The Commonwealth Bank is having an AWS ‘frontier’ AI agent work simultaneously alongside its engineers who are on on-call support rotation with the express aim of making the early wake-up call less taxing.
The bank revealed the on-call use case - and several others - at AWS Summit Sydney earlier this month. Its early access to the AWS-built agent was touted at the vendor’s US conference at the end of last year, but there was scant information at the time around how it was being tested or used.
With CBA’s core banking now running in the cloud, the bank has an opportunity to use the AWS AI agent to troubleshoot problems with that ecosystem, and others that also depend on cloud services to run.
On-call support workflows in most organisations rely on engineers to connect the dots by correlating data to identify a root cause and a mitigation or resolution.
Head of cloud services Jason Sandery said that for incident response occurring outside business hours, the on-call team often comprised engineers with varied domain knowledge and skillsets.
“With an on-call rotation, [there are] different engineers with different skills and experiences. An identity expert is awesome at identity, [but] they might not be the greatest at troubleshooting a hardcore network problem,” he said.
A broader complication is the nature of after-hours on-call support: forcing engineers out of bed to deal with potentially critical incidents and alerts.
“Nobody is at their best at two o’clock in the morning when you get a phone call asking why core banking is down,” Sandery said.
“Responding quickly and being able to rapidly troubleshoot from a cold start in a complex and really high pressure environment every time [is] a really hard ask.
“We’re trying to remove one of the hardest parts of incident response - the cold start - and really cold, get me out of bed cold.
“We want to reduce the cognitive load on engineers through that process, especially after hours - and part of that is [also] helping to improve decisions that are made during those moments of duress.”
Sandery said that, typically, when an issue arises it triggers an alert that is logged through PagerDuty and picked up by an on-call engineer.
“This is when the clock really starts. The moment the engineer picks up that phone, gets out of bed, grabs the laptop, finds a quiet spot, boots up, VPN, MFA - just getting to the console in these parts on a good day is 20-to-30 minutes, and during this time, probably arguably the most important 20 minutes, there’s no investigation because the platform team, largely us or others, are still getting up out of bed,” Sandery said.
“Once they’re logged in, the investigation begins. If it’s a really complex problem, level one usually taps out, they tip level two and we go through that same process - page out, wake up, log in, get in. Then, the hardcore investigation starts - anywhere from 60 minutes to three hours or longer.”
With the DevOps agent, the process is “forked” at the alarm stage.
“The alarm goes off, PagerDuty still goes off, [and the on-call] engineer still gets the phone call,” Sandery said.
“This time, though, as it hits PagerDuty, we’re sending the same context and same alarm information directly to the DevOps agent.
“So, while the engineer is waking up, the agent has already started its investigation - checking accounts, resources. It’s getting that information and correlating the signals … so by the time the engineer gets their laptop up and running, they’re not staring at an empty dashboard. They’re presented with an investigation summary, a likely root cause and a suggested set of remediation activities.
“Now, the engineer’s job isn’t where do I start - it’s do I agree with this recommendation. That alone removes a huge amount of cognitive load from a high stress situation.”
Sandery said the presence of the agent could reduce the root cause identification process to 30-to-50 minutes.
“It’s pretty cool but it’s not magic: it’s really just removing a lot of that dead time and helping with the manual correlation to put together a root cause.
“It’s largely [getting to] the same root cause that our engineers would get to, but just a little bit quicker.”
The capability was demonstrated to CBA early on, when it was first given access to the DevOps agent by AWS while working with the vendor in Seattle.
The bank was the first organisation anywhere in the world to gain access to the beta version of the agent back in September, AWS confirmed.
Sandery said the team purposely threw a challenging problem at the agent, expecting it to fail.
“We recreated a gnarly network problem that took us generally about five-to-six hours to diagnose and fix, [that had a] pretty large-ranging impact in [production],” he said.
“We recreated it, pointed the agent at it, laughed … [and] waited for [it to produce AI] slop.
“[But after] 25 minutes, it identified the root cause, and gave us essentially what we ended up doing in [production], just a slightly separate way. And I think it was at that point we were like, this could be really important to help our engineers when they need it the most.”
For code-related issues, on-call engineers can go a step further than having the DevOps agent reduce the time needed to identify the issue and engage a second agent to produce a fix.
“If you’re lucky enough to have an internal AI-powered engineering team - mad props to ours - they’ve created a code agent called Patchwork,” he said.
“We’re able to actually take that mitigation step, plug it into Patchwork and hey presto, if it’s a code-related or an IaC [infrastructure-as-code]-related problem, [the] code agent actually gives you that fix - obviously with a human in the loop, [who] must review [the output before applying it].”
General problem resolution
The bank isn’t limiting its use of the DevOps agent to incident response in high-pressure on-call situations.
Sandery said that over 10,000 engineers use the bank’s cloud platform daily, creating “a huge amount of demand for general support work - ‘How come I can’t access that database?’, ‘Why can’t I get to the internet?’ or ‘Why did my build fail?’.
“A lot of our effort isn’t [on] incidents - thank God, quite frankly - it’s questions, investigations, support tickets. These are really important - they either mean a team is blocked or something is going to cause an incident,” Sandery said.
“[By] building this [DevOps] agent directly into our platform, it’s really going to reduce customer wait time on this, and it’s also going to remove a lot of repetitive diagnostic work that our central teams do.
“So, CBA teams are getting access to a new engineering tool - one that’s always on, always on-call, that can rapidly troubleshoot from a cold start, [but also] help you debug your app and our complex foundational platforms.
“In fact, it can also help you debug and troubleshoot the underlying AWS platform, which we’ve never had before - and answer questions like why you can’t connect to a certain database.”
Sandery said the cloud enablement team at CBA is looking for the DevOps agent to handle about half of all internal support and AWS support requests.
‘New and unusual patterns’
Separately, in the first day keynote at AWS Summit Sydney, general manager of core banking technology Simon Davies hinted at additional detection roles for the DevOps agent with respect to the cloud-based banking core.
“For a class of known issues, we could already take action without any human intervention, but what I find really exciting about the DevOps agent is that it can detect new and unusual patterns - things we haven’t seen before - and it can respond in ways beyond what we’ve explicitly defined,” Davies said.
“Instead of us stitching together signals and manually analysing the correlation between them, this system observes its own state, reasons about it and acts.
“And in that way, we move from systems that are operated to systems that operate parts of themselves.”
Ry Crozier attended AWS Summit Sydney as a guest of AWS.

iTnews Executive Retreat - Data & AI Edition
iTnews Cloud Covered Breakfast Summit
iTnews State of Security Breakfast
The 2026 iAwards
Integrate 2026



