AWS outage caused by "unlikely interaction" between automated systems

By Ry Crozier

Oct 24 2025 5:09PM

Cloud provider explains US-EAST-1 chaos.

The Amazon Web Services outage in North Virginia was caused by a software bug in an automated DNS management system that led one automated component to delete another’s work.

AWS outage caused by "unlikely interaction" between automated systems

The cloud provider published an extensive post-incident report late on Friday Australian time, shedding light on a disruption touted as the biggest for internet infrastructure in more than a year.

The post-incident report notes that “there were three distinct periods of impact to customer applications”, though the initial problems with DynamoDB are likely to be of most interest.

The official root cause attribution is given to be “a latent race condition in the DynamoDB DNS management system that resulted in an incorrect empty DNS record for the service’s regional endpoint (dynamodb.us-east-1.amazonaws.com) that … automation failed to repair.”

The race condition, a type of software bug, involved “an unlikely interaction” between two of the same type of automated component in the DynamoDB DNS management architecture.

AWS said there are two distinct components in the architecture: a “DNS Planner, [which] … periodically creates a new DNS plan for each of the service’s endpoints”, and DNS Enactors that “pick up the latest plan” and systematically apply it to the endpoints.

“This process typically completes rapidly and does an effective job of keeping DNS state freshly updated,” AWS said.

AWS said DNS Enactors sometimes come in contact, usually unproblematically.

But, in this instance, one DNS Enactor “experienced unusually high delays, needing to retry its update on several of the DNS endpoints” while another Enactor picked up a newer plan and “rapidly” applied it to endpoints.

“The timing of these events triggered the latent race condition,” AWS said.

“When the second Enactor (applying the newest plan) completed its endpoint updates, it then invoked [a] clean-up process, which identifies plans that are significantly older than the one it just applied and deletes them,” AWS said.

“At the same time that this clean-up process was invoked, the first Enactor (which had been unusually delayed) applied its much older plan to the regional DynamoDB endpoint, overwriting the newer plan.

“The second Enactor’s clean-up process then deleted this older plan because it was many generations older than the plan it had just applied.

“As this plan was deleted, all IP addresses for the regional endpoint were immediately removed”.

AWS said that “manual operator intervention” was ultimately required to mitigate the incident.

As an immediate step, AWS said it has disabled both the “DNS Planner and the DNS Enactor automation worldwide”.

“In advance of re-enabling this automation, we will fix the race condition scenario and add additional protections to prevent the application of incorrect DNS plans,” the cloud provider said.

The problems with DynamoDB in US-EAST-1 led to disruptions of other AWS cloud services that depend on it.

Problems with EC2 instances were caused by a subsystem that depended on DynamoDB to function being unable to reach the service; this failure caused flow-on impacts.

Other AWS services with dependency on DynamoDB also experienced issues during the incident.

Got a news tip for our journalists? Share it with us anonymously here.

Tags:

TfNSW to replace traffic nerve centre core systems

Samsung hits back, warns against old tech for triple zero

NEXTDC to build AI campus and GPU "supercluster" in Sydney

WA makes its chief data officer permanent

Samsung tried to fix triple zero problem with mobiles nearly five years ago

AWS outage caused by "unlikely interaction" between automated systems

Cloud provider explains US-EAST-1 chaos.

Partner Content

Sponsored Whitepapers

Events

Most Read Articles

CBA wants agents to initiate application modernisation

BoM never planned to end reliance on 'legacy' site

NEXTDC to build AI campus and GPU "supercluster" in Sydney

Cochlear plugs AI into its global contact centre operations

Most popular tech stories

Virgin Australia, Wesfarmers strike OpenAI agreements

CBA finds its first chief AI officer

Meta to cut up to 30 percent of metaverse budget

QBE backs leadership and startup culture to deliver transformation

Charles Sturt University's new chatbot aims to be 'empathetic' 24/7

HamiltonJet partners with digital services provider Fortude

SentinelOne signs distribution agreement with Sektor

Rapid7’s new SIEM combines exposure management with threat detection

The techpartner.news podcast, episode 3: Why security consultancy founder Kat McCrabb started with the hard stuff

Bluechip Infotech enters final stage of Goodson Imports acquisition

Blackberry celebrates "giant step forward"

'Touch-free' smartphone controlled with head movements

Govt launches consumer tech label program for smart devices

Perth IoT vendor Digital Matter names new chief executive

Axis Communications opens experience centre in Sydney tech hub

TfNSW to replace traffic nerve centre core systems

Samsung hits back, warns against old tech for triple zero

NEXTDC to build AI campus and GPU "supercluster" in Sydney

WA makes its chief data officer permanent

Samsung tried to fix triple zero problem with mobiles nearly five years ago

AWS outage caused by "unlikely interaction" between automated systems

Cloud provider explains US-EAST-1 chaos.

Partner Content

Sponsored Whitepapers

Events

Most Read Articles

CBA wants agents to initiate application modernisation

BoM never planned to end reliance on 'legacy' site

NEXTDC to build AI campus and GPU "supercluster" in Sydney

Cochlear plugs AI into its global contact centre operations

Most popular tech stories

Virgin Australia, Wesfarmers strike OpenAI agreements

CBA finds its first chief AI officer

Meta to cut up to 30 percent of metaverse budget

QBE backs leadership and startup culture to deliver transformation

Charles Sturt University's new chatbot aims to be 'empathetic' 24/7

HamiltonJet partners with digital services provider Fortude

SentinelOne signs distribution agreement with Sektor

Rapid7’s new SIEM combines exposure management with threat detection

The techpartner.news podcast, episode 3: Why security consultancy founder Kat McCrabb started with the hard stuff

Bluechip Infotech enters final stage of Goodson Imports acquisition

Blackberry celebrates "giant step forward"

'Touch-free' smartphone controlled with head movements

Govt launches consumer tech label program for smart devices

Perth IoT vendor Digital Matter names new chief executive

Axis Communications opens experience centre in Sydney tech hub

Log In