Code defect behind Easter outage for Azure

By

Recovery time from overloading exceeded service goals.

Microsoft has published a root cause analysis of an outage of its Azure Domain Name System that struck the cloud platform over Easter, causing intermittent failures for customers accessing and managing their Microsoft services globally.

Code defect behind Easter outage for Azure

The problems started at around 8.30 am on April 2, when the Azure DNS servers received an anomalous surge in queries for an unspecified set of domains hosted on Microsoft's cloud.

Microsoft said it was ready for such surges, with layers of caches and traffic shaping to mitigate the effect, but a bug in its DNS service made the overloading worse.

"In this incident, one specific sequence of events exposed a code defect in our DNS service that reduced the efficiency of our DNS Edge caches," the company said.

"As our DNS service became overloaded, DNS clients began frequent retries of their requests which added workload to the DNS service.

"Since client retries are considered legitimate DNS traffic, this traffic was not dropped by our volumetric spike mitigation systems."

Multiple Microsoft services, including Azure, Office, Microsoft 365, Dynamics and Xbox Live were impacted.

Some customers reported being unable to access the Azure service status web page, but it's not clear if that issue was related to the DNS outage.

Microsoft apologised for the impact caused by the outage and said it would repair the code defect so that all DNS requests can be effectively handed in cache.

At the same time, the company said the recovery time from the outage exceeeded its design goals.

The Easter outage came just over two weeks after a wrongly removed digital key locked out Microsoft customers from their applications, causing access issues for 12 hours.

Got a news tip for our journalists? Share it with us anonymously here.
Copyright © iTnews.com.au . All rights reserved.
Tags:

Most Read Articles

How NAB unwound Teradata's 'tentacles' to decommission it

How NAB unwound Teradata's 'tentacles' to decommission it

DTA adds another $25m to Microsoft sourcing deal

DTA adds another $25m to Microsoft sourcing deal

NAB approved to use serverless in data environment

NAB approved to use serverless in data environment

WA Police Force to spend $30.8m on IT 'optimisation'

WA Police Force to spend $30.8m on IT 'optimisation'

Log In

  |  Forgot your password?