Failed DNS server restarts caused Salesforce outage

By Juha Saarinen

May 22 2021 1:07PM

Configuration change "exposed a design issue in the shutdown process".

Domain name servers that did not restart as expected after a configuration change caused Salesforce's services to go down worldwide on May 12, the company said in a final root cause analysis of the incident.

Failed DNS server restarts caused Salesforce outage

On that day, "a configuration change was made as an emergency fix at the network tier, which was designed to address a functional gap in preparation for an upcoming maintenance activity," Salesforce said.

Salesforce use the Berkely Internet Name Daemon (BIND) software.

A change was made to enable DNS resolution between an existing Salesforce Australia data centre and a new Hyperforce environment set to undergo maintenance, using a script.

The script, which Salesforce says has been used in the past three years without ill effects, used an internal method called Metazone change.

This deploys new configuration data through a DNS zone transfer, but in the May 12 incident the script did not behave as expected.

A UNIX operating system KILL command did not wait long for the BIND named process to exit cleanly or to remove a process identification (PID) file.

On restart, the named startup script checks for an existing PID to determine if an instance is already running.

If the script finds a PID file, it exits immediately, and as a result, the named DNS server process did not restart.

Salesforce said the script failure had global impact because the Metazone changes were deployed to named servers across all its data centres worldwide.

Many named services failed to restart, causing widespread disruption for Salesforce customers.

A lack of automation with safeguards for DNS changes to protect against unforeseen incidents was a contributing factor for the outage, along with insufficient guardrails to enforce the change management process.

Saleforce's Sales, Service, Marketing, Commerce, Government and Experience Clouds all became inaccessible for users, along with Heroku, Pardot and Industries.

Adding to Salesforce customers' woes, the status.salesforce.com site experienced such high traffic that it, too, became unavailable.

Customers were also unable to log support cases due to multi-factor authentication problems.

Salesforce has apologised for the outage, and has put a moratorium in place for all DNS change across the company.

The script that triggered the outage has also been removed.

Got a news tip for our journalists? Share it with us anonymously here.

Tags:

SUBCO's SMAP on track for May go-live

Australian Electoral Commission hits go on generative AI

Researchers detail Bluetooth headphone attack that can hijack smartphones

Optus CTO Tony Baird to depart

Patients fret as ManageMyHealth data breach drama plays out

Failed DNS server restarts caused Salesforce outage

Configuration change "exposed a design issue in the shutdown process".

Partner Content

Sponsored Whitepapers

Events

Most Read Articles

BHP's long-time network performance focus has it "AI-ready"

BoM's seven-year technology transformation cost $866m

WA man jailed for at least five years for evil twin attack

Samsung triple zero handset firmware mystery deepens

Most popular tech stories

Australian Electoral Commission hits go on generative AI

Cochlear pilots voice-to-text Salesforce integration for lead management

State of HR Tech 2025

Audit Office of NSW and Data61 explore AI for gov auditing

Zara turns to AI to generate fashion imagery

HamiltonJet partners with digital services provider Fortude

SentinelOne signs distribution agreement with Sektor

Rapid7’s new SIEM combines exposure management with threat detection

The techpartner.news podcast, episode 3: Why security consultancy founder Kat McCrabb started with the hard stuff

Bluechip Infotech enters final stage of Goodson Imports acquisition

Blackberry celebrates "giant step forward"

AI-driven robot makes ‘perfect’ flatbread

'Touch-free' smartphone controlled with head movements

NSW Govt to trial IoT tech to monitor sharks

What is an 'intelligent' edge gateway?

SUBCO's SMAP on track for May go-live

Australian Electoral Commission hits go on generative AI

Researchers detail Bluetooth headphone attack that can hijack smartphones

Optus CTO Tony Baird to depart

Patients fret as ManageMyHealth data breach drama plays out

Failed DNS server restarts caused Salesforce outage

Configuration change "exposed a design issue in the shutdown process".

Partner Content

Sponsored Whitepapers

Events

Most Read Articles

BHP's long-time network performance focus has it "AI-ready"

BoM's seven-year technology transformation cost $866m

WA man jailed for at least five years for evil twin attack

Samsung triple zero handset firmware mystery deepens

Most popular tech stories

Australian Electoral Commission hits go on generative AI

Cochlear pilots voice-to-text Salesforce integration for lead management

State of HR Tech 2025

Audit Office of NSW and Data61 explore AI for gov auditing

Zara turns to AI to generate fashion imagery

HamiltonJet partners with digital services provider Fortude

SentinelOne signs distribution agreement with Sektor

Rapid7’s new SIEM combines exposure management with threat detection

The techpartner.news podcast, episode 3: Why security consultancy founder Kat McCrabb started with the hard stuff

Bluechip Infotech enters final stage of Goodson Imports acquisition

Blackberry celebrates "giant step forward"

AI-driven robot makes ‘perfect’ flatbread

'Touch-free' smartphone controlled with head movements

NSW Govt to trial IoT tech to monitor sharks

What is an 'intelligent' edge gateway?

Log In