Amazon Web Services experienced a half hour outage yesterday thanks to a network configuration by a third-party provider, affecting a wide range of the cloud provider's customers.
The outage occured between 10.25 and 11.07 AEST, when solutions proivider AxCelX wrongly advertised that it would accept a set of routes for traffic for AWS, via the border gateway protocol (BGP) that manages data flows between autonomous networks.
AWS posted a post-mortem of the incident, and said "providers should normally reject these routes by policy, but in this case, the routes were accepted and propagated to other ISPs, affecting some end users ability to access AWS resources".
AxCelX owned up to the mistake and apologised on Twitter.
Our sincere apologies to everyone who experienced a route leak via AS33083 of AWS. We have a new prefix-list facing Hibernia.— Axcelx (@Axcelx) July 1, 2015
Nick Kephart of network performance monitoring firm ThousandEyes conducted an analysis of the outage after his company noticed that its corporate website and other facilities hosted on AWS were not accessible.
The analysis showed that AxCelX's BGP configuration mistake led to that company and ISP Hibernia incorrectly appearing in the network path to Amazon, leading to large packet loss for the data traffic.
While the outage was brief, Kephart noted it affected several well-known web properties such as Tinder, Netflix, financial firm Experian and Yelp.
BGP information between networks to direct data traffic is by and large trust-based and can be prone to configuration mistakes that result in large-scale outages affecting millions of internet customers.
In June this year, a massive route leak by Telekom Malaysia caused severe access problems for Australian and New Zealand providers, after the South East Asian telco's network was swamped with large amounts of traffic it wrongly said it could accept.