AWS Outage Disrupts Major Platforms Following DNS Resolution Failure

A significant AWS outage in the US-EAST-1 region caused widespread service disruptions for major platforms including Slack, Atlassian, and Snapchat.
The incident was triggered by DNS resolution issues affecting DynamoDB service endpoints, cascading to multiple AWS services.
Full service restoration took over 15 hours, highlighting critical dependencies in cloud infrastructure.

Amazon Web Services experienced a major service disruption on Monday, with user reports indicating problems beginning around 9:13 AM EDT according to Downdetector data. The outage originated in AWS's critical US-EAST-1 region in Northern Virginia and quickly cascaded to affect numerous major platforms and applications.

The incident was caused by DNS resolution issues affecting DynamoDB service endpoints, which then propagated to impact multiple AWS services and the applications that depend on them. According to people familiar with the matter, the DNS problems created a cascading failure that affected even Amazon's own retail operations and subsidiary services.

AWS engineering teams identified the DNS resolution problem and implemented an initial mitigation, though full recovery proved more complex. "We're continuing to work towards full recovery of all services," an AWS spokesperson said in a statement. The company temporarily throttled certain operations, including EC2 instance launches, to facilitate complete system restoration.

Monitoring data from ThousandEyes confirmed the issue resided within AWS's backend systems rather than network infrastructure, with no coinciding network events detected. This marks another in a series of outages affecting the US-EAST-1 region, which remains one of AWS's oldest and most heavily utilized data centers.

For enterprise customers, the disruption highlighted the risks of cloud concentration and single-region dependencies. Many organizations following AWS best practices maintain multi-region architectures precisely to avoid such widespread outages, though cost considerations often lead some to concentrate resources in single regions.

AWS has not yet commented on whether customers will receive service level agreement credits, though the extended duration of the outage likely triggers automatic compensation for many enterprise contracts. The company typically provides detailed post-mortem analysis following significant incidents, which customers will be watching closely for insights into preventing similar future disruptions.

Correction: An earlier version of this article misstated the exact start time of the AWS outage. The incident began at approximately 11:49 PM PDT on October 19 (7:49 AM UTC on October 20).