What we know about the massive Amazon Web Services outage

What we know about the massive Amazon Web Services outage

A massive outage at Amazon’s cloud computing service disrupted apps and websites around the world Monday, leaving customers and businesses feeling the effects.

Amazon Web Services said the outage, which started early in the morning, was resolved by Monday evening.

Here’s what we know about what happened.

What is AWS?

Amazon Web Services is Amazon’s cloud services unit, which provides digital services to companies, governments and individuals.

Shion Guha, professor of human-centred data science at the University of Toronto, says when most people think about Amazon, they think about its online store or Prime Video, but AWS is where the company makes a lot of its money.

Guha said businesses seek out AWS for its centralized network of servers and other distributed software products.

Businesses can use AWS to host their website, for example, and can buy computing power on AWS so they can run analysis on large amounts of data.

Amazon is the leading provider of cloud infrastructure and platform services, constituting over 41 per cent of the market, according to research group Gartner. Google and Microsoft are the next biggest competitors.

Who was affected?

The outage on Monday morning impacted popular social media apps like Snapchat and Reddit and disrupted businesses around the world.

Fortnite, Roblox, Clash Royale and Clash of Clans were among the gaming sites that were down, while Paypal’s Venmo and Chime were some of the financial platforms that faced issues, outage tracking website Downdetector said.

The outage also affected food delivery and rideshare apps, as well as messaging app Signal, streaming apps Netflix and Disney+, video chat app Zoom, and Amazon’s own services, among many others.

WATCH | Amazon working on recovery Monday morning:

Amazon recovering after outage hits Snapchat, other major apps

Amazon’s cloud services unit (AWS) says it is recovering from a roughly three-hour widespread outage on Monday that disrupted businesses around the world. The services affected include several popular websites, apps and major banks.

Guha said Canadian customers would notice the outage in a variety of ways. For example, he said Canadian telecommunications firms host a significant portion of their back end operations on AWS. So in the case of an outage, people are likely to have trouble doing things like paying their phone bills.

He said a number of agencies at all levels of government in Canada also use AWS, which could cause issues with, for example, trying to renew a driver’s licence.

“Essentially, our critical point of failure lies in the products of one company,” Guha said. And so when there is an outage in one company, it has all of these networked effects to so many other businesses and government agencies.”

By 2:20 p.m. ET, Downdetector had received more than 13 million reports from users related to the incident, including more than 351,000 from Canada.

Davi Ottenheimer, a security operations and compliance manager and vice president at data infrastructure company Inrupt, told CBC News that we’ve seen massive global outages before, but this one is “unique in the way that it has affected so many major services.”

What happened?

The company said the problem stemmed from a cluster of data centres in northern Virginia, known as US-EAST-1. An issue with the Domain Name System, or DNS, prevented applications from finding the correct address for AWS’s DynamoDB API, a cloud database that stores user information and other critical data.

Ian Lin, director of research and development with cybersecurity firm Packetlabs, said that DNS translates numerical IP addresses into domain names so web users can reach a website by typing, for example, google.ca.

“So when the DNS goes down … then all these services kind of just can’t talk to each other,” he told CBC News.

Aerial view of a data centre
An aerial view of an Amazon Web Services Data Center known as US East 1 in Ashburn, Va., Monday, where AWS says the outage originated. (Jonathan Ernst/Reuters)

The outage was at least the third time in five years that AWS’s US-EAST-1 cluster — its oldest and largest location for web services — contributed to a major internet meltdown.

After hours of disruptions starting just after 3 a.m. ET, many applications were gradually coming back online in the afternoon in Canada, before AWS said it had resolved the issue by 6 p.m. ET.

Guha says the source of the problem could have been anything from a physical issue, like mice chewing through wires on a server farm, to a software glitch, or a virus or malware attack.

“That’s why it takes a long time to kind of fix these issues … they have to go through each and every one of their products to make sure that fixing this issue right here is not going to affect another thing down the line. So it’s one of the most complex things that they can do,” he said.

Patrick Burgess, a cybersecurity expert at U.K.-based BCS, The Chartered Institute for IT, told The Associated Press that there is no indication the outage was caused by a cyberattack.

“This looks like a good old-fashioned technology issue,” he said.

There are “well-established processes” to deal with outages at AWS, as well as rivals Google and Microsoft, Burgess said, adding that such outages are usually over in “hours rather than days.”

Source link

Visited 1 times, 1 visit(s) today

Leave a Reply

Your email address will not be published. Required fields are marked *