Dear Visitor,

Our system has found that you are using an ad-blocking browser add-on.

We just wanted to let you know that our site content is, of course, available to you absolutely free of charge.

Our ads are the only way we have to be able to bring you the latest high-quality content, which is written by professional journalists, with the help of editors, graphic designers, and our site production and I.T. staff, as well as many other talented people who work around the clock for this site.

So, we ask you to add this site to your Ad Blocker’s "white list" or to simply disable your Ad Blocker while visiting this site.

Continue on this site freely
  HOME     MENU     SEARCH     NEWSLETTER    
BUSINESS NEWS FOR TECHNOLOGY DECISION-MAKERS. UPDATED 10 MINUTES AGO.
You are here: Home / E-Commerce / Typo Blamed for Amazon's Outage
Typo Blamed for Amazon's Internet-Crippling Outage
Typo Blamed for Amazon's Internet-Crippling Outage
By Samuel Gibbs Like this on Facebook Tweet this Link thison Linkedin Link this on Google Plus
PUBLISHED:
MARCH
03
2017

Amazon has blamed the outage of its S3 web service, which took down many different sites, services and devices across the internet, on a typo.

The failure of a critical section of Amazon Web Services (AWS) called S3 (Simple Storage Solution) on Tuesday, led to sites such as Business Insider and Medium failing, while some people found they could not turn on their internet-connected lightbulbs because automation service Ifttt was knocked offline.

Amazon said that at the time of the outage one of its engineers was attempting to diagnose why its billing service for S3 was running slowly. The engineer attempted to take a small subset of the servers for one of S3’s subsystems involved in billing offline for inspection, executing a command from Amazon's "established playbook."

Amazon said in an apology to customers: "Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems. One of these subsystems, the index subsystem, manages the metadata and location information of all S3 objects in the region."

Like most other cloud providers, Amazon's S3 and other services under the AWS banner are built with redundancy in mind, allowing things to fail without taking out the whole system. But it seems accidentally taking the wrong servers in the wrong quantity offline caused a cascade of more major issues.

The problem was compounded by the fact that Amazon hasn't rebooted the indexing system parts of AWS relies on for years.

Amazon said: "We have not completely restarted the index subsystem or the placement subsystem in our larger regions for many years. S3 has experienced massive growth over the last several years and the process of restarting these services and running the necessary safety checks to validate the integrity of the metadata took longer than expected."

The issue actually only affected Amazon's Northern Virginia region, but that was enough to cause major problems for sites and services using that particular data center region.

Amazon apologized for the issue and said that it has put schemes in place to avoid the same problems caused by human error in the future.

© 2017 Guardian Web under contract with NewsEdge/Acquire Media. All rights reserved.
Tell Us What You Think
Comment:

Name:

Like Us on FacebookFollow Us on Twitter
MORE IN E-COMMERCE
NEWSFACTOR BUSINESS REPORT
NEWSFACTOR NETWORK SITES
NEWSFACTOR SERVICES
© Copyright 2017 NewsFactor Network. All rights reserved. Member of Accuserve Ad Network.