Deployment-related outages at Microsoft and Amazon. What we can learn.
Two weeks ago the high-profile outage was Amazon’s S3. This week it was widespread outages at Microsoft’s Skype, Outlook, Xbox, OneDrive, and Hotmail.
Those of us on the outside looking in can never be 100% certain of all the actions (and inactions) that led up to the event but both the recent Amazon and Microsoft outages have a few things in common:
Neither company blamed an outside attack or hack.
Neither company blamed equipment failures.
Both blamed errors in deploying changes.
As Microsoft noted on their Azure status history page: https://azure.microsoft.com/en-us/status/history/
“Engineers identified a recent deployment task as the potential root cause. Engineers rolled back the recent deployment task to mitigate the issue.”
In explaining their own outage two weeks ago Amazon said, “one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended…allowing “too much capacity to be removed too quickly”.
MTTD is Key:
Mean time to Detect (MTTD) is the key metric when finding the root cause of an outage. Orca’s ‘at-a-glance’ feature highlights all configurations that are out of compliance, have recently changed or drifted from their desired state, and those that can adversely affect deployments. As time is key, Orca also provides an option to be alerted to configuration issues by email too. No more guessing or wondering, and more importantly; no more waiting.
MTTR is Better By Far:
Mean time to Recover (MTTR) from a configuration or deployment effort is the essential metric in an outage. That’s why Orca has built in automatic rollbacks of configuration deployments and application releases and can quickly restore previous settings from a prior state or release in the event of a disaster.
While there is no single, silver bullet to prevent every possible outage, focusing on MTTD and MTTR will lessen the pain. To learn about using automation to slash your MTTD and MTTR, contact an Orca Product Specialist today.