Skip to main content
banner image
venafi logo

What Do We Know About the Microsoft Azure Outage?

What Do We Know About the Microsoft Azure Outage?

March 16, 2021 | Alexa Hernandez

On March 15, Microsoft experienced a widespread outage in Azure Active Directory. According to Microsoft, the outage was caused by a “rotation of keys.” The result was a 14-hour outage that took down Office 365, Dynamics 365, Xbox Live, Teams, and additional third-party apps that depend on Azure for authentication.

We’ve said it before, and we’ll say it again: no business, big or small, is immune to outages if they are not managing the full lifecycles of their machine identities. When automation and full certificate visibility isn’t the backbone of your machine identity management strategy, error is inevitable. Let’s take a closer look at what exactly went wrong at Microsoft, and how you can avoid a similar occurrence with your organization using Venafi’s comprehensive platform for machine identity management.

What Exactly Went Wrong With Microsoft Azure?

Officials from Microsoft have confirmed that on March 15th an “an error occurred in the rotation of keys used to support Azure AD's use of OpenID, and other, Identity standard protocols for cryptographic signing operations.”

As part of Microsoft’s standard security practices, an automated system eliminates redundant keys. According to Microsoft, for the last few weeks “a particular key was marked as 'retain' for longer than normal to support a complex cross-cloud migration. This exposed a bug where the automation incorrectly ignored that 'retain' state, leading it to remove that particular key."

Once that key was removed, any app using Azure AD authentication immediately started rejected tokens that were signed with the removed key. The result? All Microsoft users that attempted to login to affected apps and third-party services were rejected.

While Microsoft did swiftly take action to mitigate the impact, the outage couldn’t be immediately reversed due to “different server implementations that handle caching differently”. It wasn’t until the affected apps had picked up the updated key metadata and refreshed their caches that users could regain access to their accounts.

On the outage, Microsoft released a statement expressing that they “understand how incredibly impactful and unacceptable this is and apologize deeply. "We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future."

How Could Venafi Have Helped Prevent This Outage?

According to Michael Thelander, Venafi Director of Product Marketing, “Poorly orchestrated key rotation is the Achilles heel of modern digital transformation efforts; this oversight capable of bringing down entire applications and services in an instant.”

This is no isolated incident or freak occurrence. Outages, audit failures, lack of visibility… these are all the result of failing to secure and manage machine identities across your entire organization.

“Unfortunately, these kinds of outages will only continue until organizations adopt an enterprise-wide approach to managing the machine identities these keys and certificates represent.” Thelander comments.

“Digital transformation is not going to slow down, and this requires automation of  keys and certificates found in workloads, containers, and across cloud environments as well as those in on-premises environments.” 


To kickstart your digital transformation, learn more about how a single platform for enterprise-wide machine identity management can help you eliminate certificate-related outages for good!

Related Posts

Like this blog? We think you will love this.
Featured Blog

With Rapid Rise in Funds Stolen from DeFi Protocols, Private Keys in Play

Massive heist begins with

Read More
Subscribe to our Weekly Blog Updates!

Join thousands of other security professionals

Get top blogs delivered to your inbox every week

Subscribe Now

See Popular Tags

You might also like

TLS Machine Identity Management for Dummies

TLS Machine Identity Management for Dummies

Certificate-Related Outages Continue to Plague Organizations
White Paper

CIO Study: Certificate-Related Outages Continue to Plague Organizations

About the author

Alexa Hernandez
Alexa Hernandez

Alexa is the Web Marketing Specialist at Venafi.

Read Posts by Author
get-started-overlay close-overlay cross icon
get-started-overlay close-overlay cross icon

How can we help you?

Thank you!

Venafi will reach out to you within 24 hours. If you need an immediate answer please use our chat to get a live person.

In the meantime, please explore more of our solutions

Explore Solutions

learn more

Email Us a Question

learn more

Chat With Us

learn more