Microsoft, the company that most organizations rely on for internal certificates, had a bit of trouble managing their external certificates. This morning users around the world were unable to log in to Microsoft Teams due to an apparent outage. The company later identified the cause of the outage to be an expired authentication certificate. It’s still unclear how many of the 20 million users worldwide were impacted.
The good news for Microsoft is that they were able to identify the cause of the outage and remediate relatively quickly by applying a new certificate to the service. The bad news is that they were left with a little egg on their face for a simple, and easily avoidable mistake that had (apparently) widespread impact.
We've determined that an authentication certificate has expired causing, users to have issues using the service. We're developing a fix to apply a new certificate to the service which will remediate impact. Further updates can be found under TM202916 in the admin center.— Microsoft 365 Status (@MSFT365Status) February 3, 2020
Many organizations are not so lucky. It often takes hours to identify the cause of an outage as an expired certificate. And it may then take just as long to locate and replace it.
Kevin Bocek, vice president of security strategy and threat intelligence at Venafi, warns that the Microsoft outage is not as uncommon as we’d like it to be. “Microsoft is experiencing something that happens every day to Global 5000 businesses. Certificates can take weeks to renew and mistakes are often made. These mistakes can cause a service or application to go down for hours, days, and, in some cases, even longer. This is not a unique occurrence, and unfortunately Microsoft Azure and LinkedIn have experienced outages due to expired certificates in the past.”
Certificates have a bigger role in our security infrastructure than many imagine, Kevin Bocek notes, “The main issue is that certificates act as authenticators for machines, they authorize machine-to-machine connections and communications. Keys and certificates serve as machine identities and they are critical to today’s global economy work. When they expire, or are untrusted, business stops.”
Indeed, in recent history, we’ve seen high-profile certificate outages with much bigger impact. In December of 2018, a 24-hour outage that impacted more than 30 million customers of multiple U.K.-based mobile providers—including O2, Tesco Mobile and Sky Mobile—was traced back to the expiration of one or more certificates. Even worse, recent government reports on Equifax’s 2017 breach all pointed to the expiration of a certificate and the failure of internal systems to compensate for the loss of this control.
All of that adds up to lost availability, productivity, reputation and even regulatory fines and job security. As Eva Hanscom mentioned in a recent blog, “Leading analysts report that the cost of a critical infrastructure outage in Global 5000 organizations can average $5,600 per minute, or more than $300,000 per hour. For large networks, severe outages can take days to resolve and cost as much as $500,000 per hour or more.”
Why is this issue still impacting large, security conscious organizations around the world?
According to Kevin Bocek, “The problem is that most businesses and government agencies companies are using thousands of certificates but they don’t have the insight or automation needed to replace certificates before they expire. And, an outage based on a failed certificate is really painful, not just for consumers but also for the IT and security teams trying to fix them. Finding an expired certificate manually is like looking for a very specific needle in a stack of needles.”
This problem is becoming even more critical as the volume and rate of change connected with machine identities, such as TLS certificates, increases. Without comprehensive visibility and intelligence, certificate-related outages will remain a relatively common—and ultimately painful—occurrence. Automating certificate renewals throughout your infrastructure is also critical preventive medicine in eliminating outages.
Does your organization have the visibility, intelligence and automation you need to avoid the aftermath of an embarrassing, and potentially risky, certificate outage?
Venafi can ensure your company won't have a certificate-related outage. Ever. CEO Jeff Hudson explains.