Everyone knows that certificate outages are painful. Just ask anyone who has had to deal with the tangled aftermath of an expired certificate. There are so many unknowns. And so many unanticipated consequences. And that’s perhaps why, when it comes to measuring the specifics of just how bad a given outage was, the details often get blurred by the post traumatic stress. So it’s hard to get answers that quantify the impact. How long was the outage? Too long. How many systems were impacted? Too many. How much revenue was lost? Too much. But that particular type of denial won’t help anyone avoid a similar outage from happening again at some point in the future.
That’s why it’s so amazing that Epic Games was entirely transparent about a certificate outage that impacted the company on April 6. In the spirit of openness and goodwill, the company shared their outage story with the world. In their own words, “It is embarrassing when a certificate expires, but we felt it was important to share our story here in hopes that others can also take our learnings and improve their systems.”
The company goes on to reveal in-depth details about why the outage happened, how big was the impact, and how long it took to fix. This is incredibly valuable information to help organizations everywhere understand why they need to take certificate management seriously. This level of sharing is downright…well…epic! And I applaud Epic Games for this heroic level of candor and downright altruism.
It’s bad enough when one system goes down. But what you will see in the story that Epic Games shares is that certificate outages often have unanticipated, critical impact on systems beyond those directly involved in the original outage. Epic Games outlines two additional areas of substantial impact beyond the initial outage triggered by the expired certificate:
It’s hard to imagine a more careful complete summary of the impacts of certificate outages. Many companies choose to overlook the peripheral impacts. In this case, over 25 critical staff members were pulled away from other pressing duties to repair the damage. Millions of connections were disrupted. And thousands (not quantified) of frustrated customers were offered invalid content from the company’s online store. This brings concrete meaning to otherwise vague terms like lost revenue, diverted productivity, customer dissatisfaction and brand damage.
But the relatively mild user irritation caused by a few minutes of outage did not dissipate once the expired certificate was repaired. As I suspect is often the case, the impact lasted much longer than anyone could have predicted. While the expired certificate was detected and replaced in a near record time (approximately 37 minutes), the aftermath lingered on for nearly 5 hours afterwards. Here’s the exact timeline that Epic Games shared:
Now that is an afternoon that I would not wish on anyone. But congratulations on a successful resolution. So, how can you be sure that this won’t happen to your organization? First, as Epic Games now does, you need to recognize the critical importance of each and every digital certificate that acts as a machine identity anywhere in your network. You need to know how many you have, where they are being used, and…yes…when they will expire. Once you are armed with that information, you can safely automate the entire certificate lifecycle so that there will be no nasty surprises.
Venafi offers a comprehensive platform for machine identity management that has helped the world’s leading companies keep track of their certificates and avoid outages. In fact, based on the lessons we’ve learned from working with 400+ global customers, we’ve created a proven, 8-step methodology that combines people, process and technology. If you follow this blueprint, we guarantee that you can stop TLS certificate-related outages forever.
Tired of worrying when your next certificate outage will hit? Contact us.