The what-could-possibly-go-wrong scenario
You had a single certificate expire, causing an outage to be reported. Over the next two weeks you chased outage after outage running from Vlan to Vlan attempting to pick up the pieces from a barely tracked certificate and the worst part about it is that each outage had to be reported to you by your customers.
Here’s a quick look at Murphy’s Law was not your friend throughout the entire arduous process of recovery. The one shining star throughout the ordeal is that you have a great Certificate Authority (CA). They are a market leader and do a great job telling you when your certificate is going to expire. So, 30 days before expiration you followed their notification and went to renew your certificate.
- Murphy Strikes 1: You go to login only to realize that your CA uses certificate authentication and you don’t have it in your Microsoft key store.
- Murphy Strikes 2: You got the credential and loaded it in your key store only to find out that this credential only gives you access to abc.com. The one you need, xyz.com is under a different credential.
Finally, you have the right credential and you renew the certificate for another year of peace and uninterrupted service. You take that certificate and you start the arduous effort of provisioning that certificate to each application it lives on. Luckily it is not a Wild Card certificate, so you know where it lives. Even though this certificate has over 200 Subject Alternate Names, you can get the work done with some basic time-consuming elbow grease.
- Murphy Strikes 3: Despite all of your blood, sweat and tears, you still have outages being reported and worst yet, they are intermittent. (People who fix things are not fans of the horrific word INTERMITTENT.)
Now, you could go run Wireshark and debug packet captures, and if you are lucky you might see what is happening. It just so happens I have seen this issue a lot, so I am going to cut to the chase and tell you that you have a certificate and a load balancer at play. When you distributed this certificate, you got 10 out of 11 servers running that certificate behind the load balancer. So, it all works great except for 1 out of 11 times.
- Murphy Striking 4 (just one more time): Your CA goes out of business, or one of your disgruntled employees with access to that certificate and private key just left the company or, there is a new private key compromise that affects this or…the list goes on and on and on.
Now you get to fix those 200+ certificate machine identities all over again. At this point, you realize that it is time to bring in some automation.
Does this scenario sound farfetched? Ahem. I tell you this story because it is real. In fact, this is a very simplified version. Managing and protecting machine identities is nothing new. Saving time, effort and saving yourself from certificate outage fire drills can be.
Here’s to smarter work with less emergencies in 2019!