Your data-in-transit must be encrypted continuously, whether it’s in your website, in DevOps or in any other sort of application. However, the more you rely on encryption, the more certificates you have to protect and manage. And if any of those certificates expire, they will trigger an application outage. Certificate outages can be ever so painful! Downtime can wreak havoc. It’s critical that you keep everything running smoothly to please your customers and keep your business going. Fortunately, there are easy ways to ensure that you never have to experience certificate outages. Here are some of the best!
- Watch your certificate renewal processes
Certificates expire, get revoked, and then new certificates need to be used in their place. It’s a process that keeps going all the time. One common cause of certificate outages is when the certificates aren’t renewed properly! That’s why you must watch your certificate renewal processes. If you’re not careful, they can become unreliable. Human error is often a problem here. Test your systems and see if certificates are renewed whenever it becomes necessary. You should also automate as much as you can.
- Use better certificate policies
Your policies for TLS certificates and other such machine identities must have standard practices that are implemented enterprise wide. Inconsistencies in the deployment of certificates can lead to certificate outages. How can you be sure that all of the aspects of your PKI and certificate implementation are compatible without good, universal certificate policies? Incompatibilities can lead to outages and downtime. Remove the guesswork for CAs, attributes and configurations with better, more detailed, and more consistent policies.
- Improve certificate visibility and warn of impending outages
There are often signs that a certificate outage may happen soon, such as watching your certificate lifespans and expiration dates. You need to have systems in place that will warn when these events are about to happen. Don’t try to track down individual certificates. Have a good warning system in place in order to avert disaster across the enterprise.
- Real life events can cause outages, prepare for them
There’s more to the deployment of computer technology than the computer technology itself. Sometimes human world events can lead to certificate outages if you don’t prepare for them! An excellent example is when there was a US government shutdown in early 2019. As Martin Thorpe, Enterprise Architect for Venafi said at the time:
“The US shutdown has now left a mark on the digital world. Several government websites, such as the DoD, now greet users with a ‘CERT_DATE_INVALID’ warning in place of the website itself. At best, this isn’t a good look for the government departments concerned. At worst, the thousands of Americans who rely on these websites are left cut off from the services they need.”
Test your PKI and investigate if your certificates can be revoked, rotated and renewed—even if your staff don’t show up to your office. I’m writing this during the COVID19 pandemic, which has also thrown a wrench into business-as-usual. Real-life events are a lot less likely to cause outages if you automate your certificate management.
- Improve your machine identity workflows
Your machine identity workflows must be well integrated into all of your systems that use them. DevOps, ITSM, ticketing, dynamic web app databases—all of your pertinent backends need certificate deployment systems that run smoothly. Any glitch could cause an outage! So, integrate your certificate services properly, ensuring compatibility and proper functionality.
- Document your certificate service processes
The human beings who work with your certificate services need thorough documentation about how everything works. They should also be well trained and ready to troubleshoot if something goes wrong. If you don’t prepare your team with all of the necessary knowledge in order to maintain certificate systems properly, you may very likely have disastrous outages.
- Delegate your certificate owners
Your certificates need properly delegated owners. The buck needs to stop somewhere. If you don’t know who is responsible for your particular certificate lifecycle actions, chaos can ensue. Which of course can lead to certificate outages. And if a certificate owner doesn’t take action within an acceptable time frame, detailed escalation paths must be defined.
- Implement technologies for certificate visibility and automation
Your organization could be generating hundreds of certificates per day, especially in DevOps. Your certificate processes can become an unmanageable mess without proper visibility into all of your certificates. You can’t fix what you cannot see. If you deploy good technology that provides thorough visibility and extensive automation, certificate outages can be eliminated.
- Validate certificate owner actions
Staffing changes and new hires or promotions can mean that sometimes certificate owners are learning on-the-job. And that’s okay, and often necessary. Unfortunately, human beings make mistakes, and people are more likely to make mistakes when they’re new. Implementing systems that automatically validate certificate owner actions can help your applications overcome staffing growing pains and avoid certificate outages.
- If outages do happen, investigate them
It is possible to completely eliminate certificate outages in your networks and applications. But just in case they do happen, they need to be thoroughly investigated. That way, they can be prevented in the future. Look at your network and application logs. Ask technical staff what happened when the outage was discovered. Retrace your steps and figure out what went wrong. Then fix your certificate implementation accordingly to make sure it doesn’t happen again.
Certificate outages can be eliminated, even on complex networks! Proper training, automation, and careful configuration is key. The key to preventing outages is your very important keys! Those are your machine identities, of course. And it’s just as important to protect them against outages as it is to protect them against malicious misuse.