Skip to main content
banner image
venafi logo

5 Reasons Your Certificates Keep Expiring

5 Reasons Your Certificates Keep Expiring

expiring certificates
July 27, 2020 | Mark Sanders

Nearly every organization struggles with certificate-related outages. For people that don’t work with PKI everyday managing TLS certificates seems like it should be very straight forward, but even large organizations with strong IT and security practices fall victim to certificate outages regularly.

I have been at Venafi for almost 9 years and during that time I’ve worked with clients from around the world. Before that, my focus was network and systems management and network operations, so I’ve been in the trenches both as a vendor and as a team member trying to keep systems up and operating reliability.

I’ve seen a lot of organizations from all kinds of industries in various stages of maturity when it comes to managing and securing their machine identities. At this point I’m pretty much able to predict the challenges, struggles and pains that an organization is having and going to have based on the maturity level of their machine identity management program.

These are real world stories that I have personally seen multiple times while working with organizations all over the world.

Ownership problems

Joe requested a certificate, so Joe’s email address is listed as the owner of the cert. An email was sent to Joe 30 days before the certificate was set to expire.

  • Problem 1 – Joe changed teams and he’s sure that someone on his previous team is now responsible for that old cert.
  • Problem 2 – Joe left the company 6 months ago.
  • Problem 3 – Joe’s super busy with important stuff. He sees in the email that he’s got 30 days until the cert expires so that means he has 29 whole days until he needs to handle this. You can guess what happens.

Certificate Management via Spreadsheet / Wiki / SharePoint

Susan created a spreadsheet to track certificates. When someone requests a certificate, Susan logs the cert name, requestor and expiration date. Every week Susan generates a report to identify the certs expiring within 30 days. She sends an email to the owner to let them know the cert is going to expire.

  • Problem 1 – See “ownership problem” above.
  • Problem 2 – Susan goes on a vacation, or gets sick, or takes a few days of unexpectedly. Who’s going to update the spreadsheet when she’s out? Are we not going to allow any cert requests while she’s on away?

I know what you are thinking right now. “Come on Venafi, of course she gets vacation (or sick leave or whatever). No organization is going to place such an important task on just one person.” OR you might be thinking “Dang! This Venafi guy knows exactly what my life is like. This is what I deal with every day.” If your organization is trying to manage certificates using a static list that is manually maintained it doesn’t really matter which belief you have, you will fail and eventually one of those failures will be significant.

Unknown location

You know that spreadsheet or SharePoint or wiki that Susan at Company X created to track certs? Or you know some other system or tool that Tony at Company Y uses to track his certs? What if it doesn’t track where the cert is installed? For that matter, how do either Susan or Tony know where any cert is installed?

Tony has a form that he uses for certificate requests. Susan uses tickets. The form and ticket ask the requestor to provide the information on where the certificate will be installed. So now 30 days before the certificate is going to expire Susan and Tony both send email notifications to the cert owner. Susan even can open a ticket to let the owner know the cert is going to expire. In the ticket and notification, it tells the owner where the cert is installed based on the info, they provided 2 years ago (1 year ago beginning this Sept. But that’s another story that will complicate Susan’s and Tony’s lives even more). The owner responds and says they need to renew the cert. Tony and Susan both follow the processes for their respective organizations and provide a renewed cert to the owner well before the certificate expires. The countdown begins: 20 days, 10 days, 5, 4, 3, 2, 1. OUTAGE. What the heck happened?

  • Problem 1 – The cert was updated on the load balancer, but someone forgot that the cert is also installed on the app server behind the load balancer.
  • Problem 2 – The cert was installed on a cluster of web servers. It was updated on 4 of the 5 servers but somehow, we forgot it was installed on the 5th.

He said / She said

This is not always the blame game. Sometimes, maybe even most of the time, this is a communication or process issue. Here’s what goes wrong:

Company A is a hosting provider of some sort. Their customers need to use certificates to access Company A’s services. In some scenarios the customer is responsible for the cert and others Company A might be responsible for cert generation. In either case, if the cert is not managed, monitored and secured properly there will be an outage. And guess what? Even if the customer was responsible for the cert, it will still be Company As fault the cert expired because it is their service the customer is using, and the customer is always right.

In some organizations the app team is responsible for the certs their apps are consuming. In other organizations the device owners are responsible for the certs installed on their devices. In some organization the SecOps team is responsible. In other organizations it’s a mix. Who gets notified? Who must approve this spend? In these mixed responsibility situations, each potential owner thinks things like:

  • “I don’t have access to the webserver where that cert is installed so it’s not my job.” 
  • “There’s no ticket assigned with my name on it so I’m not your man.” 
  • “My app is runs on several systems which are managed by the Ops team so I’m sure they’re going to renew the cert each year.”

 There are endless variations on this theme - 9t’s easy to see how this can become confusing.

Restarting services / daemons / bindings

App owners and Ops teams are busy. Their days are filled with tasks to deploy new things and keep everything else running. Installing certs is not something that they do every day. So, when they get notified that a cert is going to expire soon, they follow the corporate process to get the cert renewed. Once the cert is renewed, they need to install it. They copy the cert and key into the appropriate location and assume all is well.

25 days later there’s an outage with a severity 1 ticket. The app owner or ops team checks the database. Nothing. They check the network. Nothing. They check the VM. Nothing. They check physical. Nothing. They check the app stack. Nothing. The check all the logs. Nothing. (If this happens on a critical system everyone’s blood pressure is ticking up a notch or two by this point).

At this point someone says, “Wait, isn’t this the system where we just renewed the cert?” Turns out someone copied the new cert to the system but didn’t do the final binding and/or restart services. Because these things didn’t happen the original cert was still in operation when it expired so they had an outage.

For organizations without a strong machine identity management program these fundamental problems tend to show up; regardless of the type of organization, their business model and how they use of machine identities.

If reading about these issues gives you a strong sense of de ja vu, and you’d like to figure out how to solve these problems once and for all, check out our approach. It’s helped many of customers eliminate certificate related outages completely.

Related posts

Like this blog? We think you will love this.
Featured Blog

Stop Certificate Outages from Increasing in Frequency and Severity

Machine identity management was a mess This company had experienced 2

Read More
Subscribe to our Weekly Blog Updates!

Join thousands of other security professionals

Get top blogs delivered to your inbox every week

Subscribe Now

See Popular Tags

You might also like

TLS Machine Identity Management for Dummies

TLS Machine Identity Management for Dummies

Certificate-Related Outages Continue to Plague Organizations
White Paper

CIO Study: Certificate-Related Outages Continue to Plague Organizations

About the author

Mark Sanders
Mark Sanders
Read Posts by Author
get-started-overlay close-overlay cross icon
get-started-overlay close-overlay cross icon
Venafi Risk assessment Form Image

Sign up for Venafi Cloud

Venafi Cloud manages and protects certificates

* Please fill in this field Please enter valid email address
* Please fill in this field Password must be
At least 8 characters long
At least one digit
At last one lowercase letter
At least one uppercase letter
At least one special character
* Please fill in this field
* Please fill in this field
* Please fill in this field

End User License Agreement needs to be viewed and accepted

Already have an account? Login Here

get-started-overlay close-overlay cross icon

How can we help you?

Thank you!

Venafi will reach out to you within 24 hours. If you need an immediate answer please use our chat to get a live person.

In the meantime, please explore more of our solutions

Explore Solutions

learn more

Email Us a Question

learn more

Chat With Us

learn more