The machine identities powering your digital transformation are at risk. They could be cheetahs.
Genetic variability is a measure. It provides quantitative insight into the likelihood that individual genotypes—the collection of genes that make you “you” and me “me” in all our glorious uniqueness—will vary from one another in a given population. As a baseline example, most of our planet’s animal species have variability in about 20 percent of their genes. About one-fifth of their genes can come in more than one form, just as human eyes can be blue, brown, green or somewhere in between.
Consider the dizzying menagerie of dogs competing at Westminster Kennel Club every year. Poodles, Labradors, boxers and mastiffs all look very different, but they have the same genes. The differences in appearance result from domesticated dogs having a higher-than-average genetic variability of about 27 percent.[i] They were bred this way on purpose. This variability determines whether a given breed is lean or round, a sprinter or an endurance specialist, or sports short, curly fur versus long, flowing Lhasa Apso locks. Human populations, by comparison, vary from one another by only about 5.4 percent. That’s why all humans look largely the same, while dog breeds appear so radically diverse.
The cheetah, on the other hand, is not so lucky. Of all land mammals, cheetahs have the lowest genetic variability, with only one percent of their genes having variable forms. In essence, more than 99 percent of all cheetahs look the same, smell the same, walk the same and sound the same. Even long-tenured zookeepers have trouble telling individual cheetahs apart. This “sameness” was likely driven by a series of environmental bottlenecks, such as droughts, disease and famines over millions of years, so that populations of African cheetahs were repeatedly whittled down to a small, partially inbred group of struggling survivors.
Why does it matter? In many cases, variability within a species safeguards its viability. Environments change, going from cold to hot and from wet to dry. Diseases like COVID-19 can suddenly flame up, with no previous exposure to test defense mechanisms and grant immunity. If a highly invariable population is sensitive to these changes or vulnerable to the new disease without a healthy supply of non-vulnerable variable types in their midst, the results can be disastrous.
The world’s fastest, sleekest and least genetically diverse land mammal is just one quirky novel virus from becoming extinct. “Genetic variability” is the reserve store—the biological life raft—that would allow a species like the cheetah to weather nature’s storms.
The Need for Crypto-Agility
Surprisingly, cheetah evolution and survival can tell us a lot about the need for agility in modern cryptographic systems, the protocols and functions that allow machine-to-machine authentication and encryption. For the purposes of this paper we’ll define crypto-agility in a way that addresses two interrelated but different problem areas. First, the problems:
Why do these reports matter? Because machine identities matter. Everyone intuitively knows (but often fails to recognize) that networks large and small—from VPNs to the world wide web—are populated by two kinds of entities, people and machines. “People” we can relate to, but “machines” are more difficult to understand. They include the obvious bits of physical and virtual hardware, but also include services and serverless workloads, algorithms and code running in containers. While people use usernames and passwords to authenticate themselves on these networks, machines use machine identities, such as TLS and other kinds of X.509-based certificates, Secure Shell (SSH) keys, code signing keys and API keys.
The two instances reported in Threatpost and in ZDNet, when viewed in the context of a world where machine identities are increasingly and unambiguously critical, together lay out the need for “crypto-agility.” Crypto-agility is, simply put, the ability for an organization to adapt fast. That adaptation may be needed in order to respond to a cryptographic threat, as in the case of SHA-1 or vulnerabilities like Heartbleed, or it may be required because of issues with the cryptographic supply chain, as in the case of Let’s Encrypt, which provides cryptographic services in the form of TLS certificates.
An example of two kinds of machine identities at work, with TLS certificates and SSH keys represented
Importantly, these adaptations must be made rapidly and without impacting the applications and services they support. As TechTarget reports, “The idea behind crypto-agility is to adapt to an alternative cryptographic standard quickly without making any major changes to infrastructure.”[vi] In a world of digital transformation, the application is king. It can’t be pulled offline and rebuilt because of an issue arising from underlying cryptography or a certificate authority.
Crypto-Agility Requires Visibility, Intelligence and Automation
Achieving crypto-agility is more difficult than it first appears, especially for large organizations. Most don’t know how many keys or certificates they have, where they are located in their infrastructure or who on their staff has the access and knowledge to correct them.
Certificate-related website or service outages graphically illustrate the fact that all TLS certificates have an expiration date. When a certificate expires, the site or service is untrusted by modern software, can be rejected or fail to respond altogether. Several recent incidents demonstrate this:
Microsoft Teams users experienced a massive certificate-related outage in February, 2020
While the maximum duration of a certificate has typically been three years, browser developers like Apple have increasingly pushed for shorter durations to add more frequent review of the security configurations and the business ownership of machine-to-machine connections. As Computer Business Review recently put it: “Apple is planning to more than halve how long its Safari browser will trust TLS certificates, cutting the time to just 13 months, putting fresh pressure on organizations to get their certificate management practices in shape.” The article goes on to say “As of September 1, 2020, Apple is setting a hard trust limit of 398 days. (The current acceptable duration is 825 days). Certificates issued on or after that date with term beyond 398 days will be distrusted in Apple products.”[xi]
While this is good security thinking, Apple’s decision will most likely exacerbate the “certificate expiration” problem experienced by Microsoft, O2, LinkedIn, Equifax and many others. Large organizations simply don’t have the visibility, the intelligence (in the form of specific insight into the “who, what, why and where” of TLS certificates and keys) or the automation to manage the tens of thousands of certificates their sites, services and the underlying machines rely on.
TLS certificates are used liberally throughout the network infrastructure that powers digital transformation, from application servers to load balancers to printers and everywhere in between. So liberally, in fact, that their location and configuration is often unknown. TechValidate, a service that captures feedback and comments from information technology users reports that “On average, IT security professionals found 57,420 additional SSL/TLS keys and certificates using Venafi [a machine identity protection platform] that were previously unknown.”[xii]
But certificate-related outages due to expired certificates are just one symptom of a much larger problem that stems from the sprawl of machine identities, their ubiquity in all aspects of life and their inherent complexity. Visibility, intelligence and automation address the problem in these ways:
Visibility: The first step in any security framework is to inventory the assets you have. With machine identities, information security teams need to ask: Where are the certificates our web servers, applications and services depend on? Where are the SSH keys our system administrators have created to authenticate from one machine to another? Are private code signing keys being left on developer’s servers, only to be stolen and used to sign malware, as in the ShadowHammer attack on ASUS in Spring 2019 that infected a million computers belonging to the manufacturer’s customers? Where are the certificates and keys for mobile devices and IoT devices? Can we account for every machine-to-machine authentication event in the enterprise and be assured that none run invisibly?
“Digital Visibility” is not a trivial thing for small companies—let alone Global 5000 enterprises. These organizations need to find machine and device identities across their far-flung networks, whether on traditional web servers, security devices, load balancers or cloud-based databases. Visibility is needed across cloud containers and virtual systems, the applications and services they’re running and the connections they’re making to other systems, whether those systems are in public or private clouds or in traditional on-premises networks.
Paul Haywood, former head of information security at Barclays, offers this cautionary tale about SSH keys that authenticate and encrypt connections for back-end systems and their administrators. SSH keys provide password-less, non-expiring replacements for system access, and can be copied, shared and transported with ease. “We suspected at the time Barclays had close to 500,000 SSH keys, both public and private, some with root access, throughout our various networks, so we launched an inventory and intelligence-gathering effort across all of them. A few months later we had accounted for nearly 4 million keys. After a year of dedicated effort from the Barclay’s IT team, the number came down to a manageable and sustainable 600,000 keys. We understood the risk but had no idea of its enormity.”[xiii]
Intelligence: When an organization has accounted for all the machine identities and their machine-to-machine authentication events, they need intelligence about each connection. Is it relying on a potentially forged or fraudulent certificate? Is it ready to expire, leaving gaps in our visibility, unable to discern risks or weakness in your security posture? Were TLS certificates issued by the appropriate CAs? Recall the Let’s Encrypt example used earlier: If it’s not an allowed CA, per your information security policy, can you find it where it has issued certificates and replace them?
Even worse, is a certificate responsible for not just delivering applications and content to end users, but for enabling your network security tools? The Equifax hack of 2017 can trace its cause to an exploited vulnerability in Apache Struts. But why did it go undetected for nearly three months? Equifax had traffic inspection tools that should have revealed the anomalous traffic, but that equipment relied on a TLS certificate to open and inspect encrypted traffic. A report by U.S. House of Representatives Committee on Oversight and Government Reform[xiv] showed that the equipment’s TLS certificate had expired 10 months before the breach. As a result, it was blind to network anomalies, and this blindness allowed the attackers to remain in the network, pivoting to over 50 different databases, for over 70 days.
Knowing which of your network systems and tools—whether they are delivery-focused or security-focused—rely on certificates and keys is critical intelligence. This is true when an attack surface needs to be analyzed or when expenses need to be prioritized.
Automation: Renewing, revoking or updating digital keys and certificates can be a laborious and time-consuming process. But with an automated system, expiring certificates can be detected and renewed, regardless of the CA or internal system that originated them, before they cause blind spots or operational outages. Weak or guessable SSH keys can be rotated and reinforced, without the costly delays that come from tracking down owners and system administrators.
Perhaps most importantly, the tools an enterprise’s operations and defenses rely on, like load balancers and threat protection systems, can maintain visibility into the sites and services they serve and protect. As an example: if Equifax had automated the renewal of the certificate that supported their traffic inspection system it’s entirely possible that their 2017 breach would have been detected in minutes rather than months.
The Pillars of Crypto-Agility
To achieve crypto-agility, enterprises must be able to respond promptly to mass certificate and key replacement events. At the same time, they must be able to demonstrate policy compliance for all certificates and identify any anomalies. This requires comprehensive visibility and detailed intelligence, as well as automation to enable replacement and renewal at machine speed and scale.
If the current rate of growth in certificates weren’t enough reason to master crypto-agility, another is on the horizon: quantum computing. Because modern cryptography relies on mathematical algorithms can take years or even centuries for traditional computers to break, we have enjoyed a relatively secure few decades. Hash algorithms like MD-5 and SHA-1 have been replaced by stronger systems with longer key lengths. But quantum computer will provide an exponential increase in on-demand computing power, putting all know security algorithms at risk. As Rob Stubbs reported in the Cryptomathic blog, “This will break nearly every practical application of cryptography in use today, making e-commerce and many other digital applications that we rely on in our daily lives totally insecure.”[xvii]
Adapting fast to new encryption methods and protocols may become a necessary survival skill.
The Pillars of CA-Agility
But what if no “cryptopocalyspe” occurs? No new bugs or vulnerabilities arise, and there are no broken hashing algorithms, like MD-5 or SHA-1. There are no compromised cryptographic libraries like the one Microsoft announced on January 20 of this year, that allows for remote code execution via spoofed TLS certificates.[xviii] Even in this ideal state, organizations need to protect against CAs that have poor security management practices, go out of business or whose systems are compromised.
Alongside the requirements outlined above, organizations also need to:
The Challenge of Scale
If all of this makes sense, superimposing one more lens on the situation will add additional clarity and urgency—the lens of “scale.”
Digital transformation has radically changed the way we create, host and use online services and applications. Monolithic application architectures of the past might have used one TLS certificate to authenticate and encrypt connections between users and their application. But today, containers and mesh architectures have led to an explosion in the number of TLS certificates required to deliver a service, and a single application might require dozens or even hundreds of TLS certificates.
In the last thirty years, the number of TLS certificates required to enable and secure online applications and services has exploded.
Christian Posta, field CTO at solo.io, puts it this way in his blog on modern architectures: “As we break applications into smaller services, we increase our surface area for attacks. … Do you secure your internal applications with SSL/TLS? All of the communication that we used to do inside a single monolith via local calls is now exposed over the network. The first thing we need to do is encrypt all of our internal traffic.”[xx]
DevOps practices, cloud-native development, lift-and-shift cloud patterns, IoT, virtualization and containerization are all driving the increased use of TLS certificates to provide the end-to-end encryption Posta calls for. Who is managing these certificates? How are they configured? Who is revoking or renewing them as needed?
Industry research firm Gartner, in the paper “Technology Insight for X.509 Certificate Management” states that “Organizations with roughly 100 or more X.509 certificates in use that are using manual processes typically need one full-time equivalent (FTE) to discover and manage certificates within their organizations.”[xxi]
In contrast to this is the number of active domains in the world, with Verisign reporting over 326 million domain names registered across top level domains (TLDs) like .com and .net,[xxii] and with the majority of these domains having many subdomains or addressable applications and services. Modern, large organizations that are engaged in digital transformation efforts are likely counting their TLS certificates not in the hundreds, but in the tens of thousands. And the amount of TLS certificates that need to be managed is increasing by the hour.
But What Does COVID-19 Have to Do with It?
The title of this paper, “Cheetahs, COVID-19 and the Demand for Crypto-Agility,” is not merely meant as an attention-grabber. It points directly at one of the issues crypto-agility is designed to solve: the ability to quickly adapt to sudden changes in circumstance.
This paper is written in the shadow of 2020’s novel coronavirus and the illness it creates, COVID-19. The United States has become the epicenter of a pandemic, with more than twice the number of confirmed cases than China and with a particularly strong grip on New York City. Why has this virus been so easily transmitted?
Without going into detail, the virus evolved to fit the way humans gather, communicate, breathe and even talk. Spike proteins on the virus surface made it easy to be embed itself in soft, wet, warm tissues. The spike proteins also contain a site that recognizes and becomes activated by an enzyme called furin, a host-cell enzyme that exists in various human organs such as the lungs. Apart from its biology, it is extraordinarily lightweight and can linger in the air for 30 minutes and travel up to 4.5 meters.[xxiii] In dense cities like New York, Seattle and New Orleans, where people gather in coffee shops and markets and rely heavily on mass transit, the 4.5-meter distance and 30-minute survival time are ideal tools for transmission. Biologically, the novel Coronavirus a great fit for human anatomy and behavior.
And how have we adapted to change our behavior in the face of the pandemic? What was normal behavior four weeks ago—such as plane travel or large, in-person gatherings—has been replaced by remote meetings and video conferencing. In times like these, the businesses and organizations that are nimble thrive. Those that aren’t, don’t.
As far as COVID-19 itself goes, there’s good news: Humans, unlike cheetahs, have a pretty fair amount of genetic diversity. We are in fact three times more diverse in our genetic makeup than cheetahs.[xxiv] This means the likelihood of there being some random combination of genes, ribosomes or proteins in an individual’s genotype that makes them slightly less susceptible to coronavirus and COVID-19 than another is exponentially higher than if we were cheetahs. In the next year or so, one or more of those variability factors will be isolated to deliver a vaccine. At the end of the day, this variability makes us stronger (at least in this one narrow sense) than cheetahs.
Maintaining the same “strength through diversity” in our networks is equally important, though for wildly different reasons. We may still need on-premises data centers for highly privileged information—but we also need cloud platforms to support fast-charging digital transformation initiatives, such as spinning up thousands of TLS-protected instances in seconds. We may need special TLS certificates for endpoints or for thousands of suddenly remote users. In both cases, crypto-agility allows for the same policy—the same set of protective services, like well-configured TLS certificates and reliably produced and tracked SSH keys—to be assessed and enforced across increasingly diverse networks.
With diversity also comes flexibility: Good business sense tells us not to allow popular cloud platforms like Amazon AWS, Google Cloud Platform and Azure to lock us in perpetually or put us in a position where they compete for our business. Crypto-agility lets us shift between a variety of platforms and a variety of public and private CAs for any business, security or productivity reason that arises.
And if there is an equivalent of a coronavirus among machines—whether another cryptographic library bug or a compromised certificate authority—we need to be able to quickly pivot workloads, services and configurations back to known-good, safe state. Crypto-agility provides that roadmap.
 Time magazine published an article listing three reasons why the SoftBank “flopped”. Reason three was “An inauspicious service outage.” It went on to say, “On Dec. 6, a service outage lasting more than 4 hours hit 34m customers across the country, sending the price of SoftBank shares down by as much as 6%.”