It's time! August 2018 represents the 7th time I've published a report of the Alexa Top 1 Million sites so let's get stuck in and see what changes have taken place over the last six months on the biggest sites on the web.
As always the data from my crawler is available along with all of the data you see in this report. I recently launched https://crawler.ninja which you can read about in the launch blog but all of the details on how to get the raw data and statistics are on there, including how to donate if you'd like to support the crawler (which would be really awesome because it's expensive!). That said, let's get on to what you all came here for, the results!
Diving right in I'm really please to say that all of the metrics we care about are still going up! This is genuinely great news and shows that we are continuing to make progress on making the web more secure.
Here's the one we've all been waiting for, and this one is a pretty big announcement too. Not only because we've seen amazing growth in HTTPS again in this crawl, but because we've passed through 50% of the Alexa Top 1 Million sites actively redirecting to HTTPS for the first time!
???50%+ of Top 1 Million sites on HTTPS!
This is really awesome progress and you can see we're maintaining strong growth in the HTTPS department. In the previous report it looked like the growth of HTTPS had slowed, which it had at the time, but as you can see from the graph here, adoption has picked up again and we're continuing to see that sharp incline sustained. The growth shown here in this graph is unrivaled in any other security mechanism and if you think about the effort required to achieve this, how impressive it is becomes crystal clear.
Looking at the history we've made serious progress in the last couple of years and again we're continuing to see maintained growth which is exactly what we need. The web is now well on its way to being 100% encrypted and long may it continue.
HTTP Public Key Pinning
In the Feb 2018 report I saw a notable growth in the use of HPKP, PKP was up 183% and PKPRO was up 3,924% too! I noted at the time how I'd recently given up on HPKP and how Chrome was deprecating it so we might see a reversal of that trend, and we have. There are still far more sites lower down the ranking using HPKP, thanks almost exclusively to Tumblr, so the distribution is still the same, but the numbers are a lot less now.
The use of PKP is down -18% and the use of PKPRO is down -5% so rather than continued growth like all other metrics, we're seeing sites drop the header now. If you check the distinct PKP PKPRO values in the top 1 million sites, you can see that thousands of them are all using the same policy, and that's because they're all related to Yahoo in some way, mostly Tumblr sites. You can see the list of sites using PKP PKPRO and you'll notice that a lot are indeed subdomains of Tumblr and they use either https://cspreports.srvcs.tumblr.com/hpkp or http://csp.yahoo.com/beacon/csp?src=yahoocom-hpkp-report-only as their reporting endpoint.
It's also great to see that the use of Security Headers is also continuing to grow, with a notable increase in the adoption of some headers in particular. We saw a 40% increase in CSP which is epic and a 23% increase in HSTS which is following behind the increase in HTTPS usage.
Whilst we did see a slight reduction in the use of CSPRO, we saw a considerably larger increase in the use of CSP. My guess on what's most likely happening is that sites are moving from a report only version of a policy to an enforced version, which shows progress in deployments of CSP.
We've seen some truly massive growth in HTTPS and with that comes an associated growth in the use of CAs. One of the CAs that seem to be helping the growth in adoption is Let's Encrypt, and they seem to be seeing the largest amount of growth themselves too!
Their presence is in the top 1 million has seen similar growth across the board, from the very top to the very bottom they've increased their presence.
Their own stats show exactly how well they're doing with 53.5 million active certificates currently issued and a rough average of 600,000 issued per day! Keep up the good work!
Despite seeing strong growth in HTTPS across the top 1 million sites, EV certificates have not seen much of that growth at all.
The graph is pretty flat and overall there really hasn't been much change and I actually just wrote a blog post about sites that used to have EV. Using my crawler data I can track sites that used to have EV certs and have switched away from them to either OV or DV certs. With such a massive flood of new sites coming to HTTPS and the proposed benefits of EV, I'd have thought we'd at least see a little more increase in the use of EV but we really haven't.
Certificate Authority Authorisation is a new DNS record that all sites should be using, but most aren't. The last scan in Feb 2018 was the first time I tracked the use of CAA so this scan is the first time we can look at growth from one scan to the next. Whilst we have seen an increase in the use of CAA, it's not a huge increase...
Some of the spikes on that graph looked a little odd so I checked it out and it turns out they're not errors. There are literally huge collections of sites within those groups that deploy CAA on their domains, and the vast majority of them are sites with a .pl TLD. I can't explain it, but the data is definitely right. Take a look at the list of sites using CAA but remember that updates daily so the sites could move around the list so I took a screen shot to show the results on the day of the report.
A new metric to track in this report is the use of the security.txt file. If you've not heard of this file then you can learn more about it in my blog Say hello to security.txt but in short, it's a file where you can put security contact information for your site/company so that researchers can make responsible disclosures more easily. You can see my security.txt file right here.
It's really simple and is intended to make the information easy to find in a reliable location. Even though it's a pretty new standard we have seen some good adoption but a lot of the large spikes here are down to Tumblr creating the files for sites they manage.
Whilst I did track Referrer Policy in the Feb 2018 report, I didn't present any data on it as it was a fairly new addition at the time. It has since been added to Security Headers and counts towards your grade there and we are seeing people starting to use it.
Interestingly this header doesn't show the same trend as other Security Headers and has a fairly consistent deployment across all of the top 1 million sites. It's a little strange to see such a break but perhaps this will change over time as the header becomes more widely deployed?
Another new Security Header, Feature Policy is so new that whilst I've added support to the Security Headers scan, it doesn't count towards your grade yet. How new this header is can also be clearly seen in the crawl results.
Those are some really low numbers but, in honesty, this is where I'd expect it to be but not where I'd like it to be. You can also see, even at such low numbers, the trend emerging that follows the other headers of higher usage in the higher ranked sites and a decrease as you go down the ranking. I wonder if that will be maintained as usage grows up to the Feb 2019 where I hope numbers will be a lot bigger!
This is the overview of some of the crawl data but of course, all of the crawler data is now much more available.
Security Headers Grades:
Sites using strict-transport-security:
Sites using content-security-policy:
Sites using content-security-policy-report-only:
Sites using x-webkit-csp:
Sites using x-content-security-policy:
Sites using public-key-pins:
Sites using public-key-pins-report-only:
Sites using x-content-type-options:
Sites using x-frame-options:
Sites using x-xss-protection:
Sites using x-download-options:
Sites using x-permitted-cross-domain-policies:
Sites using access-control-allow-origin:
Sites using referrer-policy:
Sites using feature-policy:
Sites redirecting to HTTPS:
Sites using Let's Encrypt certificate:
Top 10Server headers:
Top 10Server headers:
Top 10Certificate Issuers:
C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3 159,635
C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO RSA Domain Validation Secure Server CA 54,328
C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO ECC Domain Validation Secure Server CA 252,811
C = US, ST = Arizona, L = Scottsdale, O = "GoDaddy.com, Inc.", OU = http://certs.godaddy.com/repository/, CN = Go Daddy Secure Certificate Authority - G2 33,823
C = US, O = Amazon, OU = Server CA 1B, CN = Amazon 13,147
C = US, ST = TX, L = Houston, O = "cPanel, Inc.", CN = "cPanel, Inc. Certification Authority"13,090
C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 High Assurance Server CA 12,108
C = US, O = DigiCert Inc, OU = www.digicert.com, CN = RapidSSL RSA CA 201810,576
C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA 10,352
C = BE, O = GlobalSign nv-sa, CN = AlphaSSL CA - SHA256 - G2 7,846
Top 10 Cipher Suites:
Top 10PFS Key Exchange Params:
ECDH, P-256, 256bits 263,502
ECDH, P-521, 521bits 4,842
ECDH, P-384, 384bits 4,051
DH, 1024bits 3,619
DH, 2048bits 1,285
DH, 4096bits 125
ECDH, B-571, 570bits 62
DH, 3072bits 9
ECDH, brainpoolP512r1, 512bits 7
ECDH, secp256k1, 256bits 2
Top Key Sizes:
Sites using CAA:
I'd highly recommend checking out this folder on the crawler site!
This is a gathering of other observations that I've made using the crawler data.
Whilst we're seeing an increase in the adoption of HTTPS, we're not seeing ECDSA keys grabbing as much of that new adoption as I'd like. It seems that RSA will remain the top choice for quite some time.
There isn't much change in cipher suites or the trends in their use either, these graphs remain surprisingly similar despite the huge shifts in total HTTPS usage.
At first I was surprised to see a drop in the amount of support for TLSv1.2 despite a huge growth in HTTPS, how can that be possible? Then I realised, the crawler doesn't gather data when using TLSv1.3 and there's probably a good portion of sites using that now. I've made a note to upgrade the crawler and hopefully we should see some really good adoption stats for TLSv1.3 in the next crawl!
If you want to see the Google Sheet that I produced all of these graphs from, which contains the raw scan data too, you can get that here.
The raw data is all available through https://crawler.ninja and you really should check out the text files for for all of the stats too. They contain a lot of interesting data and you can even get them in JSON format if you want to do something more than look through them. These JSON files are what powers https://whynohttps.com/ to provide the list of HTTP sites!