Latest Posts (20 found)

Oops, my UUIDs collided!

This post is part of a collection on UUIDs . Universally Unique Identifiers (UUIDs) are handy tool for a distributed systems architect. The provide a method by which a distributed system can generate IDs without meaningful risk of duplicates. These tools are very widely used and do their job quite well. This post describes instances where UUID generation fails to provide uniqueness, known as a UUID collision. This mystical event is mathematically so unlikely that you can often dismiss it in a hand-wave, yet collisions occur. This post provides specific real-world examples of collisions and explains what went wrong. When using UUIDv4, you need to be sure that your random number generator is working properly. If you use a weak source of entropy, you’ll harm your collision resistance. The UUID spec indicates that you SHOULD use a Cryptographically Secure Pseudo-Random Number Generator (CSPRNG) to mitigate this concern. But it allows fallbacks when a CSPRNG is not available. The JavaScript standard includes Math.random(), a simple random number generator. Math.random() is not a secure random number generator and should not be used in security sensitive contexts. But the quality of Math.random() output can be surprisingly poor , making it especially unsuitable for use in UUID generation. There are several different JavaScript runtimes and each can implement Math.random() differently. This article describes collisions of randomized ID (although not UUIDs) when using the MWC1616 implementation in the V8 implementation of JavaScript. The post describes real-world collisions and highlights how bad random number generators can be. Thankfully, V8 has since switched xorshift128+, which produces better randomness. A UUID implementation that was vulnerable to this issue was the JavaScript uuid library (I’ll call it uuidjs to avoid confusion). uuidjs release before 7.0 would use a CSPRNG when available, but fall back to Math.random() otherwise. This concern was disclosed as a security vulnerability (CVE-2015-8851) and the fallback was removed. But uuidjs users experienced an even worse class of collision. GoogleBot, a JavaScript enabled web crawler, is known to use an implementation of Math.random() that always starts from the same seed. This is an intentional design decision by Google so that the crawler won’t consider client-side dynamic content as a change to the underlying page content. Some users of uuidjs found that GoogleBot was sending network requests containing colliding UUIDs . If you search Google for the UUIDs listed in the bug report you’ll find a diverse set of websites are impacted by this issue. UUIDv4 5f1b428b-53a5-4116-b2a1-2d269dd8e592 appears on many websites If you search for this UUID on other search engines you may only see the uuidjs issue (and perhaps this blog post). This specific UUID is an artifact of how Google indexes web pages. In summary, you may experience UUID collisions if your random number generator is a poor source of entropy. Real world random number generators can be comically broken. I accidentally experienced a UUID collisions when I was an intern. I was writing some C++ code that defined a couple COM objects . Each COM object needs to have a unique class ID (CLSID) and Microsoft uses a UUID for this (note: Microsoft calls them GUIDs). The natural thing to do when creating yet another COM object is to copy the source code of an existing one. I copy/pasted some code and forgot to change the CLSID, resulting in a collision. Forgetting to change a hard coded UUID is a common issue, and not just for interns. Users have found BIOS UUID collisions for the ID “03000200-0400-0500-0006-000700080009”. The issue appears to be a hardware vendor that ships devices with this hardcoded UUID with the expectation that OEMs will change it. The OEM doesn’t change it, and users experience UUID collisions. If you reuse a UUID, you’re obviously going to have a collision. What if we introduce an adversary to the system? If you accept UUIDs from outside your sphere of trust and control then you may encounter UUIDs that are generated with the intent of collision. As my internship example shows, the attacker doesn’t need to do anything complicated, they just send the same UUID twice. Any client-side generation of UUIDs is at risk of this class of collision. As always, use caution when handling untrusted data. Unfortunately, UUIDv3 and UUIDv5 have not aged well. These get their collision resistance entirely from cryptographic hash functions and the underlying functions have long been broken. Historically, it made sense to think about MD5 and SHA-1 hashes as unpredictable values and you could model their collision resistance as random sampling (the birthday paradox again). But once collisions could be forced, it was no longer safe to model them in that way. A malicious user who can control the inputs to these hash functions could trigger a collision. Since the UUIDv3 and UUIDv5 algorithms simply append the namespace to name, it’s trivial to generate UUID collisions from existing hash collisions. I haven’t seen an example of this being demonstrated, so here goes. UUIDv3 collision example This prints: UUIDv5 collision example This prints: Credit goes to Marc Stephens for the underlying MD5 collision and the authors of the SHATTERED paper for the underlying SHA-1 collision. UUID collisions are a fun topic because the first thing you learn about UUIDs is that they are guaranteed to be unique. Looking at how UUIDs can collide in practice is a good overview of the sort of problems that pop up in software development. Broken dependencies, error-prone developers, hard-coded values, weak cryptography, and malicious inputs are familiar dangers. UUIDs are not immune.

0 views

Why UUIDs won't protect your secrets

This post is part of a collection on UUIDs . Indirect Object Reference (IDOR) occurs when a resource can be accessed directly by its ID even when the user does not have proper authorization to access it. IDOR is a common mistake when using a separate service for storing files, such as a publicly readable Amazon S3 bucket. The web application may perform access control checks correctly, but the storage service does not. Here’s vulnerable Django code which allows a user to view their latest billing statement: While Django ensures the user is logged in and only provides them with bills they own, S3 has no concept of Django users, and performs no such authorization checks. A simple attack would start from a known URL and increment the bill ID: The attacker can keep trying bill IDs, potentially accessing the entire collection of bills. What if we changed the Django model to use UUIDs for the primary key instead of an auto-increment? The new URLs will look like: my-bucket.us-east-1.s3.amazonaws.com/bill-9c742b6a-3401-4f3d-bee7-6f5086c6811f. UUIDs aren’t guessable, so the attacker can’t just “add one” to the URL to access other user’s files, right? Unfortunately, this is only a partial fix. Even when URLs are unguessable, that doesn’t mean an attacker can’t learn them. A classic example starts with a former employee who used their personal computer for work. Hopefully their user account was quickly disabled, blocking them from accessing the company’s web application. But sensitive URLs may still exist in their browser history. Even a non-technical attacker can pull off this attack, just by clicking through their browser history. Thankfully, many companies require employees to use company-issued devices when performing work, so this attack may be limited to former employees who violated that rule. The accidental leaking of URLs is probably a more reasonable concern. For example, if only managers are authorized to view bills you need to be careful not to leak the bill ID in other views where other employees have access. If you use secret UUIDs, think of them as toxic assets. They taint anything they touch. If they end up in logs, then logs must be kept secret. If they end up in URLs, then browser history must be kept secret. This is no small challenge. Another concern for leaked UUIDs is rotation. Whenever a secret key is compromised, leaked, or known to have been stored improperly, it should be changed. The same holds true for secret URLs. Make sure you have a way to rotate secret URLs, otherwise you may end up stuck in a compromised state. Again, no small challenge. If this sounds like a huge pain… it is. Let’s find a better solution. The best approach is to ensure every request for sensitive data is authorized. One fix is to route file access through the web application. Continuing our example, the user would access /api/bill/100 and the file would be streamed from the storage through the web app to the user’s browser. If the user tries to access /api/bill/101, where they lack authorization, the web application can deny the request. Make sure the storage bucket is private, such that access must route via the web app. This approach is a good quick fix, but there are other approaches to consider. If your storage provider is Amazon S3 you should consider pre-signed URLs . These URLs allow the browser to download the file directly from S3, without streaming through the web app. The URL contains a cryptographic signature with a short expiration date. These URLs are still sensitive, but the short expiration mitigates a number of concerns. Again, make sure the storage bucket is private. A key benefit of the pre-signed URL approach is that it offloads file access from your web application, reducing load on the application server. Let’s consider a well-known application that doesn’t follow this advice. YouTube, a popular video hosting service, allows uploaders to mark videos as “unlisted”. This is a compromise between public and private. The owner of the video can copy their video’s URL and share it out-of-band, like in a private chat room. This way, people in the private chat room can view the video, but the owner doesn’t need to grant them access one-at-a-time and the viewers don’t need to log in. In essence, anyone who knows the URL is considered authorized from YouTube’s perspective. YouTube visibility selection This approach uses unguessable URLs, which contain a random video ID, like . This appears to be 11 random alphanumeric characters, which offer around 64 bits of entropy. This is suitably unguessable, but the security is questionable. Once the URL is shared with others, the owner loses the ability to assert access control over the video. An authorized viewer can choose to share the URL with others. Users may expect that the video has proper access control restrictions and share the URL in a public-facing document, not realizing that leaking the URL leaks the video. Consider unlistedvideos.com, an index of unlisted YouTube videos. Users who discover unlisted videos can upload those URLs to the site, thus leaking the content to a broad audience. The large number of videos listed on the site shows the poor access control properties afforded by this access control method. If your unlisted content leaks to unauthorized viewers, you can regain control by marking the video as private. This prevents anyone from accessing the video, until you grant their account access. Of course, you probably chose to make the video unlisted to avoid needing to manage individual account access. You could also try re-uploading the video, marking it as unlisted, and sharing the new link, but the risk of a subsequent leak remains. Another example of this design appears later in this blog post, AWS billing estimates. AWS appears to use 160 bits of entropy to protect these URLs. Here’s the verbiage AWS uses when you create a share link. AWS billing share dialog Interestingly, I’m not seeing a way to delete a billing estimate once shared. The creator appears to lose all ability to manage access once the link is shared outside their sphere of control. Be very careful not to put sensitive data in your billing estimates. Unlisted content is an example of IDOR as an intentional security design. The uploader is expected to decide if unlisted offers the right security posture for their content. There are use cases where the effort needed to individually grant users access outweighs the risk of using unlisted. Not everyone is dealing in highly sensitive content, after all. OK, maybe you want to create something like YouTube unlisted content, despite these concerns. In that case, we should ignore security concerns related to “leaked URLs” as that is “by design”. Unlisted URLs are sort of like bearer tokens or API tokens which grant access to a single resource. Let’s focus on attacks that guess URLs and consider how guessable UUIDs actually are. UUIDv4 contains 122 random bits, much more than the 64 bits of a YouTube video ID, so there’s little to contest about UUIDv4 guessability. But what about newer formats like UUIDv7? UUIDv7 embeds a timestamp at the start such that the IDs generally increase over time. There’s some claimed benefits, such as improved write performance for certain types of databases. Unfortunately, the timestamp makes UUIDv7s easier to guess. The attacker needs to figure out the timestamp and then brute-force the random bits. Learning the timestamp may not be that difficult: users sometimes have access to metadata for resources they don’t have full permission to access. In our “latest bill” example, the bills are probably generated by a batch job kicked off by cron. As such, the bills are likely created one after another in a narrow time period. This is especially true if the attacker has the UUID of their own bill as a reference. An attacker may be able to guess a small window around when the target object’s UUID was created. Other UUID generation methods recommend creating UUIDs in large batches and then assigning them to resources, in order, as resources are created. With this approach, the UUID timestamp is loosely correlated with the resource creation timestamp, but doesn’t contain a high precision timestamp for the resource creation. This mitigates some classes of information leakage related to timestamps. Unfortunately, it also bunches UUIDs together very tightly, such that many IDs will share the exact same timestamp. Learning one UUID leaks the timestamp of the entire batch. At first glance, the random bits seem to save us. There are still 74 random bits in a UUIDv7; still more than a YouTube video ID. That’s 2 74 possible random suffixes (18,889,465,931,478,580,854,784). Well beyond what an attacker can reasonably brute-force over the Internet. I would end the blog post here, but UUIDv7 offers additional optional methods which we need to consider. The spec allows monotonic counters to be used when multiple UUIDs are created within the same timestamp. This ensures that IDs created by a single node are monotonically increasing, even within a single millisecond. The first UUID in a given timestamp uses a randomized counter value. Subsequent IDs in the same millisecond increment that counter by one. When the counter method is used, an attacker who learns one UUIDv7 can predict the counters of neighboring IDs by adding or subtracting one. A random suffix still exists, and that would still need to be brute-forced. Of note for Django users, Python 3.14 introduced UUIDv7 in the standard library. Python uses a 42-bit counter , which is the maximum width the spec allows. That means Python’s UUIDv7 only has 32 random bits, offering only 2 32 possible random suffixes (4,294,967,296). Four billion seems like a big number, but is it large enough? On average, this is 1,657 request per second averaged over a month. Is that possible? S3 claims it will automatically scale to “at least 5,500 GET requests per second”. On the attacker side, HTTP load testing tools easily scale this high. k6, a popular load testing too, suggests using a single machine unless you need to exceed 100,000 request per second. The attack fits within the systems limits and appears feasible. Adding a rate limiter would force the attacker to distribute their attack, increasing attacker cost and complexity. Cloud providers like Amazon S3 don’t offer rate limiting controls so you’ll need to consider a WAF. This changes the user-facing URL, so adding a WAF may break old URLs. There’s cost asymmetry here too. An attacker who guesses 2 32 S3 URLs will cost your service at least $1,700 on your AWS bill . If you don’t have monitoring set up, you may not realize you’re under attack until you get an expensive bill. The attackers cost could be as low as a single machine. I’m uneasy about the security here, as the attack appears technically feasible. But the attack doesn’t seem very attractive to an attacker, as they may not be able to target a specific resource. An application that had juicy enough content to be worth attacking in this way would probably worry about “URLs leaking”. In that case, unlisted URLs are a poor fit for the product and the fixes listed earlier should be used. Which renders the entire point moot as you should never end up here. But it’s not an entirely theoretical concern. If you search on GitHub, you can find examples of applications that use UUIDv7 IDs and the “public-read” ACL. The sensitivity of the data they store and the exact UUIDv7 implementation they use varies. Nevertheless, 32 random bits is too small to be considered unguessable, especially for a cloud service like S3 which lacks rate-limit controls. A common theme of UUIDv7 adoption is to avoid exposing the IDs publicly. One concern driving this trend relates to IDs leaking timing information, which can be sensitive in certain situations. A simple approach uses a random ID, perhaps UUIDv4, as the external ID and UUIDv7 as the database primary key. This can be done using a separate database column and index for the external ID. Another intriguing approach is UUIDv47 which uses SipHash to securely hash the UUIDv7 into a UUIDv4-like ID. SipHash requires a secret key to operate, so you’ll need to manage that key. Unfortunately, rotating the key will invalidate old IDs, which would break external integrations like old URLs. This may prevent systems from changing keys after a key compromise. Caveat emptor. Either of these approaches could help in our “unlisted URLs with UUIDv7” example. Postgres currently uses the “replace leftmost random bits with increased clock precision” method when generating UUIDv7 IDs. Postgres converts 12 of the random bits into extra timestamp bits. This means Postgres UUIDv7 timestamps have nanosecond granularity instead of millisecond. As such, Postgres UUIDv7s have 62 random bits, in the current implementation. So when it comes to UUIDv7 guessability, it really depends on what optional methods the implementation chooses. Be careful when adopting newer UUID versions as the properties and trade-offs are distinct from earlier versions. The authors of UUIDv7 knew about these guessability concerns and discuss them in RFC 9562. The spec offers a “monotonic random” counter method, which increments the counter by a random amount instead of one. While their solution would help mitigate this attack, I wasn’t able to find an implementation that actually uses it. RFC 9562: Universally Unique IDentifiers (UUIDs) (2024) Python uuid.uuid7 100,000,000 S3 requests per day k6 load generator Postgres UUIDv7 generator

0 views

The UUID collection

I’ve been getting back into blogging and am happy to share that my next group of blog posts will be a collection. When I write I often end up with a massive amount of content. I have a hard time keeping posts focused on a single theme; writing is thinking and there’s just so many fun angles to ponder! As a post is polished for publication, I trim sections to the “cutting room floor”. This time I had trouble letting go, and I spun a couple blog posts out. Presented here in no particular order and on no particular schedule. This collection covers UUIDs, especially collisions, guessability, and implementation quirks. Why UUIDs won’t protect your secrets : UUIDs and Indirect Object Reference

0 views

The most popular email providers

I downloaded the Tranco list of top domain names and ran DNS MX lookups to detect mail servers. I used the top 100,000 domain names, which should be a reasonably sample of large and popular sites. I found that 72% of the scanned domains had a mail server configured (MX record was present). Here are the most popular email hosting platforms, as of August 2025. I’ve aggregated by the second-level domain name of the MX records, then truncated to providers supporting at least 25 of the scanned domain names. Google and Microsoft grab the top spots, acting as email provider for about 60% of the email-enabled domains analyzed. A diverse group of hosting providers, registrars, email and security products form the long-tail. The scanning code is available on GitHub

0 views

Choosing Vim over VSCode

You’ve probably noticed that VSCode is very popular among programmers. As I drop into a Vim session, yet again, I wonder if I’m missing something. What keeps me coming back to Vim? Here’s the roughly same view as seen in VSCode versus Vim. I’m working on a python file (sql.py) which is reading some content written by a go file (sql.go) from another project. This is exploratory code for a project I’ve written about before . I have a SQLite shell open so I can explore the schemas and table content. And I usually keep an extra terminal open for one-off commands. It took me a couple tries to make a screenshot that was fair to VSCode. Initially, I had a terminal on the bottom of two side-by-side editor views. This left tons of white space at the ends of lines and made the terminal tiny. I had to look online to figure out how to move my VSCode buffer. I needed to click on “Terminal” and then a hidden menu would open. One of the alleged benefits of using an IDE like VSCode is the low learning curve and mouse-based controls, but that falls apart when things are hidden. Even after spending a couple minutes trying to size things, VSCode is still showing less information: The Vim screenshot even shows that I have another tab of my terminal editing my blog (using Vim). The density is much better, and much easier for me to control. You may have noticed that VSCode is complaining about errors in my Go code. VSCode is trying to be smart, so it’s linting my Go. This could be a helpful feature, but it’s confused about my workspaces. VSCode can’t figure out that ../feed2pages-action/sql.go uses ../feed2pages-action/util.go. So now I have these squiggly lines all over my code, because VSCode is wrong. Perhaps I should open two windows, one for each project, and put them side by side? This makes me wonder if the trend towards mono-repos is driven partly by people fighting with VSCode. I also notice that there are some badges on the left side of the screen trying to catch my attention. What could be so important? Ah, Microsoft wants me to restart VSCode for some updates. No thanks… I’m trying to write code. Writing code takes a lot of focus, eliminating noise is really important to me. I’ve built up a ton of muscle memory around Vim commands. So installing Vim keybindings is my go-to when ever I try a new IDE. This keeps me sane as I can and as with Vim proper. But the Vim plugin is always some weak subset of Vim, or conflicts with built-in IDE hotkeys. I’ve built up a ton of muscle memory around optimizing my views. I like to have a bunch of buffers open with the files I’m working on or referencing. Being able to quickly split a view in a way that makes sense to me is great. If I need to see more of a buffer I’ll it full size. On Vim this works with all buffers, including terminal buffers. But VSCode Vim doesn’t implement these. Editor views can support some Vim resizing commands, but not others. Terminal views don’t seem to support Vim commands at all. This means my terminal and editor views can’t intermix. If I want to resize the terminals, for example, I need to carefully hover over the tiny (three pixel?) border between views and then click and drag. Managing views on the screen is a pain with VSCode. My typical workflow starts by opening a shell. It gives me ready access to open files, check my git status, or use other tools. I’ve done lots of DevOps work where I need to interact with bare-bones remote systems over SSH, so shell command already flow from my fingers. Once I open Vim, I tend to to open a shell and then work from inside Vim. Remote development is where Vim shines. I could try to use VSCode Remote Development, but I wouldn’t want to use that to connect to a “production” system. So, I’m back to using the tools I can find on the system. Knowing Vim is really helpful because it’s almost always installed and it’s miles ahead of other pre-installed editors like Nano. It’s not for me. I’m pretty unimpressed with how LLMs have been marketed. They are an amazing technology, absolutely. But I think they are harmful in a lot of the ways people are trying to use them. A week back I came across a comment mentioning that the poster liked asking “Write a password generator” as an interview question. It’s easy until you realize you should account for archane rules like “no repeating letters”. I gave it to ChatGPT’s GPT-4o, the same model Copilot uses, to see if it could pass the interview. What stood out is that I kept needing to remind it of the rules . Eventually, I gave up as I ran out of GPT-4o credits (and I’m not gonna pay for this). The last output reverted a previous fix yet again. GPT-4o writes code that looks quite decent (especially for “interview coding”), but it really struggled to keep track of requirements and leaned on me to do careful code reviews. A human usually writes better code than what you see in the interview, but this is the best GPT can do. I’d be pretty upset if a junior developer kept sending me code reviews that added and removed previously discussed fixes, never producing all of them together. For a junior developer, I’d tolerate some back-and-forth as I’m training them to improve their craft. But for Copilot? I’m not interested in volunteering to train AI to write code, and the code quality isn’t enough to make it a fair trade. I totally understand the appeal for “boiler-plate” coding, or for people learning how to code. But as an experienced software developer it just slows me down. I’ve seen a number of editors grow popular and then wane. At various times I’ve seen Emacs, Eclipse, IntelliJ IDEA, Atom, SlickEdit, gedit, Notepad++, Sublime, NetBeans, Visual Studio, Xcode, IDLE, PyCharm, Android Studio, and more. Some of those aren’t around any more. I’m not worried about Vim getting eaten by a big tech company, or enshittified to feed advertising or AI ambitions. VSCode seems like easy mode, so I fully understand that people gravitate towards it. Vim has a steep learning curve, and honestly after around 12 years of using it I’m still learning ( to add things to the spell check dictionary is a recent one). I’m glad I invested time in learning Vim; it’s paid off. So I think others would similarly benefit if they started today. The file I’m working on (sql.py) shows 47 lines in VSCode versus 53 in Vim. I’m using the SQLite shell as a primary reference, so I put it in a big buffer in Vim I’m using sql.go as a secondary reference, so I put it in a small buffer in Vim VSCode keeps terminals and code separate, so I can’t optimize my view

0 views

MTA-STS Preload

SMTP MTA Strict Transport Security (MTA-STS) is a mechanism for securing email as it travels between mail servers. Similarly to DANE, MTA-STS signals to the sender that they should use TLS and should validate TLS certificates when sending messages. Without these tools, email is vulnerable to STARTTLS downgrade attacks. MTA-STS offers protection by caching the receiving domain’s policy, if any exists. Once the policy has been cached, the sender will continue to validate TLS certificates for the destination domain, through a “max_age” limit. However, if the policy has not been cached and an attacker blocks the MTA-STS lookup, then the connection remains vulnerable. This means that MTA-STS cannot protect the first message being sent to a domain, and cannot protect against an attacker who persistently blocks the policy lookup. A similar challenge exists on the web. When a user connects to a website and the website presents an invalid certificate (like a self-signed certificate) the browser doesn’t know if it’s safe to continue. Since it’s possible that a self-signed certificate is intentional, the browser asks the user for help. Unfortunately, users aren’t the best at knowing if it’s safe to continue: some will click through the warning, allowing an attacker-in-the-middle to intercept their traffic. The web addresses this problem with HTTP Strict Transport Security (HSTS). When a user connects to a website, the website includes an HSTS header that informs the browser that the website uses HTTPS and that the operators intend to keep their certificate valid. Once cached, the browser will refuse to connect to the website over unencrypted HTTP or when the certificate appears invalid. The user isn’t given an option to proceed as the browser is now confident that there’s a security problem. Unfortunately, web browsers can’t know the HSTS policy of a domain until they see the HSTS header. This means that the first request to a website is vulnerable. Generally, this isn’t a large problem. For many websites, users make many return visits from the same device, so the window of vulnerability is quite small. But there are enough websites with high threat models and specific use cases that prevent HSTS header caching (like a website you only access when setting up a new device). Web browsers address the first-connection vulnerability using the HSTS preload list, which is installed with the web browser. The preload list tracks websites that are well known to support HSTS and ensures that even the first connection is protected by HSTS. There are challenges with the preload list, like it’s size, but it’s a viable solution. Senders who are concerned about the vulnerability of the first message sent to a domain will want additional protection. As part of my project to measure MTA-STS adoption I ended up building a list of domains with support. I published the list in a GitHub repo and created a pull-request based workflow for adding additional domains. The list currently includes almost 3,000 domains. I reached out to Vladislav, the creator of the MTA-STS resolver for Postfix , and he pointed me towards a way to pre-warm the cache. First enable the setting in and then scan the domains with postmap. Here’s an example showing gmail.com being added to the cache (indented for clarity): This approach works great as it double checks if the domain still has MTA-STS support. You can warm your MTA-STS cache with this command, which downloads the latest preload list and processes each domain with postmap: You may want to run this command when setting up a new Postfix server or when enabling MTA-STS support for the first time. The one-liner can also be run as a cron job, if you’d like to keep your cache up to date as the preload list grows. MTA-STS supports policy delegation , where the MTA-STS DNS records use CNAME to point to records hosted by their email provider. Without policy delegation, a domain wishing to enable MTA-STS needs to host their MTA-STS policy document on their own. While a traditional web server could be used, static web hosting providers are also suitable . There are several companies that offer MTA-STS policy hosting: I’d like to see more mail providers add first-party support for MTA-STS policy hosting. It really simplifies adoption. A domain owner can enable MTA-STS by setting two DNS records; without needing to provide their own policy hosting. This would make the complexity of configuring MTA-STS similar to setting up SPF or DKIM records, easing the burden of enabling MTA-STS on a domain. EasyDMARC - easydmarc.pro Red Sift - ondmarc.com Mailhardener - mailhardener.com URIports - uriports.com Sendmarc - sdmarc.net mta-sts.tech - I’m unsure who runs this

0 views

The next stage of getlocalcert.net

Update: In exciting news, the getlocalcert.net project will continue under new management. I’m grateful to William Harrison for taking the lead on future development. Over the next month, I’ll be transferring domains, servers, and other project resources. There may be short periods of downtime during the transition, but users should be minimally disrupted. Please consider sponsoring William to support this next stage of the project. This post previously claimed the project would be shut down. The original post follows, with references to shut down removed. I started the getlocalcert.net project to explore two key ideas: The answer to both questions has been yes. Here’s some statistics. Since the July 2023 launch, around 900 users logged in with GitHub. Anonymous registration complicated measuring unique users, but I believe 2,000 unique users accessed the service. Those users created 3,500 zones and 450 TLS certificates ( 1 , 2 ). Around 370 of those were issued by Let’s Encrypt and 70 by ZeroSSL. The longest active subdomain was from the original launch, July 2023 . In all, 17 subdomains were were actively renewing their TLS certificates. It’s difficult to know exactly what the subdomains were used for, as they were only used on private networks, but keywords in the names provide some hints: I made signup as frictionless as possible, so 20% of users reaching the marketing page logged in with GitHub. But issuing a TLS certificate requires installing an ACME client, which requires more effort, and many users didn’t get that far. Since the project was focused on private domain usage, there was no need to support any public IP address ranges. Without public IP addresses there was little opportunty for users to host malware, phishing sites, or illegal content. To avoid spam email from these domains I configured PowerDNS to publicly host some default records , like a null MX record, a null DKIM, and an empty SPF list. This approach proved to be extremely effective. I was able to keep anonymous subdomain registration open, even without CAPTCHAs. Are public CA-issued certificates useful on private networks? Can I offer free and anonymous subdomain registrations without fighting spam?

0 views

Is email confidential in transit yet?

When I’ve talked to developers about the confidentiality of email in transit, between mail servers, I usually hear one of these responses: Which is it? In this post, I explore the current state of server-to-server transport encryption and examine the confidentiality challenges we still face. There’s a lot of areas to consider here, so I want to start by refining the scope. Email travels quite a bit: from the author of the message’s device to their mail server, between mail servers, and finally to the recipient’s device so they can read it. I’m focusing solely on the confidentiality of messages as they transit between mail servers. I won’t be covering: Google publicly tracks the volume of unencrypted email they send. As of 2024, they cite that 2% of the email they send is sent unencrypted. Similarly, Cloudflare reports that 6% of the email they receive is received unencrypted. These messages are vulnerable to passive eavesdropping. This contradicts the notion that everyone uses TLS so we’ve got to dig in further. Here’s the worst offenders, by volume, according to Google: Many of these domains are operated by financial institutions, or telecoms. If you decided to require TLS you’d be unable to send mail to the organizations on the right. Looking at the list, you may not know any of these companies, so maybe this doesn’t seem like a problem. But any large email provider sees a meaningful volume of unencrypted outbound email, so “mandatory TLS” still isn’t a reasonable default policy. While the progress to encrypt over 90% of email in transit has been a great success, TLS isn’t a complete solution. Most servers use opportunistic TLS as their default policy. In this mode senders blindly accept whatever certificates they receive. It doesn’t matter if the certificate is expired, self-signed, or for the wrong host name: the sender doesn’t verify the certificate. Opportunistic TLS is still a huge step forward, as it prevents passive eavesdropping, but it cannot protect against active attacks. I explored this issue and some of the solutions in a previous blog post . Even when sending servers support DANE and/or MTA-STS, they often need to send mail to domains that don’t. The most common fallback behavior continues to be opportunistic TLS, which doesn’t verify certificates. Most popular mail server software and services support TLS certificate verification as a non-default configuration option, so why not enable it? In 2018, EFF reported that about half of the mail servers supporting STARTTLS used self-signed certificates. There’s lots of documentation that cites widespread use of self-signed certificates, expired certificates, or certificates with mismatched host names. Verifying certificates wasn’t really an option earlier, but what about today? To learn more I scanned Cloudflare’s list of the top 1,000 domain names to: The project code is open source , so you can run your own analysis. Here’s a summary of key statistics: Of the 544 domains supporting email: This is exceptional progress. Unfortunately, I still can’t recommend TLS certificate verification as a default policy, as a small number of domains still lack STARTTLS and CA-issued certificates. MTA-STS and DANE remain the best options for securely authenticating mail servers. To answer this question, I downloaded the larger “top 100,000 domains” list from Cloudflare Radar. I scanned each domain to identify mail servers, DNSSEC support, DANE support, and MTA-STS support. Of the scanned domains, 65,244 supported email (I.E. had MX records). The code for this scanner is also open source. Of the domains supporting email: There’s lots of consolidation of email hosting providers. Here are the top providers seen during scanning: If you want inbound DANE support for your mail, you may need to shop around. Outlook began rolling out inbound DANE support in 2024. Domains need to opt-in to Outlook’s DANE support, so roll-out will take time. For each domain using outlook.com’s mail servers, I checked for TLSA records on the customer subdomain (I.E.: customer .mail.protection.outlook.com). None had DANE support enabled. Looking again at my list of 65,244 domain names: Edit: I had originally missed MTA-STS policy delegation (CNAME). I rescanned the domains and have updated the numbers with around 50 more domains. To support MTA-STS, a mail server needs to have a TLS certificate issued by a public CA. Thankfully, these are easy to find: Before you rush off to enable MTA-STS on your domain, remember that providers can change their TLS configuration. Ask your provider if they are committed to using valid, CA-issued TLS certificates long-term. Ideally, they should document this publicly. qq.com’s mxdomain.qq.com mail server’s certificate currently doesn’t match its host name, but all the other qq.com mail servers use trusted certificates. This issue won’t prevent you from using MTA-STS with qq.com, as the sender would switch to one of the other mail servers. But again, I’d recommend checking if your provider will commit to ongoing use of public CA-issued certificates first. While support for DANE and MTA-STS are both very low, MTA-STS was present on more domains. There isn’t a clear leader, as many of the MTA-STS policies weren’t in enforce mode. Since MTA-STS is a newer technology, 2018 for MTA-STS vs. 2012 for DANE ), MTA-STS adoption appears to be more rapid. The widespread use of public CA-issued certificates means that the majority of domains may be able eligible to enable inbound MTA-STS, without changing mail providers. One of the common responses to concerns about email confidentiality in transit is that server-to-server connections are sufficiently secure. Consolidation of email has lead to a large volume of email being sent between several major providers, like Google’s Workspace and Microsoft 365. Connections between the Microsoft Cloud and the Google Cloud are vastly different than user-to-server connections (where concerns like malicious WiFi routers exist). It seems possible that your email is secure, in practice, because the network routes between servers only includes trustworthy network operators. In 2015, a joint report by University of Michigan, Google, and University of Illinois, Urbana Champaign on STARTTLS stripping offered a chilling overview of observed attacks: Attackers are present. Admittedly, lots of email isn’t sensitive, so securing it provides little marginal value. Most email isn’t even wanted (estimates for spam range from 45% to 85% of all email). What if we drill down into messages that are obviously sensitive? Consider password reset emails. Email-backed accounts are extremely common on the web and an attacker could perform account takeover if they intercept password reset emails. I tested several services and found the same result: they don’t require TLS when sending password reset emails. One of the most sensitive services I can think of that uses password reset emails is Amazon AWS, so let’s consider their approach. I spun up a test mail server, disabled TLS, and signed up for an AWS account. I then clicked through the password reset flow a couple times. AWS always sent the password reset email in cleartext, without any encryption. Looking at the email headers, AWS used it’s own Amazon SES to deliver the password reset email. SES supports the TLS policy , but it’s not being used here. This leads me to believe that AWS chose not to require TLS for these sensitive messages. If you’re scratching your head, remember that multi-factor authentication (MFA) mitigates this issue. Amazon “strongly recommends” MFA for the root account. AWS has numerous high-quality MFA options, so I don’t think there’s a meaningful account takeover risk here… as long as you use MFA. I want to emphasize, AWS is not alone here. I couldn’t find a service that required or verified TLS for password reset emails. Knowing the history of email security this isn’t surprising, but at some point, I hope this would change. While the number of servers supporting TLS and server authentication has risen greatly, the number of senders requiring and validating TLS certificates is still very low. Adoption of DANE and MTA-STS are also low. Even seemingly high sensitivity email, like AWS root account password reset emails, remain vulnerable to 2002-era downgrade attacks. We can do better. If you run a mail provider that supports multiple domains, consider centrally hosting the MTA-STS policy server for your customer’s domains. You’ll need to issue certificates (likely using ACME HTTP-01) for each connected customer domain. This would further simplify the MTA-STS set up process (add an A record and a TXT record), bringing the overall complexity for customers in line with setting up DKIM and SPF. If you run a mail server, make sure you have a TLS certificate that supports server authentication. Consider enabling DNSSEC/DANE and MTA-STS for both inbound and outbound email. If you accept mail on your domain, consider adding inbound support for MTA-STS (if your mail provider can support it) and TLS-RPT . Jean’s MTA-STS guide is a great starting point. Finally, you may add your domain to the MTA-STS cache warming list . Investigate your logs to see what behaviors are present. How much of your email is encrypted? Can you find any evidence of downgrade attacks? How much mail would be impacted if you changed your TLS policies? If you use a hosted email provider, find out their transit security policy. Prompt them to adopt DANE or MTA-STS. Encourage them to report statistics about how many unencrypted messages they send and receive. Email should be confidential. It’s going to take more work to protect messages in transit. One challenge was deciding which CAs should be considered trusted. I used the list installed by Debian’s ca-certificates package . CA trust is a really important concern and there’s a variety of different trust stores you could pick, each of which contain some differences. I’m not sure if I picked the “best” trust store for this measurement, but any popular trust store was probably sufficient. Conversely, mail servers should select a CA that is widely trusted, such that any sender will be able to validate their certificate. The test server I used when checking AWS’s password reset emails was MTA-STS test system “B” described in an earlier blog post. This mail server had MTA-STS enabled, so senders supporting MTA-STS should not send it unencrypted email. As Amazon SES doesn’t support MTA-STS, the AWS password reset email was delivered anyway. Another experiment I ran considered if CAs require TLS certificates when using email-based domain control validation (DCV). They don’t . Everyone knows email isn’t secure While email is vulnerable, hop-to-hop connections between servers are secure enough Email is secure because every modern mail server uses TLS Message integrity (DKIM) User submission and retrieval of messages Denial of service and other attacks End-to-end message encryption (S/MIME and PGP) identify mail servers (when present) check for STARTTLS support; and check if the certificate was issued by a public CA 1,000 domain names checked 544 domains had MX records 99.3% had STARTTLS support (540 domains) 94.1% used valid certificates issued by a public CA (512 domains) 8% supported DNSSEC (5436 domains) 0.9% supported DANE (564 domains) 1.2% supported MTA-STS (792 domains) 0.7% used MTA-STS in ’enforce’ mode (415 domains) “in seven countries, more than 20% of all messages are actively prevented from being encrypted” “We identify 41,405 SMTP servers in 4,714 ASes and 193 countries that cannot protect mail from passive eavesdroppers due to STARTTLS corruption on the network.” “96% of messages sent from Tunisia to Gmail are downgraded to cleartext”

0 views

TLDs with novel behavior

ICANN recently reserved .internal for private use , sanctioning the use of the TLD for ad-hoc purposes. The decision solidifies .internal for this role, which it has long supported, albeit on unofficial capacity. The reservation was important as previous use was not sanctioned and was on shaky ground. Consider what happened to the .dev TLD. In 2017, programmers who used .dev for their internal development networks were shocked to find that their browsers suddenly required HTTPS . Google had registered .dev as a gTLD and enabled HSTS preload for the entire TLD. Realizing that their usage was akin to squatting, they had little choice but to migrate to new domain names. ICANN’s reservation of .internal protects the TLD from a similar fate. The modern web runs on HTTPS, and internal networks are no exception. Generating TLS certificates for .internal domains is fairly easy: The hard part is getting all the relevant devices and software to trust those certificates. System imaging and other provisioning tools are frequently used to pre-load the CA into to the numerous trust stores. Unfortunately, the provisioning process can be quite complicated and many organizations struggle here. In bring your-own-device environments, or when vendors and contractors connect with their own systems, the burden of updating trust stores falls to the end user. In many cases, incomplete rollout of the private CA leads to users clicking through certificate warning messages when connecting to internal sites. Another challenge of private CAs is that they are often able to issue certificates for all domain names. This is a huge security concern for any application trusting the private CA and a reason end users may avoid installing a private CA, if given the option. It’s sometimes possible to constrain private CAs, but you’ve got to do it carefully . First, be sure to use the name constraints extension to restrict your CA to a small domain subtree, like . You may need to switch to different CA generation software, as name constraint support is occasionally missing . Second, make sure the client software you’re using actually supports name constraints. Client-side support for name constraints has been growing, but it’s an optional extension, so it’s perfectly reasonable for software to ignore it. Private CAs are a pain to work with. If you run your internal network under a delegated TLD, like .com, then you can get a TLS certificate from a public CA. The process of getting a certificate from a public CA is easy, great automation tools exist, and it often doesn’t cost any money. You won’t be able to use the ACME HTTP protocol (which requires making your server publicly accessible, at least briefly), but the ACME DNS protocol works great on internal systems. Using a public CA is also transparent to the end user: their device trusts the public CA out of the box. If you already own a domain name then you can run your internal network off a subdomain, like internal.example.com, although you can register another dedicated domain. There are some free subdomain registration services, but domain names are cheap enough that it’s probably best to register your own. One of the biggest arguments against using public CAs for private networks is that the certificates you register end up in public certificate transparency logs. This means that anyone can list your internal host names. Consider these: That’s a little awkward. There are some mitigations, like using obscure names or wildcard certificates, but this is a problem worthy of careful consideration. A year ago, I built the getlocalcert.net service to explore and promote the use of public CAs for private networks. The service offers free subdomains, but constrains usage to private networks. I specifically focused on private networks as it dodges virtually all concerns of malicious use: it’s not possible to publicly host malware, illegal content, or to send spam with these subdomains. This narrow focus has allowed me to keep registration free and open. The getlocalcert service exposes some limited information on the public internet. The TXT record is available through public DNS, allowing certificate issuance via the ACME DNS verification method. This means that our subdomain are only usable for private/internal purposes, while still supporting public CAs. One limitation of getlocalcert is that it’s implemented as a subdomain registration service. As I mentioned earlier, it’s probably better to register your own domain name than to use someone else’s free subdomains. While our inclusion on the public suffix list blurs the line a little, getlocalcert is still only offering subdomains. But what if getlocalcert had it’s own gTLD? ICANN allows anyone to apply for a gTLD , although it’s a complicated process and costs over $200,000 in fees . There’s lots of activity in this space with almost two thousand applications currently tracked by ICANN. There’s several potentially suitable options, like .private. Users could register domains like example.private and my-house.private, which they’d operate on their private networks. Unlike with .internal, they’d be able to prove they uniquely own their domain name, so public CAs would be able to issue certificates. This approach could have minimal impact to existing use, especially if domain-wide settings like HSTS preload are not enabled. Most of the new gTLDs appear to be focused on vanity. Domain names like example.cool and example.discount aren’t functionally different from example-cool.com or example-discount.net. But we certainly could have TLDs that are functionally different. Presumably ICANN would reject a TLD that breaks too many norms, but a TLD serving a niche through constrained operation I won’t be pursuing a gTLD for getlocalcert, but the idea has been bouncing around in my head for a while now. make up a domain name, no registration needed then generate a self-signed certificate or create your own private certificate authority (CA) internal.office.com internal.espn.com internal.amazon.com internal.microsoft.com internal.baidu.com

0 views

SMTP Downgrade Attacks and MTA-STS

In this post, I audit several prominent mail providers to discover how they handle email encryption and show how MTA-STS can help improve email security. When SMTP was created it was cleartext only, as we hadn’t yet figured out transport layer security (TLS). When TLS was finally ready, we needed a way to phase TLS in. STARTTLS was created and offered opportunistic encryption. Basically, a mail sender could ask the destination mail server: “Do you support encryption?” If the reply was positive, then a TLS session would be established using the certificate the server provided. If not, then a cleartext SMTP session would be used. Anyone who’s studied network security will see a problem here. An active attacker-in-the-middle (AitM) can inject their own response, claiming that encryption isn’t supported and tricking the sender into using cleartext, and allowing the attacker to eavesdrop on the message. This is a classic downgrade attack. Even when the receiving mail server presents a TLS certificate, troubles abound. Consider the options a sending mail server has when it is presented with a TLS certificate it doesn’t trust. Maybe the hostname doesn’t match, it’s expired, or it’s signed by an unknown certificate authority (CA). Most mail senders chose the first option as it protects against passive AitM and ensures email is delivered. DANE’s TLSA record offers one possible improvement which leverages signed DNS records. Unfortunately, DNSSEC adoption has been slow. Measurement studies show that around 5% of domains currently use DNSSEC. Several large email providers like Gmail do not support DNSSEC . As such, other options are being considered. MTA-STS provides a way for mail servers to indicate that they will support encryption using a TLS certificate issued by a trusted certificate authority (CA). The policy includes a parameter that tells the sending mail server how long it should remember the policy. I won’t get into the full details of how MTA-STS works, as it involves several steps and is well documented elsewhere. I stood up a pair of Postfix mail servers and configured them with a catch-all email address and a mailbox_command that logs incoming messages to a database. The server A has a TLS certificate issued by Let’s Encrypt, a widely trusted public CA. The server B has TLS fully disabled. Both servers have an MTA-STS policy set to . These servers help me infer the sender policy by seeing which mail servers receive email. The sender cannot distinguish B’s intentionally misconfigured lack of TLS support from an AitM attack. A sender that properly implements MTA-STS should deliver mail to A and refuse to use an unencrypted connection with B. I picked six mail providers for testing. The first group are marketed to support MTA-STS: The second group includes other prominent mail providers, although these do not claim to support MTA-STS: During early testing, I sent an email from Gmail to the audit servers but found that Google delivered the message to both mail servers. Gmail has MTA-STS support, so this was surprising. Upon further research, I found that lazy caching was the problem: … fetch a new policy (perhaps asynchronously, so as not to block message delivery). RFC 8461 §5.1 When testing mail senders I needed to send multiple messages. The first would prime the cache and additional messages would be processed using the cached policy. This unfortunately means the first message sent to a domain could still be vulnerable to downgrade attacks, but this is intentionally permitted by the RFC for performance reasons. Using email for password reset or email verification has long been an industry standard. Magic links that directly log you into an account, avoiding passwords altogether, are also fairly popular. But are those sensitive messages getting delivered without encryption? These sorts of messages are often sent using transactional email providers, but I wasn’t able to find any hosted providers that offered MTA-STS support. It is possible to self-host Postfix and use an MTA-STS extension to do-it-yourself. Here’s the hosted providers I tested: Most transactional email providers support some combination of toggles for enabling TLS and/or certificate verification. These sort of toggles are difficult to enable, in practice, as some mail servers still don’t support TLS and many don’t use valid certificates. Google reports that 2% of the emails they send aren’t encrypted. It’s unclear how many of the encrypted connections used trusted certificates. Requiring TLS and fully verifying certificates would block potential customers, so many websites avoid these settings for business reasons. MTA-STS allows domains to opt-in to TLS verification while still allowing the sender to support mail servers with weaker security settings. This is the best of both worlds. I’ve also included a check for inbound MTA-STS policy, for completeness. Here’s what I found: Surprisingly, Tuta Mail consistently sent the message to both mail servers. I don’t see any requests reaching the MTA-STS policy server, so there’s no indication that the feature has been implemented. Their marketing material clearly mentions MTA-STS support, and the feature is marked as completed in their issue tracker, but perhaps only inbound MTA-STS is supported. Otherwise, MTA-STS support matched product documentation. What happens when you send an email and the message can’t be delivered due to MTA-STS? You’ll receive a message like these indicating that there’s a problem: This timely feedback is critical for helping the user understand what happened to the email they sent. Outlook only notifies the user when the message fully fails, after 24 hours. Awkwardly, this email contains a typo and doesn’t mention that a TLS verification issue prevented delivery. Hopefully, the user experience improves over time. To help others audit MTA-STS, I’m open sourcing the project code and allowing others to use the hosted audit tool (although I don’t plan to keep it online long-term). It uses Docker Compose and assumes DigitalOcean as the hosting provider. Substitute your own domain names and other settings where needed. There are four other postfix servers I haven’t mentioned, which help audit some other related concerns. Check the code for full details. While the audit infrastructure is online you can generate test email addresses and see sender information (not content) for any received email. Messages are being logged, so don’t expect any privacy for messages you send here. Run the audit.sh tool to get started: This shows that Yahoo delivered email to both the A and the B server. If you receive email on your own domain, consider enabling MTA-STS. Increased adoption of MTA-STS will help encourage mail senders to add support. Check with your email provider to learn how you can get started. Here’s the official documentation for the mail providers I’ve tested: Protonmail doesn’t claim to support MTA-STS for custom domains, but others have offered solutions: The general process for enabling inbound MTA-STS is well documented too: Fastmail and Yahoo use certificates from DigiCert, a trusted CA, so they could enable inbound MTA-STS without much work. You should always set up TLSRPT when enabling MTA-STS, so be sure to enable that as well. MTA-STS provides a promising approach to strengthening SMTP security. Unfortunately adoption by major providers is low. The lack of MTA-STS support by hosted transactional email providers exposes the password reset emails of countless websites vulnerable to downgrade attacks and snooping. Everyone should encourage transactional email providers to adopt MTA-STS. Google provides the best user experience for messages that have been delayed or failed due to MTA-STS. Notifying the user about the problem quickly, and providing several updates was very helpful. Implementers should be sure to consider how they communicate MTA-STS induced behaviors to their users. HSTS is a similar technology that helps us secure websites. HSTS uses a preload list which allows web browsers to ship with a list of domain names that are well-known to support HSTS. SMTP used to have the STARTTLS policy list, but this shut down in favor of MTA-STS . Like HSTS, the MTA-STS RFC doesn’t specify which certificate authorities should be used when validating a certificate. The server is promising to use a certificate that the client (I.E. web browser, or connecting mail server) will trust. This centralizes the decision of which CAs to trust in the hands of the popular web browsers (like Chrome) and popular mail providers (like Gmail). Web browsers have rapidly dropped support for CAs after security incidents, so mail servers need to keep track of which CAs the senders trust and adjust when needed. There’s a ton of extra information I’m collecting with this test framework: Future research could build on this foundation. Finally, here’s a TLSRPT report from Microsoft. You can see that I tried a couple different configurations, but Microsoft correctly rejected the connection. Send using the untrusted certificate Downgrading to cleartext; or Refuse to send the message Outlook.com Proton Mail Outlook.com Using Apache Using GitHub Pages TLSRPT - Aggregate reports from mail senders about failed messages HTTP logs showing MTA-STS policy file fetching Logs indicate that Proton Mail uses , hinting that they run a Postfix stack IPv6 support

0 views

One year of getlocalcert.net

Since 2012, Freenom acted as the domain name registrar for several free top level domains including .cf, .ga, .gq, .ml, and .tk. Unfortunately, at the start of 2023, it was pretty clear that Freenom was not doing well. Freenom was sued by Meta for ignoring abuse complaints and quickly halted new registrations . In early 2024, Freenom announced they were exiting the domain registration business . While this is a loss for anyone in the hobbyist and low-cost project space, there was a noticeable impact on phishing . By March 2024, many of the domains registered on these TLDs disappered. Spam and abuse are clearly challenges. Others who offer free subdomains, like FreeDNS at afraid.org , seem to address spam concerns by requiring users to solve captchas on every DNS entry update . These CAPTCHAs are seriously hard: Thankfully, even more options exist , so you can still find a free domain if you need one. Many of these use a GitHub Pull Request approach to manage registrations: One cool thing about this approach is that we can see how the abuse management process works: it’s just more pull requests. Of the ~4,200 domains hosted by .is-a.dev, there were only 15 issues tagged for abuse. That’s a pretty good ratio. I’ve always wanted to run a subdomain registration service but fears of battling spam always changed my mind. A useful approach to building hobbyist projects is to think about the things you want to build (like a domain name registration service) and the thing you don’t want to build (like tooling for fighting spam and abuse). Can you reduce the project’s scope to avoid the latter while still providing enough features to support at least a niche use case? For hobby projects, even a narrow use case will have tons of surface area to work on. You can always increase scope later, if you’d like. In early 2023, watching the fall of Freenom, I decided to build an alternative, but laser focused on a specific use case so as to avoid the spam concerns. getlocalcert.net is a (sub)domain registration service focused on private network use. Users can register up to five free subdomains of or with their GitHub account. We’re compatible with the ACME DNS-01 protocol and Let’s Encrypt. This mean you can get a free globally trusted wildcard TLS certificate for any web services you run. What makes getlocalcert unique is that we don’t host any , , , or DNS records publicly (other than , which points to ). This means you can’t use these domains for hosting websites or email publicly. Instead, you’ll need to host your own DNS records privately on your internal network. The decision to avoid hosting public DNS records was two fold. First, to avoid spam, malware, illegal content, or other questionable content from tainting the domain. I wanted to avoid playing whack-a-mole with take-down requests or needing to plaster the service with CAPTCHAs. Second, to keep the service focused on a single use case. There’s a ton of challenges to tackle with just the narrow scope I was considering. By focusing on a smaller scope, I avoided problems that felt intractable. With such a restricted starting place, you may wonder how getlocalcert is even useful. Here are the use cases I think are well suited. Many people go the private certificate authority (CA) route and use a top level domain like , , , , and others. But private CAs need to be installed on every system that will use the network… which is awkward when your friends come over and don’t want to install your trust root. Others struggle to install the private CAs on mobile devices. With getlocalcert, you can use free Let’s Encrypt certificates, which all your devices already trust. Consider getlocalcert if: If you’re testing HTTPS locally, try a domain, which always points to localhost. This lets your entire DevOps tools run using their own real domain and a trusted certificate, without needing to mess with DNS or private CAs. Consider getlocalcert if: You don’t need to register an account to use this workflow, the “instant subdomain” feature will provide you with a UUID subdomain. You may want a ACME DNS-01 Validation Domain . This is a really cool read, even if you don’t need this approach. Basically, you add a CNAME record that tells Let’s Encrypt (and others) to validate your domain via a “throwaway” domain. If the “throwaway” domain has a better API for updating DNS entries, you enable automation. The best part is that with CAA Records you don’t need to trust your “throwaway” domain’s DNS server. Consider getlocalcert if: There’s a couple tools that provide easy validation domains, including acme-dns (open source, self-hosted) and Certify DNS (commercial). Check these out as well. I had a couple false starts when I started the project. My first pass tried to build something very minimal on top of afraid.org’s FreeDNS. Unfortunately, you’ll usually hit Let’s Encrypt’s rate limits if you try to register issue a TLS certificate with these one of these domains. This made it clear that I needed my own domain names and my own services. I took a hard look at CoreDNS for a while, but ended up choosing PowerDNS as the underlying DNS service. PowerDNS felt more mature, stable, and was quite easy to integrate with. I chose a Django stack to keep development productive and used server-side rendering as I didn’t want to pull out React for this project. I’ll probably reach for Alpine or HTMX if I need more interactivity. Generating TLS certificates with ACME is a complicated process. There’s lots of tools to choose from and a number of different approaches you could take. I spent about half the project effort building out the getting started documentation and validating the steps . I launched on July 11th, 2023 and gathered some interest on social media. Traffic spiked to a peak of 800 unique visits per hour. While the vast majority of these visitors didn’t create an account, I was excited to see that some users made it all the way through the funnel to TLS certificate issuance. I expected a steep drop off in engagement due to the complexity and niche focus, so I considered these results to be a win. So far in 2024, I’ve reached some important milestones. Thousands of domains have been registered, hundreds of certificates have been issued (340 from Let’s Encrypt, 60 from ZeroSSL). Our domain names were added to the public suffix list , so each subdomain is treated equivalently to any traditional domain by any tool that supports the list. I have a small but steady stream of sign ups, so usage is slowly trending upwards. On the operations side, it’s been fairly low maintainance. One breakage occurred due to Python’s PIP installer refusing to break externally managed python . This is a good change as using a venv is a much safer approach, but required Dockerfile changes to upgrade. Thankfully testing caught this issue. The first production outage occurred when upgrading PDNS versions due to a missed database migration. Automated testing didn’t catch this as it uses a fresh database each time which automatically used the newer schema. Downtime was about 45 minutes. Since the localcert.net domains require local DNS resolution, no lookup failures would have occurred. ACME DNS certificate issuance and renewal were impacted, but renewals should have retried successfully after a delay. Maintenance and updates have been quick and painless otherwise. On the feature side, my focus has been elsewhere, so the service is still very much an MVP. When I’m ready there’s a fairly lengthy TODO list of improvements waiting for me. To those who have given getlocalcert a try, thanks for being part of the journey, I hope you’ve built something awesome! If you’d like to share, I’d love to hear about it. you’ve got a private network of computers you run your own internal DNS resolver you want TLS certs on your internal web pages for HTTPS but you don’t want to pay for a domain name, set up a private Certificate Authority, or click-though self-signed certificate warning messages. you’re testing a web service you need to connect to it using HTTPS (sometimes existing client software requires HTTPS) but you don’t want to pay for a domain name, set up a private Certificate Authority, or click-though self-signed certificate warning messages. you have your own domain name you’d like to use ACME DNS-01 for certificate issuance (perhaps for a wildcard cert) but editing DNS entries on your existing DNS service is hard due to missing API support, bureaucracy, etc.

0 views

Moving to Hugo

I wrote my first blog post 10 years ago using Jekyll as a framework. I didn’t love the process. It took quite a bit of work to get something I liked. I felt like I was promised a quick-and-easy solution but found that I was sinking an unexpected amount of time into tweaking things. I recently started blogging again and chose to use plain HTML, without a framework. This ended up clicking for me. I found a small amount of CSS that got me something readable and then focused on writing. Using , , , , , , , , and little else got me enough basic formatting. When I occasionally wanted something more I had all of HTML at my disposal. I’ve done enough web programming that HTML flows from my fingers, so while HTML more verbose than the Markdown equivalent, it doesn’t impact writing speed for me. I have long memorized HTML’s syntax while I need to remember how Markdown does images ( ). Plain HTML has fast cycle times since there’s no build process to drop you out of flow. If you’ve written HTML before then plain HTML blogging has zero learning curve relative to any framework you could choose. This was surprising to me, as HTML seems strictly worse than Markdown for blogging, but it somehow made it easier to focus on content. I think there’s a few key things that made this approach work. For one, I was starting from an empty blog, so I didn’t need to worry about fancy features like tag clouds, pagination, or plug-ins. Navigation would be simple, each post as a link to the homepage, and the home page links to each post in reverse chronological order. I also wasn’t too worried about each post having a consistent style. I adopted an archive-after-publishing mentality, each post would be self-contained with it’s own CSS, and I wouldn’t worry if some old post looked different from the latest. The biggest pain points were keeping things in sync. I had the title, publication date, and other metadata of each post duplicated in three places: the post itself, the homepage, and the RSS feed. This was annoying, but solvable with a little vigilance to clean things up. I discovered h-feeds recently, which could improve the situation. Instead of maintaining a separate RSS feed you’d annotate your blog posts and listing with microformats . These would carry equivalent information to an RSS feed, and h-feed compatible feed readers would be able to process these web pages directly. This keeps your blog DRY . But support for h-feeds is lacking, so a “legacy” RSS feed is still highly desired. I’ve been using Hugo for a couple other static sites recently and I’ve really liked the way it works. It’s blazing fast to render a site, so it won’t slow you down (this blog renders in 48ms). The primitives you work with (layouts, partials, frontmatter) are productive. There’s a learning curve, but I paid that cost already, so Hugo was now at a very clear advantage. My first plan was to drop all my old content under and just write new content. This wouldn’t quite work as Hugo was now in charge of my RSS feed and it wouldn’t know about the old content. A light alternative is to use a Hugo shortcode to embed the original plain HTML for each post. In the end I converted each post to Markdown manually keeping only small sections as unchanged plain HTML. I broke my rule and tweaked the CSS for my older posts to bring them under a common theme. An interesting side effect of this approach was that Hugo minified my old posts. So my pages should now load slightly faster than before. I needed to customize the RSS feed layout: I kept my original RSS feed’s GUIDs, so my old posts hopefully didn’t re-appear in your feed. I’ve added extra stuff like the source:blogroll element.

0 views

RSS Categories - In Practice

I’ve been building a directory of RSS feeds which has quickly grown to over a thousand feeds. To build the directory, I wrote a web crawler (open source) which fetches each feed, parses it to collect metadata, identifies OPML blogrolls , and spiders to each new feed it discovers. I’m hopeful the directory will expand as more people adopt OPML for their blogrolls. One challenge I needed to tackle was to figure out how the RSS (and Atom) spec is used in practice. There’s some instances where the spec is followed, some where the spec is vague, and others where the spec is not followed. The category tags of RSS are a good example of this. The category element is an optional sub-element of the channel and item elements. The spec for category is pretty short: is an optional sub-element of . It has one optional attribute, domain, a string that identifies a categorization taxonomy. The value of the element is a forward-slash-separated string that identifies a hierarchic location in the indicated taxonomy. Processors may establish conventions for the interpretation of categories. Two examples are provided below: You may include as many category elements as you need to, for different domains, and to have an item cross-referenced in different parts of the same domain. Right off the bat, we see “Processors may establish conventions for the interpretation of categories” . So the spec doesn’t have all the information we need to proceed. With over a thousand feeds in my dataset, I decided to look at what conventions I could deduce. A first observation was that elements are rarely present for the channel. Only about twenty in one thousand have any. My blog didn’t either… so I added one. Channel categories would be really useful for navigating a large directory of feeds, but they are too rare. Item categories, I.E. tags for individual posts, are more common with at least 25% of the feeds in my dataset using them. Another observation was that the content of the category element isn’t always intended for humans. For example, one blog uses numbers in the category and another uses URI fragments. One place where actual usage diverges from the spec is how forward slashes are used. For example, a post with the category should be interpreted as being about , which is a subcategory of . But it’s clearly not a subcategory. I wasn’t able to find any categories that were obviously using as a subcategory separator. I decided it would be best to assume that feeds don’t follow the spec, and subcategories aren’t a thing. You’ll notice that the attribute from the example above looks like a URL. However, the spec says that it’s just a string and there’s a comment at the end of the spec that explains this. In RSS 2.0, a provision is made for linking a channel to its identifier in a cataloging system, using the channel-level category feature, described above. For example, to link a channel to its Syndic8 identifier, include a category element as a sub-element of , with domain “Syndic8”, and value the identifier for your channel in the Syndic8 database. The appropriate category element for Scripting News would be . The Syndic8 database is defunct, but I was able to find an archived version . Here’s the link for Scripting News . The Syndic8 page shows a few other categories being used: DMOZ, NewsIsFree (NIF), and TX. Unfortunately, DMOZ shut down in 2017, NIF is currently full of affiliate links, and TX is too vague to identify. The archived syndic8 page. Syndic8 was nowhere to be found in my dataset. I turned to sourcegraph.com to see if I could find an example. This pointed to a 16-year-old example file : Cool, that matches the RSS spec and shows how the forward-slash was intended to be used. But I’m not seeing anything like that in use today. The domain attribute in my dataset was rarely present. When it was, it was always a URL. Awkwardly, the most commonly used URL, , throws a 404 error. An archive of the page shows that this was previously a password protected API. Blogger deprecated this API sometime around 2006. Blogs hosted on blogger.com continue to use this value, although the meaning isn’t defined in any of the docs (as far as I could find). Kinda odd. All of the others are URLs to a blog-specific categorization scheme. If you open the URL you’ll see all the posts on that blog in that category. So the domain is usually a link to more content in that category. I noticed that 5% of the feeds were podcasts. While podcasts use RSS for their format, they have a bunch of unique traits. For one, iTunes defined the element as the preferred way to set the category. This element stores the category as an attribute. Podcasts are required to have an , so this is present in all the podcast feeds I’ve found. There’s too much detail in the iTunes docs to go into it, but I found it interesting that iTunes uses a fixed set of categories . The categorization scheme is surprisingly small: only 120 categories. Large directories can have a huge number of categories: defunct DMOZ had over a million. The iTunes category system’s small size makes it easy to assign a category for your podcast (iTunes only lets you pick one). But it wouldn’t be suitable for blogs, or the web in general, it’s far too small. Another interesting trend is the use of hashtags either as categories or in the description of a feed. I don’t see any evidence that RSS tools directly support hashtags, so this may be bleed over of users familiar with them on other platforms. I was able to find around 700 examples of a hashtag being used in my dataset between categories. There’s no central authority on which hashtags should be used or what they mean. This potentially fragments naming as different people will choose different hashtags for related topics, but it ends up working well in practice. Writers gravitate towards popular hashtags to try to catch the attention of readers. Readers flock to popular hashtags because they see those being used. It’s an organic approach that trends towards a common naming system without central authority. Hashtags are sort of a good fit for RSS feed categorization. The lack of a central authority matches the federated nature of RSS feeds. When RSS was being developed, the idea that you could centrally define a taxonomy for the entire web was considered possible. Today, few online directories exist and they are tiny compared to major search engine indexes. My current approach is: Using this model, here’s the most popular categories in my dataset: As expected, without a consistent taxonomy we have some duplicated elements, like Technology, technology, and tech. You can see all the results at RSS Blogroll Network Violate the RSS spec: don’t treat forward-slashes as subcategory separators Use alternate categorization elements, like Scan descriptions for hashtags, use these as categories if none were specified If a channel still doesn’t have any categories, use the categories of the most recent post

0 views

RSS blogrolls are a federated social network

RSS and other web feeds are a great way to keep track of articles published by your favorite blogs. But feed discovery remains challenging. Some recent work in this space opens up new opportunities. Since the earliest blogs were published, blogrolls helped readers discover new blogs. Each blogger could promote the blogs they follow by listing them somewhere on their site (their blogroll). Readers could discover these suggestions as they browsed, helping them explore the blogosphere. A blogroll could be as simple as a list of hyperlinks, although recent tooling has become more advanced. Here’s XKCD ’s blogroll, promoting other webcomics. With the rise of PageRank-based search engines (I.E. Google), content discovery suddenly became easy. This sucked the wind out of the sails of manually curated blogrolls and directories: readers no longer needed them. Twenty years later, the major search engines are losing the battle to filter out link farms and AI-generated slop. Overwhelming SEO pressure has mutated websites so now you can’t get a recipe for brownies without a long back story. They’ve turned towards value extraction, filling search results with so many ads that they sometimes push actual results below the fold. (Although honestly, you’re using an ad-blocker , right?) It’s a troubling development that makes us yearn for the early web. Top search results are often just the same large websites: Wikipedia, IMDB, StackOverflow, etc. It’s a sea of sameness that makes the Internet feel boring and beige. Personal blogs, small community websites, and other digital gardens still exist, and it’s easier than ever to create your own, but Google won’t find them for you. Instead we’re seeing modern search engines like Marginalia emerge to elevate the small-web. Social networks followed a similar path. They proved that analyzing connected users (followers, friends, etc.) helped provide quality recommendations for content and new connections. They connected a billion people online in a way that was previously not possible, providing real value to the users. Unfortunately, the centralized social networks were also profit driven and they tuned their graph analytics and recommendation engines towards engagement over all else. From fueling rage, teen mental health crisis, offensive advertising, malvertising, poor moderation, log-in gated walled gardens, censorship, privacy-invasive practices, and the numerous degradations of Twitter: enshitification is in fully swing. Many, myself included, have fled to the fediverse. Popular software includes Mastodon , Pixelfed , and Lemmy . The fediverse is still small, with tens of millions of comments per month across millions of active users; tiny compared to the big centralized social networks which boast one to three billion active users. But it’s the quality of communications, community, and user-focused software that sets it apart. Moderation in the fediverse is much better staffed , and it shows. Could a modern look at RSS feeds and blogrolls help the small-web much like the fediverse is revitalizing community in social networks? RSS readers work great when you’ve found a site that publishes content you want to consume. You paste a URL in your feed reader and most tools will automatically detect available feeds. Your reader will periodically scan your feeds for new content and track what you’ve already read. It’s a personalized news feed from sources you curate. Importantly, an RSS reader is your user-agent and you can configure filters, processing, or other automation to suit your needs (instead of the needs of advertisers and shareholders). But RSS historically hasn’t helped discover new feeds. You’d need to leave your RSS feed, navigate to the website to view any blogrolls, or use other platforms to discover new sites. Some RSS readers support recommendations, but these are mostly centralized recommendations, not the sort of emergent recommendations you’ll get from a social network. At its core a social network is a collection of links between users. The terminology varies, but these links could be friends, followers, subscribers, professional connections, etc. If we wanted to make RSS social, we’d need a way to define links between the feeds. An RSS feed is mostly just a list of posts. You link to it from your website using a element: A blogroll traditionally is a list of hyperlinks to suggested websites: Discovering these blogroll links programmatically is challenging. Search engine crawlers can do it by exhaustively crawling the web, but the purpose of each hyperlink is unclear. There’s no specification for how to find a blogroll page or determine which hyperlinks are part of the blogroll. What’s missing is a specification for machine-readable blogrolls and a way to link to them from a homepage or RSS feed. This idea is being explored by a couple different people at the moment. Dave Winer has created a namespaced RSS element which points to list of RSS feed subscriptions in the OPML format. He also proposed the HTML element to make OPML blogrolls discoverable from webpages. This specification makes blogrolls machine-discoverable and machine-readable. Micro.blog has adopted the HTML element for their hosted blogging service. OPML (Outline Processor Markup Language) is commonly used by RSS readers to import and export feed subscriptions. It’s a well-supported standard for sharing feed subscriptions and a natural choice to improve blogroll connectivity. RSS feeds have always been part of the blogroll network, but these changes make it feasible to discover and walk the network. It’s worth noting, these are only forward links that describe which feeds a site follows or recommends. Backward links , describing who subscribes to a feed, are not specified and would need to be discovered by walking the connection network. As a starting place I manually identified twenty websites with OPML blogrolls. From these I can follow links to find 150 distinct websites. Several of the links go to YouTube pages, reminding us that RSS isn’t just for blogs. The network is small but could quickly exceed the scale of large manually-curated RSS feed lists (like Awesome RSS Feeds which has around 500 feeds). It’s only been two months since the specification was proposed, so rapid growth is possible. To track the network growth over time, I’ve created a git scraping crawler that I’ll run until it become impractical. You can view the latest network graph , which will be updated periodically. If you’d like to join the network, but your blog isn’t discoverable from the network yet, check out the repo for instructions on manually adding your website. Opt-out instructions are also in the repo. Deploying an OPML blogroll is as easy as uploading the file to your hosting provider and linking to it. Check if your existing hosting or site generator supports discoverable blogrolls. Early adopters like micro.blog are ready to go. Others may take longer, so be sure to let them know you are interested. Ideally, you’ll want to auto-generate an HTML view of your OPML blogroll for web browsers. The blog you’re reading now is WAS a collection of raw HTML files hosted on GitHub Pages, so I wrote something custom. My subscription list is on FeedLand (another Dave Winer creation) so I link there instead of hosting my own. I wrote an RSS crawler that collects recent posts from my subscription list and recommendations from discoverable blogrolls they contain. You can view my feeds and their recommendations . If you’d like to use my approach, check out these repos: These are either open source or public domain, so fork away. Hopefully I’ve piqued your interest in RSS feeds and blogrolls. The value of a network grows with the number of nodes, so I’m hopeful others will join us. It’s still unclear how RSS readers will adopt the blogroll network. Discovery by walking the network or using graph analysis for generating recommendations seem possible. Software could make it easy to publish feeds or support back-ends like FeedLand for subscription management. My network mapping tools may be the first, but there are many possible improvements. I’m excited to see what others can build. feed2pages-action - The golang utility and GitHub Action feed2pages - A demo site using Hugo feed2pages-papermod - A demo site with the PaperMod Hugo theme

0 views

Designing Personal Software

I’ve been thinking a lot about the type of software I want to build and use. I spend so much of my screen time using large feature-heavy software, which are one-size-fits-none at best or outright hostile. I’m left frustrated, distracted, and wanting something better. I write lots of mini software projects as a hobby, and a couple have been successful. By success I don’t mean that other people use them, but instead that I keep using them. Or that my toddlers played with a couple mini games I built, which taught them to use a keyboard and mouse. These are not things I’d ever put on my resume or claim as examples of good code, but I’m oddly satisfied each time I build one. Reflecting on my development process I realized that building software for yourself provides several unique benefits. My wife and I have a set of tasks that need to get done every night. It’s a repeating pattern each week. We usually switch off who does chores and who helps the kids get to bed, but sometimes we swap nights or split up the tasks and chit-chat while we work. A couple examples are: We could try to remember these things, it’s not that hard to do, but after a long day of working, we just want to make the evening easy. We both really like checking things off a list, the act of marking something done is a little reward that gets us on to the next chore. To try to coordinate the evening I tried setting up a couple to-do apps with our weekly tasks. But each one didn’t quite solve the problem and we abandoned each after a while. The one I remember the most was Todoist. It’s a very complicated app and it’s visually busy. There are lots of options for all-of-the-things, none of which I’m going to use. My goal was to make our evening easier, and something chock-full of colorful features isn’t going to do it. My biggest frustration was that it would roll over to the next day at midnight. Unfortunately, I tend to procrastinate chores and stay up late (or early, I guess). At midnight Todoist would mark tasks for the current day as “overdue” and start showing the new day’s tasks. This wasn’t my workflow, I wasn’t ready to be done with the “current” day and Todoist started showing me tasks that I didn’t want to look at yet. Perhaps there was a way to configure Todoist not to do that, you can often coax feature-full products into doing what you need, but I didn’t want to fight with it. After giving up on Todoist and several other apps, I gave up and used a small whiteboard. It was so much easier! Most importantly we kept using it. Here’s an example: At the start of the week, we’d reset the board by redrawing a circle on the days when each task needs to be completed. Each evening we’d look at the board for tasks to be done and draw an X for each when complete. So the photo above tells me that I need to put the trash out to complete Wednesday’s chores. At one point we improved the workflow by using some small magnets to track tasks. We’d slide each magnet to the next occurrence of a task, eventually clearing the current day. This solved the “where did the marker go” problem, but the concept is largely the same. As a software developer, I felt a little let down. Software is supposed to provide a productivity boost yet I preferred an old-school approach. What went wrong? Part of the issue is that a whiteboard is effortlessly flexible. You can track chores, take down messages, build a grocery list, or play tic-tac-toe. A computer needs to be told about the rules, but with a whiteboard, you can enforce them ad hoc. With off-the-shelf software, you need to tweak your workflow to match the capabilities of the system. We could have tried changing our workflows to match existing to-do apps, but we didn’t want to. What if I built my own? As an initial thought experiment, I built the most literal equivalent in raw HTML. It’s just a bunch of checkboxes and a reset button for the end of the week. In the rare instances I needed to add a task I could edit the HTML. This is usable, although rough. This approach is simple enough I’ve embedded it in the blog post. This still didn’t feel right. I still feel like there’s too much on the screen. Do I need to look at tasks for Friday when I’m trying to work on Wednesday’s chores? Reducing visual distractions turned out to be important to us. Eventually, I built Chore Wheel . Chore Wheel isn’t a general-purpose task tracker, it solves a specific problem, for two people, and I run it from a single device. Chore Wheel clicked for us and we used it every single night for about a year. It even felt better than the whiteboard, which was repurposed. I think there are a couple reasons why Chore Wheel worked so well for us: There are also a number of anti-features that are intentionally missing. I don’t care about cross-device syncing. We already had a kitchen iPad for music and recipes, so it became the dedicated Chore Wheel device. I don’t want the app on my phone because if I take my phone out to check the list I’ll start doing something else instead. The kitchen iPad has nothing but the software we use in the kitchen, minimizing distractions. Introducing syncing would require a back-end, authentication, and a whole bunch of other things. The app avoids that by saving to the browser’s local storage. Chore Wheel doesn’t assign tasks. I could easily add something that helps us decide who’s in charge of chores for the evening. This would ensure that chores are being done equally, but not equitably . If my wife had a rough day at the office, I’d much prefer to notice and tackle the chores myself than for the app to tell her she’s on the hook. Splitting responsibilities exactly 50/50 leads to resentment, which I wanted to avoid. You may have noticed Chore Wheel has an unusual style. Admittedly adding a CSS framework was planned but I found I liked the unique look I organically built. I think this style works well specifically because it doesn’t look like all the other software out there. If I’m staring at a to-do list that looks like the software I use at work, suddenly I’m thinking about work again. Finally, Chore Wheel has some whimsy. I never add fun elements to projects at my day job, that’s just not the type of software I work on, and I’m usually opposed to excess visuals. But for Chore Wheel I added some CSS confetti that pours down when the list is complete. I think it’s usually hard to know when it’s OK to add fun elements, but for personal projects, you just build what feels right to you. There are a couple philosophical concepts at play here. First is that software should do a single job. Chore Wheel only tracks our evening chores. There are lots of other to-do lists we create: which groceries to buy, when to do car maintenance, packing lists, and so on. Chore Wheel doesn’t concern itself with those. Second is that perfection is achieved when there’s nothing left to take away. General-purpose software tries to solve so many different jobs that it can’t be simplified. Chore Wheel solves such a narrowly scoped problem that it can be reduced very far. I don’t take either philosophy to the extreme. I don’t expect others to find Chore Wheel useful. The app is aggressively optimized for our needs in a very specific workflow. Instead, I want to encourage you to try creating your own to-do app. Building software for yourself that you’ll actually enjoy using is an adventure in self-discovery. You’ll learn a little more about what motivates you and how your brain works. Starting with something small like a to-do list helps you focus on your personal user-stories instead of getting caught up building complicated internals. Start by looking at what kind of to-do lists you create already. These could be on paper, whiteboard, in an app, or in your head. Pick exactly one list to focus on. Consider where you are when you make the list, how you decide what to add, and how you complete tasks on the list. This is the workflow you’re going to work with. Think about how a custom app would help you perform the workflow as-is. If there are things that aren’t working in the workflow, think about how a custom app could change the workflow. Finally, build an MVP and iterate. Keep it simple, have fun, and make it yours. Starting from a simple prompt of “build a to-do list” everyone will end up at something drastically different. Make lunch for kids (if the next day is a school day) Move the trash can into the street (if tomorrow is garbage day) Water the plants (occasionally) It shows a single day at a time The user decides when to move to the next day A chore is completed with a single tap Adding a task has a simple UI There are no unneeded features

0 views

Certificate Authority Trustworthiness

The certificate authority (CA) system does an incredible job of solving an impossible challenge. Think about it. The CAs measure control of a domain name and then issue TLS certificates that pair cryptographic keys to those names. They do this on a global scale, often automatically. It’s impossible to do this perfectly, and unfortunately, they occasionally fail. In this post I describe the challenges the CAs face, describe a history of failures, and explain the process we use to maintain confidence in the system in spite of it all. The certificate authorities (CAs) solve a foundational key exchange problem for the Internet. They allow us to authenticate the TLS keys used by web servers, which they do by verifying control of domain names and signing certificates that associate public keys with these names. Authentication is a critical part of encrypting communications. Without authentication you may be encrypting with an attacker’s key, allowing them to eavesdrop on or tamper with your data in transit. Methods like certificate pinning work for things like IoT or mobile applications that communicate with a single back-end server. The developer can hardcode the certificate fingerprint and push an update any time it changes. But pinning doesn’t scale for websites or email. We need something Internet-scale, and we’ve got the CAs. Mozilla maintains a long list of CA compliance bugs that tracks over a thousand concerns. Most of these aren’t worth discussing, so let’s start with noteworthy CA-related problems from the past year: e-Tugra November 2022 An Internet-facing administration tool used by the e-Tugra CA had a wide-open sign-up page that allowed Ian Carrol , a security researcher, to register an account and view sensitive content. Of top concern: the confirmation codes used by the email domain control validation (DCV) method were visible. With this access the security researcher could have started a certificate signing request for a domain they didn't own, chosen the email DCV, and then intercepted the confirmation code via the administration tool 1 . e-Tugra fixed the issue when notified, but the community acknowledged "this isn't a little mistake" . Chrome and Mozilla distrusted e-Tugra after around six months of discussion. Trustcor November 2022 Trustcor appeared to have "shared corporate officers, operational control and technical integrations" with Measurement Systems, which "engaged in the distribution of an SDK containing malware". Chrome distrusted the CA, citing a loss of confidence: "Behavior that attempts to degrade or subvert security and privacy on the web is incompatible with organizations whose CA certificates are included in the Chrome Root Store." Trustcor contested the claims as "opinion, circumstantial evidence, conjecture, and fear-mongering". HiCA July 2023 HiCA used a remote code execution exploit on users' systems as part of a certificate issuance process. While HiCA's use of this technique wasn't malicious, the approach was immediately treated as a security vulnerability and fixed once proper notification was given. The concern was briefly discussed on the Mozilla forum, which acknowledged that HiCA is not a CA. HiCA was assisting in the certificate issuance process using an actual CA, using a process that was otherwise by the book. "Literally anyone can do this and do monumentally stupid/insecure things; it's not productive to have a discussion every time this happens." No action was recommended as the browsers decide on CA trust, not tools or 3rd parties that users choose to assist the issuance process. HiCA shut down soon after these events, citing security incidents. Unknown March 2022 An unnamed CA was hacked as part of a suspected state-sponsored hacking campaign also targeting government agencies and defense contractors. There is "no evidence to suggest [the hackers] were successful in compromising digital certificates". The lack of clarity on which CA was hacked, how they were hacked, and the lack of adequate public disclosure is troubling. There are many more of these that are older, but I think the recent events tell the story well 2 . The above shows successful attacks, poor security practices, and questionable organizations. While these are concerning, none of these events appear to have caused actual certificate mis-issuance 3 . The web browsers carefully consider the risk of maintaining trust relationships when these sorts of events happen, sometimes revoking trust after a thorough review. Mis-issuances occur somewhat rarely, let’s look further back in time. MCS Holding March 2015 MCS Holdings, an intermediary CA, mis-issued certificates for various domains, including Google's. These certificates appeared to be used for an internal man-in-the-middle proxy, but not external to the company. Google immediately distrusted the intermediary CA and quickly distrusted CNNIC, the root CA used by MCS Holdings. ANSSI December 2013 A similar incident occurred with ANSSI. DigiNotar July 2011 Hackers fraudulently obtained certificates from the DigiNotar CA for , , , , and more totaling 531 certificates in all. An active man-in-the-middle attack using these certificates was performed against users connecting to Google's services. The trust in DigiNotar was revoked and the company soon filed for bankruptcy. The DigiNotar hack is a textbook example of what we don’t want. The hackers not only compromised the CA, but they also fraudulently issued certificates, established a man-in-the-middle network position and intercepted the emails of 300,000 people . Thankfully, the DigiNotar hack is an outlier. Each web browser maintains a list of the CAs they trust out-of-the-box. As we’ve already seen, this trust can be revoked when problems arise. But what’s the inclusion process? Review is process heavy, focusing on security assurance and the trustworthiness of the organization operating the CA. There are independent audits and security standards . In the end, it’s a subjective decision with lots of supporting documentation. For transparency, Mozilla and the CA/Browser Forum use public discussion when deciding if a new CA should be added as a trust root. There are many boring examples that prompt little debate, like the inclusion request for LAWtrust. The denials can be terrifying: December 2019 : Accused of spying, inclusion request was denied and their existing intermediate CA certificate was distrusted. December 2015 : “it appears that the owner of this CA has used their certificates to MITM” November 2018 : “did not disclose the incident, nor - given that the other two were never revoked - did they apparently perform a scan of their certificates to identify any others.” March 2018 : “A CA can’t simply fix one problem after another as we find them during the inclusion process.” A rough way to measure the security of a system is to observe how often it is attacked versus adjacent areas. Attackers have limited resources, so they are biased to choose the weakest links (or perceived weakest). Here are some examples of how attackers bypass the protections the CAs provide, without attacking the CAs directly: Opportunity to attack the CAs exists, but these adjacent attacks are significantly more common. CA root certificates are long lived, up to 25 years . The DigiNotar root certificates that were maliciously used by hackers in 2011 are still not expired . So active revocations are required when issues arise. Unfortunately, even when the root certificates expire, they don’t really expire. Many consumer devices contain hard-coded trust stores that stop getting updated soon after the initial sale. This was a concern for Let’s Encrypt when the IdenTrust root certificate expired. Around a third of Android devices trusted the expiring root certificate, but hadn’t been updated to trust Let’s Encrypt’s new root. IdenTrust agreed to sign Let’s Encrypt’s certificate past the expiration of their own trust root. This worked because Android doesn’t enforce the expiration of its trust roots. While this approach allowed many old Android devices to be usable, it underscores a problem. The CAs often cannot be distrusted, not via software update nor expiration dates. As such, some devices will forever trust the problematic CAs referenced earlier in the post. A recurring topic of discussion is government coercion of the CAs. Every CA operates within the jurisdiction of a government which can exert legal pressure on the CA. Many countries have laws that compel technology companies to assist the government in certain circumstances. The CAs promise not to mis-issue certificates, but a request from their government could supersede that promise. Understanding the legal exposure of a CA is a complicated question of foreign law. Since revoking a trust root isn’t always possible, it’s also an exercise in predicting how laws may change. Given those concerns, you may be surprised to know that some of the CAs are directly operated by government agencies. Here are several from the Microsoft trust store : Historic trust also existed for: The Mozilla list and Google list are much shorter than Microsoft’s list. Unfortunately, Microsoft doesn’t operate a public discussion forum, so the purpose and justification of these inclusions is not apparent. One of the most important functions of a government is to provide services to its people. It’s common for governments to provide security services, issue identity documents, and handle delivery of postal mail, for example. I don’t think it’s unusual for governments to seek to provide Internet-based identity services. But there is a conflict of interest, as governments also perform law enforcement, intelligence, and military operations. The global reach a government-operated CA has may not be appropriate. With all these problems, why do we still trust the CA system? One reason is the lack of a better alternative. I’m planning to post about DNSSEC+DANE soon, but in short: it’s a mess. This is an incredibly hard problem space, and nothing else is viable. The web browsers have done an excellent job defining security standards, reviewing inclusion requests, considering revocations, and being transparent to the users about their decisions. It’s a dynamic process needing constant attention and vigilance. Requiring certificate transparency logs (a public log of every issued certificate) provides great tooling to audit the issuance practices and swiftly detect problems . Recent improvements like Certificate Authority Authorization (CAA) reduce the impact of certain classes of CA security incidents. Attacks like the DigiNotar hack are much easier to detect these days and we have tools to reduce impact. The large number of attacks impacting adjacent systems is a strong signal that effort is best spent securing those adjacent systems. Security isn’t about perfection, it’s about strengthening weak areas. With that said, I think there’s still high-value work to be done here. This post is heavily references actions of the Mozilla trust store, largely due to the public discussion forum they use. Without such public discussion users have no idea what the inclusion process looked like. Sometimes objections are raised and the justification used for inclusion helps concerned users understand the nuance of the role the CAs fill. Microsoft, Apple, and Google own major web browsers but do not provide an equivalent open discussion forum. Each browser trusts a distinct set of trusted CAs, so the justification for inclusion can’t always be inferred from Mozilla’s forum. You can review these lists here: None of the CAs offer bug bounties, they instead rely on private auditors. There’s clearly low-hanging fruit the auditors are missing. Bug bounties are a great way to encourage altruistic hackers to take a look at your security while providing safe harbor . Without these incentives, less scrupulous hackers will look anyway and may sell their findings to malicious actors. Some devices can be challenging to update, especially if they are abandoned by the manufacturer. Let’s Encrypt had this issue when they were getting started as a CA, particularly for Android devices. The only way to mass deploy updates to the trust store on Android is via over-the-air (OTA) updates. Thankfully Android is fixing this issue by adding updatable trust stores . All devices should support updating trust stores. If Chrome doesn’t trust a CA, why should my Firefox browser trust it? And vice versa. Any competently operated website wouldn’t use a CA that isn’t widely trusted, so the loss of functionality should be marginal. When a distrust action is taken, it’s common for all the trust stores to agree on revocation, but decisions don’t always happen at the same time. Custom browser builds with a security focus should use a trust store that only includes trust roots that are in all the major browser trust stores. Better UX can be helpful for end-user pruning of trust roots. December 2019 : Accused of spying, inclusion request was denied and their existing intermediate CA certificate was distrusted. December 2015 : “it appears that the owner of this CA has used their certificates to MITM” November 2018 : “did not disclose the incident, nor - given that the other two were never revoked - did they apparently perform a scan of their certificates to identify any others.” March 2018 : “A CA can’t simply fix one problem after another as we find them during the inclusion process.” Hackers often have success simply using invalid certificates. Users may suffer from security fatigue , and click-through when faced with an active man-in-the-middle attack. These attacks are relatively easy to perform and off-the-shelf tools exist. Look-alike domains are quite effective. An attacker registers a domain that looks like the target domain name. Since they own the look-alike domain they can get valid TLS certificates. Social engineering is typically used to trick the victim into connecting to the look-alike. Blocking traffic on port 443 (HTTPS) still works to perform downgrade attacks. Savvy users may notice a missing lock icon in the lock icon, but others won't realize there is an issue. Tools like HTTPS-only mode and HSTS add protection but aren't widely used. Weak encryption and TLS bugs can be exploited: Logjam , weak primes , Sweet32 , Heartbleed , RC4 , Lucky Thirteen , POODLE , FREAK , and BEAST . Between 2015 and 2020 the Government of Kazakhstan repeatedly attempted to mandate the installation of a Government-operated root certificate on its citizen's devices. This certificate would have allowed the government to perform man-in-the-middle attacks on HTTPS traffic. Browser vendors responded by deny-listing these trust roots , such that they would not be trusted even if the user manually installed them. Hackers can easily obtain certificates for sites they've already hacked by manipulating DNS records (ACME DNS-01), modifying files on web servers (ACME HTTP-01), or stealing verification codes from email inboxes. This works because the CAs validate domain control, not ownership. They could even steal existing certificates from servers they've compromised. NB: the hackers don't always need to perform a man-in-the-middle attack if they've already compromised an endpoint; they've already carried out a higher-impact attack. Malware may install its own root CA certificates to allow snooping on HTTPS traffic. Opportunity to attack the CAs exists, but these adjacent attacks are significantly more common. Irrevocable Trust CA root certificates are long lived, up to 25 years . The DigiNotar root certificates that were maliciously used by hackers in 2011 are still not expired . So active revocations are required when issues arise. Unfortunately, even when the root certificates expire, they don’t really expire. Many consumer devices contain hard-coded trust stores that stop getting updated soon after the initial sale. This was a concern for Let’s Encrypt when the IdenTrust root certificate expired. Around a third of Android devices trusted the expiring root certificate, but hadn’t been updated to trust Let’s Encrypt’s new root. IdenTrust agreed to sign Let’s Encrypt’s certificate past the expiration of their own trust root. This worked because Android doesn’t enforce the expiration of its trust roots. While this approach allowed many old Android devices to be usable, it underscores a problem. The CAs often cannot be distrusted, not via software update nor expiration dates. As such, some devices will forever trust the problematic CAs referenced earlier in the post. Government control of CAs A recurring topic of discussion is government coercion of the CAs. Every CA operates within the jurisdiction of a government which can exert legal pressure on the CA. Many countries have laws that compel technology companies to assist the government in certain circumstances. The CAs promise not to mis-issue certificates, but a request from their government could supersede that promise. Understanding the legal exposure of a CA is a complicated question of foreign law. Since revoking a trust root isn’t always possible, it’s also an exercise in predicting how laws may change. Given those concerns, you may be surprised to know that some of the CAs are directly operated by government agencies. Here are several from the Microsoft trust store : Department of Defence Australia (cert) Government of Brazil, Instituto Nacional de Tecnologia da Informação (ITI) (cert) Government of Finland, Population Register Centre’s (Väestörekisterikeskus, VRK) (cert) Government of Hong Kong (SAR), Hongkong Post, Certizen (cert) Government of India, Ministry of Communications & Information Technology, Controller of Certifying Authorities (CCA) (cert) Government of Korea, KLID (cert) Government of Lithuania, Registru Centras (cert) Government of Portugal, Sistema de Certificação Electrónica do Estado (SCEE) / Electronic Certification System of the State (cert) Government of Saudi Arabia, NCDC (cert) Government of South Africa, Post Office Trust Centre (cert) Government of Spain, Autoritat de Certificació de la Comunitat Valenciana (ACCV) (cert) Government of Spain, Dirección General de la Policía – Ministerio del Interior – España (cert) Government of Spain, Fábrica Nacional de Moneda y Timbre (FNMT) (cert) Government of Sweden (Försäkringskassan) (cert) Government of Taiwan, Government Root Certification Authority (GRCA) (cert) Government of The Netherlands, PKIoverheid (Logius) (cert) Government of Turkey, Kamu Sertifikasyon Merkezi (Kamu SM) (cert) Government of Uruguay, Agency for E-Government and Information Society (AGESIC) (cert) Korea Information Security Agency (KISA) (cert) Macao Post and Telecommunications Bureau (cert) Swiss BIT, Swiss Federal Office of Information Technology, Systems and Telecommunication (FOITT) (cert) Thailand National Root Certificate Authority (Electronic Transactions Development Agency) (cert) Historic trust also existed for: China Internet Network Information Center (CNNIC) (discussed earlier) Government of France (ANSSI, DCSSI) (discussed earlier) Government of Japan, Ministry of Internal Affairs and Communications Government of Latvia, Latvian State Radio & Television Centre (LVRTC) Government of Mexico, Autoridad Certificadora Raiz de la Secretaria de Economia Government of Venezuela, Superintendencia de Servicios de Certificación Electrónica (SUSCERTE) Post of Slovenia The Uruguayan Post, “El Correo Uruguayo” U.S. Federal Public Key Infrastructure (US FPKI) (removal) The Mozilla list and Google list are much shorter than Microsoft’s list. Unfortunately, Microsoft doesn’t operate a public discussion forum, so the purpose and justification of these inclusions is not apparent. One of the most important functions of a government is to provide services to its people. It’s common for governments to provide security services, issue identity documents, and handle delivery of postal mail, for example. I don’t think it’s unusual for governments to seek to provide Internet-based identity services. But there is a conflict of interest, as governments also perform law enforcement, intelligence, and military operations. The global reach a government-operated CA has may not be appropriate. Trusting the system With all these problems, why do we still trust the CA system? One reason is the lack of a better alternative. I’m planning to post about DNSSEC+DANE soon, but in short: it’s a mess. This is an incredibly hard problem space, and nothing else is viable. The web browsers have done an excellent job defining security standards, reviewing inclusion requests, considering revocations, and being transparent to the users about their decisions. It’s a dynamic process needing constant attention and vigilance. Requiring certificate transparency logs (a public log of every issued certificate) provides great tooling to audit the issuance practices and swiftly detect problems . Recent improvements like Certificate Authority Authorization (CAA) reduce the impact of certain classes of CA security incidents. Attacks like the DigiNotar hack are much easier to detect these days and we have tools to reduce impact. Call to action The large number of attacks impacting adjacent systems is a strong signal that effort is best spent securing those adjacent systems. Security isn’t about perfection, it’s about strengthening weak areas. With that said, I think there’s still high-value work to be done here. Transparent inclusion decisions This post is heavily references actions of the Mozilla trust store, largely due to the public discussion forum they use. Without such public discussion users have no idea what the inclusion process looked like. Sometimes objections are raised and the justification used for inclusion helps concerned users understand the nuance of the role the CAs fill. Microsoft, Apple, and Google own major web browsers but do not provide an equivalent open discussion forum. Each browser trusts a distinct set of trusted CAs, so the justification for inclusion can’t always be inferred from Mozilla’s forum. You can review these lists here: Chrome

0 views

HSTS preload adoption and challenges

HTTP Strict Transport Security (HSTS), is a way to signal to a web client that valid HTTPS certificates must be used when connecting to a domain. There are two main benefits to HSTS. First, it prevents a user from connecting over an unencrypted HTTP connection. Unencrypted HTTP leaks data to the network and is vulnerable to manipulation in man-in-the-middle attacks. Second, it prevents a user from connecting if the server presents an untrusted TLS certificate. Users aren’t great at deciding if a warning message about an untrusted TLS certificate is a security concern or just a misconfiguration. They don’t have the knowledge or tools to decide, and they may inadvertently allow an attack to proceed by clicking through the warning. A site that deploys HSTS has determined that it can encrypt all its web traffic with HTTPS and accurately manage its certificates. HSTS preload is a mitigation against a very specific attack. A web browser needs to talk to a web server to see if it has HSTS headers. Once it does, it can cache those headers and it will be protected through the expiration. However, that first load is vulnerable as the HSTS policy is not yet known (the bootstrap man-in-the-middle vulnerability). A malicious network operator could block HTTPS on port 443, for example, to try to trick browsers into thinking sites don’t support TLS. Web browsers ship with a large hardcoded list of domain names known to support HSTS: the HSTS preload list. As such, the first load of these websites can be protected. Here’s the HSTS status for the apex domain of the twenty most visited domains on the internet: I want to call out a couple of things. Only eight of the top twenty domains are on the Chrome HSTS preload list. Half aren’t using HSTS at their apex domain at all. Yahoo and OpenAI are using the HSTS headers, but they aren’t on the preload list. It depends on your browser settings. You should, right now, make sure you’ve enabled HTTPS-only mode . Unfortunately, this isn’t the default setting so many users will still connect over HTTP in some situations. Without this setting, clicking on a URL like (insecure URL) will cause your browser to connect over unencrypted HTTP. Google responds by sending an HTTP redirect to the HTTPS site, but that redirect can be tampered with as it is served unencrypted. It could redirect to a malware or phishing site if you’re on an untrusted network. is also vulnerable to these issues, but only the first time you connect. An HSTS-aware browser ( 98.5% currently ) will cache the HSTS setting, preventing the use of HTTP or bogus TLS certificates until it expires (per the directive). This is a significantly stronger position, as subsequent visits are forced to use HTTPS and require the TLS certificate to pass all validation checks. The sites that support preload, like , will always use HTTPS. Your browser knows before you ever connect that the site requires HTTPS, so it will never try HTTP. We’ve got to start with . They’re at the top of the list in terms of traffic and Google manages “the” HSTS preload list. Google and it’s customers are frequent targets of state-sponsored hackers. Shouldn’t they be using HSTS? They do use HSTS, but not at the apex domain: Google has lots of projects hosted as subdomains under and they can’t enable domain-wide security settings until all subdomains can support them. HSTS headers are missing from several subdomains like, and . Even very new projects, like the AI tool Bard, are being launched without HSTS support. demonstrates another interesting thing about the preload list. Some of the Google subdomains, like , , and , are on the preload list even though the preload list doesn’t currently support adding subdomains. These must have been added before that rule took effect. The preload list ever-growing, so there are space concerns and they’ve got to be careful about adding too many entries. The current list has 120,699 entries and has a file size of 18 MB. A per-subdomain list would be far too massive. Google publishes a transparency report which shows how much of their traffic is encrypted using HTTPS. They’ve been stuck at ~95% for the last five years, so it looks like they’re still a long way from being HTTPS-only. Private browsing windows (I.E., incognito in Chrome) don’t save the HSTS setting after the window is closed. This protects privacy, but has a security impact since every time is like the first time: there’s no cached HSTS setting to protect you. If most users are opening a site in a private window and closing it with every visit, HSTS headers are mostly useless. and don’t support HSTS, although both supports HSTS and is on the preload list. Being on the preload list is probably more important here, as the HSTS headers have reduced effect. doesn’t set HSTS headers on the apex, but many of the subdomains use it. Let’s take a look. I can’t find any subdomains of without HSTS headers (can you?) Amazon looks much closer to enabling HSTS for their apex domain and adding it to the preload list than Google. I found one. A preload list is bulky and requires regular maintenance, so it’s difficult to ship it alongside most web clients. It sort of feels like something the operating system could provide, similar to the certificate authority trust roots, but no operating systems do this today. I found that curl has optionally supported HSTS since 2021 and wget has supported HSTS since 2015 (currently enabled by default). wget supports preload lists and a third-party tool can import Chromium’s preload list. curl doesn’t support preload lists. Excluding traditional web browsers, client-side support for HSTS is rare. The only HTTP client library I found supporting HSTS was libcurl. None of the other popular HTTP clients have HSTS support, including Python’s Requests, C++’s Boost, Java’s HttpClient, and others. Not even the venerable OpenSSL supports HSTS. Software developers and other technical users just need to be careful it seems. Make sure you’re using as most tools will happily use . To get added to the Chromium preload list, a domain needs to configure HSTS headers, including a non-standard directive, and then manually request addition. Mozilla copies the Chromium list but validates each domain by checking that the HSTS headers are still present . Domains that are currently online but aren’t sending HSTS headers are filtered out of Mozilla’s derivative list. Since the inclusion rules are different we end up with two distinct lists. The HSTS RFC mentions preload lists briefly but doesn’t specify behavior. So neither the Chromium nor Mozilla approach is wrong. But the lists have several thousand discrepant domains. To me, the Mozilla behavior feels more true to the standard as it doesn’t require nonstandard directives and you’ve got to keep the HSTS headers active, or else the HSTS preload will drop. The disagreement between the lists causes an inconsistent security posture on Chrome and Firefox. A site operator may think they enjoy HSTS preload protections, while it actually varies by browser. Chromium runs a website that checks if domains are on “the” preload list. Mozilla doesn’t have an equivalent. This may perpetuate the misunderstanding that there’s only one HSTS preload list. What other domains are impacted? I ran a diff between the Mozilla preload list and the Chromium preload list . There are a ton of domains in the output that aren’t currently online and the vast majority appear to be niche websites. Few scream high-profile targets, so their presence on the preload list is questionable. After all, the HSTS header offers great protection on its own; the preload list only protects against the bootstrap man-in-the-middle vulnerability. Does the (currently offline) marketing website for a bed and breakfast really need to be on the preload list? Certain websites are highly targeted to the point that any exposure to HTTP or bogus TLS certificates causes measurable harm. There are some related improvements, like HTTPS-only mode in browsers, that partially mitigate some of the same problems HSTS is trying to support. But HSTS preload provides full coverage whereas others apply only partially. Do we need 100k domains on the preload list? Certainly not. The public-suffix list , another list hardcoded in web browsers, has minimum user count requirements for addition. The public-suffix list requirements state that “projects not serving more then thousands of users are quite likely to be declined” although enforcement appears to be based on self-reporting. The HSTS list enforces no such policy, so there are likely many domains with very low usage. Just how fast is the HSTS preload list growing? Since the preload list is stored in git I’m able to look at the history to pull out some statistics: Here’s the HSTS preload domain count over time, showing a troubling growth rate and the recent large drop (log scale): The majority of domains removed on April 21st currently show negative WHOIS checks. A sample of domains shows 84% are no longer registered. A domain that isn’t registered has no owner, so the HSTS policy can reasonably be reset. There’s some minor concern about temporary lapsed domains, but a delay before removal can mitigate that. Removed entries often belong to services that provide free subdomains, like: *.ddns.net (126 removed), *.azurewebsites.net (93 removed), *.herokuapp.com (57 removed), and *.duckdns.org (28 removed). It seems like these subdomains shouldn’t have been added to the list, as subdomains can no longer be added, but they are domain names too. Each of these domains is on the public-suffix list, which indicates that child domain names are owned by different entities, so each should be handled as a domain name and be able to set their HSTS policies independently. HSTS is a valuable technology and protects against real-world attacks. HSTS preload is a niche security feature that even Big-Tech companies are slow to adopt and consider as an optional defense-in-depth measure. Concerns about the size of the list have already triggered restrictions on the addition of new subdomains and cleanup efforts. I suspect HSTS preload will have reduced importance in years ahead as features like HTTPS-only mode become more widely deployed. The list will continue to grow, however, as it’s easy to add new domains. I suspect the list will eventually adopt more restrictions to slow growth and avoid including domains that don’t need it. A total of 229k domain names have ever been preloaded. 108k of those have been removed (47%). The largest removal was on April 21st removing 82k domains One entry was removed because the entire ccTLD is discontinued : google.tp .

0 views

A mental model for on-demand pricing

Cost optimization is a constant concern in cloud architecture. Cloud vendors often obfuscate costs or frame costs in unhelpful ways. Changing the way you think about cloud computing costs can be a really helpful way towards getting your bill under control. In this post I discuss a strategy for deciding how many instances to reserve when auto-scaling. The first trick cloud marketers pulled off was to get everyone thinking about computing costs using the on-demand pricing as a baseline. This lets them frame reserved instances as a discount or a savings; they are giving you a deal! Avoid this style of thinking. Let’s look at some pricing examples. Prices from Vantage’s EC2 Instance Comparison tool . EC2 instances running Linux in us-east-2 (Ohio), monthly cost. Reserved costs are “no upfront” for one year. Actual prices subject to change. Excluding EBS, data transfer, and other expected costs. Framing the pricing this way tends to make on-demand the default choice. It looks like you have the option to save money, if you can predict your usage and reserve your instances. However, if I flip the narrative to make reserved the baseline, this changes: Now I’m thinking, “Ouch those on-demand instances are pricey!” I’d need to justify paying a premium over the “default” option of a reserved instance. If I can’t predict my usage, then I’m going to pay a premium. Is this mental model better? No, not really. It’s just another way marketing can frame prices to stir up emotions. Let’s find a better model. About 92% of the EC2 instances available show an on-demand markup between 1.5x and 1.7x. The mean and median markup is 1.6x. I’m not certain why AWS uses this number. It could be based on utilization rates, desired profit margins, or an artifact of financial risk management. Whatever the cause, knowing this “constant” can greatly speed up thinking about cost optimizations. I included i2.xlarge in the table as it’s clearly an outlier. The entire i2 family (xlarge, 2xlarge, 4xlarge, and 8xlarge) use a 2x markup. You can see those as the four large spikes in the graph. I suspect the higher rate reflects that the i2 series is a “previous generation” instance and customers should migrate to new hardware. In this case, that’s the i3 family, which all use a 1.5x markup. As customers migrate off this old hardware it will become harder and harder to support variable demand. If you’re using these, please upgrade, the new instance types are much cheaper. If we base our cloud-pricing mental model in reserved instances, then an on-demand instance (running 100% of the time) costs about 1.6 times more. Let’s define this as a constant: the “on-demand markup” constant is 1.6. Under this model, a reserved instance costs 1 and an on-demand instance running 50% of the time costs 0.8 (I.E. half of the “on-demand markup” constant). These unit-less costs can help decide when it makes sense to use a reserved instance, and when an on-demand instance is better. Unit-less values can be confusing to deal with, so I’ll make up a unit “C” so you know we’re talking about unit-less cost. To get back to actual costs, just multiply by the reserved instance cost. For example: 3.5 C for t3.median is $66.68 ( ) monthly. It doesn’t matter if those are all on-demand instances, or if some of them are reserved, we’ve abstracted that away. Every pricing effort has a different starting place. Maybe you’re running on-demand instances and looking to get a discount by running reserved instances to handle your baseline load. Maybe you’re running a reserved instance fleet and you’d like to scale down when load is low, again reducing costs. In either case, you’re going to want to find break even points. We can figure out how many hours a day we can run an on-demand instance before its cheaper to just reserve it. This is easy to calculate with the on-demand markup constant: On-demand only makes sense if you can run the instance for less than 15 hours per day, otherwise reserved is cheaper. Or the other way, on-demand becomes viable if you can keep an instance off for at least 9 hours per day. Each hour an on-demand instance runs costs 0.0667 C ( ). If you run an on-demand instance for 14 hours a day, one hour below the 15 hour break-even point, then you can save 0.0667 C. If you only need it for five hours, then you’re ten hours below and you save 0.667 C. The opposite is also true, if you run an on-demand instance for 16 hours a day, one hour above the break-even point, then you’re spending 0.0667 C more than if you had reserved it. If you run an on-demand instance 100% of the time then the cost is 0.6 C ( ) more than reserved (there’s our 1.6x markup again). If your load is more seasonal, you’ll want to calculate the break even point in days: You’ll need to turn your on-demand instance off at least 137 days per year to see a benefit. Each excess on-demand day costs 0.0044 C. If your load pattern has both seasonal and daily patterns, you’ll need to build a hybrid model. It’s tempting to use reserved instances for your baseline load and to use on-demand instances for the variable load. But this is not optimal! You may benefit from making the cut a little higher. Consider the following simplified load pattern. It’s a sinusoidal load centered on 100 requests per second (RPS) rising and falling 40 RPS through a single day. Your load pattern will be different, so treat the following as an example only. If each instance can handle ten RPS, then fourteen instances can handle the peak load and seven can handle the minimum load. You may expect the minimum to be six instances, as the minimum load is 60 RPS, but this is only momentarily true. As soon as the request rate rises, to say 60.1 RPS, you’ll need to round up to seven instances. This is too brief to scale down to six instances. I’ve marked the number of required instances in red, which follows a step pattern. To optimize cost, you’ll want to determine how many instances can be turned off for nine hours or more. Remember, you won’t see a cost savings if you run an on-demand instance for 15 hours or more. I counted the required instance counts throughout the day as: My baseline load requires seven instances but I’ll want to reserve nine instances, since each of those is needed for too many hours. The rest will be on-demand: Now that we know the optimal strategy, we can contrast it with other strategies. The “all on-demand” strategy will use auto-scaling, but doesn’t reserve any instances. You should expect to save some money when it turns off unneeded instances but to overpay for the baseline load. The “all reserved” strategy will reserve the maximum required instances and keep them all running. It’s very easy to compute the cost of this strategy, each instance costs 1 C, and a total of 14 C for our example. The “baseline” strategy will reserve instances for the minimum load and will use on-demand instances to auto-scale for the remaining load. Finally, the “optimal” strategy reserves instances such that any running 15 hours a day or more are reserved and the rest are auto-scaling on-demand instances. Since we know 15 hours is the break even point, this should perform the best. As promised, the optimal strategy performs the best. I’ve shaded cells in red when they are above the 1 C cost of a reserved instance and green when we’re getting a discount. This clearly shows when each strategy does well, and where it performs poorly. Something else I hope jumps out here: auto-scaling with only on-demand instances is the most expensive option, about 43% worse than using a fixed-size reserved instance fleet. If you’ve got baseline load, you may be better off reserving instances than auto-scaling, if you only choose one. If all this math and numbers is too much, here’s the solution visually. Draw a horizontal line through the graph at the point where exactly 15 hours per day are below the load curve. Divide these RPS by the load each instance can handle (10 RPS in this example). This is the number of instances you should reserve, the rest can be on-demand. Can you tell the optimal strategy just from the shape of these load graphs? When load is flat: the best strategy is to reserve instances. There’s no opportunity to turn off instances so on-demand and auto-scaling won’t help. When load is extremely spiky: you can auto-scale, letting you save money on the off-hours. When load is briefly low: auto-scaling won’t save costs as you won’t be able to turn off on-demand instances long enough. Hopefully those weren’t too challenging. These concepts can help visually assess where auto-scaling and on-demand instances can help. In absolute terms, the estimated annual cost of the example would be: These figures are obtained by multiplying the reserved instance cost back in. There’s some discrepancy here as the on-demand markup isn’t exactly 1.6x for all instances. AWS also offers three-year reservations and reservations with different payment terms. I don’t think these meaningfully change the previous analysis, but I’ll talk through them. Each of these options can further reduce the reserved instance price, which lowers the break-even point. This will reduce the number of on-demand instances you’ll want to use, causing you to auto-scale less. You’ve probably noticed that EC2 offers three year reservations as well, at an even further discount. I usually avoid these. Three years is a long time to lock in your capacity and there’s a lot that can changes in that time. Here’s some things to keep in mind: Capacity planning is about predicting the future. You’ll never be able to predict it perfectly, but some predictions are safer than others. Amazon marketing considers full upfront reserved instances to be “discounts”, where you can save by paying more up front. As before, we can shift our thinking and consider these to be markups, with full up front being the default option. Under this model, the no upfront reservation charges a premium to let you pay in installments. Prepayment generally gets you a 6.7% discount (1 year reserved, no upfront vs full upfront), although there’s some variation again: I don’t think these variants are too interesting from a DevOps perspective, it’s just a question of finances. If you’ve got cash “burning a hole in your wallet”, pay up front to see a cost savings. Choosing full upfront will change the “on-demand markup” to 1.7x, so you’ll want to adjust your models. The break even point on a daily basis is lowered to 14.1 hours, making it slightly less desirable to auto-scale. If you over estimate your capacity needs, then you’ll have too many reserved instances. Compared to an optimal allocation, you’ll miss out on 0.0667 C savings for each instance-hour. If you underestimate your capacity needs, then you’ll have too few reserved instances. Compared to an optimal allocation, you’ll overpay at 0.0667 C for each instance-hour. If you experience growth, you may have an opportunity to reserve additional instances. But be careful about having too many reservations that end at different times of the year. At some point you’ll need to end your usage of this instance type. Timing your instance type upgrade based on reservation expiration lets you maximize your value, but you’ll want all your reservations to end at the same date. Intentionally over-estimating capacity needs has it’s own benefits. You’ll have more stable costs if you use fewer on-demand instances. You can limit your worst case scenario costs by putting a cap on your auto-scaling. Using extra reserved instances can reduce that worst case cost. This is helpful if you need your projected budget to be close to your actual bill. Something to watch out for is that on-demand instances may not always be available. All cloud providers have capacity limits, and sometimes the entire availability zone can reach these. This may be more common with smaller regions, uncommon instance types, or during certain busy times of the year. You’re also more likely to see this with certain deployment patterns; like deploying to a full fresh set of instances, cutting traffic to the new set, and then terminating the old set. This pattern causes you to temporarily have twice the instance count. I’ve ignored spot instances so far. These are harder to work with and harder to optimize ( instance weighing can require careful application profiling , for example). The prices are more variable, so it’s harder to reason about and model these costs. You also need to be OK if you’re not able to get a spot instance when you need it, which works OK for background jobs but not for typical web server load. “We strongly warn against using Spot Instances for these workloads or attempting to fail-over to On-Demand Instances to handle interruptions.” – Best practices for EC2 Spot I hope I’ve helped explain the dynamics of EC2 pricing and made it a little easier to think about optimization strategies and their impact. This sort of analysis should work on other cloud service that use similar billing models, but be sure to model your actual load and the markups. Good luck! the tenth instance I’ll run on-demand for fourteen hours, reducing the cost of that instance by 0.0667 C ( ) the 11th instance I’ll run on-demand for about 12 hours, reducing the cost of that instance by 0.19 C ( ) the 12th at about 10 hours and a 0.333 C reduction the 13th at 8 hours and a 0.466 C reduction; and the 14th at 5.5 hours and a 0.633 C reduction. You may release new features or experience new usage patterns, causing a different instance type to be optimal AWS may launch new instance types, which are usually much cheaper Your service may experience decreased load if customer needs change

0 views

Improving HTTPS on private networks

For the last couple months I’ve been fanatic about understanding why HTTPS adoption on private networks is so poor, and so poorly implemented. I believe the challenge comes down to poor usability of the ACME certificate issuance protocols in the context of private networks, small but meaningful cost pressures, and legacy momentum. I’m excited to share a service that I believe can address these shortcomings and improve network security while decreasing toil. In this post I share the launch of getlocalcert.net . Learn how the service can help you issue HTTPS certificates that are suitable for use on localhost, private networks, and internal networks. I walk through account registration, installation, and configuration of the Caddy web server for automated certificate issuance. With a nod to the popular Let’s Encrypt project, which helped the public Internet transition from 25% to 80% of web pages using HTTPS over the last ten years, the goal of getlocalcert.net is to increase adoption of HTTPS on private networks through better tooling, improved documentation, a free subdomain name registrar, and a free DNS service. Our model uses globally trusted HTTPS certificates from certificate authorities (CAs) that support the ACME DNS-01 protocol, such as the free domain validated (DV) certificates of Let’s Encrypt and ZeroSSL . These HTTPS certificates are preferred over self-signed certificates and private CAs as they require the least amount of configuration management on end-user devices. While I rarely encounter HTTP-only websites on the Internet these days, they are still quite common on private networks. When HTTPS is used, it’s often via private CAs. A private CA requires complex roll-out of the root certificate into every device on the network. If misused, the root certificate could enable a man-in-the-middle (MITM) attack against every website on the Internet. Further, managing certificate renewals can be a complicated manual process, resulting in unneeded toil, expired certificates, and broken websites. Web browsers are pushing HTTPS adoption forward by disabling or limiting features like service workers, CORS, geolocation, device orientation, notifications, and caching on insecure webpages. Performance improvements in HTTP/2 , and HTTP/3 have drastically improved performance, but only support encrypted connections. Websites run on private networks may benefit from these modern features, so a proper HTTPS deployment is valuable. getlocalcert is now available for early access and a subset of the planned roadmap has been implemented. As of today, getlocalcert supports: Already, this limited feature set offers a number of advantages. I configured a simple HTTPS website that used the Caddy web server , a free subdomain from getlocalcert, and a free Let’s Encrypt HTTPS certificate. If you’re already familiar with Linux system administration tasks then this process should take about five to ten minutes to replicate. First I opened console.getlocalcert.net to register an account. I needed to sign in with my GitHub account, but we’ll eventually support email/password as well. I logged in and I clicked the button. The service assigned me a globally unique subdomain name (2.localhostcert.net). Here’s my domain details page, showing that several default DNS records were present and were locked. These locked records keep getlocalcert focused on our niche: private networks. At the bottom of the page I saw the section for API keys, although none existed yet. I created one by clicking . The next page shows the API key. You’ll want to save yours in a password manager, credential vault, or similar secure storage. Don’t commit these to a git repository. If you accidentally expose your API key you can delete it from the listing on the previous page. I deleted mine before publishing this blog post. I used these credentials for the rest of the setup. If you’re following along, you should substitute your own values anywhere you see these. Next I set up Caddy. Caddy supports getlocalcert using the ACME-DNS module , which isn’t in the default build. There’s currently an issue impacting Caddy builds, so I needed to build Caddy using a non-standard approach. I created a minimal Caddy file: Notice that I used the Let’s Encrypt staging environment. I didn’t want to use Let’s Encrypt production yet as it has strict rate limits, best practice is to confirm everything is working in staging first. I saved my getlocalcert credentials to a JSON file. Many ACME clients support a credentials file of this format. All set, I started Caddy and watched the certificate issuance process in the log output: Once I saw I knew I had passed the ACME DNS-01 challenge. I checked my work by loading 2.localhostcert.net in Firefox. The , error message I got here was expected as Firefox doesn’t trust the Let’s Encrypt staging environment. If you see an issue at this step, spend some time debugging here before proceeding, or ask for help . Finally, I commented out (#) the line in my Caddyfile and ran Caddy again. This time a certificate was issued by Let’s Encrypt production. The certificate was issued and I reloaded 2.localhostcert.net in Firefox. Now that Caddy was using a Let’s Encrypt production certificate, I no longer saw the certificate warning message. My web page is pretty boring, but it’s loading over a secure HTTP/2 connection. If you follow these steps you’ll find that Caddy manages HTTPS certificate renewals automatically, so there’s minimal manual maintenance toil. In just a couple minutes I’ve registered a free subdomain, configured a web server, and issued a free HTTPS certificate that my web browser immediately trusted. But this was just a quick demo, so there’s some additional things you’ll want to try next. You’ll probably want to configure Caddy to do something useful, so head to the Caddy docs to learn more. Check out the docs to learn about several additional supported tools. Prefer to use acme.sh with NGINX? That’s compatible with getlocalcert too! If you’d like to host multiple subdomains, you’ll currently need to use a wildcard certificate. Support for proper subdomains management is on the roadmap, but for now you can ask Caddy to request a wildcard certificate using: A great reference for this setup is here . If you’d like to access your web server from a different computer on the same network, you can do this by changing your local DNS settings. An easy approach is to edit your hosts file . You’ll add an entry like: Or if you’ve setup a wildcard certificate: Alternatively, if you run a DNS server like Unbound you can manage your DNS resolution on that server. A simple Unbound view to remap to looks like: These work best with static IP addressing. Read on in the DNS docs for more help. There’s a lot of improvements planned: I’d love to hear your thoughts as well. Check out the discussion of the getlocalcert security model and roadmap in the docs . None of what we’ve discussed so far is valuable if it doesn’t provide suitable security, and as such, security remains a primary focus of the project. More details will be provided as we grow. Please give us a try . I hope you’ll find that getlocalcert can simplify HTTPS certificate management for private networks. If you’d like updates on this and other projects, check out the subscription information below. Free subdomain registration under Issuance of free, globally valid HTTPS certificates via ACME DNS-01 Wildcard certificates are supported Public DNS A records locked to Certificate automation using several popular ACME clients Public CAs can issue HTTPS certificates to subdomains of unlike , , or which require a private CA Zero client configuration: web browsers already trust certificates issued by Let’s Encrypt and ZeroSSL Free subdomain registration supports cost sensitive use cases Certificate misissuance can be monitored (private CAs aren’t subject to certificate transparency rules) Managed A records other than just 127.0.0.1 Deep subdomain support Vanity/custom subdomain names Bring your own domain name Public suffix list and rate limits Documentation of all compatible ACME clients Privacy mitigations to counter hostnames leaked in certificate transparency logs Documentation for additional use cases

0 views

Name "Constrain't" on Chrome

The X.509 Name Constraints extension is a powerful way to limit a certificate authority (CA) to only issue certificates for specific TLDs or domain names. Unfortunately, Google Chrome doesn’t currently enforce name constraints for user imported trust roots on Linux. Review of related blog posts shows that developers have a poor understanding of how this feature is implemented, which could have unintended security impact. While Chrome’s behavior is standards compliant, it is inconsistent with other web browsers, other TLS clients, and even Google Chrome running on different operating systems. Starting with the next Chrome release, enforcement of name constraints for trust roots will be enabled. Let’s say you run an internal network, probably using domains under . A public CA can’t issue HTTPS certificates for because this TLD is not part of the public, global DNS system. No one can prove ownership of these domain name names, so the public CAs don’t work here. If you’re building with HTTPS, you may try creating a private CA. Creating your own private CA is well documented online, so it’s easy to get started. Unfortunately, as I’ll explain, lots of documentation has omissions and security problems. Let’s start with this guide from DigitalOcean. DigitalOcean has some great documentation, so this looks like a good reference. Following the instructions, I can create a private CA and issue a certificate for . I’m going to use a Linux Mint 21.1 VM for this. Google Chrome version is (released March 2023) and Firefox is (released December 2022). The easiest way to confirm my work is to import the private CA into my browser trust store, start a web server, and load the page with Chrome. I’ll do this with: I can see that everything works in Chrome. Success! Unfortunately, I’ve actually created a private CA with way too much power. If I follow the instructions from the DigitalOcean post again, but using , I’ll also get a valid certificate. When I run with this new certificate, Chrome trusts it too. My concern is that compromise or misuse of the private CA makes it easier to perform a man-in-the-middle attack against any browser that trust that CA. The web browser will consider the misissued certificate as valid, so the user will have no indication that something bad has happened. Perhaps the issue is just that I picked the wrong guide, DigitalOcean hosts great content, but this one didn’t have what I wanted. Let’s choose a different blog, like this one which was written by a Google security engineer. This guide focuses on the X.509 certificate extension known as Name Constraints. A certificate can specify name constraints such that it can only issue certificates that fall under a specified TLD or domain name. That sounds like a great mitigation, it reduces the scope of impact for a compromised private CA. Let’s try it out. Following the instructions, I see openssl confirm that is valid and is invalid. Great, it really looks like I’ve done it. Our private CA can issue certificates for , but not . But I’d like to take one more verification step, I’ll check my certificates in Chrome. After all, my users will connect via a web browser, not using the OpenSSL command line tool. As expected, works: But, yikes! also works: Does Firefox see the same thing? No, Firefox says that the CA , which is what I had hoped Chrome would do. It sort of looks like a bug in Google Chrome’s TLS implementation. What’s actually happening here? This chromium issue has a great explanation of what’s going on. First, notice that both Firefox on Linux and Chromium on macOS reject the certificate. So this appears to be Chrome on Linux specific behavior. Ryan Sleevi gives a great answer about this, which I’ll quote: “It sounds like you’re placing nameConstraints on the root, which is not supported, not only in Chrome, but many major PKI implementations. That’s because RFC 5280 does not require such support; imported root certificates are treated as trust anchors (that is, only the Subject and SPKI are used, not other extensions)”, “this may be WontFix/WorkingAsIntended”, “Marking this as not a Security Bug.” Looking at the RFC text I see: I agree, a TLS client can choose to process or ignore name constraints. Either choice is standards compliant. Ryan also provides a workaround: “place the constraints on the intermediate” . Unfortunately, placing constraints on the intermediate leaves the root certificate unconstrained, so it’s a partial fix only for my use case. Instead of adding name constraints on the private CA’s root certificate, I could create an intermediary certificate and put the name constraint there. Then I’d issue certificates only with the intermediary cert. If the intermediate certificate is stolen, it could be used to issue additional certificates for the attacker. If the attacker issued certificates that violate the name constraints, a compliant client would reject them, mitigating the MITM attack risk. So we’ve limited the scope to just the constrained namespace. Unfortunately, a stolen root certificate behaves differently. Google Chrome on Linux won’t honor any name constraints on the root, so the attacker can issue certificates for any domain name. Web browsers will consider these certificates to be valid, and the MITM attack can proceed unnoticed. It should go without saying that all CA private keys, especially for the root, need to be handled with security best practices. Starting with Chrome 112 name constraints will be enforced for user imported trust roots. Chrome 112 is currently in beta, so let’s give it a try: Success! Chrome on Linux will soon behave the same as Chrome on macOS and Firefox. (Since this post was published, Chrome 112 has been reached the Stable channel). I’m happy Chrome is implementing this feature as it brings Chrome’s handling of X.509 certificates more inline with developer expectations. One remaining concern of mine stems from Ryan’s assessment of the TLS client landscape: “nameConstraints on the root, […] is not supported, not only in Chrome, but many major PKI implementations” . While Google Chrome will add support very soon, other fully standards compliant TLS clients may continue to lack support. A TLS client can defend their lack of support, citing that the RFC considers enforcement to be optional. It’s unclear to me which clients enforce name constraints on trust root and which don’t, so I’m wary to consider name constraints as more than a partial mitigation. Developers tend to test on one platform and assume that security behaviors will match on others, but this isn’t the case. Tracking behavioral differences of TLS implementations is a hard problem. Netflix created BetterTLS to track standards implementation of TLS clients. The results for Chromium show no failed tests, even though I clearly demonstrated enforcement was missing. I suspect this is because Chrome’s lack of enforcement is standards compliant, even if it’s undesired. My best guidance, for now, is to consider all private CAs as if they could issue certificates for the entire Internet. Treat the private keys of your private CAs with the utmost care to protect them from compromise or misuse. Avoid trusting a private CA unless you are sure the private keys will be properly protected. If it’s helpful to anyone, I’m sharing the private CAs and certificates I created. Some of these are password protected, the password is . Please don’t use this outside a VM or other isolated environment, these are leaked keys and they are unsafe. “Where a CA distributes self-signed certificates to specify trust anchor information, certificate extensions can be used to specify recommended inputs to path validation.” “Similarly, a name constraints extension could be included to indicate that paths beginning with this trust anchor should be trusted only for the specified name spaces.” “Implementations that use self-signed certificates to specify trust anchor information are free to process or ignore such information.”

0 views