GreatReads - Blog Aggregator · Phoenix Framework

0 views

Neil Madden 2 weeks ago

Monotonic Collections: a middle ground between immutable and fully mutable

This post covers several topics around collections (sets, lists, maps/dictionaries, queues, etc) that I’d like to see someone explore more fully. To my knowledge, there are many alternative collection libraries for Java and for many other languages, but I’m not aware of any that provide support for monotonic collections . What is a monotonic collection, I hear you ask? Well, I’m about to answer that. Jesus, give me a moment. It’s become popular, in the JVM ecosystem at least, for collections libraries to provide parallel class hierarchies for mutable and immutable collections: Set vs MutableSet, List vs MutableList, etc. I think this probably originated with Scala , and has been copied by Kotlin , and various alternative collection libraries, e.g. Eclipse Collections , Guava , etc. There are plenty of articles out there on the benefits and drawbacks of each type. But the gulf between fully immutable and fully mutable objects is enormous: they are polar opposites, with wildly different properties, performance profiles, and gotchas. I’m interested in exploring the space between these two extremes. (Actually, I’m interested in someone else exploring it, hence this post). One such point is the idea of monotonic collections, and I’ll now explain what that means. By monotonic I mean here logical monotonicity : the idea that any information that is entailed by some set of logical formulas is also entailed by any superset of those formulas. For a collection data structure, I would formulate that as follows: If any (non-negated) predicate is true of the collection at time t , then it is also true of the collection at any time t’ > t . For example, if c is a collection and c.contains(x) returns true at some point in time, then it must always return true from then onwards. To make this concrete, a MonotonicList (say) would have an append operation, but not insert , delete , or replace operations. More subtly, monotonic collections cannot have any aggregate operations: i.e., operations that report statistics/summary information on the collection as a whole. For example, you cannot have a size method, as the size will change as new items are added (and thus the predicate can become false). You can have (as I understand it) map and filter operations, but not a reduce / fold . So why are monotonic collections an important category to look at? Firstly, monotonic collections can have some of the same benefits as immutable data structures, such as simplified concurrency. Secondly, monotonic collections are interesting because they can be (relatively) easily made distributed, per the CALM principle: Consistency as Logical Monotonicity (insecure link, sorry). This says that monotonic collections are strongly eventually consistent without any need for coordination protocols. Providing such collections would thus somewhat simplify making distributed systems. Interestingly, Kotlin decided to make their mutable collection classes sub-types of the immutable ones: MutableList is a sub-type of List, etc. (They also decided to make the arrows go the other way from normal in their inheritance diagram, crazy kids). This makes sense in one way: mutable structures offer more operations than immutable ones. But it seems backwards from my point of view: it says that all mutable collections are immutable, which is logically false. (But then they don’t include the word Immutable in the super types). It also means that consumers of a List can’t actually assume it is immutable: it may change underneath them. Guava seems to make the opposite decision: ImmutableList extends the built-in (mutable) List type, probably for convenience. Both options seem to have drawbacks. I think the way to resolve this is to entirely separate the read-only view of a collection from the means to update it. On the view-side, we would have a class hierarchy consisting of ImmutableList, which inherits from MonotonicList, which inherits from the general List. On the mutation side, we’d have a ListAppender and ListUpdater classes, where the latter extends the former. Creating a mutable or monotonic list would return a pair of the read-only list view, and the mutator object, something like the following (pseudocode): The type hierarchies would look something like the following: This seems to satisfy allowing the natural sub-type relationships between types on both sides of the divide. It’s a sort of CQRS at the level of data structures, but it seems to solve the issue that the inheritance direction for read-only consumers is the inverse of the natural hierarchy for mutating producers. (This has a relationship to covariant/contravariant subtypes, but I’m buggered if I’m looking that stuff up again on my free time). Anyway, these thoughts are obviously pretty rough, but maybe some inklings of ideas if anyone is looking for an interesting project to work on.

Kotlin

Scala

0 views

Neil Madden 3 weeks ago

Fluent Visitors: revisiting a classic design pattern

It’s been a while since I’ve written a pure programming post. I was recently implementing a specialist collection class that contained items of a number of different types. I needed to be able to iterate over the collection performing different actions depending on the specific type. There are lots of different ways to do this, depending on the school of programming you prefer. In this article, I’m going to take a look at a classic “Gang of Four” design pattern: The Visitor Pattern . I’ll describe how it works, provide some modern spins on it, and compare it to other ways of implementing the same functionality. Hopefully even the most die-hard anti-OO/patterns reader will come away thinking that there’s something worth knowing here after all. (Design Patterns? In this economy?) The example I’ll use in this post is a simple arithmetic expression language. It’s the kind of boring and not very realistic example you see all the time in textbooks, but the more realistic examples I have to hand have too many weird details, so this’ll do. I’m going to write everything in Java 25. Java because, after Smalltalk, it’s probably the language most associated with design patterns. And Java 25 specifically because it makes this example really nice to write. OK, our expression language just has floating-point numbers, addition, and multiplication. So we start by defining datatypes to represent these: If you’re familiar with a functional programming language, this is effectively the same as a datatype definition like the following: Now we want to define a bunch of different operations over these expressions: evaluation, pretty-printing, maybe type-checking or some other kinds of static analysis. We could just directly expose the Expression sub-classes and let each operation directly traverse the structure using pattern matching. For example, we can add an method directly to the expression class that evaluates the expression: (Incidentally, isn’t this great? It’s taken a long time, but I really like how clean this is in modern Java). We can then try out an example: Which gives us: There are some issues with this though. Firstly, there’s no encapsulation. If we want to change the way expressions are represented then we have to change eval() and any other function that’s been defined in this way. Secondly, although it’s straightforward for this small expression language, there can be a lot of duplication in operations over a complex structure dealing with details of traversing that structure. The Visitor Pattern solves both of these issues, as we’ll show now. The basic Visitor Pattern involves creating an interface with callback methods for each type of object you might encounter when traversing a structure. For our example, it looks like the following: A few things to note here: The next part of the pattern is to add an method to the Expression class, which then traverses the data structure invoking the callbacks as appropriate. In the traditional implementation, this method is implemented on each concrete sub-class using a technique known as “double-dispatch”. For example, we could add an implementation of to the Add class that calls . This technique is still sometimes useful, but I find it’s often clearer to just inline all that into the top-level Expression implementation (as a method implementation, because Expression is an interface): What’s going on here? Firstly, the method is parameterised to accept any type of return value. Again, we’ll see why in a moment. It then inspects the specific type of expression of this object and calls the appropriate callback on the visitor. Note that in the Add/Mul cases we also recursively visit the left-hand-side and right-hand-side expressions first, similarly to how we called .eval() on those in the earlier listing. We can then re-implement our expression evaluator in terms of the visitor: OK, that works. But it’s kinda ugly compared to what we had. Can we improve it? Yes, we can. The Visitor is really just a set of callback functions, one for each type of object in our data structure. Rather than defining these callbacks as an implementation of the interface, we could instead define them as three separate lambda functions. We can then invoke these instead: We can then use this to reimplement our expression evaluator again: That’s a lot nicer to look at. We can then call it as before, and we can also use the fluent visitor to define operations on the fly, such as printing a nicer string representation: There are some potential drawbacks to this approach, but overall I think it’s really clean and nice. One drawback is that you lose compile-time checking that all the cases have been handled: if you forget to register one of the callbacks you’ll get a runtime NullPointerException instead. There are ways around this, such as using multiple FluentVisitor types that incrementally construct the callbacks, but that’s more work: That ensures that every callback has to be provided before you can call , at the cost of needing many more classes. This is the sort of thing where good IDE support would really help (IntelliJ plugin anyone?). Another easy-to-fix nit is that, if you don’t care about the result, it is easy to forget to call and thus not actually do anything at all. This can be fixed by changing the method to accept a function rather than returning a FluentVisitor: The encapsulation that the Visitor provides allows us to quite radically change the underlying representation, while still preserving the same logical view of the data. For example, here is an alternative implementation that works only on positive integers and stores expressions in a compact reverse Polish notation (RPN). The exact same visitors we defined for the previous expression evaluator will also work for this one: Hopefully this article has shown you that there is still something interesting about old patterns like the Visitor, especially if you adapt them a bit to modern programming idioms. I often hear fans of functional programming stating that the Visitor pattern only exists to make up for the lack of pattern matching in OO languages like Java. In my opinion, this is the wrong way to think about things. Even when you have pattern matching (as Java now does) the Visitor pattern is still useful due to the increased encapsulation it provides, hiding details of the underlying representation. The correct way to think about the Visitor pattern is as a natural generalisation of the reduce/fold operation common in functional programming languages. Consider the following (imperative) implementation of a left-fold operation over a list: We can think of a linked list as a data structure with two constructors: Nil (the empty list), and Cons(List, List). In this case, the reduce operation is essentially a Visitor pattern where corresponds to the case and corresponds to the case. So, far from being a poor man’s pattern matching, the true essence of the Visitor is a generalised fold operation, which is why it’s so useful. Maybe this old dog still has some nice tricks, eh? We use a generic type parameter <T> to allow operations to return different types of results depending on what they do. We’ll see how this works in a bit. In keeping with the idea of encapsulating details, we use the more abstract type rather than the concrete type we’re using under the hood. (We could also have done this before, but I’m doing it here to illustrate that the Visitor interface doesn’t have to exactly represent the underlying data structures).

0 views

Neil Madden 2 months ago

Rating 26 years of Java changes

I first started programming Java at IBM back in 1999 as a Pre-University Employee. If I remember correctly, we had Java 1.1.8 installed at that time, but were moving to Java 1.2 (“Java 2”), which was a massive release—I remember engineers at the time grumbling that the ever-present “ Java in a Nutshell ” book had grown to over 600 pages. I thought I’d take a look back at 26 years of Java releases and rate some of the language and core library changes (Java SE only) that have occurred over this time. It’s a very different language to what I started out with! I can’t possibly cover every feature of those releases , as there are just way too many. So I’m just going to cherry-pick some that seemed significant at the time, or have been in retrospect. I’m not going to cover UI- or graphics-related stuff (Swing, Java2D etc), or VM/GC improvements. Just language changes and core libraries. And obviously this is highly subjective. Feel free to put your own opinions in the comments! The descriptions are brief and not intended as an introduction to the features in question: see the links from the Wikipedia page for more background. NB: later features are listed from when they were first introduced as a preview. The Collections Framework : before the collections framework, there was just raw arrays, Vector, and Hashtable. It gets the job done, but I don’t think anyone thinks the Java collections framework is particularly well designed. One of the biggest issues was a failure to distinguish between mutable and immutable collections, strange inconsistencies like why Iterator as a remove() method (but not, say, update or insert), and so on. Various improvements have been made over the years, and I do still use it in preference to pulling in a better alternative library, so it has shown the test of time in that respect. 4/10 The keyword: I remember being somewhat outraged at the time that they could introduce a new keyword! I’m personally quite fond of asserts as an easy way to check invariants without having to do complex refactoring to make things unit-testable, but that is not a popular approach. I can’t remember the last time I saw an assert in any production Java code. 3/10 Regular expressions: Did I really have to wait 3 years to use regex in Java? I don’t remember ever having any issues with the implementation they finally went for. The Matcher class is perhaps a little clunky, but gets the job done. Good, solid, essential functionality. 9/10 “New” I/O (NIO): Provided non-blocking I/O for the first time, but really just a horrible API (still inexplicably using 32-bit signed integers for file sizes, limiting files to 2GB, confusing interface). I still basically never use these interfaces except when I really need to. I learnt Tcl/Tk at the same time that I learnt Java, and Java’s I/O always just seemed extraordinarily baroque for no good reason. Has barely improved in 2 and a half decades. 0/10 Also notable in this release was the new crypto APIs : the Java Cryptography Extensions (JCE) added encryption and MAC support to the existing signatures and hashes, and we got JSSE for SSL. Useful functionality, dr eadful error-prone APIs . 1/10 Absolutely loads of changes in this release. This feels like the start of modern Java to me. Generics : as Go discovered on its attempt to speed-run Java’s mistakes all over again, if you don’t add generics from the start then you’ll have to retrofit them later, badly. I wouldn’t want to live without them, and the rapid and universal adoption of them shows what a success they’ve been. They certainly have complicated the language, and there are plenty of rough edges (type erasure, reflection, etc), but God I wouldn’t want to live without them. 8/10 . Annotations: sometimes useful, sometimes overused. I know I’ve been guilty of abusing them in the past. At the time it felt like they were ushering a new age of custom static analysis, but that doesn’t really seem to be used much. Mostly just used to mark things as deprecated or when overriding a method. Meh. 5/10 Autoboxing: there was a time when, if you wanted to store an integer in a collection, you had to manually convert to and from the primitive int type and the Integer “boxed” class. Such conversion code was everywhere. Java 5 got rid of that, by getting the compiler to insert those conversions for you. Brevity, but no less inefficient. 7/10 Enums : I’d learned Haskell by this point, so I couldn’t see the point of introducing enums without going the whole hog and doing algebraic datatypes and pattern-matching. (Especially as Scala launched about this time). Decent feature, and a good implementation, but underwhelming. 6/10 Vararg methods: these have done quite a lot to reduce verbosity across the standard library. A nice small improvement that’s had a good quality of life enhancement. I still never really know when to put @SafeVarargs annotations on things though. 8/10 The for-each loop: cracking, use it all the time. Still not a patch on Tcl’s foreach (which can loop over multiple collections at once), but still very good. Could be improved and has been somewhat replaced by Streams. 8/10 Static imports: Again, a good simple change. I probably would have avoided adding * imports for statics, but it’s quite nice for DSLs. 8/10 Doug Lea’s java.util.concurrent etc : these felt really well designed. So well designed that everyone started using them in preference to the core collection classes, and they ended up back-porting a lot of the methods. 10/10 After the big bang of Java 5, Java 6 was mostly performance and VM improvements, I believe, so we had to wait until 2011 for more new language features. Strings in switch: seems like a code smell to me. Never use this, and never see it used. 1/10 Try-with-resources : made a huge difference in exception safety. Combined with the improvements in exception chaining (so root cause exceptions are not lost), this was a massive win. Still use it everywhere. 10/10 Diamond operator for type parameter inference: a good minor syntactic improvement to cut down the visual noise. 6/10 Binary literals and underscores in literals: again, minor syntactic sugar. Nice to have, rarely something I care about much. 4/10 Path and Filesystem APIs: I tend to use these over the older File APIs, but just because it feels like I should. I couldn’t really tell you if they are better or not. Still overly verbose. Still insanely hard to set file permissions in a cross-platform way. 3/10 Lambdas: somewhat controversial at the time. I was very in favour of them, but only use them sparingly these days, due to ugly stack traces and other drawbacks. Named method references provide most of the benefit without being anonymous. Deciding to exclude checked exceptions from the various standard functional interfaces was understandable, but also regularly a royal PITA. 4/10 Streams: Ah, streams. So much potential, but so frustrating in practice. I was hoping that Java would just do the obvious thing and put filter/map/reduce methods onto Collection and Map, but they went with this instead. The benefits of functional programming weren’t enough to carry the feature, I think, so they had to justify it by promising easy parallel computing. This scope creep enormously over-complicated the feature, makes it hard to debug issues, and yet I almost never see parallel streams being used. What I do still see quite regularly is resource leaks from people not realising that the stream returned from Files.lines() has to be close()d when you’re done—but doing so makes the code a lot uglier. Combine that with ugly hacks around callbacks that throw checked exceptions, the non-discoverable API (where are the static helper functions I need for this method again?), and the large impact on lots of very common code, and I have to say I think this was one of the largest blunders in modern Java. I blogged what I thought was a better approach 2 years earlier, and I still think it would have been better. There was plenty of good research that different approaches were better , since at least Oleg Kiselyov’s work in the early noughties . 1/10 Java Time: Much better than what came before, but I have barely had to use much of this API at all, so I’m not in a position to really judge how good this is. Despite knowing how complex time and dates are, I do have a nagging suspicion that surely it doesn’t all need to be this complex? 8/10 Modules: I still don’t really know what the point of all this was. Enormous upheaval for minimal concrete benefit that I can discern. The general advice seems to be that modules are (should be) an internal detail of the JRE and best ignored in application code (apart from when they spuriously break things). Awful. -10/10 (that’s minus 10!) jshell: cute! A REPL! Use it sometimes. Took them long enough. 6/10 The start of time-based releases, and a distinct ramp-up of features from here on, trying to keep up with the kids. Local type inference (“var”) : Some love this, some hate it. I’m definitely in the former camp. 9/10 New HTTP Client : replaced the old URL.openStream() approach by creating something more like Apache HttpClient. It works for most purposes, but I do find the interface overly verbose. 6/10 This release also added TLS 1.3 support, along with djb-suite crypto algorithms. Yay. 9/10 Switch expressions : another nice mild quality-of-life improvement. Not world changing, but occasionally nice to have. 6/10 Text blocks: on the face of it, what’s not to like about multi-line strings? Well, apparently there’s a good reason that injection attacks remain high on the OWASP Top 10, as the JEP introducing this feature seemed intent on getting everyone writing SQL, HTML and JavaScript using string concatenation again. Nearly gave me a heart attack at the time, and still seems like a pointless feature. Text templates (later) are trying to fix this, but seem to be currently in limbo . 3/10 Pattern matching in : a little bit of syntactic sugar to avoid an explicit cast. But didn’t we all agree that using was a bad idea decades ago? I’m really not sure who was doing the cost/benefit analysis on these kinds of features. 4/10 Records: about bloody time! Love ‘em. 10/10 Better error messages for NullPointerExceptions: lovely. 8/10 Sealed classes: in principal I like these a lot. We’re slowly getting towards a weird implementation of algebraic datatypes. I haven’t used them very much yet so far. 8/10 EdDSA signatures: again, a nice little improvement in the built-in cryptography. Came with a rather serious bug though… 8/10 Vector (SIMD) API: this will be great when it is finally done, but still baking several years later. ?/10 Pattern matching switch: another piece of the algebraic datatype puzzle. Seems somehow more acceptable than instanceof, despite being largely the same idea in a better form. 7/10 UTF-8 by default: Fixed a thousand encoding errors in one fell swoop. 10/10 Record patterns: an obvious extension, and I think we’re now pretty much there with ADTs? 9/10 Virtual threads: being someone who never really got on with async/callback/promise/reactive stream-based programming in Java, I was really happy to see this feature. I haven’t really had much reason to use them in anger yet, so I don’t know how well they’ve been done. But I’m hopeful! ?/10 String templates: these are exactly what I asked for in A few programming language features I’d like to see , based on E’s quasi-literal syntax, and they fix the issues I had with text blocks. Unfortunately, the first design had some issues, and so they’ve gone back to the drawing board. Hopefully not for too long. I really wish they’d not released text blocks without this feature. 10/10 (if they ever arrive). Sequenced collections: a simple addition that adds a common super-type to all collections that have a defined “encounter order”: lists, deques, sorted sets, etc. It defines convenient getFirst() and getLast() methods and a way to iterate items in the defined order or in reverse order. This is a nice unification, and plugs what seems like an obvious gap in the collections types, if perhaps not the most pressing issue? 6/10 Wildcards in patterns: adds the familiar syntax from Haskell and Prolog etc of using as a non-capturing wildcard variable in patterns when you don’t care about the value of that part. 6/10 Simplified console applications: Java finally makes simple programs simple for beginners, about a decade after universities stopped teaching Java to beginners… Snark aside, this is a welcome simplification. 8/10 This release also adds support for KEMs , although in the simplest possible form only. Meh. 4/10 The only significant change in this release is the ability to have statements before a call to super() in a constructor. Fine. 5/10 Primitive types in patterns: plugs a gap in pattern matching. 7/10 Markdown javadoc comments: Does anyone really care about this? 1/10 The main feature here from my point of view as a crypto geek is the addition of post-quantum cryptography in the form of the newly standardised ML-KEM and ML-DSA algorithms, and support in TLS. Stable values: this is essentially support for lazily-initialised final variables. Lazy initialisation is often trickier than it should be in Java, so this is a welcome addition. Remembering Alice ML , I wonder if there is some overlap between the proposed StableValue and a Future? 7/10 ? PEM encoding of cryptographic objects is welcome from my point of view, but someone will need to tell me why this is not just ? Decoding support is useful though, as that’s a frequent reason I have to grab Bouncy Castle still. 7/10 Well, that brings us pretty much up to date. What do you think? Agree, disagree? Are you a passionate defender of streams or Java modules? Have at it in the comments.

Java Haskell

HTML

0 views

Neil Madden 4 months ago

No, no, no. You’re still not doing REST right!

OK, so you’ve made your JSON-over-HTTP API. Then someone told you that it’s not “really” REST unless it’s hypertext-driven. So now all your responses contain links, and you’re defining mediatypes properly and all that stuff. But I’m here to tell you that you’re still not doing it right. What you’re doing now is just “HYPE”. Now I’ll let you in on the final secret to move from HYPE to REST. OK, I’m joking here. But there is an aspect of REST that doesn’t seem to ever get discussed despite the endless nitpicking over what is and isn’t really REST. And it’s an odd one, because it’s literally the name: Representational State Transfer. I remember this being quite widely discussed in the early 2000s when REST was catching on, but seems to have fallen by the wayside in favour of discussion of other architectural decisions. If you’re familiar with OO design, then when you come to design an API you probably think of some service that encapsulates a bunch of state. The service accepts messages (method calls) that manipulate the internal state, from one consistent state to another. That internal state remains hidden and the service just returns bits of it to clients as needed. Clients certainly don’t directly manipulate that state. If you need to perform multiple manipulations then you make multiple requests (multiple method calls). But the idea of REST is to flip that on its head. If a client wants to update the state, it makes a request to the server, which generates a representation of the state of the resource and sends it to the client. Then client then locally makes whatever changes it wants, and then sends the updated representation back to the server. Think of checking out a file from Git, making changes and then pushing the changes back to the server. (Can you imagine instead having to send individual edit commands to make changes?) This was a stunning “ strange inversion of reasoning ” to me at the time, steeped as I was in OO orthodoxy. My first reaction was largely one of horror. But I’d missed the key word “representation” in the description. Returning a representation of the state doesn’t mean it has to directly represent the state as it is stored on the server, it just has to be some logically appropriate representation. And that representation doesn’t have to represent every detail: it can be a summary, or more abstract representation. Is it a good idea? I’ll leave that for you to decide. I think it makes sense in some cases, not in others. I’m more just interested in how this whole radical aspect of REST never gets mentioned anymore. It suggests to me a much more declarative conception of API design, whereas even the most hypertext-driven APIs I see tend to still have a very imperative flavour. Thoughts?

API

JSON

Web Development

0 views

Neil Madden 5 months ago

Streaming public key authenticated encryption with insider auth security

Note: this post will probably only really make sense to cryptography geeks. In “When a KEM is not enough ”, I described how to construct multi-recipient (public key) authenticated encryption. A naïve approach to this is vulnerable to insider forgeries: any recipient can construct a new message (to the same recipients) that appears to come from the original sender. For some applications this is fine, but for many it is not. Consider, for example, using such a scheme to create auth tokens for use at multiple endpoints: A and B. Alice gets an auth token for accessing endpoints A and B and it is encrypted and authenticated using the scheme. The problem is, as soon as Alice presents this auth token to endpoint A, that endpoint (if compromised or malicious) can use it to construct a new auth token to access endpoint B, with any permissions it likes. This is a big problem IMO. I presented a couple of solutions to this problem in the original blog post. The most straightforward is to sign the entire message, providing non-repudiation. This works, but as I pointed out in “ Digital signatures and how to avoid them ”, signature schemes have lots of downsides and unintended consequences. So I developed a weaker notion of “insider non-repudiation”, and a scheme that achieves it: we use a compactly-committing symmetric authenticated encryption scheme to encrypt the message body, and then include the authentication tag as additional authenticated data when wrapping the data encryption key for each recipient. This prevents insider forgeries, but without the hammer of full blown outsider non-repudiation, with the problems it brings. I recently got involved in a discussion on Mastodon about adding authenticated encryption to Age (a topic I’ve previously written about ), where abacabadabacaba pointed out that my scheme seems incompatible with streaming encryption and decryption, which is important in Age use-cases as it is often used to encrypt large files. Age supports streaming for unauthenticated encryption, so it would be useful to preserve this for authenticated encryption too. Doing this with signatures is fairly straightforward: just sign each “chunk” individually. A subtlety is that you also need to sign a chunk counter and “last chunk” bit to prevent reordering and truncation, but as abacabadabacaba points out these bits are already in Age, so its not too hard. But can you do the same without signatures? Yes, you can, and efficiently too. In this post I’ll show how. One way of thinking about the scheme I described in my previous blog post is to think of it as a kind of designated-verifier signature scheme. (I don’t hugely like this term, but it’s useful here). That is, we can view the combination of the committing MAC and authenticated KEM as a kind of signature scheme where the signature can only be verified by recipients chosen by the sender, not by third-parties. If we take that perspective, then it becomes clear that we can just do exactly the same as you do for the normal signature scheme: simply sign each chunk of the message separately, and include some chunk counter + last chunk marker in the signature. How does this work in practice? Firstly, we generate a fresh random data encryption key (DEK) for the message. This is shared between all recipients. We then use this DEK to encrypt each chunk of the message separately using our compactly-committing AEAD. To prevent chunk reordering or truncation we can use the same method as Age: Rogaway’s STREAM construction , which effectively just encodes the chunk counter and last-chunk bit into the AEAD nonce. (Personally, I prefer using a symmetric ratchet instead, but that’s for another post). This will produce a compactly committing tag for each chunk—typically 16 bytes per chunk (or 32 bytes if we care about malicious senders ). The original scheme I proposed then fed this tag (of which there was only 1) as associated data when wrapping the DEK for each recipient using an authenticated key-wrap algorithm and a per-recipient wrapping-key derived from an authenticated KEM. If the DEK is 32 bytes, and the key-wrapping algorithm produces a 16-byte tag then this outputs 48 bytes per recipient. We can do exactly the same thing for the new scheme, but we only feed the tag from the first chunk as the associated data, producing wrapped keys that commit to the first chunk only. We then simply repeat the process for each subsequent chunk, but as the DEK is unchanged we can leave it empty: effectively just computing a MAC over the commitment for each chunk in turn. In our example, this will produce just a 16-byte output per recipient for each chunk. If we compare this to typical signature schemes that would be used for signing chunks otherwise, we can fit 4 recipient commitments in the same space as a single Ed25519 signature (64 bytes), or 16 recipients in the same space as an RSA-2048 signature. To support such a scheme, the interface of our KEM would need to change to include a new operation that produces an intermediate commitment to a particular chunk tag, with an indication of whether it is the last tag or not. The KEM is then free to reuse the shared secrets derived for each recipient, avoiding the overhead of computing new ones for each chunk. This is an efficiency gain over using a normal digital signature for each chunk. Here is a sketch of what the overall process would look like, to hopefully clarify the ideas presented. Alice is sending a message to Bob and Charlie. The message consists of three “chunks” and is using an authenticated KEM based on X25519. The total space taken is then 128 + 32 + 32 = 192 bytes, and we can remove the 3 original 16-byte AEAD tags, giving a net overhead of just 144 bytes. Compared to signing each chunk with Ed25519 which would need 192 bytes, or RSA-2048 which needs 768 bytes. Decryption then performs the obvious operations: decrypting and recomputing the MAC tag for each chunk using the decapsulated DEK and then verifying the commitment blocks match at the end of each subsequent chunk. This is still very much a sketch, and needs to be reviewed and fleshed out more. But I believe this is quite a neat scheme that achieves streaming authenticated encryption without the need for tricksy little signatureses , and potentially much more efficient too. First, Alice generates a random 32-byte DEK and uses it to encrypt the message, producing tags t 1 , t 2 , and t 3 . Alice generates a random ephemeral X25519 keypair: ( esk , epk ). She computes a shared secret with Bob, ss b , via something like HPKE’s DH-AKEM. Likewise, she computes a shared secret with Charlie, ss c . Alice then wraps the DEK from step 1 for Bob and Charlie, using a key-wrap algorithm like AES in SIV mode (keyed with ss b and then ss c ), including t 1 as additional authenticated data. She outputs the two wrapped keys plus the ephemeral public key ( epk ) as the encapsulated key blob. This will be 32 bytes for the epk, plus 48 bytes for each key blob (one for Bob, another for Charlie), giving 128 bytes total. She then calls a new “commit” operation on the KEM for each subsequent chunk tag: i.e., t 2 and t 3 . This commit operation performs the same as step 5, but with a blank DEK, outputting just a 16-byte SIV for each recipient for a total of 32 bytes per chunk. These commitment blocks can then be appended to each chunk. (In fact, they can replace the normal AEAD tag, saving even more space).

Crypto

0 views

Neil Madden 5 months ago

Are we overthinking post-quantum cryptography?

tl;dr: yes, contra thingamajig’s law of wotsits . Before the final nail has even been hammered on the coffin of AI, I hear the next big marketing wave is “quantum”. Q uantum computing promises to speed up various useful calculations, but is also potentially catastrophic to widely-deployed public key cryptography. Shor’s algorithm for a quantum computer, if realised, will break the hard problems underlying RSA, Diffie-Hellman, and Elliptic Curve cryptography—i.e., most crypto used for TLS, SSH and so on. Although “cryptographically-relevant” quantum computers (CRQCs) still seem a long way off ( optimistic roadmap announcements and re-runs of previously announced “breakthroughs” notwithstanding), for some applications the risk is already real. In particular, if you are worried about nation-states or those with deep pockets, the threat of “store-now, decrypt-later” attacks must be considered. It is therefore sensible to start thinking about deploying some form of post-quantum cryptography that protects against these threats. But what, exactly? If you are following NIST’s post-quantum crypto standardisation efforts, you might be tempted to think the answer is “everything”. NIST has now selected multiple post-quantum signature schemes and public key encryption algorithms (“ KEMs ”), and is evaluating more. Many application and protocol standards are then building on top of these with the assumption that post-quantum crypto should either replace all the existing crypto, or else at least ride alongside it everywhere in a “hybrid” configuration. We have proposals for post-quantum certificates , post-quantum ciphersuites for TLS , for SSH , for Signal and so on. From my view on the sidelines, it feels like many cryptographers are pushing to entirely replace existing “classical” cryptographic algorithms entirely with post-quantum replacements, with the idea that we would turn off the classical algorithms somewhere down the line. Unfortunately, many of the proposed post-quantum cryptographic primitives have significant drawbacks compared to existing mechanisms, in particular producing outputs that are much larger. For signatures, a state of the art classical signature scheme is Ed25519, which produces 64-byte signatures and 32-byte public keys, while for widely-used RSA-2048 the values are around 256 bytes for both. Compare this to the lowest security strength ML-DSA post-quantum signature scheme, which has signatures of 2,420 bytes (i.e., over 2kB!) and public keys that are also over a kB in size (1,312 bytes). For encryption, the equivalent would be comparing X25519 as a KEM (32-byte public keys and ciphertexts) with ML-KEM-512 (800-byte PK, 768-byte ciphertext). What is the practical impact of this? For some protocols, like TLS, the impact is a bit painful but doable, and post-quantum hybrid ciphersuites are already being rolled out. But there is a long tail of other technologies that have yet to make the transition. For example, consider JWTs, with which I am intimately familiar (in a Stockholm Syndrome way). The signature of a JWT is base64-encoded, which adds an extra 33% size compared to raw binary. So, for Ed25519 signatures, we end up with 86 bytes, after encoding. For ML-DSA, the result is 3,227 bytes. If you consider that browsers typically impose a 4kB maximum size for cookies per-domain, that doesn’t leave a lot of room left for actual data. If you wanted to also encrypt that JWT, then the base64-encoded content (including the signature) is encrypted and then base64-encoded again, resulting in a signature that is already over the 4kB limit on its own. None of this should be taken as an endorsement of JWTs for cookies, or of the design decisions in the JWT specs, but rather it’s just an example of a case where replacing classical algorithms with post-quantum equivalents looks extremely hard. In my own view, given that the threat from quantum computers is at best uncertain and has potentially already stalled (see image below), the level of effort we invest in replacing already deployed crypto with something new needs to be proportional to the risk. In a list of things that keep me up at night as a security engineer, quantum computing would be somewhere on the second or third page. There is, IMO a not-insignificant chance that CRQCs never materialise, and so after a few years we actually roll back entirely to pre-quantum cryptographic algorithms because they are just better. For some applications (such as Signal) that risk profile is quite different, and it is right that they have invested effort into moving to PQC already, but I think for most organisations this is not the case. What would a Minimum Viable Post-Quantum Cryptography transition look like? One that protects against the most pressing threats in a way that is minimally disruptive. I believe such a solution would involve making two trade-offs: To my eyes, this is the obvious first step to take and is potentially the only step we need to take. But this approach seems at odds with where PQC standardisation is heading currently. For example, if we adopt this approach then code-based approaches such as Classic McEliece seem much more attractive than the alternatives currently being standardised by NIST. The main drawback of McEliece is that it has massive public keys (261kB for the lowest security parameters, over 1MB for more conservative choices). But in exchange for that you get much smaller ciphertexts: between 96 and 208 bytes. This is much less than the other lattice-based KEMs, and somewhere between elliptic curves and RSA in size. For many applications of JWTs, which already use static or rarely-updated keys, not to mention OAuth, SAML, Age , PGP, etc this seems like an entirely sensible choice and essentially a low-risk drop-in replacement. Continue using pre-quantum signatures or Diffie-Hellman based authentication mechanism , and layer Classic McEliece on top. A hybrid X25519-McEliece KEM could use as little as 128 bytes for ciphertext—roughly half the size of a typical RSA equivalent and way less than ML-KEM, and could also support (pre-quantum) authentication as an Authenticated KEM by hybridisation with an existing DH-AKEM , avoiding the need for signatures at all . This is the approach I am taking to PQC in my own upcoming open source project, and it’s an approach I’d like to see in JWT-land too (perhaps by resurrecting my ECDH-1PU proposal , which avoids many JWT-specific pitfalls associated with alternative schemes). If there’s enough interest perhaps I’ll find time to get a draft together. Firstly, to commit only to post-quantum confidentiality and ignore any threats to authenticity/integrity from quantum computers until it is much more certain that CRQCs are imminent. That is, we transition to (hybrid) post-quantum encryption to tackle the store-now, decrypt-later threat, but ignore post-quantum signature schemes. We will still have classical authentication mechanisms. Secondly, we implement the absolute bare minimum needed to protect against that store-now, decrypt-later threat: that is, simple encryption with static keys. Any stronger properties, such as forward secrecy, or post-compromise recovery, are left to existing pre-quantum algorithms such as elliptic curve crypto. This largely eliminates the need to transfer bulky PQ public keys over the network, as we can share them once (potentially out-of-band) and then reuse them for a long time.

Security Crypto

0 views

Neil Madden 5 months ago

A look at CloudFlare’s AI-coded OAuth library

I decided today to take a look at CloudFlare’s new OAuth provider library , which they apparently coded almost entirely with Anthropic’s Claude LLM: This library (including the schema documentation) was largely written with the help of Claude , the AI model by Anthropic. Claude’s output was thoroughly reviewed by Cloudflare engineers with careful attention paid to security and compliance with standards. Many improvements were made on the initial output, mostly again by prompting Claude (and reviewing the results). Check out the commit history to see how Claude was prompted and what code it produced. […] To emphasize, this is not “vibe coded” . Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs. I was trying to validate my skepticism. I ended up proving myself wrong. I have done a fair amount of LLM-assisted “agentic” coding of this sort recently myself. I’m also an expert in OAuth, having written API Security in Action , been on the OAuth Working Group at the IETF for years, and previously been the tech lead and then security architect for a leading OAuth provider . (I also have a PhD in AI from an intelligent agents group , but that predates the current machine learning craze). So I was super interested to see what it had produced, so I took a look while sitting in some meetings today. Disclaimer: I’ve only had a brief look and raised a few bugs, not given it a full review. Initially, I was fairly impressed by the code. The code is all in one file, which is common from my experience from LLM coding, but it’s fairly well structured without too many of the useless comments that LLMs love to sprinkle over a codebase, and some actual classes and higher-level organisation. There are some tests, and they are OK, but they are woefully inadequate for what I would expect of a critical auth service. Testing every MUST and MUST NOT in the spec is a bare minimum, not to mention as many abuse cases as you can think of, but none of that is here from what I can see: just basic functionality tests. (From a cursory look at the code, I’d say there are probably quite a few missing MUST checks, particularly around validating parameters, which is pretty light in the current implementation). The first thing that stuck out for me was what I like to call “YOLO CORS”, and is not that unusual to see: setting CORS headers that effectively disable the same origin policy almost entirely for all origins: There are cases where this kind of thing is OK, and I haven’t looked in detail at why they’ve done this, but it looks really suspicious to me. You should almost never need to do this. In this case, the commit log reveals that it was the humans that decided on this approach, not the LLM. They haven’t enabled credentials at least, so the sorts of problems this usually results in probably don’t apply. Talking of headers, there is a distinct lack of standard security headers in the responses produced. Many of these don’t apply to APIs, but some do (and often in surprising ways). For example, in my book I show how to exploit an XSS vulnerability against a JSON API: just because you’re returning well-formed JSON doesn’t mean that’s how a browser will interpret it. I’m not familiar with CloudFlare Workers, so maybe it adds some of these for you, but I’d expect at least an header and HTTP Strict Transport Security to protect the bearer tokens being used. There are some odd choices in the code, and things that lead me to believe that the people involved are not actually familiar with the OAuth specs at all. For example, this commit adds support for public clients , but does so by implementing the deprecated “implicit” grant ( removed in OAuth 2.1 ). This is absolutely not needed to support public clients, especially when the rest of the code implements PKCE and relaxes CORS anyway. The commit message suggests that they didn’t know what was needed to support public clients and so asked Claude and it suggested the implicit grant. The implicit grant is hidden behind a feature flag, but that flag is only checked in an entirely optional helper method for parsing the request, not at the point of token issuance. Another hint that this is not written by people familiar with OAuth is that they have implemented Basic auth support incorrectly . This is a classic bug in OAuth provider implementations because people (and LLMs, apparently) assume that it is just vanilla Basic auth, but OAuth adds a twist of URL-encoding everything first (because charsets are a mess). Likewise, the code has a secondary bug if you have a colon in the client secret (allowed by the spec). I don’t think either of these are issues for this specific implementation, because it always generates client IDs and secrets and so can control the format, but I haven’t looked in detail. A more serious bug is that the code that generates token IDs is not sound: it generates biased output . This is a classic bug when people naively try to generate random strings, and the LLM spat it out in the very first commit as far as I can see. I don’t think it’s exploitable: it reduces the entropy of the tokens, but not far enough to be brute-forceable. But it somewhat gives the lie to the idea that experienced security professionals reviewed every line of AI-generated code. If they did and they missed this, then they were way too trusting of the LLM’s competence. (I don’t think they did: according to the commit history, there were 21 commits directly to main on the first day from one developer, no sign of any code review at all). I had a brief look at the encryption implementation for the token store. I mostly like the design! It’s quite smart. From the commit messages, we can see that the design came from the human engineers, but I was quite impressed by the implementation. It’s worth reproducing the commit message from this work here, which shows the engineer’s interactions with Claude to get the desired code implemented: Ask Claude to store the props encrypted. prompt: I would like to encrypt the `props` stored in `Grant` and `Token` records. It should be encrypted such that you need a valid token to decrypt. This is a bit tricky since there are multiple valid tokens over time: there’s the authorization code, the refresh tokens (which rotate), and individual access tokens. We don’t want to repeatedly re-encrypt `props`. Instead, we should encrypt in once, with a symmetric key, and then we should store that key “wrapped” for each token, while the token is valid. Please use WebCrypto to implement all cryptography. Claude started on the wrong track making me realize I forgot an important design consideration: prompt: One thing I forgot to note: The `listUserGrants()` helper function will no longer be able to return the `props`, since it doesn’t have any token with which to decript it. That’s OK: `props` need only be delivered to the app upon an authorized API request. We should actually change `listUserGrants()` to make it return a narrower representation of a grant. Right now it returns the entire grant record from storage, but we really only need it to return `id`, `clientId`, `userId`, `scope`, `metadata`, and `createdAt`. We don’t need to return any of the token IDs or code challenge information. Claude produced beautiful code with one big flaw. prompt: There’s a security flaw in the way you wrap keys for tokens: You used a SHA-256 hash of the token as the key material for the wrapping. However, SHA-256 is also how we compute “token IDs”. With this construction, someone would be able to unwrap the keys using only the token ID, which is stored alongside the wrapped keys, hence all keys can be trivially unwrapped. To fix this, we need to compute the hash differently when computing the key material for wrapping, in such a way that it’s not possible to derive the key material from the token ID. Claude initially tried to solve this by switching to using PBKDF2 with 100,000 iterations to derive the key material. prompt: PDKDF2 with 100000 iterations would be very expensive. This would be important if the input were a low-entropy password, but is not necessary for high-entropy input. Instead of PBKDF2, let’s use a SHA-256 HMAC, with a static HMAC key (which essentially acts as the “salt”). Claude produced code that used a string “OAUTH_PROVIDER_WRAPPING_KEY_HMAC_v1” as the HMAC key. prompt: This looks pretty good, but for performance, let’s define WRAPPING_KEY_HMAC_KEY as a 32-byte array, so that it doesn’t have to be encoded or hashed down to the right size (as HMAC would do for larger keys). Here are 32 bytes of hex which I have chosen randomly, to use as the HMAC key: 22 7e 26 86 8d f1 e1 6d 80 70 ea 17 97 5b 47 a6 82 18 fa 87 28 ae de 85 b5 1d 4a d9 96 ca ca 43 (NB: using a hard-coded “key” here is fine: it’s essentially HKDF-Extract with a fixed random salt, which is fine and dandy for this use-case. The security property we’re looking for here is that the two uses are independent random oracles , for which this is a decent design. I would maybe use the same approach for generating the token ID too, with a different salt, but that’s a minor tweak). What this interaction shows is how much knowledge you need to bring when you interact with an LLM. The “one big flaw” Claude produced in the middle would probably not have been spotted by someone less experienced with crypto code than this engineer obviously is. And likewise, many people would probably not have questioned the weird choice to move to PBKDF2 as a response: LLMs really do not “reason” in any real way. As a first cut of an OAuth library, it’s not bad, but I wouldn’t really recommend it for use yet . In my experience, it is very hard to build a correct and secure OAuth provider implementation, and it deserves way more time and attention than has clearly gone into this one (yet). IMO, it’s not an appropriate domain for testing out an LLM. At ForgeRock, we had hundreds of security bugs in our OAuth implementation, and that was despite having 100s of thousands of automated tests run on every commit, threat modelling, top-flight SAST/DAST, and extremely careful security review by experts. The idea that you can get an LLM to knock one up for you is not serious. The commit history of this project is absolutely fascinating. The engineers clearly had a good idea of many aspects of the design, and the LLM was tightly controlled and produced decent code. (LLMs are absolutely good at coding in this manner). But it still tried to do some stupid stuff, some of which were caught by the engineers, some were not. I’m sure some are still in there. Is this worse than if a human had done it? Probably not. Many of these same mistakes can be found in popular Stack Overflow answers, which is probably where Claude learnt them from too. But I know many engineers who would have done a better job, because they are extremely diligent. Code like this needs careful attention. Details matter. Yes, this does come across as a bit “vibe-coded”, despite what the README says, but so does a lot of code I see written by humans. LLM or not, we have to give a shit. What I am taking away from my experience with LLMs, and from reviewing this project is this: you need to have a clear idea in your head of the kind of code you’re expecting the LLM to produce to be able to judge whether it did a good job. Often, to really know what that looks like, and engage your “System 2” thinking (so you’re not just accepting what’s in front of you as the best way to do things), you need to have built one yourself first. For trivial things where I don’t really care how it’s done, then sure, I’m happy to let an LLM do whatever it likes. But for important things, like my fucking auth system , I’d much rather do it myself and be sure that I really thought about it.

JSON

0 views

Neil Madden 12 months ago

The square roots of all evil

Every programmer knows Donald Knuth’s famous quote that “ premature optimization is the root of all evil ”, from his 1974 Turing Award lecture (pdf) . A fuller quotation of the surrounding context gives a rounder view: I am sorry to say that many people nowadays are condemning program efficiency, telling us that it is in bad taste. The reason for this is that we are now experiencing a reaction from the time when efficiency was the only reputable criterion of goodness, and programmers in the past have tended to be so preoccupied with efficiency that they have produced needlessly complicated code; the result of this unnecessary complexity has been that net efficiency has gone down, due to difficulties of debugging and maintenance. The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming. A lot has been written about this topic and exactly what constitutes “premature” optimisation. I don’t want to rehash those debates, but instead to rephrase them. It is not premature optimisation as such that the root of all evil, but rather premature specialisation . Optimisation almost always involves making some assumptions about the context in which a program will run. At a micro level, concerns about CPU architecture, memory hierarchy, caches, what features have hardware acceleration, etc. At a macro level, even choice of big-O algorithm may require assumptions to be made about data: e.g., these data points are all integers in a certain range so I can sort them in linear time: but will that always be true? Maybe I need to dynamically pick which code to use based on what environment I’m running in, so that I can run the most efficient code for that CPU/etc. These assumptions all lead to special-case code, tailored to that context. It is that specialisation that leads to complexity and inflexibility, rather than optimisation per se. Programming is frequently a ccidentally quadratic, and we were all taught in school that quadratic equations can have two roots. So it is with evil in programming, and I would like to propose that there is another root of all evil that deserves to be called out: premature generalisation . Premature generalisation is the flip side of premature specialisation, but can be just as dangerous to the health of a codebase. We have all laughed at Enterprise FizzBuzz jokes, with their FactoryBuilderFactoryStrategies and so on. And I’m sure many of us have seen codebases affected by such monstrosities, where an “enterprise architect” applied a Pokémon “gotta catch ‘em all” approach to Martin Fowler’s blog. But this is just an extreme (although sadly common) example. Examples of premature generalisation are everywhere. As a security and cryptography engineer, I see the same mindset in tools like PGP or JWTs (JOSE): overly-complex and easy to screw up footguns because they try to cover too many use-cases in a single tool. The reaction has been the development of special-purpose tools like A ge or minisign , that follow the Unix philosophy of “do one thing well”. On a perhaps more controversial level, I would also point to the (over-)use of category theory as an example of premature generalisation. Sure, maybe your widget could be a Monad, but does that actually solve a concrete problem you have right now? Might you want to change the implementation in future in ways that make it harder to maintain the Monad contract? When I was a PhD student, I was interested in logical approaches to AI. The big deal at the time was the semantic web and description logics. As part of my work I was developing an ontology of events in online games to describe what was going on in those environments. I found myself endlessly wanting to generalise and unify concepts as much as possible. I wasted a lot of time trying to develop a perfect “upper level ontology” as these things were called then. Eventually my adviser gently nudged me to solve the problem I have in front of me . That is probably the biggest lesson I have learned in programming, and the only way I know to try to walk the fine line between premature specialisation and premature generalisation. Solve the problem in front of you. Don’t try and solve a more general version of the problem, and don’t try and “optimise” until you are fairly sure something is going to be a performance issue. This is not to say you should just live in the moment and write code with no thought at all beyond that specific problem. If you are writing code to handle a FooWidget and you know there are 200 other types of widget you’re going to have to code up later, it’s not premature to think about how those might share functionality. If you are designing a programming language or database, it is not premature to make it very general (and please do take a look at category theory and other unifying frameworks). If you know you’ll have to process millions of FooWidgets per hour then it’s not premature to think about efficiency. And likewise, if it’s extremely unlikely that you’re ever going to move off PostgreSQL as your database then it is not unreasonable to use database-specific features if it makes other code simpler. There’s a lot more that could be said on this topic, including trade-offs between individual tool/component complexity vs whole-system/ecosystem complexity, and accidental vs necessary complexity. But then I’d be making this article more complex. Better to leave that for other articles and leave this one just right .

Performance

0 views

Neil Madden 1 years ago

Digital signatures and how to avoid them

Wikipedia’s definition of a digital signature is: A digital signature is a mathematical scheme for verifying the authenticity of digital messages or documents. A valid digital signature on a message gives a recipient confidence that the message came from a sender known to the recipient. They also have a handy diagram of the process by which digital signatures are created and verified: Alice signs a message using her private key and Bob can then verify that the message came from Alice, and hasn’t been tampered with, using her public key. This all seems straightforward and uncomplicated and is probably most developers’ view of what signatures are for and how they should be used. This has led to the widespread use of signatures for all kinds of things: validating software updates, authenticating SSL connections, and so on. But cryptographers have a different way of looking at digital signatures that has some surprising aspects. This more advanced way of thinking about digital signatures can tell us a lot about what are appropriate, and inappropriate, use-cases. There are several ways to build secure signature schemes. Although you might immediately think of RSA, the scheme perhaps most beloved by cryptographers is Schnorr signatures. These form the basis of modern EdDSA signatures, and also (in heavily altered form) DSA/ECDSA. The story of Schnorr signatures starts not with a signature scheme, but instead with an interactive identification protocol . An identification protocol is a way to prove who you are (the “prover”) to some verification service (the “verifier”). Think logging into a website. But note that the protocol is only concerned with proving who you are, not in establishing a secure session or anything like that. There are a whole load of different ways to do this, like sending a username and password or something like WebAuthn/passkeys (an ironic mention that we’ll come back to later). One particularly elegant protocol is known as Schnorr’s protocol. It’s elegant because it is simple and only relies on basic security conjectures that are widely accepted, and it also has some nice properties that we’ll mention shortly. The basic structure of the protocol involves three phases: Commit-Challenge-Response . If you are familiar with challenge-response authentication protocols this just adds an additional commitment message at the start. Alice (for it is she!) wants to prove to Bob who she is. Alice already has a long-term private key, a , and Bob already has the corresponding public key, A . These keys are in a Diffie-Hellman-like finite field or elliptic curve group, so we can say A = g^a mod p where g is a generator and p is the prime modulus of the group. The protocol then works like this: Don’t worry if you don’t understand all this. I’ll probably do a blog post about Schnorr identification at some point, but there are plenty of explainers online if you want to understand it. For now, just accept that this is indeed a secure identification scheme. It has some nice properties too. One is that it is a (honest-verifier) zero knowledge proof of knowledge (of the private key). That means that an observer watching Alice authenticate, and the verifier themselves, learn nothing at all about Alice’s private key from watching those runs, but the verifier is nonetheless convinced that Alice knows it. This is because it is easy to create valid runs of the protocol for any private key by simply working backwards rather than forwards, starting with a response and calculating the challenge and commitment that fit that response. Anyone can do this without needing to know anything about the private key. That is, for any given challenge you can find a commitment for which it is easy to compute the correct response. (What they cannot do is correctly answer a random challenge after they’ve already sent a commitment). So they learn no information from observing a genuine interaction. So what does this identification protocol have to do with digital signatures? The answer is that there is a process known as the Fiat-Shamir heuristic by which you can automatically transform certain interactive identification protocols into a non-interactive signature scheme. You can’t do this for every protocol, only ones that have a certain structure, but Schnorr identification meets the criteria. The resulting signature scheme is known, amazingly, as the Schnorr signature scheme. You may be relieved to hear that the Fiat-Shamir transformation is incredibly simple. We basically just replace the challenge part of the protocol with a cryptographic hash function, computed over the message we want to sign and the commitment public key: c = H(R, m) . That’s it. The signature is then just the pair (R, s). Note that Bob is now not needed in the process at all and Alice can compute this all herself. To validate the signature, Bob (or anyone else) recomputes c by hashing the message and R and then performs the verification step just as in the identification protocol. Schnorr signatures built this way are secure (so long as you add some critical security checks!) and efficient. The EdDSA signature scheme is essentially just a modern incarnation of Schnorr with a few tweaks. The way I’ve just presented Schnorr signatures and Fiat-Shamir is the way they are usually presented in cryptography textbooks. We start with an identification protocol, performed a simple transformation and ended with a secure signature scheme. Happy days! These textbooks then usually move on to all the ways you can use signatures and never mention identification protocols again. But the transformation isn’t an entirely positive process: a lot was lost in translation! There are many useful aspects of interactive identification protocols that are lost by signature schemes: These points may sound like bonuses for signature schemes, but they are actually drawbacks in many cases. Signatures are often used for authentication, where we actually want things to be tied to a specific interaction. This lack of context in signatures is why standards like JWT have to add lots of explicit statements such as audience and issuer checks to ensure the JWT came from the expected source and arrived at the intended destination, and expiry information or unique identifiers (that have to be remembered) to prevent replay attacks. A significant proportion of JWT vulnerabilities in the wild are caused by developers forgetting to perform these checks. WebAuthn is another example of this phenomenon. On paper it is a textbook case of an identification protocol. But because it is built on top of digital signatures it requires adding a whole load of “contextual bindings” for similar reasons to JWTs. Ironically, the most widely used WebAuthn signature algorithm, ECDSA, is itself a Schnorr-ish scheme. TLS also uses signatures for what is essentially an identification protocol, and similarly has had a range of bugs due to insufficient context binding information being included in the signed data. (SSL also uses signatures for verifying certificates, which is IMO a perfectly good use of the technology. Certificates are exactly a case of where you want to convert an interactive protocol into a non-interactive one. But then again we also do an interactive protocol (DNS) in that case anyway :shrug:). In short, an awful lot of uses of digital signatures are actually identification schemes of one form or another and would be better off using an actual identification scheme. But that doesn’t mean using something like Schnorr’s protocol! There are actually better alternatives that I’ll come back to at the end. Before I look at alternatives, I want to point out that pretty much all in-use signature schemes are extremely fragile in practice. The zero-knowledge security of Schnorr identification is based on it having a property called special soundness . Special soundness essentially says that if Alice accidentally reuses the same commitment (R) for two runs of the protocol, then any observer can recover her private key. This sounds like an incredibly fragile notion to build into your security protocol! If I accidentally reuse this random value then I leak my entire private key??! And in fact it is: such nonce-reuse bugs are extremely common in deployed signature systems, and have led to compromise of lots of private keys (eg Playstation 3, various Bitcoin wallets etc). But despite its fragility, this notion of special soundness is crucial to the security of many signature systems. They are truly a cursed technology! To solve this problem, some implementations and newer standards like EdDSA use deterministic commitments, which are based on a hash of the private key and the message. This ensures that the commitment will only ever be the same if the message is identical: preventing the private key from being recovered. Unfortunately, such schemes turned out to be more susceptible to fault injection attacks (a much less scalable or general attack vector), and so now there are “hedged” schemes that inject a bit of randomness back into the hash. It’s cursed turtles all the way down. If your answer to this is to go back to good old RSA signatures, don’t be fooled. There are plenty of ways to blow your foot off using old faithful, but that’s for another post. Another way that signatures cause issues is that they are too powerful for the job they are used for. You just wanted to authenticate that an email came from a legitimate server, but now you are providing irrefutable proof of the provenance of leaked private communications . Oops! Signatures are very much the hammer of cryptographic primitives. As well as authenticating a message, they also provide third-party verifiability and (part of) non-repudiation . You don’t need to explicitly want anonymity or deniability to understand that these strong security properties can have damaging and unforeseen side-effects. Non-repudiation should never be the default in open systems. I could go on. From the fact that there are basically zero acceptable post-quantum signature schemes (all way too large or too risky), to issues with non-canonical signatures and cofactors and on and on. The problems of signature schemes never seem to end. Ok, so if signatures are so bad, what can I use instead? Firstly, if you can get away with using a simple shared secret scheme like HMAC, then do so. In contrast to public key crypto, HMAC is possibly the most robust crypto primitive ever invented. You’d have to go really far out of your way to screw up HMAC. (I mean, there are timing attacks and that time that Bouncy Castle confused bits and bytes and used 16-bit HMAC keys, so still do pay attention a little bit…) If you need public key crypto, then… still use HMAC. Use an authenticated KEM with X25519 to generate a shared secret and use that with HMAC to authenticate your message. This is essentially public key authenticated encryption without the actual encryption. (Some people mistakenly refer to such schemes as designated verifier signatures , but they are not). Signatures are good for software/firmware updates and pretty terrible for everything else. Alice generates a random ephemeral key , r , and the corresponding public key R = g^r mod p . She sends R to Bob as the commitment . Bob stores R and generates a random challenge, c and sends that to Alice. Alice computes s = ac + r and sends that back to Bob as the response. Finally, Bob checks if g^s = A^c * R (mod p ). If it is then Alice has successfully authenticated, otherwise it’s an imposter. The reason this works is that g^s = g^(ac + r) and A^c * R = (g^a)^c * g^r = g^(ac + r) too. Why it’s secure is another topic for another day. A protocol run is only meaningful for the two parties involved in the interaction (Alice and Bob). By contrast a signature is equally valid for everyone. A protocol run is specific to a given point in time. Alice’s response is to a specific challenge issued by Bob just prior. A signature can be verified at any time.

Crypto

0 views

Neil Madden 1 years ago

Machine Learning and the triumph of GOFAI

I’ve been slowly reading Brian Cantwell Smith’s “ The Promise of Artificial Intelligence ” recently. I haven’t finished reading it yet, and like much of BCS’s writing, it’ll probably take me 3 or 4 read-throughs to really understand it, but there’s one point that I want to pick up on. It is the idea that “Good Old-Fashioned AI” (GOFAI), in the famous term of John Haugeland , failed because it assumed that perception is an easy problem (relatively) that can be solved separately from cognition (intelligence): “It is not random that classical first-wave AI was based on symbolic representation. Its approach, immortalized in Haugeland’s indelible phrase “good old-fashioned Artificial Intelligence” (GOFAI), grew out of four vaguely Cartesian assumptions: C1 The essence of intelligence is thought , meaning roughly rational deliberation. C2 The ideal model of thought is logical inference (based on “clear and distinct” concepts, of the sort we associate with discrete words). C3 Perception is at a lower level than thought, and will not be that conceptually demanding. C4 The ontology of the world is what I will call formal: discrete, well-defined, mesoscale objects exemplifying properties and standing in unambiguous relations.” — The Promise of Artificial Intelligence , ch.2 BCS argues that machine learning (“second-wave AI”) is a better way of tackling intelligence in general. (His arguments go way beyond the narrow point I’m going to pick up here, with some very interesting insights—this is quite a minor issue in his general thesis. He attacks C4 particularly powerfully). Firstly, BCS takes Haugeland’s characterisation of GOFAI as unproblematic, but see for example Drew McDermott’s argument that it was essentially nonsense: GOFAI Considered Harmful (And Mythical) . Secondly, if machine learning (by which we both mean neural nets specifically) is an argument for anything at this point, it is that perception is a separable concern! The things that machine learning excels at are precisely turning messy perceptual inputs into neat symbolic categories. From the v ery first triumphs of deep learning to the recent applications across a range of fields, neural nets are fantastic at perception. But they perform much less impressively on any tasks involving reasoning or deliberation. LLMs are bullshit machines (in a strict sense of that term). Even apparent successes of machine learning to the traditional realm of GOFAI, such as AlphaGo and its successors, rely on machine learning primarily to pattern-match and score board positions, and use classic symbolic state-space search (in the refined form of Monte Carlo tree search) to actually choose what move to make. (Update January 2025: As Rodney Brooks points out , even for perception, modern ML systems are less than ideal, citing this quote: “The kinds of things that tend to go wrong with these systems are things like it was not trained on, pictures of an overturned double trailer. It just didn’t know what it was. There were some lights there, but the lights were in unusual positions. A person would have clearly said something big is in the middle of the road. But the way machine learning works is it trains it on a bunch of examples and if it encounters something it doesn’t have a bunch of examples for it may have no idea what’s going on.”) It’s been 15 years since I finished my PhD in AI, and I haven’t been involved in the field for most of that time. But when I talk to academics still in the field, it seems there is a lot of talk about hybrid approaches that combine symbolic and ML techniques. Essentially, the ML does the messy perceptual stuff and then the symbolic part does the reasoning. If this is not a paradigm of GOFAI, I’m not sure what would be! None of this is to say that I think GOFAI (if it exists at all) is a good approach, but I think perception is the wrong lens to view it through. The real “second wave” AI was not neural nets (which have been around since the start), but rather Brooks’s “Nouvelle AI” . This approach rejected explicit symbolic representation in favour of tight feedback loops between perception and action (“ the world is its own best model ”). This approach also ran into limitations, and hybrid approaches emerged as the solution there too. But the idea that intelligence is grounded in embodiment has had a much more powerful and lasting impact on how we conceptualise intelligence than anything since. It seems likely that the very first nervous systems evolved to support movement , and that almost all subsequent developments were to support more complex interaction with the outside world (finding food or mates, avoiding predators, social interactions, etc). Perception is in service of interaction. But progress in explicitly embodied enterprises like self-driving cars also seems to have stalled, running into the hard problem of general intelligence and “common sense” reasoning . Russell and Norvig describe three broad types of representations in AI: McDermott argues in the essay I linked earlier , that early AI attempts were based on the latter, most complicated, representations. But actually (in 2015) all the recent successes of AI were based on factored representations: SAT solvers, neural nets, Bayes nets, etc. I think this is basically true, but that everywhere we seem to be running into the limitations of this approach. Personally, I don’t think we can tackle intelligence without structured representations, and maybe we are ready to look anew at that challenge. SAT solvers evolved into SMT solvers , which combine the strengths of the former with more complex representations (still largely factored). Logic programming and Datalog (Prolog-lite) is seeing something of a resurgence. These kinds of logical, symbolic approaches have plenty of conceptual and technical difficulties, but I think we need to tackle those rather than declaring them insurmountable. The history of AI is throwing babies out with the bath water. Hopefully this time we can learn the lesson that hybrid architectures that combine a range of techniques are the best. Atomic representations like those of finite-state machines and state-space search. The state is an indivisible whole and you can’t say anything about its composition. Factored representations break up the state into a finite set of variables, each taking on a simple (scalar) value of some kind. Boolean circuits and SAT solvers fit in this category, as do constraint solvers, Bayesian networks, and neural nets. Finally, structured representations such as formal logic, most programming languages, and relational databases, where the world is modelled in detail as collections of entities with properties and relationships between them.

Machine Learning

0 views

Neil Madden 1 years ago

Galois/Counter Mode and random nonces

Galois/Counter Mode , better known as GCM, is very popular way of encrypting data with AES to provide authenticated encryption with associated data (AEAD). Phew, that’s a lot of terminology. Let’s just say that AEAD is the gold standard for data encryption, and AES-GCM is one of the most popular choices—by far the most widely used encryption algorithm on the internet . Whenever you encrypt a message with GCM, you need to use a unique nonce (number used once). For protocols like TLS, this nonce is generated deterministically: essentially a counter is initialised when the connection is first established and incremented after every message. This is mostly fine for stateful protocols like TLS ( except when it isn’t ), but is incredibly hard to do in a stateless protocol, where servers and clients may be coming and going, crashing, resuming VMs, etc. Reusing a nonce even once for GCM is absolutely catastrophic, as an observer can then trivially recover the authentication sub-key and probably the message content too. So the solution that most people use is to use random nonces, created using a CSPRNG . The problem is that GCM’s nonce is only 96 bits long, which means that the probability of two random nonces being the same (a collision) approaches 50% after around 2 48 messages. 50% is way too high for comfort, so NIST advises to keep the chance of a collision to less than 2 -32 , or about 1 in 4 billion. That limit comes after 2 32 messages for GCM: 4 billion again, give or take, and that is the limit NIST imposes for GCM with a random nonce. That sounds like a lot, and for many uses it is, but for some high-frequency usage cases, like Google’s front-end servers, that limit can be reached quite quickly. Google have stated (pp. 4) that when under DDoS attack, their servers may have to produce “several hundred million [encrypted tokens] per second” . Under that load, they would hit the 2 32 limit in 43 seconds or less! The solution they designed is described in that linked paper: AES-GCM-SIV, which is able to tolerate some number of nonce collisions, but under a weaker notion of security that is only really applicable to that use-case (where the data being encrypted is itself random). But is GCM limited to a 96-bit nonce in the first place? The answer turns out to be no (with some caveats). But is GCM limited to a 96-bit nonce in the first place? The answer turns out to be no. Where does the 96-bit limit come from? GCM is, as the name suggests, based on a simpler mode called Counter Mode (or CTR for short). In CTR mode, a block cipher like AES is used to encrypt a sequence of incrementing counter values: 0, 1, 2, 3, … and so on. This produces a sequence of pseudorandom data (the keystream) that is then bitwise-XORed with the message to encrypt it. AES is a 128-bit block cipher, so the input counter is encoded as 128 bits, or 16 bytes, and it produces 16 bytes of output each time. This means that the counter needs to be incremented for each (16-byte) block of data encrypted rather than each message. To ensure that no two blocks ever collide, GCM splits the 128-bit counter into two parts: a 96-bit per-message nonce, and a 32-bit block counter. Each message uses a unique nonce, and the block counter is reset to 1 each time (the all-zero counter is used internally) and incremented for each block, allowing a maximum of 2 32 × 16 = 64GiB per message. This allows the application to simply increment the nonce for each message, without having to keep track of how many blocks of data have been encrypted. That’s fine for deterministic nonces, but for random nonces segregating the counter space in this way is counter-productive (pun intended!). Unless you are encrypting really large messages (approaching the 64GiB limit), it is generally better to randomise the entire 128-bit counter space. That is, you pick a random starting point in the counter space and then increment that initial counter for each block of the message. This will create “runs” of used-up nonces, spread uniformly around the full 2 128 space. Although it may initially seem like this would make collisions more likely compared to having a separate block counter, in fact 2 128 is so enormously bigger than 2 96 that the chance of two “runs” of counters overlapping is vanishingly small until you’ve encrypted a very large number of blocks. In fact, you would hit NIST’s 2 -32 probability limit after encrypting around 2 48 blocks (281 trillion ). For small messages, of just a few blocks in length, as in Google’s case, that allows you to encrypt close to 2 48 messages. For example, suppose those messages are all <= 128 bytes in length. In that case, you could encrypt 2 45 messages before you hit the limit, which means Google would only need to change the key every 4 days or so, even under large-scale DDoS attack. OK, very well, you might be saying, but GCM doesn’t support randomising the entire 128-bit counter space, so this is all academic, isn’t it? Well, it turns out that it does. GCM has the strange property that it allows the nonce (IV) to be any length, not just exactly 96 bits. If it is exactly 96 bits, then the initial counter value becomes the nonce concatenated with the 32-bit initial block counter (set to 1). If it is not 96 bits, then the value is first hashed with GHASH, GCM’s universal hash function , and the full 128-bit output becomes the initial counter value: However, we can’t just assume that the output of GHASH is ideal with respect to collisions, and in fact the analysis in this paper shows that it is not. In the worst case, where messages can be up to 2 32 -2 blocks long, there is an additional factor of 2 22 in the security advantage to an attacker. However, if we restrict ourselves to shorter messages then the impact is less severe. For example, taking our example of 128-byte messages and 16-byte nonces, then the probability of a collision after applying GHASH is ≤ 2 9 /2 128 = 2 -119 . In that case, you can encrypt about 2 43.5 messages (about 12.4 trillion) before you hit NIST’s limit, allowing Google to reuse a key for about 34.6 hours. Even in the worst case, with messages up to the maximum 64GiB in size, this analysis shows that you can still encrypt 2 36 messages with a random 128-bit nonce: 16 times as many as with a random 96-bit nonce. For more realistic message sizes, the advantage is greater. Update: /u/wearingdepends on Reddit points out a later paper that reduces that worst case factor from 2 22 to just 32 (i.e., 2 5 ). So even in this worst case (with very large messages), you can encrypt around 2 44.5 messages (almost 25 trillion) and almost 70 hours of Google-scale DDoS handling. So, technically you can use random nonces with AES-GCM with better bounds than folklore would have you believe. Would I recommend it? No. Even if it technically works, it’s a highly non-standard usage and against the letter of NIST’s recommendations. I wouldn’t spend innovation tokens on this kind of thing. Frankly, if I needed to do something like this, I would derive a fresh key for each message from a long nonce, in the way that XSalsa20 does: using a real pseudorandom function. Libsodium even exposes this functionality as a general-purpose facility.

0 views

Neil Madden 1 years ago

SipHash-based encryption for constrained devices

I see a lot of attempts to define encryption schemes for constrained devices with short authentication tags (e.g., 64 bits) using universal hashing . For example, there’s a proposal in CFRG at the moment for a version of AES-GCM with short tags for this kind of use-case. In my (admittedly limited) experience, these kinds of constrained device use-cases (such as IoT) are also processing quite short messages, e.g., sensor readings etc. In these cases, a universal hash function like GCM’s GHASH or ChaPoly’s Poly1305 are, in my opinion, not the best choice. A fast short-input PRF like SipHash will likely perform just as fast or possibly even faster, and is far more robust. For example, here’s a sketch of a misuse-resistant authenticated encryption (MRAE) mode based on SipHash-2-4 and ChaCha12. We first using HChaCha12 to derive unique MAC and encryption sub-keys from the key and nonce, and then use SipHash (in PMAC mode, with a hat-tip to Miscreant ) to compute the SIV. Finally, use that SIV as the 64-bit nonce to ChaCha12. I haven’t implemented this or given it much thought at all beyond this sketch, but if I was still doing any kind of work on IoT, that’s the kind of construction I’d be looking at: very fast for small inputs, but extremely robust against lots of kinds of misuse. A universal hash function like Poly1305 will be much faster if your input exceeds more than a kB or two, but for short messages I think using a PRF is a much better idea.

Security Crypto

0 views

Neil Madden 1 years ago

Happy new year! I’m hoping to write a few posts on here over the next few weeks, but probably exploring a few topics around AI and philosophy. If you’d prefer some more technical content around security and cryptography, then take a look at the newsletter I put out for my consulting company, Illuminated Security . The topics covered so far: If you like this kind of content, consider subscribing to the newsletter. A comment on PBKDF2 iterations for encryption . An evaluation of the performance of NIST-approved random number generators in Java . Are authentication factors still useful? Some common JWK Set gotchas JWT right answers

AI Crypto

0 views

Neil Madden 2 years ago

A controversial opinion about REST API design

I was just reading yet another article on REST API design guidelines . Some of it is good advice, some of it I could quibble with. But several of the rules are about how to design the path hierarchy of your API: use plural nouns, don’t use nested sub-paths unnecessarily, etc. In this article I want to argue that you largely shouldn’t care about any of this. The problem with designing your URL hierarchies to be “ergonomic” for developers is that it encourages said developers to directly call endpoints on your API. One of the arguments for using a nice consistent naming scheme is that it allows developers to predict where the endpoint for updating, say, purple widgets is. That’ll be the `/widgets/purple` endpoint, for example. But I don’t want developers to predict where endpoints are! That leads to them making assumptions, which leads to them hardcoding those assumptions in their clients, which leads to brittle and tightly-coupled clients, which leads to the inability to make changes, which leads to suffering! What I want them to do is follow hyperlinks from one resource to another. That way, when I change things in future I can just change the hyperlink and not worry about breaking lots of clients. The whole point of REST is to encourage loose coupling. Designing consistent and logical URL hierarchies feels like a good thing to do. As developers, we want to take pride in our work and show attention to detail in our designs. But paradoxically, by making APIs easy to predict we encourage exactly the sort of coupling we should be trying to avoid. So, what am I suggesting we do instead? Well, firstly there will always be some initial entry points into your API. Feel free to give those nice URLs. But for everything else, I’m going to be really controversial now and suggest that most resources should live under a URL that is unpredictable. That doesn’t mean that everything has to live under a /stuff/<random-uuid> generic container for everything. Rather, I’d suggest adopting capability URLs , in which resources live under logical paths but to access them you also need a unique token that is encoded into the URL. Without the token, you cannot access the URL. Capability URLs have lots of security advantages, which I’ve described in detail in chapter 9 of my book and also in several posts on this blog . But aside from those security considerations, adopting capability URLs forces developers to use your API in a hyperlink-driven fashion. Once your clients are all hyperlink-driven you can relax a bit about the minutiae of URL hierarchies, because you can make changes without fear of breaking everything. You don’t need to get everything right first time or otherwise have ugly /v3-FINAL-2-REALLY-THIS-TIME/ URL prefixes. tl;dr: Never REST until you have the capability to do so properly.

API

0 views

Neil Madden 2 years ago

Regular JSON

For better or worse, depending on your perspective, JSON has become a dominant data format and shows no signs of being replaced any time soon. There are good reasons for that: on the face of it, it provides a very simple format with just enough features to cover a lot of use-cases with minimal feature bloat. Of course, behind this superficial appearance of simplicity there are lots of edge cases to be aware of especially where interoperability is important: Parsing JSON is a minefield . As JSON has grown in popularity, it has increasingly been used in security-sensitive applications. For example, the popular JSON Web Token (JWT) format for cryptographic auth tokens uses JSON to represent identity claims about a subject, and to encode token metadata in the form of headers. There is a lot that could be discussed about the suitability of JSON for such an application, and the security aspects that need to be considered (e.g., how different parsers handle duplicate fields in objects). In this post, I want to just consider the complexity of the language. The Language-Theoretic Security (LANGSEC) movement has long argued that if we want to improve security then we should be opting for the simplest data formats and languages possible, ideally regular languages where possible. JSON is not regular as defined—it is at least context-free and potentially context-sensitive if you try to forbid duplicate field names in the grammar. This non-regularity comes primarily from the arbitrary nesting that can occur with JSON arrays and objects: arrays inside objects inside arrays inside objects and so on. But in many cases, particularly in the security-sensitive cases like JWTs, we don’t actually need this full generality. I don’t think I’ve ever seen a legitimate JWT where the level of nesting was more than a few levels. It turns out that if we restrict the nesting depth of these structures to some arbitrary limit then JSON becomes a regular language. I call this subset of JSON “Regular JSON” , for obvious reasons. Section 9 of RFC 8259 explicitly allows the nesting depth of structures to be limited, but as an implementation-specific concern. Regular JSON effectively just formalises such a restriction. Rather than being a single language, Regular JSON is a family of languages defined by the maximum nesting depth. The Regular JSON language with maximum nesting depth n is called Rank-n Regular JSON . It is defined as follows: With a bit of work it would be straightforward to generate an actual regular expression that parses Rank- n Regular JSON strings for some fixed value of n . In principle it is possible to parse conforming (and reject non-conforming) Regular JSON strings in a constant amount of memory space, which is beneficial for resource-constrained implementations—assuming more efficient binary formats cannot be used, for interoperability or other reasons. In my opinion, Rank-2 Regular JSON is a suitable target for most data formats like JWTs. I believe almost all JWTs in the wild would fit within this subset. Although Regular JSON by no means tackles all of the potential pitfalls of using JSON in security-sensitive applications, it does significantly reduce the formal and real-world implementation complexity of the language. Such applications are normally quite restrained in how they actually use JSON but rarely state this as any kind of formal constraint. This leads to implementations that use generic JSON parsers, which can handle arbitrarily complex nested structures that should never occur in legitimate data. In my opinion, if JSON is going to continue to be used in such applications (and I’m sure it will be, regardless of the relative merits of that), then it would be good if spec authors and designers specified a particular rank of Regular JSON so that implementors know exactly what they are expected to deal with. (You should then also specify the maximum size of numbers, strings, allowed extensions and so on as discussed in the “minefield” article). Anyone want to design a logo? Rank-0 Regular JSON consists of just the basic data types from JSON: strings, numbers, and , and . No arrays or objects. Rank-1 Regular JSON consists of any Rank-0 Regular JSON value, plus arrays and objects whose values are constrained to be Rank-0 Regular JSON values. That is, you can have an array of numbers or an array of strings (or a mix), but you cannot have an array of arrays or array of objects. Rank-2 Regular JSON adds arrays and objects whose values can be Rank-1 Regular JSON values. So here you can have nested arrays and objects but only with a single level of nesting: e.g. a JSON object whose values are arrays, but those arrays contain simple strings or numbers, not other objects or arrays. And so on. Rank- n Regular JSON is JSON where the arrays and objects are restricted to only contain values of Rank- (n-1) .

Data

JSON

0 views

Neil Madden 2 years ago

I still don’t really get “hash shucking”

If you want to learn how to store passwords securely, you could do a lot worse than looking at the OWASP Password Storage Cheat Sheet . These cheat sheets are generally pretty good, and the password storage one is particularly good. The editors do a great job of keeping it up to date and incorporating the latest research from experts. (Just bear in mind that the recommendations there are when using password for authentication. If you’re using a password to encrypt sensitive data then you should be aware of some limitations ). One of the hash functions that OWASP recommend is bcrypt , which should be familiar to anyone who’s ever looked at password hashing. Bcrypt is generally an ok choice, but it has some quirks that make it hard to love. As pointed out in the cheat sheet, many implementations cannot handle input passwords longer than 72 bytes. (And some implementations are not binary safe either ). To get around this, it was common advice at one point to “pre-hash” the input using some other fast hash function like SHA-256. That is, rather than the stored password hash being it was or something similar. This was also sometimes done when an old insecure password database using something like unsalted MD5 was upgraded by simply re-hashing the existing hashes with bcrypt: . On the face of it, this seems like a reasonable and safe thing to do. After all, if someone gets a copy of your password database they will be faced with a list of hard-to-crack bcrypt hashes, rather than raw unsalted MD5 or SHA-1 or whatever. This is where “hash shucking” (or “password shucking”) enters the picture. Imagine that an attacker compromises your database and gets hold of this list of password hashes. Now imagine that the attacker already has a bunch of unsalted MD5 hashes from a previous breach, that for whatever reason they haven’t cracked yet. The idea of hash shucking is that an attacker can take these MD5 hashes and run them through bcrypt (with the same salts and cost factors from your database) and see if any match any of the hashes in your database. If they do, then they know that those unsalted MD5 hashes correspond to passwords used by your users. They can then spend their time cracking those MD5 hashes to recover the passwords, apparently bypassing needing to attack bcrypt at all. When I first learned of this attack, my reaction was that the problem here is those unsalted MD5 hashes from the other breach. If they are crackable, then an attacker is going to crack them anyway, at which point they can just try those passwords anyway. So what does shucking really buy the attacker here? The way it was (very patiently) explained to me, is to imagine a high-value target that is using a quite good password. They have reused it on multiple sites, but it’s something pretty hard to guess like a random-ish 10 character password that isn’t in any existing password breach databases. Hard to crack, but it’s not uncrackable. The thought then is that the attacker may have exhausted basic dictionary attacks on the unsalted MD5s and failed to crack this password. With password shucking though, they learn that this password is being used by a high value target on a juicy website. So now they know that, they will concentrate their resources on trying to crack this particular hash using more sophisticated and expensive techniques such as brute-force enumerating every 10-character password. With this concentrated firepower, they will eventually crack this unsalted MD5, something they could never achieve against bcrypt. But this isn’t really how cracking unsalted password hashes goes. You don’t need to “concentrate” resources on a few candidate hashes. The whole point of unsalted hashes (and the reason we use salts in the first place) is that you can attack them all at once for roughly the same cost as attacking one: you generate a candidate hash and then see if it matches any in your database. This check can be performed in constant time regardless of the number of hashes to check, or close enough to not matter. Heck, go wild and build a Rainbow Table (they are really fun). You don’t need to attack them one at a time. Now maybe I’m still missing some fundamental reason why this attack makes sense, but at the moment I’m struggling to see it. If your password ends up in an unsalted MD5/SHA-1 breach and the password is crackable then it’s best to assume that it is going to be cracked. I don’t see how hash shucking changes the equation there at all. Now, perhaps through this technique an attacker learns that some MD5 hash is also the password of some high-value target, but they could probably already figure this out from the email address or other metadata associated with that hash. (Which is not to say hash shucking is a bad idea or poor research: it’s an interesting idea regardless). Regardless of whether you think hash shucking is a real attack or not, one thing to notice is that it wouldn’t be an issue at all if you used some really unusual (but still secure) hash function to pre-hash the input to bcrypt. It’s pretty unlikely that anyone is using say BLAKE3-256 for unsalted password hashes, because anyone who knows that BLAKE3 even exists likely also knows not to use an unsalted password hash. (Well, ok that is maybe a shaky assumption). So it’s very unlikely that there is a breached database of unsalted BLAKE3 hashes sitting around out there for an attacker to try against your bcrypt database. This may all sound a bit “security through obscurity” and that I’m going to suggest using some crazy homespun hash function . But there is a more principled way to think about this. A hash function approximates an ideal object known as a “ random oracle ”. Often in security proofs we need to assume that different uses of a hash function in some code behave like independent random oracles: that is, they produce different outputs even when given the same input. One way to do this is via a process known as domain separation : we ensure that the inputs to the hash function are always different so that the output is always different (with high probability). There are a bunch of ways to do this, for example we can just add a fixed prefix string to the hash input for each usage. For example, we could prefix the inputs to our bcrypt pre-hash with the string “bcrypt-prehash”: A more principled way to do this is via dedicated constructions that support a separate salt argument (hash salting is itself a form of domain separation), such as HDKF-Extract , where the salt is used as a HMAC key, making the combined construction look like: More modern hash functions like cSHAKE or BLAKE3 support “customisation strings” that natively support this kind of domain separation without needing to use HMAC. Now if you go and look at what people are recommending to protect against hash shucking, it is something exactly like this. The first suggestions were to use a separate random salt per user, and then I saw suggestions to use a random “pepper” that is the same for all passwords, and I’ve now also seen people suggest that a simple constant string like “bcrypt-prehash” is fine. And it is. So even though I don’t believe that hash shucking is a real issue, the solutions that are proposed to counter it are perfectly sensible and good practice. By all means, if you are going to prehash bcrypt password hashes then add some domain separation. At the very least, if you ever want a cryptographer to prove your construction safe then they will thank you for it. (Just remember, that if you are prehashing for bcrypt, then make sure the output is hex-encoded or similar and less than 72 bytes to avoid common issues . Really, though, just use a better password hash without these legacy quirks). I still don’t understand why hash shucking is an issue. From my perspective it doesn’t give the attacker any advantage that they didn’t already have. On the other hand, the solutions proposed to eliminate hash shucking are sensible things to do anyway, so they are worth doing regardless of whether you think it’s a real issue or not. But consider using a more modern hash function that doesn’t have the weird limitations of bcrypt. (And as always, use a password manager, use 2FA, use Passkeys, sign up for my newsletter , etc etc).

0 views

Neil Madden 2 years ago

Entity authentication with a KEM

In cryptography, the process of authenticating a user (or app/service) is known as entity authentication or identification (to distinguish it from message authentication or data origin authentication ). There are lots of ways to do this. In this post I’m going to talk about authentication schemes based on public key cryptography. It turns out that the notion of Key Encapsulation Mechanism (KEM) , originally invented for (unauthenticated) public key encryption , also provides a unifying framework for thinking about public key entity authentication. As a dedicated reader of this blog, you will of course recall that a KEM is a nice way of describing public key hybrid encryption algorithms: i.e., those involving a combination of public key and secret key cryptography. A KEM provides two operations: It turns out that we can also use this interface to authenticate a user using a challenge-response protocol . I’m sure this is well-known amongst cryptographers, but it was a real “aha!” moment for me when I first realised it. The process goes like this: Let’s see how this pans out when we plug in concrete KEM implementations into this scheme: In RSA-KEM, the encapsulate method generates a random value and encrypts it with the RSA public key (please see my original article for the details). The secret key is then derived by running this random value through a key derivation function (KDF). In the context of our authentication process, this will result in Alice being challenged to decrypt a random value encrypted under her public key. This is a classic way to authenticate using RSA, for example listed as the very first method in section 10.3.3 of the classic Handbook of Applied Cryptography (pdf link) . When we look at DH-KEM, which is based on Diffie-Hellman (DH) key agreement (or its Elliptic Curve equivalent), the encapsulate method works like the following: The decapsulate method then decodes the encapsulated value to recover the ephemeral public key and performs a DH between that and the recipient’s private key to recover the same shared secret. In the context of our authentication scheme, this KEM performs an ephemeral-static DH key agreement which is another well-known method of authentication. This is, for example, used to provide recipient authentication in the Noise protocol framework and in Signal’s X3DH . Note that in this scheme, nothing actually gets encrypted at all! In general, any secure KEM will be a secure entity authentication mechanism when using this generic scheme. This security of these authentication methods both depend on the KEM encryption scheme being secure against chosen ciphertext attacks (IND-CCA), which is the security goal required of a KEM in the first place. I find it fascinating that an abstraction that was introduced to unify different hybrid encryption schemes also turns out to be a great abstraction for describing challenge-response authentication schemes, some of which involve no encryption at all. If you want to learn more about KEMs and all the things they are useful for, check out my three-part series: I’m also available to hire for all your authentication needs. The encapsulate operation takes the recipient’s public key and returns a fresh symmetric data encryption key (DEK), along with an encapsulation of that key. You use the DEK to encrypt the message body (using something like AES-GCM) and then send the resulting ciphertext along with the encapsulation to the recipient. The recipient then takes the encapsulated key and runs the KEM’s decapsulate operation with their private key to recover the original DEK, which they then use to decrypt the rest of the message. Alice wants to log into our website, where she already has an account. She sends her username “alice” to the site. The website looks up Alice in the user database and finds her associated public key (or generates a random one if she doesn’t exist, to prevent account enumeration ). The website runs the KEM encapsulate operation with Alice’s public key. It stores the secret key somewhere safe (e.g. encrypted in a cookie) and sends the encapsulation to Alice as a challenge . Alice runs the KEM decapsulate operation with her private key and the challenge to recover the secret key. Alice then sends the secret key to the website as the response. (Or uses it as a HMAC key to sign some response message, but for now we’ll keep it simples). The website compares the secret key sent by Alice to the one it stored in step 3: if they are the same ( compared in constant time , naturally) then authentication succeeds. First a random ephemeral key-pair is generated. The ephemeral public key becomes the encapsulation value. The KEM then computes a DH key agreement between the recipient’s public key and the ephemeral private key to generate a shared secret. This shared secret is then passed through a KDF to generate the final secret DEK. Hybrid encryption and the KEM/DEM paradigm When a KEM is not enough From KEMs to protocols

Crypto

0 views

Neil Madden 2 years ago

Book review: The Joy of Cryptography

Mike Rosulek, Oregon State University. Draft of January 3, 2021. Online: The Joy of Cryptography . This is a freely-available book covering introductory material on cryptography. It’s suitable for anyone with undergraduate-level computer science knowledge. As is often the case in cryptography textbooks, there is a brief review of mathematical background in the first (or zeroth in this case) chapter: modular arithmetic, probability, and Big-O notation. This chapter is brief but clear and establishes the sense of humour that is present throughout the book. Overall, the book is very well-presented and -written for the target audience—undergraduates and practitioners. I have read several other textbooks on cryptography, but most are pitched more at a level for post-graduate students (even if apparently meant for undergrads). The writing here is always clear and focused, without leaving out essential details. In particular, I appreciated that security definitions are immediately followed by a discussion on the implications of that definition, and what it leaves out. On a more basic note, the book is beautifully presented, and clearly a lot of thought has gone into the typography, layout, and diagrams (which have a pleasing “tightness” in their use of space). It is largely a book about symmetric cryptography—the first 12 chapters cover symmetric material in depth, and only the last 3 chapters cover public key crypto. I actually quite like this split, because I believe many topics that are discussed in the context of public key cryptography have simpler illustrations in a symmetric setting. Still, you may find the public key material a little sparse. There is no discussion of elliptic curves, for example, or the developing field of post-quantum cryptography. There is also no material on hybrid encryption (i.e., using a combination of public key and symmetric primitives) or the KEM-DEM paradigm, which I would expect in any modern textbook. Update: Mike Rosulek pointed out that the book does actually cover hybrid encryption in the very last section: 15.4. Mea culpa. It is a draft though, and the roadmap makes clear that some subjects that I would expect to be covered are coming at some point. On the other hand, by the time you finish the book you will know enough to tackle some of the more comprehensive textbooks, such as Smart’s Cryptography Made Simple (lol) or Boneh & Shoup’s Graduate Course in Applied Cryptography . The presentation is also quite slow, taking its time to develop topics and foster a real understanding. This is both a blessing and a curse. The clarity is welcome, but you have to get to chapter 12 before authenticated encryption is covered. If you are looking to find out “what should I be using?”, waiting 12 chapters to find out might be testing your patience a little. But if you want to really understand the topics and what modern cryptography is all about, your patience is paid off. The approach used in the book to proving security of schemes is that of code-based games . The idea here is that a cryptographic scheme is developed as a kind of abstract data type or API, defining the operations that can be performed: generating a key, encrypting a message, decrypting a message, and so on. An adversary is then given access to two implementations of that API: one that is our real encryption scheme, and another that is some idealised (typically random) implementation of it. The API the attacker sees is restricted in some ways compared to what a real user would see (e.g., the attacker doesn’t get to see the key used for encryption). A scheme is secure if we can prove that no “reasonable” adversary can tell which implementation they have been given any better than a blind guess. A proof in this scheme proceeds by replacing parts of the real implementation with idealised versions, until we can show that an attack on the real implementation is equivalent to an attack on the idealised version. Such proofs are always relative to some basic set of primitives that we believe to be close enough to the ideal versions already (generally because a lot of smart people have tried to break them for a long time and failed). This is a different approach than a lot of textbooks, which use this indistinguishability proof style only for encryption, and then switch to other types of games for other security notions (such as message authentication). I found this uniform presentation very helpful in understanding the material and in seeing connections between security notions that I hadn’t appreciated before. For example, this book is the first time I’ve appreciated the importance of the Decisional Diffie-Hellman assumption to proving pseudorandomness of key agreement, which then motivates the use of quadratic residues and safe primes. (Although it’s missing a follow-up discussion on the use of hashing/random oracles to relax this assumption, which is what you would do in practice—a chapter on hybrid encryption would help clarify this). An another example, the book introduces secret sharing early on and then points out that a one-time pad can be viewed as a simple 2-of-2 secret sharing technique, a connection I had never made before. Secret sharing can be viewed as a type of multi-party computation (MPC) and the author is an MPC researcher and co-author of an MPC textbook . This specialist knowledge provides a welcome fresh perspective on otherwise familiar material. Overall, this is an excellent introduction to cryptography, with a novel and engaging approach to security proofs. The draft status and some lacking topic areas make me hesitant to recommend it as your sole introduction to cryptography just yet, but as a companion to another textbook it has a lot to recommend it. Even if you believe you already know all this material, you may still gain a lot from reading it: I know I did.

Cryptography

Education

0 views

Neil Madden 2 years ago

A few programming language features I’d like to see

I enjoyed Hillel Wayne’s recent newsletter about microfeatures they’d like to see in programming languages . A “microfeature” is essentially a small convenience that makes programming in that language a bit easier without fundamentally changing it. I love this idea. I’m partial to a bit of syntactic sugar, even if it can cause cancer of the semicolon . I have a few features that I’d love to see more languages adopt, so I thought I’d join in the fun. Unlike Hillel’s, however, most of mine are much larger changes. I’ve long thought it was time for a bit more experimentalism to return to programming language design, and not just for type systems! Maybe these ideas will inspire you to think up some whacky new programming languages. Let me know in the comments. The E language is a veritable cornucopia of interesting programming language ideas. But to pick one that I really like it’s the quasi-literal syntax for safely constructing values in other languages: SQL, HTML, etc. There were several iterations of this idea in E, and the documented versions are older I believe. But the basic idea is that you have a generic syntax for constructing multi-line strings with embedded expressions. On the surface this is a familiar idea from many languages, but E has a nice twist on it. For example, given a code snippet like the following what gets constructed here is not a simple string, but rather some kind of Template object/record. That template object has two fields: The database exec() method then doesn’t take a String, but rather one of these Template objects and it applies appropriate processing based on the contents. In this case, by constructing a prepared statement, something like the following: Here the (completely made up) code replaces the originally expressions with SQL placeholders in a prepared statement: “INSERT INTO … VALUES(?, ?)”. It then binds those placeholders to the actual values of the expressions passed in the template. (The exact type of the “values” field is questionable: E was dynamically-typed. For now, let’s assume that all values get converted to strings, so “values” is a list of strings: in this example “flibble” and “43”). This has the effect of allowing the easy use of prepared statements, while having a syntax that is as easy to use as string concatenation. Safety and convenience. The same syntax can be used to provide safe templating when outputting HTML, XML, JSON, whatever. (E’s actual approach was somewhat different to how I’ve presented it, but I think this version is simpler to understand. E also allowed the same syntax to be used for parsing too, but I think that is less compelling and complicates the feature). Edit: /u/kn4rf and /u/holo3146 on Reddit point out that this already exists in JavaScript in the form of t agged template literals, and has been proposed for Java too in JEP-430 . Very cool! I’ve always loved Prolog and its cleaner cousin Datalog, but like many people, I find them hard languages to write a full application in. Logic programming is great for writing, well… logic. But it’s less good at the messy stuff. Robert Kowalski famously described the design of Prolog using the equation “ Algorithm = Logic + Control” : the idea that the logical form of an algorithm can be separated from how it is executed (the control flow). In my opinion, we can go further and separate all the messy stuff around the edges. I propose the following equation: Program = Logic + Control + Interaction “Interaction” here covers a multitude of sins: user interfaces, networking, file I/O, operating system calls, etc. (Aside: parallelism is a control strategy, concurrency is interaction – hope that’s cleared that up!) I would like to program the logic of my program in a pure logic programming language. (With a bit of sugar for functions, which are after all just a certain type of relation). And then I’d like to separately be able to specify how that logic should be executed: top-down vs bottom-up, sequential or parallel, etc. And I’d like to be able to call into this pure declarative logic from an imperative shell that handles all the messy stuff. Is that too much to ask? While we’re talking about the messy stuff of interaction, how about we do something a bit better than simple procedures or Java-style methods? When I was doing a PhD in AI (before deep learning took over the world), I was very taken with Nilsson’s Teleo-Reactive Programs (TRP). This was an approach to programming robots and other systems that is goal-driven but also reacts to unexpected changes in the environment. It doesn’t just keep blindly following a fixed procedure. The basic idea is that you define a method to achieve some goal as an ordered list of conditions and corresponding actions, like the following contrived example: Although this looks like a simple set of if-then statements, the execution is quite different. When you invoke the make_tea method it starts a persistent process that continues until the tea is made (or an unrecoverable error occurs). At each cycle it scans through the list of conditions and executes the first one it finds that isn’t satisfied. Actions further down the list attempt to satisfy the preconditions of actions higher up the list. In this way, it is always trying to move closer to its goal condition (perfect tea). However, the approach has some advantages. If some earlier task was completed but is somehow undone by another action — say, someone comes along and makes coffee with our freshly boiled water—the TRP will notice this the next time it scans through the list and so will automatically boil some fresh water. On the other hand, if a change in the environment serendipitously causes a higher-up goal to be satisfied already (someone puts the teabag in for us), the system will also notice that and automatically avoid duplicate work. Although many of the examples for TRPs involve robotics and real-world tasks, this kind of intelligent goal-directed execution is incredibly useful in everyday programming too. If you’ve ever had to code retry and recovery logic with back-offs and abort conditions, you may see the value of something like TRP. Kubernetes has some similar ideas in its approach to d eclarative config: you describe your desired goal state for the system, and it performs a continuous diff between the current and desired states and corrects any differences it sees. This is a simple one. I’d like to be able to easily annotate methods with pre- and post-conditions and have them automatically checked and documented: Once I’ve annotated all my methods with pre- and post-conditions, maybe I shouldn’t even have to manually bother calling them? What if I could just write: and the language runtime went off and found some sequence of method calls that would achieve that post-condition. Combine that with pre- and post-conditions specified in Datalog and methods implemented in TRPs and this starts to sounds kinda interesting! Of course, it might be terrible and I’m sure debugging would be a nightmare, but wouldn’t it be fun to at least try programming in a language like this? A list of fragments of the literal string before and after each variable reference: “INSERT INTO … VALUES(”, “, ”, and “)” in this case. A list of the (evaluated) expression values. In this case, that is the value of the widgetName variable and the result of adding 1 to widgetCount.

HTML

JavaScript

Database