Text Diffusion Models are Faster at Writing Code
In this post, I run small experiments showing that diffusion language models generate code (and other structured text) at a faster rate. Increased stucture tends to correlate with reduced entropy, which leads to higher confident token predictions, which directly means more tokens decoded in parallel per step. 1 In speculative decoding (for autoregressive models), we speed up generation by using a smaller model to generate multiple tokens, which are then verified in parallel by a larger model. The core idea is that most tokens are easily predictable; thus, we should be able to use a smaller and faster model for them. The classic example is the following sentence: