Zig's Lovely Syntax
It’s a bit of a silly post, because syntax is the least interesting detail about the language, but, still, I can’t stop thinking how Zig gets this detail just right for the class of curly-braced languages, and, well, now you’ll have to think about that too. On the first glance, Zig looks almost exactly like Rust, because Zig borrows from Rust liberally. And I think that Rust has great syntax, considering all the semantics it needs to express (see “Rust’s Ugly Syntax” ). But Zig improves on that, mostly by leveraging simpler language semantics, but also through some purely syntactical tasteful decisions. How do you spell a number ninety-two? Easy, . But what type is that? Statically-typed languages often come with several flavors of integers: , , . And there’s often a syntax for literals of a particular types: , , . Zig doesn’t have suffixes, because, in Zig, all integer literals have the same type: : The value of an integer literal is known at compile time and is coerced to a specific type on assignment or ascription: To emphasize, this is not type inference, this is implicit comptime coercion. This does mean that code like generally doesn’t work, and requires an explicit type. Raw or multiline strings are spelled like this: This syntax doesn’t require a special form for escaping itself: It nicely dodges indentation problems that plague every other language with a similar feature. And, the best thing ever: lexically, each line is a separate token. As Zig has only line-comments, this means that is always whitespace. Unlike most other languages, Zig can be correctly lexed in a line-by-line manner. Raw strings is perhaps the biggest improvement of Zig over Rust. Rust brute-forces the problem with syntax, which does the required job, technically, but suffers from the mentioned problems: indentation is messy, nesting quotes requires adjusting hashes, unclosed raw literal breaks the following lexical structure completely, and rustfmt’s formatting of raw strings tends to be rather ugly. On the plus side, this syntax at least cannot be expressed by a context-free grammar! For the record, Zig takes C syntax (not that C would notice): The feels weird! It will make sense by the end of the post. Here, I want only to note part, which matches the assignment syntax . This is great! This means that grepping for gives you all instances where a field is written to. This is hugely valuable: most of usages are reads, but, to understand the flow of data, you only need to consider writes. Ability to mechanically partition the entire set of usages into majority of boring reads and a few interesting writes does wonders for code comprehension. Where Zig departs from C the most is the syntax for types. C uses a needlessly confusing spiral rule. In Zig, all types are prefix: While pointer type is prefix, pointer dereference is postfix, which is a more natural subject-verb order to read: Zig has general syntax for “raw” identifiers: It is useful to avoid collisions with keywords, or for exporting a symbol whose name is otherwise not a valid Zig identifier. It is a bit more to type than Kotlin’s delightful , but manages to re-use Zig’s syntax for built-ins ( ) and strings. Like, Rust, Zig goes for function declaration syntax. This is such a massive improvement over C/Java style function declarations: it puts token (which is completely absent in traditional C family) and function name next to each other, which means that textual search for allows you to quickly find the function. Then Zig adds a little twist. While in Rust we write The arrow is gone! Now that I’ve used this for some time, I find arrow very annoying to type, and adding to the visual noise. Rust needs the arrow: Rust has lambdas with an inferred return type, and, in a lambda, the return type is optional. So you need some sort of an explicit syntax to tell the parser if there is return type: And it’s understandable that lambdas and functions would want to use compatible syntax. But Zig doesn’t have lambdas, so it just makes the type mandatory. So the main is Related small thing, but, as name of the type, I think I like more than . Zig is using and for binding values to names: This is ok, a bit weird after Rust’s, whose would be in Zig, but not really noticeable after some months. I do think this particular part is not great, because , the more frequent one, is longer. I think Kotlin nails it: , , . Note all three are monosyllable, unlike and ! Number of syllables matters more than the number of letters! Like Rust, Zig uses syntax for ascribing types, which is better than because optional suffixes are easier to parse visually and mechanically than optional prefixes. Zig doesn’t use and and spells the relevant operators as and : This is easier to type and much easier to read, but there’s also a deeper reason why they are not sigils. Zig marks any control flow with a keyword. And, because boolean operators short-circuit, they are control flow! Treating them as normal binary operator leads to an entirely incorrect mental model. For bitwise operations, Zig of course uses and . Both Zig and Rust have statements and expressions. Zig is a bit more statement oriented, and requires explicit returns: Furthermore, because there are no lambdas, scope of return is always clear. Relatedly, the value of a block expression is void. A block is a list of statements, and doesn’t have an optional expression at the end. This removes the semicolon problem — while Rust rules around semicolons are sufficiently clear (until you get to macros), there’s some constant mental overhead to getting them right all the time. Zig is more uniform and mechanical here. If you need a block that yields a value, Zig supports a general syntax for breaking out of a labeled block: Rust makes pedantically correct choice regarding s: braces are mandatory: This removes the dreaded “dangling else” grammatical ambiguity. While theoretically nice, it makes -expression one-line feel too heavy. It’s not the braces, it’s the whitespace around them: But the ternary is important! Exploding a simple choice into multi-line condition hurts readability. Zig goes with the traditional choice of making parentheses required and braces optional: By itself, this does create a risk of style bugs. But in Zig formatter (non-configurable, user-directed) is a part of the compiler, and formatting errors that can mask bugs are caught during compilation. For example, is an error due to inconsistent whitespace around the minus sign, which signals a plausible mixup of infix and binary minus. No such errors are currently produced for incorrect indentation (the value add there is relatively little, given ), but this is planned. NB: because Rust requires branches to be blocks, it is forced to make synonym with . Otherwise, the ternary would be even more unusable! Syntax design is tricky! Whether you need s and whether you make or mandatory in ifs are not orthogonal! Like Python, Zig allows on loops. Unlike Python, loops are expressions, which leads to a nicely readable imperative searches: Zig doesn’t have syntactically-infinite loop like Rust’s or Go’s . Normally I’d consider that a drawback, because these loops produce different control flow, affecting reachability analysis in the compiler, and I don’t think it’s great to make reachability dependent on condition being visibly constant. But! As Zig places semantics front and center, and the rules for what is and isn’t a comptime constant are a backbone of every feature, “anything equivalent to ” becomes sufficiently precise. Incidentally, these days I tend to write “infinite” loops as Almost always there is an up-front bound for the number of iterations until the break, and its worth asserting this bound, because debugging crashes is easier than debugging hangs. , , , , and all use the same Ruby/Rust inspired syntax for naming captured values: I like how the iterator comes first, and then the name of an item follows, logically and syntactically. I have a very strong opinion about variable shadowing. It goes both ways: I spent hours debugging code which incorrectly tried to use a variable that was shadowed by something else, but I also spent hours debugging code that accidentally used a variable that should have been shadowed! I really don’t know whether on balance it is better to forbid or encourage shadowing! Zig of course forbids shadowing, but what’s curious is that it’s just one episode of the large crusade against any complexity in name resolution. There’s no “prelude”, if you want to use anything from std, you need to import it: There are no glob imports, if you want to use an item from std, you need to import it: Zig doesn’t have inheritance, mixins, argument-dependent lookup, extension functions, implicit or traits, so, if you see , that is guaranteed to be a boring method declared on type. Similarly, while Zig has powerful comptime capabilities, it intentionally disallows declaring methods at compile time. Like Rust, Zig used to allow a method and a field to share a name, because it actually is syntactically clear enough at the call site which is which. But then this feature got removed from Zig. More generally, Zig doesn’t have namespaces. There can be only one kind of in scope, while Rust allows things like I am astonished at the relative lack of inconvenience in Zig’s approach. Turns out that is all the syntax you’ll ever need for accessing things? For the historically inclined, see “The module naming situation” thread in the rust mailing list archive to learn the story of how rust got its syntax. The lack of namespaces touches on the most notable (by its absence) feature of Zig syntax, which deeply relates to the most profound aspect of Zig’s semantics. Everything is an expression. By which I mean, there’s no separate syntactic categories of values, types, and patterns. Values, types, and patterns are of course different things. And usually in the language grammar it is syntactically obvious whether a particular text fragment refers to a type or a value: So the standard way is to have separate syntax families for the three categories, which need to be internally unambiguous, but can be ambiguous across the categories because the place in the grammar dictates the category: when parsing , everything until is a pattern, stuff between and is a type, and after we have a value. There are two problems here. First, there’s a combinatorial explosion of sorts in the syntax, because, while three categories describe different things, it turns out that they have the same general tree-ish shape. The second problem is that it might be hard to maintain category separation in the grammar. Rust started with the three categories separated by a bright line. But then, changes happen. Originally, Rust only allowed syntax for assignment. But today you can also write to do unpacking like Similarly, the turbofish used to move the parser from the value to the type mode, but now const parameters are values that can be found in the type position! The alternative is not to pick this fight at all. Rather than trying to keep the categories separately in the syntax, use the same surface syntax to express all three, and categorize later, during semantic analysis. In fact, this is already happens in the example — these are different things! One is a place (lvalue) and another is a “true” value (rvalue), but we use the same syntax for both. I don’t think such syntactic unification necessarily implies semantic unification, but Zig does treat everything uniformly, as a value with comptime and runtime behavior (for some values, runtime behavior may be missing, for others — comptime): The fact that you can write an where a type goes is occasionally useful. But the fact that simple types look like simple values syntactically consistently make the language feel significantly less busy. As a special case of everything being an expression, instances of generic types look like this: Just a function call! Though, there’s some resistance to trickery involved to make this work. Usually, languages rely on type inference to allow eliding generic arguments. That in turn requires making argument syntax optional, and that in turn leads to separating generic and non-generic arguments into separate parameter lists and some introducer sigil for generics, like or . Zig solves this syntactic challenge in the most brute-force way possible. Generic parameters are never inferred, if a function takes 3 comptime arguments and 2 runtime arguments, it will always be called with 5 arguments syntactically. Like with the (absence of) importing flourishes, a reasonable reaction would be “wait, does this mean that I’ll have to specify the types all the time?” And, like with import, in practice this is a non-issue. The trick are comptime closures. Consider a generic : We have to specify type when creating an instance of an . But subsequently, when we are using the array list, we don’t have to specify the type parameter again, because the type of variable already closes over . This is the major truth of object-orienting programming, the truth so profound that no one even notices it: in real code, 90% of functions are happiest as (non-virtual) methods. And, because of that, the annotation burden in real-world Zig programs is low. While Zig doesn’t have Hindley-Milner constraint-based type inference, it relies heavily on one specific way to propagate types. Let’s revisit the first example: This doesn’t compile: and are different values, we can’t select between two at runtime because they are different. We need to coerce the constants to a specific runtime type: But this doesn’t kick the can sufficiently far enough and essentially reproduces the with two incompatible branches. We need to sink coercion down the branches: And that’s exactly how Zig’s “Result Location Semantics” works. Type “inference” runs a simple left-to-right tree-walking algorithm, which resembles interpreter’s . In fact, is exactly what happens. Zig is not a compiler, it is an interpreter. When evaluates an expression, it gets: When interpreting code like the interpreter passes the result location ( ) and type down the tree of subexpressions. If branches store result directly into object field (there’s a inside each branch, as opposed to one after the ), and each coerces its comptime constant to the appropriate runtime type of the result. This mechanism enables concise syntax for specifying enums: When evaluates the switch, it first evaluates the scrutinee, and realizes that it has type . When evaluating arm, it sets result type to for the condition, and a literal gets coerced to . The same happens for the second arm, where result type further sinks down the . Result type semantics also explains the leading dot in the record literal syntax: Syntactically, we just want to disambiguate records from blocks. But, semantically, we want to coerce the literal to whatever type we want to get out of this expression. In Zig, is a shorthand for . I must confess that did weird me out a lot at first during writing code (I don’t mind reading the dot). It’s not the easiest thing to type! But that was fixed once I added snippet, expanding to . The benefits to lightweight record literal syntax are huge, as they allow for some pretty nice APIs. In particular, you get named and default arguments for free: I don’t really miss the absence of named arguments in Rust, you can always design APIs without them. But they are free in Zig, so I use them liberally. Syntax wise, we get two features (calling functions and initializing objects) for the price of one! Finally, the thing that weirds out some people when they see Zig code, and makes others reconsider their choice GitHub handles, even when they haven’t seen any Zig: syntax for built-in functions. Every language needs to glue “userspace” code with primitive operations supported by the compiler. Usually, the gluing is achieved by making the standard library privileged and allowing it to define intrinsic functions without bodies, or by adding ad-hoc operators directly to the language (like Rust’s ). And Zig does have a fair amount of operators, like or . But the release valve for a lot of functionality are built-in functions in distinct syntactic namespace, so Zig separates out , , , , , , , , , and . There’s no need to overload casting when you can give each variant a name. There’s also for type ascription. The types goes first, because the mechanism here is result type semantics: evaluates the first argument as a type, and then uses that as the type for the second argument. Curiously, I think actually can be implemented in the userspace: In Zig, a type of function parameter may depend on values of preceding (comptime) ones! My favorite builtin is . First, it’s the most obvious way to import code: Its crystal clear where the file comes from. But, second, it is an instance of reverse syntax sugar. You see, import isn’t really a function. You can’t do The argument of has to be a string, syntactically. It really is syntax, except that the function-call form is re-used, because it already has the right shape. So, this is it. Just a bunch of silly syntactical decisions, which add up to a language which is positively enjoyable to read. As for big lessons, obviously, the less features your language has, the less syntax you’ll need. And less syntax is generally good, because varied syntactic constructs tend to step on each other toes. Languages are not combinations of orthogonal aspects. Features tug and pull the language in different directions and their combinations might turn to be miraculous features in their own right, or might drag the language down. Even with a small feature-set fixed, there’s still a lot of work to pick a good concrete syntax: unambiguous to parse, useful to grep, easy to read and not to painful to write. A smart thing is of course to steal and borrow solutions from other languages, not because of familiarity, but because the ruthless natural selection tends to weed out poor ideas. But there’s a lot of inertia in languages, so there’s no need to fear innovation. If an odd-looking syntax is actually good, people will take to it. Is there anything about Zig’s syntax I don’t like? I thought no, when starting this post. But in the process of writing it I did discover one form that annoys me. It is the while with the increment loop: This is two-thirds of a C-style loop (without the declarator), and it sucks for the same reason: control flow jumps all over the place and is unrelated to the source code order. We go from condition, to the body, to the increment. But in the source order the increment is between the condition and the body. In Zig, this loop sucks for one additional reason: that separating the increment I think is the single example of control flow in Zig that is expressed by a sigil, rather than a keyword. This form used to be rather important, as Zig lacked a counting loop. It has form now, so I am tempted to call the while-with-increment redundant. Annoyingly, is almost equivalent to But not exactly: if contains a , or , the version would run the one extra time, which is useless and might be outright buggy. Oh well.