Latest Posts (20 found)
Anton Zhiyanov 1 weeks ago

Porting Go's strings package to C

Creating a subset of Go that translates to C was never my end goal. I liked writing C code with Go, but without the standard library it felt pretty limited. So, the next logical step was to port Go's stdlib to C. Of course, this isn't something I could do all at once. I started with the io package , which provides core abstractions like and , as well as general-purpose functions like . But isn't very interesting on its own, since it doesn't include specific reader or writer implementations. So my next choices were naturally and — the workhorses of almost every Go program. This post is about how the porting process went. Bits and UTF-8 • Bytes • Allocators • Buffers and builders • Benchmarks • Optimizing search • Optimizing builder • Wrapping up Before I could start porting , I had to deal with its dependencies first: Both of these packages are made up of pure functions, so they were pretty easy to port. The only minor challenge was the difference in operator precedence between Go and C — specifically, bit shifts ( , ). In Go, bit shifts have higher precedence than addition and subtraction. In C, they have lower precedence: The simplest solution was to just use parentheses everywhere shifts are involved: With and done, I moved on to . The package provides functions for working with byte slices: Some of them were easy to port, like . Here's how it looks in Go: And here's the C version: Just like in Go, the ( → ) macro doesn't allocate memory; it just reinterprets the byte slice's underlying storage as a string. The function (which works like in Go) is easy to implement using from the libc API. Another example is the function, which looks for a specific byte in a slice. Here's the pure-Go implementation: And here's the C version: I used a regular C loop to mimic Go's : But and don't allocate memory. What should I do with , since it clearly does? I had a decision to make. The Go runtime handles memory allocation and deallocation automatically. In C, I had a few options: An allocator is a tool that reserves memory (typically on the heap) so a program can store its data structures there. See Allocators from C to Zig if you want to learn more about them. For me, the winner was clear. Modern systems programming languages like Zig and Odin clearly showed the value of allocators: An is an interface with three methods: , , and . In C, it translates to a struct with function pointers: As I mentioned in the post about porting the io package , this interface representation isn't as efficient as using a static method table, but it's simpler. If you're interested in other options, check out the post on interfaces . By convention, if a function allocates memory, it takes an allocator as its first parameter. So Go's : Translates to this C code: If the caller doesn't care about using a specific allocator, they can just pass an empty allocator, and the implementation will use the system allocator — , , and from libc. Here's a simplified version of the system allocator (I removed safety checks to make it easier to read): The system allocator is stateless, so it's safe to have a global instance: Here's an example of how to call with an allocator: Way better than hidden allocations! Besides pure functions, and also provide types like , , and . I ported them using the same approach as with functions. For types that allocate memory, like , the allocator becomes a struct field: The code is pretty wordy — most C developers would dislike using instead of something shorter like . My solution to this problem is to automatically translate Go code to C (which is actually what I do when porting Go's stdlib). If you're interested, check out the post about this approach — Solod: Go can be a better C . Types that don't allocate, like , need no special treatment — they translate directly to C structs without an allocator field. The package is the twin of , so porting it was uneventful. Here's usage example in Go and C side by side: Again, the C code is just a more verbose version of Go's implementation, plus explicit memory allocation. What's the point of writing C code if it's slow, right? I decided it was time to benchmark the ported C types and functions against their Go versions. To do that, I ported the benchmarking part of Go's package. Surprisingly, the simplified version was only 300 lines long and included everything I needed: Here's a sample benchmark for the type: Reads almost like Go's benchmarks. To monitor memory usage, I created — a memory allocator that wraps another allocator and keeps track of allocations: The benchmark gets an allocator through the function and wraps it in a to keep track of allocations: There's no auto-discovery, but the manual setup is quite straightforward. With the benchmarking setup ready, I ran benchmarks on the package. Some functions did well — about 1.5-2x faster than their Go equivalents: But (searching for a substring in a string) was a total disaster — it was nearly 20 times slower than in Go: The problem was caused by the function we looked at earlier: This "pure" Go implementation is just a fallback. On most platforms, Go uses a specialized version of written in assembly. For the C version, the easiest solution was to use , which is also optimized for most platforms: With this fix, the benchmark results changed drastically: Still not quite as fast as Go, but it's close. Honestly, I don't know why the -based implementation is still slower than Go's assembly here, but I decided not to pursue it any further. After running the rest of the function benchmarks, the ported versions won all of them except for two: Benchmarking details is a common way to compose strings from parts in Go, so I tested its performance too. The results were worse than I expected: Here, the C version performed about the same as Go, but I expected it to be faster. Unlike , is written entirely in Go, so there's no reason the ported version should lose in this benchmark. The method looked almost identical in Go and C: Go's automatically grows the backing slice, while does it manually ( , on the contrary, doesn't grow the slice — it's merely a wrapper). So, there shouldn't be any difference. I had to investigate. Looking at the compiled binary, I noticed a difference in how the functions returned results. Go returns multiple values in separate registers, so uses three registers: one for 8-byte , two for the interface (implemented as two 8-byte pointers). But in C, was a single struct made up of two unions and a pointer: Of course, this 56-byte monster can't be returned in registers — the C calling convention passes it through memory instead. Since is on the hot path in the benchmark, I figured this had to be the issue. So I switched from a single monolithic type to signature-specific types for multi-return pairs: Now, the implementation in C looked like this: is only 16 bytes — small enough to be returned in two registers. Problem solved! But it wasn't — the benchmark only showed a slight improvement. After looking into it more, I finally found the real issue: unlike Go, the C compiler wasn't inlining calls. Adding and moving to the header file made all the difference: 2-4x faster. That's what I was hoping for! Porting and was a mix of easy parts and interesting challenges. The pure functions were straightforward — just translate the syntax and pay attention to operator precedence. The real design challenge was memory management. Using allocators turned out to be a good solution, making memory allocation clear and explicit without being too difficult to use. The benchmarks showed that the C versions outperformed Go in most cases, sometimes by 2-4x. The only exceptions were and , where Go relies on hand-written assembly. The optimization was an interesting challenge: what seemed like a return-type issue was actually an inlining problem, and fixing it gave a nice speed boost. There's a lot more of Go's stdlib to port. In the next post, we'll cover — a very unique Go package. In the meantime, if you'd like to write Go that translates to C — with no runtime and manual memory management — I invite you to try Solod . The and packages are included, of course. implements bit counting and manipulation functions. implements functions for UTF-8 encoded text. Loop over the slice indexes with ( is a macro that returns , similar to Go's built-in). Access the i-th byte with (a bounds-checking macro that returns ). Use a reliable garbage collector like Boehm GC to closely match Go's behavior. Allocate memory with libc's and have the caller free it later with . Introduce allocators. It's obvious whether a function allocates memory or not: if it has an allocator as a parameter, it allocates. It's easy to use different allocation methods: you can use for one function, an arena for another, and a stack allocator for a third. It helps with testing and debugging: you can use a tracking allocator to find memory leaks, or a failing allocator to test error handling. Figuring out how many iterations to run. Running the benchmark function in a loop. Recording metrics (ns/op, MB/s, B/op, allocs/op). Reporting the results.

0 views
Anton Zhiyanov 3 weeks ago

Porting Go's io package to C

Creating a subset of Go that translates to C was never my end goal. I liked writing C code with Go, but without the standard library it felt pretty limited. So, the next logical step was to port Go's stdlib to C. Of course, this isn't something I could do all at once. So I started with the standard library packages that had the fewest dependencies, and one of them was the package. This post is about how that went. io package • Slices • Multiple returns • Errors • Interfaces • Type assertion • Specialized readers • Copy • Wrapping up is one of the core Go packages. It introduces the concepts of readers and writers , which are also common in other programming languages. In Go, a reader is anything that can read some raw data (bytes) from a source into a slice: A writer is anything that can take some raw data from a slice and write it to a destination: The package defines many other interfaces, like and , as well as combinations like and . It also provides several functions, the most well-known being , which copies all data from a source (represented by a reader) to a destination (represented by a writer): C, of course, doesn't have interfaces. But before I get into that, I had to make several other design decisions. In general, a slice is a linear container that holds N elements of type T. Typically, a slice is a view of some underlying data. In Go, a slice consists of a pointer to a block of allocated memory, a length (the number of elements in the slice), and a capacity (the total number of elements that can fit in the backing memory before the runtime needs to re-allocate): Interfaces in the package work with fixed-length slices (readers and writers should never append to a slice), and they only use byte slices. So, the simplest way to represent this in C could be: But since I needed a general-purpose slice type, I decided to do it the Go way instead: Plus a bound-checking helper to access slice elements: Usage example: So far, so good. Let's look at the method again: It returns two values: an and an . C functions can only return one value, so I needed to figure out how to handle this. The classic approach would be to pass output parameters by pointer, like or . But that doesn't compose well and looks nothing like Go. Instead, I went with a result struct: The union can store any primitive type, as well as strings, slices, and pointers. The type combines a value with an error. So, our method (let's assume it's just a regular function for now): Translates to: And the caller can access the result like this: For the error type itself, I went with a simple pointer to an immutable string: Plus a constructor macro: I wanted to avoid heap allocations as much as possible, so decided not to support dynamic errors. Only sentinel errors are used, and they're defined at the file level like this: Errors are compared by pointer identity ( ), not by string content — just like sentinel errors in Go. A error is a pointer. This keeps error handling cheap and straightforward. This was the big one. In Go, an interface is a type that specifies a set of methods. Any concrete type that implements those methods satisfies the interface — no explicit declaration needed. In C, there's no such mechanism. For interfaces, I decided to use "fat" structs with function pointers. That way, Go's : Becomes an struct in C: The pointer holds the concrete value, and each method becomes a function pointer that takes as its first argument. This is less efficient than using a static method table, especially if the interface has a lot of methods, but it's simpler. So I decided it was good enough for the first version. Now functions can work with interfaces without knowing the specific implementation: Calling a method on the interface just goes through the function pointer: Go's interface is more than just a value wrapper with a method table. It also stores type information about the value it holds: Since the runtime knows the exact type inside the interface, it can try to "upgrade" the interface (for example, a regular ) to another interface (like ) using a type assertion : The last thing I wanted to do was reinvent Go's dynamic type system in C, so dropping this feature was an easy decision. There's another kind of type assertion, though — when we unwrap the interface to get the value of a specific type: And this kind of assertion is quite possible in C. All we have to do is compare function pointers: If two different types happened to share the same method implementation, this would break. In practice, each concrete type has its own methods, so the function pointer serves as a reliable type tag. After I decided on the interface approach, porting the actual types was pretty easy. For example, wraps a reader and stops with EOF after reading N bytes: The logic is straightforward: if there are no bytes left, return EOF. Otherwise, if the buffer is bigger than the remaining size, shorten it. Then, call the underlying reader, and decrease the remaining size. Here's what the ported C code looks like: A bit more verbose, but nothing special. The multiple return values, the interface call with , and the slice handling are all implemented as described in previous sections. is where everything comes together. Here's the simplified Go version: In Go, allocates its buffer on the heap with . I could take a similar approach in C — make take an allocator and use it to create the buffer like this: But since this is just a temporary buffer that only exists during the function call, I decided stack allocation was a better choice: allocates memory on a stack with a bounds-checking macro that wraps C's . It moves the stack pointer and gives you a chunk of memory that's automatically freed when the function returns. People often avoid using because it can cause a stack overflow, but using a bounds-checking wrapper fixes this issue. Another common concern with is that it's not block-scoped — the memory stays allocated until the function exits. However, since we only allocate once, this isn't a problem. Here's the simplified C version of : Here, you can see all the parts from this post working together: a function accepting interfaces, slices passed to interface methods, a result type wrapping multiple return values, error sentinels compared by identity, and a stack-allocated buffer used for the copy. Porting Go's package to C meant solving a few problems: representing slices, handling multiple return values, modeling errors, and implementing interfaces using function pointers. None of this needed anything fancy — just structs, unions, functions, and some macros. The resulting C code is more verbose than Go, but it's structurally similar, easy enough to read, and this approach should work well for other Go packages too. The package isn't very useful on its own — it mainly defines interfaces and doesn't provide concrete implementations. So, the next two packages to port were naturally and — I'll talk about those in the next post. In the meantime, if you'd like to write Go that translates to C — with no runtime and manual memory management — I invite you to try Solod . The package is included, of course.

0 views
Anton Zhiyanov 3 weeks ago

Solod: Go can be a better C

I'm working on a new programming language named Solod ( So ). It's a strict subset of Go that translates to C, without hidden memory allocations and with source-level interop. Highlights: So supports structs, methods, interfaces, slices, multiple returns, and defer. To keep things simple, there are no channels, goroutines, closures, or generics. So is for systems programming in C, but with Go's syntax, type safety, and tooling. Hello world • Language tour • Compatibility • Design decisions • FAQ • Final thoughts This Go code in a file : Translates to a header file : Plus an implementation file : In terms of features, So is an intersection between Go and C, making it one of the simplest C-like languages out there — on par with Hare. And since So is a strict subset of Go, you already know it if you know Go. It's pretty handy if you don't want to learn another syntax. Let's briefly go over the language features and see how they translate to C. Variables • Strings • Arrays • Slices • Maps • If/else and for • Functions • Multiple returns • Structs • Methods • Interfaces • Enums • Errors • Defer • C interop • Packages So supports basic Go types and variable declarations: is translated to ( ), to ( ), and to ( ). is not treated as an interface. Instead, it's translated to . This makes handling pointers much easier and removes the need for . is translated to (for pointer types). Strings are represented as type in C: All standard string operations are supported, including indexing, slicing, and iterating with a for-range loop. Converting a string to a byte slice and back is a zero-copy operation: Converting a string to a rune slice and back allocates on the stack with : There's a stdlib package for heap-allocated strings and various string operations. Arrays are represented as plain C arrays ( ): on arrays is emitted as compile-time constant. Slicing an array produces a . Slices are represented as type in C: All standard slice operations are supported, including indexing, slicing, and iterating with a for-range loop. As in Go, a slice is a value type. Unlike in Go, a nil slice and an empty slice are the same thing: allocates a fixed amount of memory on the stack ( ). only works up to the initial capacity and panics if it's exceeded. There's no automatic reallocation; use the stdlib package for heap allocation and dynamic arrays. Maps are fixed-size and stack-allocated, backed by parallel key/value arrays with linear search. They are pointer-based reference types, represented as in C. No delete, no resize. Only use maps when you have a small, fixed number of key-value pairs. For anything else, use heap-allocated maps from the package (planned). Most of the standard map operations are supported, including getting/setting values and iterating with a for-range loop: As in Go, a map is a pointer type. A map emits as in C. If-else and for come in all shapes and sizes, just like in Go. Standard if-else with chaining: Init statement (scoped to the if block): Traditional for loop: While-style loop: Range over an integer: Regular functions translate to C naturally: Named function types become typedefs: Exported functions (capitalized) become public C symbols prefixed with the package name ( ). Unexported functions are . Variadic functions use the standard syntax and translate to passing a slice: Function literals (anonymous functions and closures) are not supported. So supports two-value multiple returns in two patterns: and . Both cases translate to C type: Named return values are not supported. Structs translate to C naturally: works with types and values: Methods are defined on struct types with pointer or value receivers: Pointer receivers pass in C and cast to the struct pointer. Value receivers pass the struct by value, so modifications operate on a copy: Calling methods on values and pointers emits pointers or values as necessary: Methods on named primitive types are also supported. Interfaces in So are like Go interfaces, but they don't include runtime type information. Interface declarations list the required methods: In C, an interface is a struct with a pointer and function pointers for each method (less efficient than using a static method table, but simpler; this might change in the future): Just as in Go, a concrete type implements an interface by providing the necessary methods: Passing a concrete type to functions that accept interfaces: Type assertion works for concrete types ( ), but not for interfaces ( ). Type switch is not supported. Empty interfaces ( and ) are translated to . So supports typed constant groups as enums: Each constant is emitted as a C : is supported for integer-typed constants: Iota values are evaluated at compile time and translated to integer literals: Errors use the type (a pointer): So only supports sentinel errors, which are defined at the package level using (implemented as compiler built-in): Errors are compared using . This is an O(1) operation (compares pointers, not strings): Dynamic errors ( ), local error variables ( inside functions), and error wrapping are not supported. schedules a function or method call to run at the end of the enclosing scope. The scope can be either a function (as in Go): Or a bare block (unlike Go): Deferred calls are emitted inline (before returns, panics, and scope end) in LIFO order: Defer is not supported inside other scopes like or . Include a C header file with : Declare an external C type (excluded from emission) with : Declare an external C function (no body or ): When calling extern functions, and arguments are automatically decayed to their C equivalents: string literals become raw C strings ( ), string values become , and slices become raw pointers. This makes interop cleaner: The decay behavior can be turned off with the flag: The package includes helpers for converting C pointers back to So string and slice types. The package is also available and is implemented as compiler built-ins. Each Go package is translated into a single + pair, regardless of how many files it contains. Multiple files in the same package are merged into one file, separated by comments. Exported symbols (capitalized names) are prefixed with the package name: Unexported symbols (lowercase names) keep their original names and are marked : Exported symbols are declared in the file (with for variables). Unexported symbols only appear in the file. Importing a So package translates to a C : Calling imported symbols uses the package prefix: That's it for the language tour! So generates C11 code that relies on several GCC/Clang extensions: You can use GCC, Clang, or to compile the transpiled C code. MSVC is not supported. Supported operating systems: Linux, macOS, and Windows (partial support). So is highly opinionated. Simplicity is key . Fewer features are always better. Every new feature is strongly discouraged by default and should be added only if there are very convincing real-world use cases to support it. This applies to the standard library too — So tries to export as little of Go's stdlib API as possible while still remaining highly useful for real-world use cases. No heap allocations are allowed in language built-ins (like maps, slices, new, or append). Heap allocations are allowed in the standard library, but they must clearly state when an allocation happens and who owns the allocated data. Fast and easy C interop . Even though So uses Go syntax, it's basically C with its own standard library. Calling C from So, and So from C, should always be simple to write and run efficiently. The So standard library (translated to C) should be easy to add to any C project. Readability . There are several languages that claim they can transpile to readable C code. Unfortunately, the C code they generate is usually unreadable or barely readable at best. So isn't perfect in this area either (though it's arguably better than others), but it aims to produce C code that's as readable as possible. Go compatibility . So code is valid Go code. No exceptions. Raw performance . You can definitely write C code by hand that runs faster than code produced by So. Also, some features in So, like interfaces, are currently implemented in a way that's not very efficient, mainly to keep things simple. Hiding C entirely . So is a cleaner way to write C, not a replacement for it. You should know C to use So effectively. Go feature parity . Less is more. Iterators aren't coming, and neither are generic methods. I have heard these several times, so it's worth answering. Why not Rust/Zig/Odin/other language? Because I like C and Go. Why not TinyGo? TinyGo is lightweight, but it still has a garbage collector, a runtime, and aims to support all Go features. What I'm after is something even simpler, with no runtime at all, source-level C interop, and eventually, Go's standard library ported to plain C so it can be used in regular C projects. How does So handle memory? Everything is stack-allocated by default. There's no garbage collector or reference counting. The standard library provides explicit heap allocation in the package when you need it. Is it safe? So itself has few safeguards other than the default Go type checking. It will panic on out-of-bounds array access, but it won't stop you from returning a dangling pointer or forgetting to free allocated memory. Most memory-related problems can be caught with AddressSanitizer in modern compilers, so I recommend enabling it during development by adding to your . Can I use So code from C (and vice versa)? Yes. So compiles to plain C, therefore calling So from C is just calling C from C. Calling C from So is equally straightforward. Can I compile existing Go packages with So? Not really. Go uses automatic memory management, while So uses manual memory management. So also supports far fewer features than Go. Neither Go's standard library nor third-party packages will work with So without changes. How stable is this? Not for production at the moment. Where's the standard library? There is a growing set of high-level packages ( , , , ...). There are also low-level packages that wrap the libc API ( , , , ...). Check the links below for more details. Even though So isn't ready for production yet, I encourage you to try it out on a hobby project or just keep an eye on it if you like the concept. Further reading: Go in, C out. You write regular Go code and get readable C11 as output. Zero runtime. No garbage collection, no reference counting, no hidden allocations. Everything is stack-allocated by default. Heap is opt-in through the standard library. Native C interop. Call C from So and So from C — no CGO, no overhead. Go tooling works out of the box — syntax highlighting, LSP, linting and "go test". Binary literals ( ) in generated code. Statement expressions ( ) in macros. for package-level initialization. for local type inference in generated code. for type inference in generic macros. for and other dynamic stack allocations. Installation and usage So by example Language description Stdlib description Source code

0 views
Anton Zhiyanov 2 months ago

Allocators from C to Zig

An allocator is a tool that reserves memory (typically on the heap) so a program can store its data structures there. Many C programs use the standard libc allocator, or at best, let you switch it out for another one like jemalloc or mimalloc. Unlike C, modern systems languages usually treat allocators as first-class citizens. Let's look at how they handle allocation and then create a C allocator following their approach. Rust • Zig • Odin • C3 • Hare • C • Final thoughts Rust is one of the older languages we'll be looking at, and it handles memory allocation in a more traditional way. Right now, it uses a global allocator, but there's an experimental Allocator API implemented behind a feature flag (issue #32838 ). We'll set the experimental API aside and focus on the stable one. The documentation begins with a clear statement: In a given program, the standard library has one "global" memory allocator that is used for example by and . Followed by a vague one: Currently the default global allocator is unspecified. It doesn't mean that a Rust program will abort an allocation, of course. In practice, Rust uses the system allocator as the global default (but the Rust developers don't want to commit to this, hence the "unspecified" note): The global allocator interface is defined by the trait in the module. It requires the implementor to provide two essential methods — and , and provides two more based on them — and : The struct describes a piece of memory we want to allocate — its size in bytes and alignment: Memory alignment Alignment restricts where a piece of data can start in memory. The memory address for the data has to be a multiple of a certain number, which is always a power of 2. Alignment depends on the type of data: CPUs are designed to read "aligned" memory efficiently. For example, if you read a 4-byte integer starting at address 0x03 (which is unaligned), the CPU has to do two memory reads — one for the first byte and another for the other three bytes — and then combine them. But if the integer starts at address 0x04 (which is aligned), the CPU can read all four bytes at once. Aligned memory is also needed for vectorized CPU operations (SIMD), where one processor instruction handles a group of values at once instead of just one. The compiler knows the size and alignment for each type, so we can use the constructor or helper functions to create a valid layout: Don't be surprised that a takes up 32 bytes. In Rust, the type can grow, so it stores a data pointer, a length, and a capacity (3 × 8 = 24 bytes). There's also 1 byte for the boolean and 7 bytes of padding (because of 8-byte alignment), making a total of 32 bytes. is the default memory allocator provided by the operating system. The exact implementation depends on the platform . It implements the trait and is used as the global allocator by default, but the documentation does not guarantee this (remember the "unspecified" note?). If you want to explicitly set as the global allocator, you can use the attribute: You can also set a custom allocator as global, like in this example: To use the global allocator directly, call the and functions: In practice, people rarely use or directly. Instead, they work with types like , or that handle allocation for them: The allocator doesn't abort if it can't allocate memory; instead, it returns (which is exactly what recommends): The documentation recommends using the function to signal out-of-memory errors. It immediately aborts the process, or panics if the binary isn't linked to the standard library. Unlike the low-level function, types like or call if allocation fails, so the program usually aborts if it runs out of memory: Allocator API • Memory allocation APIs Memory management in Zig is explicit. There is no default global allocator, and any function that needs to allocate memory accepts an allocator as a separate parameter. This makes the code a bit more verbose, but it matches Zig's goal of giving programmers as much control and transparency as possible. An allocator in Zig is a struct with an opaque self-pointer and a method table with four methods: Unlike Rust's allocator methods, which take a raw pointer and a size as arguments, Zig's allocator methods take a slice of bytes ( ) — a type that combines both a pointer and a length. Another interesting difference is the optional parameter, which is the first return address in the allocation call stack. Some allocators, like the , use it to keep track of which function requested memory. This helps with debugging issues related to memory allocation. Just like in Rust, allocator methods don't return errors. Instead, and return if they fail. Zig also provides type-safe wrappers that you can use instead of calling the allocator methods directly: Unlike the allocator methods, these allocation functions return an error if they fail. If a function or method allocates memory, it expects the developer to provide an allocator instance: Zig's standard library includes several built-in allocators in the namespace. asks the operating system for entire pages of memory, each allocation is a syscall: allocates memory into a fixed buffer and doesn't make any heap allocations: wraps a child allocator and allows you to allocate many times and only free once: The call frees all memory. Individual calls are no-ops. (aka ) is a safe allocator that can prevent double-free, use-after-free and can detect leaks: is a general-purpose thread-safe allocator designed for maximum performance on multithreaded machines: is a wrapper around the libc allocator: Zig doesn't panic or abort when it can't allocate memory. An allocation failure is just a regular error that you're expected to handle: Allocators • std.mem.Allocator • std.heap Odin supports explicit allocators, but, unlike Zig, it's not the only option. In Odin, every scope has an implicit variable that provides a default allocator: If you don't pass an allocator to a function, it uses the one currently set in the context. An allocator in Odin is a struct with an opaque self-pointer and a single function pointer: Unlike other languages, Odin's allocator uses a single procedure for all allocation tasks. The specific action — like allocating, resizing, or freeing memory — is decided by the parameter. The allocation procedure returns the allocated memory (for and operations) and an error ( on success). Odin provides low-level wrapper functions in the package that call the allocator procedure using a specific mode: There are also type-safe builtins like / (for a single object) and / (for multiple objects) that you can use instead of the low-level interface: By default, all builtins use the context allocator, but you can pass a custom allocator as an optional parameter: To use a different allocator for a specific block of code, you can reassign it in the context: Odin's provides two different allocators: When using the temp allocator, you only need a single call to clear all the allocated memory. Odin's standard library includes several allocators, found in the and packages. The procedure returns a general-purpose allocator: uses a single backing buffer for allocations, allowing you to allocate many times and only free once: detects leaks and invalid memory access, similar to in Zig: There are also others, such as or . Like Zig, Odin doesn't panic or abort when it can't allocate memory. Instead, it returns an error code as the second return value: Allocators • base:runtime • core:mem Like Zig and Odin, C3 supports explicit allocators. Like Odin, C3 provides two default allocators: heap and temp. An allocator in C3 is a interface with an additional option of zeroing or not zeroing the allocated memory: Unlike Zig and Odin, the and methods don't take the (old) size as a parameter — neither directly like Odin nor through a slice like Zig. This makes it a bit harder to create custom allocators because the allocator has to keep track of the size along with the allocated memory. On the other hand, this approach makes C interop easier (if you use the default C3 allocator): data allocated in C can be freed in C3 without needing to pass the size parameter from the C code. Like in Odin, allocator methods return an error if they fail. C3 provides low-level wrapper macros in the module that call allocator methods: These either return an error (the -suffix macros) or abort if they fail. There are also functions and macros with similar names in the module that use the global allocator instance: If a function or method allocates memory, it often expects the developer to provide an allocator instance: C3 provides two thread-local allocator instances: There are functions and macros in the module that use the temporary allocator: To macro releases all temporary allocations when leaving the scope: Some types, like or , use the temp allocator by default if they are not initialized: C3's standard library includes several built-in allocators, found in the module. is a wrapper around libc's malloc/free: uses a single backing buffer for allocations, allowing you to allocate many times and only free once: detects leaks and invalid memory access: There are also others, such as or . Like Zig and Odin, C3 can return an error in case of allocation failure: C3 can also abort in case of allocation failure: Since the functions and macros in the module use instead of , it looks like aborting on failure is the preferred approach. Memory Handling • core::mem::alocator • core::mem Unlike other languages, Hare doesn't support explicit allocators. The standard library has multiple allocator implementations, but only one of them is used at runtime. Hare's compiler expects the runtime to provide and implementations: The programmer isn't supposed to access them directly (although it's possible by importing and calling or ). Instead, Hare uses them to provide higher-level allocation helpers. Hare offers two high-level allocation helpers that use the global allocator internally: and . can allocate individual objects. It takes a value, not a type: can also allocate slices if you provide a second parameter (the number of items): works correctly with both pointers to single objects (like ) and slices (like ). Hare's standard library has three built-in memory allocators: The allocator that's actually used is selected at compile time. Like other languages, Hare returns an error in case of allocation failure: You can abort on error with : Or propagate the error with : Dynamic memory allocation • malloc.ha Many C programs use the standard libc allocator, or at most, let you swap it out for another one using macros: Or using a simple setter: While this might work for switching the libc allocator to jemalloc or mimalloc, it's not very flexible. For example, trying to implement an arena allocator with this kind of API is almost impossible. Now that we've seen the modern allocator design in Zig, Odin, and C3 — let's try building something similar in C. There are a lot of small choices to make, and I'm going with what I personally prefer. I'm not saying this is the only way to design an allocator — it's just one way out of many. Our allocator should return an error instead of if it fails, so we'll need an error enum: The allocation function needs to return either a tagged union (value | error) or a tuple (value, error). Since C doesn't have these built in, let's use a custom tuple type: The next step is the allocator interface. I think Odin's approach of using a single function makes the implementation more complicated than it needs to be, so let's create separate methods like Zig does: This approach to interface design is explained in detail in a separate post: Interfaces in C . Zig uses byte slices ( ) instead of raw memory pointers. We could make our own byte slice type, but I don't see any real advantage to doing that in C — it would just mean more type casting. So let's keep it simple and stick with like our ancestors did. Now let's create generic and wrappers: I'm taking for granted here to keep things simple. A more robust implementation should properly check if it is available or pass the type to directly. We can even create a separate pair of helpers for collections: We could use some macro tricks to make and work for both a single object and a collection. But let's not do that — I prefer to avoid heavy-magic macros in this post. As for the custom allocators, let's start with a libc wrapper. It's not particularly interesting, since it ignores most of the parameters, but still: Usage example: Now let's use that field to implement an arena allocator backed by a fixed-size buffer: Usage example: As shown in the examples above, the allocation method returns an error if something goes wrong. While checking for errors might not be as convenient as it is in Zig or Odin, it's still pretty straightforward: Here's an informal table comparing allocation APIs in the languages we've discussed: In Zig, you always have to specify the allocator. In Odin, passing an allocator is optional. In C3, some functions require you to pass an allocator, while others just use the global one. In Hare, there's a single global allocator. As we've seen, there's nothing magical about the allocators used in modern languages. While they're definitely more ergonomic and safe than C, there's nothing stopping us from using the same techniques in plain C. on Unix platforms; on Windows; : alignment = 1. Can start at any address (0, 1, 2, 3...). : alignment = 4. Must start at addresses divisible by 4 (0, 4, 8, 12...). : alignment = 8. Must start at addresses divisible by 8 (0, 8, 16...). is for general-purpose allocations. It uses the operating system's heap allocator. is for short-lived allocations. It uses a scratch allocator (a kind of growing arena). is for general-purpose allocations. It uses a operating system's heap allocator (typically a libc wrapper). is for short-lived allocations. It uses an arena allocator. The default allocator is based on the algorithm from the Verified sequential malloc/free paper. The libc allocator uses the operating system's malloc and free functions from libc. The debug allocator uses a simple mmap-based method for memory allocation.

0 views
Anton Zhiyanov 2 months ago

(Un)portable defer in C

Modern system programming languages, from Hare to Zig, seem to agree that is a must-have feature. It's hard to argue with that, because makes it much easier to free memory and other resources correctly, which is crucial in languages without garbage collection. The situation in C is different. There was a N2895 proposal by Jens Gustedt and Robert Seacord in 2021, but it was not accepted for C23. Now, there's another N3734 proposal by JeanHeyd Meneide, which will probably be accepted in the next standard version. Since isn't part of the standard, people have created lots of different implementations. Let's take a quick look at them and see if we can find the best one. C23/GCC  • C11/GCC  • GCC/Clang  • MSVC  • Long jump  • STC  • Stack  • Simplified GCC/Clang  • Final thoughts Jens Gustedt offers this brief version: Usage example: This approach combines C23 attribute syntax ( ) with GCC-specific features: nested functions ( ) and the attribute. It also uses the non-standard macro (supported by GCC, Clang, and MSVC), which expands to an automatically increasing integer value. Nested functions and cleanup in GCC A nested function (also known as a local function) is a function defined inside another function: Nested functions can access variables from the enclosing scope, similar to closures in other languages, but they are not first-class citizens and cannot be passed around like function pointers. The attribute runs a function when the variable goes out of scope: The function should take one parameter, which is a pointer to a type that's compatible with the variable. If the function returns a value, it will be ignored. On the plus side, this version works just like you'd expect to work. On the downside, it's only available in C23+ and only works with GCC (not even Clang supports it, because of the nested function). We can easily adapt the above version to use C11: Usage example: The main downside remains: it's GCC-only. Clang fully supports the attribute, but it doesn't support nested functions. Instead, it offers the blocks extension, which works somewhat similar: We can use Clang blocks to make a version that works with both GCC and Clang: Usage example: Now it works with Clang, but there are several things to be aware of: On the plus side, this implementation works with both GCC and Clang. The downside is that it's still not standard C, and won't work with other compilers like MSVC. MSVC, of course, doesn't support the cleanup attribute. But it provides "structured exception handling" with the and keywords: The code in the block will always run, no matter how the block exits — whether it finishes normally, returns early, or crashes (for example, from a null pointer dereference). This isn't the we're looking for, but it's a decent alternative if you're only programming for Windows. There are well-known implementations by Jens Gustedt and moon-chilled that use and . I'm mentioning them for completeness, but honestly, I would never use them in production. The first one is extremely large, and the second one is extremely hacky. Also, I'd rather not use long jumps unless it's absolutely necessary. Still, here's a usage example from Gustedt's library: Here, all deferred statements run at the end of the guarded block, no matter how we exit the block (normally or through ). The stc library probably has the simplest implementation ever: Usage example: Here, the deferred statement is passed as and is used as the loop increment. The "defer-aware" block of code is the loop body. Since the increment runs after the body, the deferred statement executes after the main code. This approach works with all mainstream compilers, but it falls apart if you try to exit early with or : Dmitriy Kubyshkin provides a implementation that adds a "stack frame" of deferred calls to any function that needs them. Here's a simplified version: Usage example: This version works with all mainstream compilers. Also, unlike the STC version, defers run correctly in case of early exit: Unfortunately, there are some drawbacks: The Stack version above doesn't support deferring code blocks. In my opinion, that's not a problem, since most defers are just "free this resource" actions, which only need a single function call with one argument. If we accept this limitation, we can simplify the GCC/Clang version by dropping GCC's nested functions and Clang's blocks: Works like a charm: Personally, I like the simpler GCC/Clang version better. Not having MSVC support isn't a big deal, since we can run GCC on Windows or use the Zig compiler, which works just fine. But if I really need to support GCC, Clang, and MSVC — I'd probably go with the Stack version. Anyway, I don't think we need to wait for to be added to the C standard. We already have at home! We must compile with . We must put a after the closing brace in the deferred block: . If we need to modify a variable inside the block, the variable must be declared with : Defer only supports single-function calls, not code blocks. We always have to call at the start of the function and exit using . In the original implementation, Dmitriy overrides the keyword, but this won't compile with strict compile flags (which I think we should always use). The deferred function runs before the return value is evaluated, not after.

0 views
Anton Zhiyanov 2 months ago

Interfaces and traits in C

Everyone likes interfaces in Go and traits in Rust. Polymorphism without class-based hierarchies or inheritance seems to be the sweet spot. What if we try to implement this in C? Interfaces in Go  • Traits in Rust  • Toy example  • Interface definition  • Interface data  • Method table  • Method table in implementor  • Type assertions  • Final thoughts An interface in Go is a convenient way to define a contract for some useful behavior. Take, for example, the honored : Anything that can read data into a byte slice provided by the caller is a . Quite handy, because the code doesn't need to care where the data comes from — whether it's memory, the file system, or the network. All that matters is that it can read the data into a slice: We can provide any kind of reader: Go's interfaces are structural, which is similar to duck typing. A type doesn't need to explicitly state that it implements ; it just needs to have a method: The Go compiler and runtime take care of the rest: A trait in Rust is also a way to define a contract for certain behavior. Here's the trait: Unlike in Go, a type must explicitly state that it implements a trait: The Rust compiler takes care of the rest: Either way, whether it's Go or Rust, the caller only cares about the contract (defined as an interface or trait), not the specific implementation. Let's make an even simpler version of — one without any error handling (Go): Usage example: Let's see how we can do this in C! The main building blocks in C are structs and functions, so let's use them. Our will be a struct with a single field called . This field will be a pointer to a function with the right signature: To make fully dynamic, let's turn it into a struct with a function pointer (I know, I know — just bear with me): Here's the "method" implementation: The is pretty obvious: And, finally, the function: See how easy it is to turn a into a : all we need is . Pretty cool, right? Not really. Actually, this implementation is seriously flawed in almost every way (except for the definition). Memory overhead . Each instance has its own function pointers (8 bytes per function on a 64-bit system) as "methods", which isn't practical even if there are only a few of them. Regular objects should store data, not functions. Layout dependency . Converting from to like only works if both structures have the same field as their first member. If we try to implement another interface: Everything will fall apart: and have different layouts, so type conversion in ⓧ is invalid and causes undefined behavior. Lack of type safety . Using a as the receiver in means the caller can pass any type, and the compiler won't even show a warning: C isn't a particularly type-safe language, but this is just too much. Let's try something else. A better way is to store a reference to the actual object in the interface: We could have the method in the interface take a instead of a , but that would make the implementation more complicated without any real benefits. So, I'll keep it as . Then will only have its own fields: We can make the method type-safe: To make this work, we add a method that returns the instance wrapped in a interface: The and functions remain quite simple: This approach is much better than the previous one: Since our type now knows about the interface (through the method), our implementation is more like a basic version of a Rust trait than a true Go interface. For simplicity, I'll keep using the term "interface". There is one downside, though: each instance has its own function pointer for every interface method. Since only has one method, this isn't an issue. But if an interface has a dozen methods and the program uses a lot of these interface instances, it can become a problem. Let's fix this. Let's extract interface methods into a separate strucute — the method table. The interface references its methods though the field: and don't change at all: The method initializes the static method table and assigns it to the interface instance: The only difference in is that it calls the method on the interface indirectly using the method table ( instead of ): stays the same: Now the instance always has a single pointer field for its methods. So even for large interfaces, it only uses 16 bytes ( + fields). This approach also keeps all the benefits from the previous version: We can even add a separate helper so the client doesn't have to worry about implementation detail: There's another approach I've seen out there. I don't like it, but it's still worth mentioning for completeness. Instead of embedding the method table in the interface, we can place it in the implementation ( ): We initialize the method table in the constructor: now takes a pointer: And converts to with a simple type cast: This keeps pretty lightweight, only adding one extra field. But the cast only works because is the first field in . If we try to implement a second interface, things will break — just like in the very first solution. I think the "method table in the interface" approach is much better. Go has an function that copies data from a source (a reader) to a destination (a writer): There's an interesting comment in its documentation: If implements , the copy is implemented by calling . Otherwise, if implements , the copy is implemented by calling . Here's what the function looks like: is a type assertion that checks if the reader is not just a , but also implements the interface. The Go runtime handles these kinds of dynamic type checks. Can we do something like this in C? I'd prefer not to make it fully dynamic, since trying to recreate parts of the Go runtime in C probably isn't a good idea. What we can do is add an optional method to the interface: Then we can easily check if a given is also a : Still, this feels a bit like a hack. I'd rather avoid using type assertions unless it's really necessary. Interfaces (traits, really) in C are possible, but they're not as simple or elegant as in Go or Rust. The method table approach we discussed is a good starting point. It's memory-efficient, as type-safe as possible given C's limitations, and supports polymorphic behavior. Here's the full source code if you are interested: The struct is lean and doesn't have any interface-related fields. The method takes a instead of a . The cast from to is handled inside the method. We can implement multiple interfaces if needed. Lightweight structure. Easy conversion from to . Supports multiple interfaces.

0 views
Anton Zhiyanov 3 months ago

Go 1.26 interactive tour

Go 1.26 is coming out in February, so it's a good time to explore what's new. The official release notes are pretty dry, so I prepared an interactive version with lots of examples showing what has changed and what the new behavior is. Read on and see! new(expr)  • Type-safe error checking  • Green Tea GC  • Faster cgo and syscalls  • Faster memory allocation  • Vectorized operations  • Secret mode  • Reader-less cryptography  • Goroutine leak profile  • Goroutine metrics  • Reflective iterators  • Peek into a buffer  • Process handle  • Signal as cause  • Compare IP subnets  • Context-aware dialing  • Fake example.com  • Optimized fmt.Errorf  • Optimized io.ReadAll  • Multiple log handlers  • Test artifacts  • Modernized go fix  • Final thoughts This article is based on the official release notes from The Go Authors and the Go source code, licensed under the BSD-3-Clause license. This is not an exhaustive list; see the official release notes for that. I provide links to the documentation (𝗗), proposals (𝗣), commits (𝗖𝗟), and authors (𝗔) for the features described. Check them out for motivation, usage, and implementation details. I also have dedicated guides (𝗚) for some of the features. Error handling is often skipped to keep things simple. Don't do this in production ツ Previously, you could only use the built-in with types: Now you can also use it with expressions: If the argument is an expression of type T, then allocates a variable of type T, initializes it to the value of , and returns its address, a value of type . This feature is especially helpful if you use pointer fields in a struct to represent optional values that you marshal to JSON or Protobuf: You can use with composite values: And function calls: Passing is still not allowed: 𝗗 spec • 𝗣 45624 • 𝗖𝗟 704935 , 704737 , 704955 , 705157 • 𝗔 Alan Donovan The new function is a generic version of : It's type-safe and easier to use: is especially handy when checking for multiple types of errors. It makes the code shorter and keeps error variables scoped to their blocks: Another issue with is that it uses reflection and can cause runtime panics if used incorrectly (like if you pass a non-pointer or a type that doesn't implement ): doesn't cause a runtime panic; it gives a clear compile-time error instead: doesn't use , executes faster, and allocates less than : Since can handle everything that does, it's a recommended drop-in replacement for new code. 𝗗 errors.AsType • 𝗣 51945 • 𝗖𝗟 707235 • 𝗔 Julien Cretel The new garbage collector (first introduced as experimental in 1.25) is designed to make memory management more efficient on modern computers with many CPU cores. Go's traditional garbage collector algorithm operates on graph, treating objects as nodes and pointers as edges, without considering their physical location in memory. The scanner jumps between distant memory locations, causing frequent cache misses. As a result, the CPU spends too much time waiting for data to arrive from memory. More than 35% of the time spent scanning memory is wasted just stalling while waiting for memory accesses. As computers get more CPU cores, this problem gets even worse. Green Tea shifts the focus from being processor-centered to being memory-aware. Instead of scanning individual objects, it scans memory in contiguous 8 KiB blocks called spans . The algorithm focuses on small objects (up to 512 bytes) because they are the most common and hardest to scan efficiently. Each span is divided into equal slots based on its assigned size class , and it only contains objects of that size class. For example, if a span is assigned to the 32-byte size class, the whole block is split into 32-byte slots, and objects are placed directly into these slots, each starting at the beginning of its slot. Because of this fixed layout, the garbage collector can easily find an object's metadata using simple address arithmetic, without checking the size of each object it finds. When the algorithm finds an object that needs to be scanned, it marks the object's location in its span but doesn't scan it immediately. Instead, it waits until there are several objects in the same span that need scanning. Then, when the garbage collector processes that span, it scans multiple objects at once. This is much faster than going over the same area of memory multiple times. To make better use of CPU cores, GC workers share the workload by stealing tasks from each other. Each worker has its own local queue of spans to scan, and if a worker is idle, it can grab tasks from the queues of other busy workers. This decentralized approach removes the need for a central global list, prevents delays, and reduces contention between CPU cores. Green Tea uses vectorized CPU instructions (only on amd64 architectures) to process memory spans in bulk when there are enough objects. Benchmark results vary, but the Go team expects a 10–40% reduction in garbage collection overhead in real-world programs that rely heavily on the garbage collector. Plus, with vectorized implementation, an extra 10% reduction in GC overhead when running on CPUs like Intel Ice Lake or AMD Zen 4 and newer. Unfortunately, I couldn't find any public benchmark results from the Go team for the latest version of Green Tea, and I wasn't able to create a good synthetic benchmark myself. So, no details this time :( The new garbage collector is enabled by default. To use the old garbage collector, set at build time (this option is expected to be removed in Go 1.27). 𝗣 73581 • 𝗔 Michael Knyszek In the Go runtime, a processor (often referred to as a P) is a resource required to run the code. For a thread (a machine or M) to execute a goroutine (G), it must first acquire a processor. Processors move through different states. They can be (executing code), (waiting for work), or (paused because of the garbage collection). Previously, processors had a state called used when a goroutine is making a system or cgo call. Now, this state has been removed. Instead of using a separate processor state, the system now checks the status of the goroutine assigned to the processor to see if it's involved in a system call. This reduces internal runtime overhead and simplifies code paths for cgo and syscalls. The Go release notes say -30% in cgo runtime overhead, and the commit mentions an 18% sec/op improvement: I decided to run the CgoCall benchmarks locally as well: Either way, both a 20% and a 30% improvement are pretty impressive. And here are the results from a local syscall benchmark: That's pretty good too. 𝗖𝗟 646198 • 𝗔 Michael Knyszek The Go runtime now has specialized versions of its memory allocation function for small objects (from 1 to 512 bytes). It uses jump tables to quickly choose the right function for each size, instead of relying on a single general-purpose implementation. The Go release notes say "the compiler will now generate calls to size-specialized memory allocation routines". But based on the code, that's not completely accurate: the compiler still emits calls to the general-purpose function. Then, at runtime, dispatches those calls to the new specialized allocation functions. This change reduces the cost of small object memory allocations by up to 30%. The Go team expects the overall improvement to be ~1% in real allocation-heavy programs. I couldn't find any existing benchmarks, so I came up with my own. And indeed, running it on Go 1.25 compared to 1.26 shows a significant improvement: The new implementation is enabled by default. You can disable it by setting at build time (this option is expected to be removed in Go 1.27). 𝗖𝗟 665835 • 𝗔 Michael Matloob The new package provides access to architecture-specific vectorized operations (SIMD — single instruction, multiple data). This is a low-level package that exposes hardware-specific functionality. It currently only supports amd64 platforms. Because different CPU architectures have very different SIMD operations, it's hard to create a single portable API that works for all of them. So the Go team decided to start with a low-level, architecture-specific API first, giving "power users" immediate access to SIMD features on the most common server platform — amd64. The package defines vector types as structs, like (a 128-bit SIMD vector with sixteen 8-bit integers) and (a 512-bit SIMD vector with eight 64-bit floats). These match the hardware's vector registers. The package supports vectors that are 128, 256, or 512 bits wide. Most operations are defined as methods on vector types. They usually map directly to hardware instructions with zero overhead. To give you a taste, here's a custom function that uses SIMD instructions to add 32-bit float vectors: Let's try it on two vectors: Common operations in the package include: The package uses only AVX instructions, not SSE. Here's a simple benchmark for adding two vectors (both the "plain" and SIMD versions use pre-allocated slices): The package is experimental and can be enabled by setting at build time. 𝗗 simd/archsimd • 𝗣 73787 • 𝗖𝗟 701915 , 712880 , 729900 , 732020 • 𝗔 Junyang Shao , Sean Liao , Tom Thorogood Cryptographic protocols like WireGuard or TLS have a property called "forward secrecy". This means that even if an attacker gains access to long-term secrets (like a private key in TLS), they shouldn't be able to decrypt past communication sessions. To make this work, ephemeral keys (temporary keys used to negotiate the session) need to be erased from memory immediately after the handshake. If there's no reliable way to clear this memory, these keys could stay there indefinitely. An attacker who finds them later could re-derive the session key and decrypt past traffic, breaking forward secrecy. In Go, the runtime manages memory, and it doesn't guarantee when or how memory is cleared. Sensitive data might remain in heap allocations or stack frames, potentially exposed in core dumps or through memory attacks. Developers often have to use unreliable "hacks" with reflection to try to zero out internal buffers in cryptographic libraries. Even so, some data might still stay in memory where the developer can't reach or control it. The Go team's solution to this problem is the new package. It lets you run a function in secret mode . After the function finishes, it immediately erases (zeroes out) the registers and stack it used. Heap allocations made by the function are erased as soon as the garbage collector decides they are no longer reachable. This helps make sure sensitive information doesn't stay in memory longer than needed, lowering the risk of attackers getting to it. Here's an example that shows how might be used in a more or less realistic setting. Let's say you want to generate a session key while keeping the ephemeral private key and shared secret safe: Here, the ephemeral private key and the raw shared secret are effectively "toxic waste" — they are necessary to create the final session key, but dangerous to keep around. If these values stay in the heap and an attacker later gets access to the application's memory (for example, via a core dump or a vulnerability like Heartbleed), they could use these intermediates to re-derive the session key and decrypt past conversations. By wrapping the calculation in , we make sure that as soon as the session key is created, the "ingredients" used to make it are permanently destroyed. This means that even if the server is compromised in the future, this specific past session can't be exposed, which ensures forward secrecy. The current implementation only supports Linux (amd64 and arm64). On unsupported platforms, invokes the function directly. Also, trying to start a goroutine within the function causes a panic (this will be fixed in Go 1.27). The package is mainly for developers who work on cryptographic libraries. Most apps should use higher-level libraries that use behind the scenes. The package is experimental and can be enabled by setting at build time. 𝗗 runtime/secret • 𝗣 21865 • 𝗖𝗟 704615 • 𝗔 Daniel Morsing Current cryptographic APIs, like or , often accept an as the source of random data: These APIs don't commit to a specific way of using random bytes from the reader. Any change to underlying cryptographic algorithms can change the sequence or amount of bytes read. Because of this, if the application code (mistakenly) relies on a specific implementation in Go version X, it might fail or behave differently in version X+1. The Go team chose a pretty bold solution to this problem. Now, most crypto APIs will just ignore the random parameter and always use the system random source ( ). The change applies to the following subpackages: still uses the random reader if provided. But if is nil, it uses an internal secure source of random bytes instead of (which could be overridden). To support deterministic testing, there's a new package with a single function. It sets a global, deterministic cryptographic randomness source for the duration of the given test: affects and all implicit sources of cryptographic randomness in the packages: To temporarily restore the old reader-respecting behavior, set (this option will be removed in a future release). 𝗗 testing/cryptotest • 𝗣 70942 • 𝗖𝗟 724480 • 𝗔 Filippo Valsorda , qiulaidongfeng A leak occurs when one or more goroutines are indefinitely blocked on synchronization primitives like channels, while other goroutines continue running and the program as a whole keeps functioning. Here's a simple example: If we call and don't read from the output channel, the inner goroutine will stay blocked trying to send to the channel for the rest of the program: Unlike deadlocks, leaks do not cause panics, so they are much harder to spot. Also, unlike data races, Go's tooling did not address them for a long time. Things started to change in Go 1.24 with the introduction of the package. Not many people talk about it, but is a great tool for catching leaks during testing. Go 1.26 adds a new experimental profile designed to report leaked goroutines in production. Here's how we can use it in the example above: As you can see, we have a nice goroutine stack trace that shows exactly where the leak happens. The profile finds leaks by using the garbage collector's marking phase to check which blocked goroutines are still connected to active code. It starts with runnable goroutines, marks all sync objects they can reach, and keeps adding any blocked goroutines waiting on those objects. When it can't add any more, any blocked goroutines left are waiting on resources that can't be reached — so they're considered leaked. Here's the gist of it: For even more details, see the paper by Saioc et al. If you want to see how (and ) can catch typical leaks that often happen in production — check out my article on goroutine leaks . The profile is experimental and can be enabled by setting at build time. Enabling the experiment also makes the profile available as a net/http/pprof endpoint, . According to the authors, the implementation is already production-ready. It's only marked as experimental so they can get feedback on the API, especially about making it a new profile. 𝗗 runtime/pprof • 𝗚 Detecting leaks • 𝗣 74609 , 75280 • 𝗖𝗟 688335 • 𝗔 Vlad Saioc New metrics in the package give better insight into goroutine scheduling: Here's the full list: Per-state goroutine metrics can be linked to common production issues. For example, an increasing waiting count can show a lock contention problem. A high not-in-go count means goroutines are stuck in syscalls or cgo. A growing runnable backlog suggests the CPUs can't keep up with demand. You can read the new metric values using the regular function: The per-state numbers (not-in-go + runnable + running + waiting) are not guaranteed to add up to the live goroutine count ( , available since Go 1.16). All new metrics use counters. 𝗗 runtime/metrics • 𝗣 15490 • 𝗖𝗟 690397 , 690398 , 690399 • 𝗔 Michael Knyszek The new and methods in the package return iterators for a type's fields and methods: The new methods and return iterators for the input and output parameters of a function type: The new methods and return iterators for a value's fields and methods. Each iteration yields both the type information ( or ) and the value: Previously, you could get all this information by using a for-range loop with methods (which is what iterators do internally): Using an iterator is more concise. I hope it justifies the increased API surface. 𝗗 reflect • 𝗣 66631 • 𝗖𝗟 707356 • 𝗔 Quentin Quaadgras The new method in the package returns the next N bytes from the buffer without advancing it: If returns fewer than N bytes, it also returns : The slice returned by points to the buffer's content and stays valid until the buffer is changed. So, if you change the slice right away, it will affect future reads: The slice returned by is only valid until the next call to a read or write method. 𝗗 Buffer.Peek • 𝗣 73794 • 𝗖𝗟 674415 • 𝗔 Ilia Choly After you start a process in Go, you can access its ID: Internally, the type uses a process handle instead of the PID (which is just an integer), if the operating system supports it. Specifically, in Linux it uses pidfd , which is a file descriptor that refers to a process. Using the handle instead of the PID makes sure that methods always work with the same OS process, and not a different process that just happens to have the same ID. Previously, you couldn't access the process handle. Now you can, thanks to the new method: calls a specified function and passes a process handle as an argument: The handle is guaranteed to refer to the process until the callback function returns, even if the process has already terminated. That's why it's implemented as a callback instead of a field or method. is only supported on Linux 5.4+ and Windows. On other operating systems, it doesn't execute the callback and returns an error. 𝗗 Process.WithHandle • 𝗣 70352 • 𝗖𝗟 699615 • 𝗔 Kir Kolyshkin returns a context that gets canceled when any of the specified signals is received. Previously, the canceled context only showed the standard "context canceled" cause: Now the context's cause shows exactly which signal was received: The returned type, , is based on , so it doesn't provide the actual value — just its string representation. 𝗗 signal.NotifyContext • 𝗖𝗟 721700 • 𝗔 Filippo Valsorda An IP address prefix represents an IP subnet. These prefixes are usually written in CIDR notation: In Go, an IP prefix is represented by the type. The new method lets you compare two IP prefixes, making it easy to sort them without having to write your own comparison code: orders two prefixes as follows: This follows the same order as Python's and the standard IANA (Internet Assigned Numbers Authority) convention. 𝗗 Prefix.Compare • 𝗣 61642 • 𝗖𝗟 700355 • 𝗔 database64128 The package has top-level functions for connecting to an address using different networks (protocols) — , , , and . They were made before was introduced, so they don't support cancellation: There's also a type with a general-purpose method. It supports cancellation and can be used to connect to any of the known networks: However, a bit less efficient than network-specific functions like — because of the extra overhead from address resolution and network type dispatching. So, network-specific functions in the package are more efficient, but they don't support cancellation. The type supports cancellation, but it's less efficient. The Go team decided to resolve this contradiction. The new context-aware methods ( , , , and ) combine the efficiency of the existing network-specific functions with the cancellation capabilities of : I wouldn't say that having three different ways to dial is very convenient, but that's the price of backward compatibility. 𝗗 net.Dialer • 𝗣 49097 • 𝗖𝗟 490975 • 𝗔 Michael Fraenkel The default certificate already lists in its DNSNames (a list of hostnames or domain names that the certificate is authorized to secure). Because of this, doesn't trust responses from the real : To fix this issue, the HTTP client returned by now redirects requests for and its subdomains to the test server: 𝗗 Server.Client • 𝗖𝗟 666855 • 𝗔 Sean Liao People often point out that using for plain strings causes more memory allocations than . Because of this, some suggest switching code from to when formatting isn't needed. The Go team disagrees. Here's a quote from Russ Cox: Using is completely fine, especially in a program where all the errors are constructed with . Having to mentally switch between two functions based on the argument is unnecessary noise. With the new Go release, this debate should finally be settled. For unformatted strings, now allocates less and generally matches the allocations for . Specifically, goes from 2 allocations to 0 allocations for a non-escaping error, and from 2 allocations to 1 allocation for an escaping error: This matches the allocations for in both cases. The difference in CPU cost is also much smaller now. Previously, it was ~64ns vs. ~21ns for vs. for escaping errors, now it's ~25ns vs. ~21ns. Here are the "before and after" benchmarks for the change. The non-escaping case is called , and the escaping case is called . If there's just a plain error string, it's . If the error includes formatting, it's . Seconds per operation: Bytes per operation: Allocations per operation: If you're interested in the details, I highly recommend reading the CL — it's perfectly written. 𝗗 fmt.Errorf • 𝗖𝗟 708836 • 𝗔 thepudds Previously, allocated a lot of intermediate memory as it grew its result slice to the size of the input data. Now, it uses intermediate slices of exponentially growing size, and then copies them into a final perfectly-sized slice at the end. The new implementation is about twice as fast and uses roughly half the memory for a 65KiB input; it's even more efficient with larger inputs. Here are the geomean results comparing the old and new versions for different input sizes: See the full benchmark results in the commit. Unfortunately, the author didn't provide the benchmark source code. Ensuring the final slice is minimally sized is also quite helpful. The slice might persist for a long time, and the unused capacity in a backing array (as in the old version) would just waste memory. As with the optimization, I recommend reading the CL — it's very good. Both changes come from thepudds , whose change descriptions are every reviewer's dream come true. 𝗗 io.ReadAll • 𝗖𝗟 722500 • 𝗔 thepudds The package, introduced in version 1.21, offers a reliable, production-ready logging solution. Since its release, many projects have switched from third-party logging packages to use it. However, it was missing one key feature: the ability to send log records to multiple handlers, such as stdout or a log file. The new type solves this problem. It implements the standard interface and calls all the handlers you set up. For example, we can create a log handler that writes to stdout: And another handler that writes to a file: Finally, combine them using a : I'm also printing the file contents here to show the results. When the receives a log record, it sends it to each enabled handler one by one. If any handler returns an error, doesn't stop; instead, it combines all the errors using : The method reports whether any of the configured handlers is enabled: Other methods — and — call the corresponding methods on each of the enabled handlers. 𝗗 slog.MultiHandler • 𝗣 65954 • 𝗖𝗟 692237 • 𝗔 Jes Cok Test artifacts are files created by tests or benchmarks, such as execution logs, memory dumps, or analysis reports. They are important for debugging failures in remote environments (like CI), where developers can't step through the code manually. Previously, the Go test framework and tools didn't support test artifacts. Now they do. The new methods , , and return a directory where you can write test output files: If you use with , this directory will be inside the output directory (specified by , or the current directory by default): As you can see, the first time is called, it writes the directory location to the test log, which is quite handy. If you don't use , artifacts are stored in a temporary directory which is deleted after the test completes. Each test or subtest within each package has its own unique artifact directory. Subtest outputs are not stored inside the parent test's output directory — all artifact directories for a given package are created at the same level: The artifact directory path normally looks like this: But if this path can't be safely converted into a local file path (which, for some reason, always happens on my machine), the path will simply be: (which is what happens in the examples above) Repeated calls to in the same test or subtest return the same directory. 𝗗 T.ArtifactDir • 𝗣 71287 • 𝗖𝗟 696399 • 𝗔 Damien Neil Over the years, the command became a sad, neglected bag of rewrites for very ancient Go features. But now, it's making a comeback. The new is re-implemented using the Go analysis framework — the same one uses. While and now use the same infrastructure, they have different purposes and use different sets of analyzers: By default, runs a full set of analyzers (currently, there are more than 20). To choose specific analyzers, use the flag for each one, or use to run all analyzers except the ones you turned off. For example, here we only enable the analyzer: And here, we enable all analyzers except : Currently, there's no way to suppress specific analyzers for certain files or sections of code. To give you a taste of analyzers, here's one of them in action. It replaces loops with or : If you're interested, check out the dedicated blog post for the full list of analyzers with examples. 𝗗 cmd/fix • 𝗚 go fix • 𝗣 71859 • 𝗔 Alan Donovan Go 1.26 is incredibly big — it's the largest release I've ever seen, and for good reason: All in all, a great release! You might be wondering about the package that was introduced as experimental in 1.25. It's still experimental and available with the flag. P.S. To catch up on other Go releases, check out the Go features by version list or explore the interactive tours for Go 1.25 and 1.24 . P.P.S. Want to learn more about Go? Check out my interactive book on concurrency a vector from array/slice, or a vector to array/slice. Arithmetic: , , , , . Bitwise: , , , , . Comparison: , , , , . Conversion: , , . Masking: , , . Rearrangement: . Collect live goroutines . Start with currently active (runnable or running) goroutines as roots. Ignore blocked goroutines for now. Mark reachable memory . Trace pointers from roots to find which synchronization objects (like channels or wait groups) are currently reachable by these roots. Resurrect blocked goroutines . Check all currently blocked goroutines. If a blocked goroutine is waiting for a synchronization resource that was just marked as reachable — add that goroutine to the roots. Iterate . Repeat steps 2 and 3 until there are no more new goroutines blocked on reachable objects. Report the leaks . Any goroutines left in the blocked state are waiting for resources that no active part of the program can access. They're considered leaked. Total number of goroutines since the program started. Number of goroutines in each state. Number of active threads. First by validity (invalid before valid). Then by address family (IPv4 before IPv6). Then by masked IP address (network IP). Then by prefix length. Then by unmasked address (original IP). Vet is for reporting problems. Its analyzers describe actual issues, but they don't always suggest fixes, and the fixes aren't always safe to apply. Fix is (mostly) for modernizing the code to use newer language and library features. Its analyzers produce fixes are always safe to apply, but don't necessarily indicate problems with the code. It brings a lot of useful updates, like the improved builtin, type-safe error checking, and goroutine leak detector. There are also many performance upgrades, including the new garbage collector, faster cgo and memory allocation, and optimized and . On top of that, it adds quality-of-life features like multiple log handlers, test artifacts, and the updated tool. Finally, there are two specialized experimental packages: one with SIMD support and another with protected mode for forward secrecy.

0 views
Anton Zhiyanov 3 months ago

Fear is not advocacy

AI advocates seem to be the only kind of technology advocates who feel this imminent urge to constantly criticize developers for not being excited enough about their tech. It would be crazy if I presented new Go features like this: If you still don't use the package, all your systems will eventually succumb to concurrency bugs. If you don't use iterators, you have absolutely nothing interesting to build. The job of an advocate is to spark interest, not to reproach people or instill FOMO. And yet that's exactly what AI advocates do. What a weird way to advocate. This whole "devote your life to AI right now, or you'll be out of a job soon" narrative is false. You don't have to be a world-class algorithm expert to write good software. You don't have to be a Linux expert to use containers. And you don't have to spend all your time now trying to become an expert in chasing ever-changing AI tech. As with any new technology, developers adopting AI typically fall into four groups: early adopters, early majority, late majority, and laggards. Right now, AI advocates are trying to shame everyone into becoming early adopters. But it's perfectly okay to wait if you're sceptical. Being part of the late majority is a safe and reasonable choice. If anything, you'll have fewer bugs to deal with. As the industry adopts AI practices, you'll naturally absorb just the right amount of them. You are going to be fine.

0 views
Anton Zhiyanov 3 months ago

'Better C' playgrounds

I have a soft spot for the "better C" family of languages: C3, Hare, Odin, V, and Zig. I'm not saying these languages are actually better than C — they're just different. But I needed to come up with an umbrella term for them, and "better C" was the only thing that came to mind. I believe playgrounds and interactive documentation make programming languages easier for more people to learn. That's why I created online sandboxes for these langs. You can try them out below, embed them on your own website, or self-host and customize them. If you're already familiar with one of these languages, maybe you could even create an interactive guide for it? I'm happy to help if you want to give it a try. C3  • Hare  • Odin  • V  • Zig  • Editors An ergonomic, safe, and familiar evolution of C. ⛫  homepage • αω  tutorial • ⚘  community A systems programming language designed to be simple, stable, and robust. ⛫  homepage • αω  tutorial • ⚘  community A high-performance, data-oriented systems programming language. ⛫  homepage • αω  tutorial • ⚘  community A language with C-level performance and rapid compilation speeds. ⛫  homepage • αω  tutorial • ⚘  community A language designed for performance and explicit control with powerful metaprogramming. ⛫  homepage • αω  tutorial • ⚘  community If you want to do more than just "hello world," there are also full-size online editors . They're pretty basic, but still can be useful.

0 views
Anton Zhiyanov 3 months ago

Go feature: Modernized go fix

Part of the Accepted! series: Go proposals and features explained in simple terms. The modernized command uses a fresh set of analyzers and the same infrastructure as . Ver. 1.26 • Tools • Medium impact The is re-implemented using the Go analysis framework — the same one uses. While and now use the same infrastructure, they have different purposes and use different sets of analyzers: See the full set of fix's analyzers in the Analyzers section. The main goal is to bring modernization tools from the Go language server (gopls) to the command line. If includes the modernize suite, developers can easily and safely update their entire codebase after a new Go release with just one command. Re-implementing also makes the Go toolchain simpler. The unified and use the same backend framework and extension mechanism. This makes the tools more consistent, easier to maintain, and more flexible for developers who want to use custom analysis tools. Implement the new command: By default, runs a full set of analyzers (see the list below). To choose specific analyzers, use the flag for each one, or use to run all analyzers except the ones you turned off. For example, here we only enable the analyzer: And here, we enable all analyzers except : Currently, there's no way to suppress specific analyzers for certain files or sections of code. Here's the list of fixes currently available in , along with examples. any  • bloop  • fmtappendf  • forvar  • hostport  • inline  • mapsloop  • minmax  • newexpr  • omitzero  • plusbuild  • rangeint  • reflecttypefor  • slicescontains  • slicessort  • stditerators  • stringsbuilder  • stringscut  • stringcutprefix  • stringsseq  • testingcontext  • waitgroup Replace with : Replace for-range over with and remove unnecessary manual timer control: Replace with to avoid intermediate string allocation: Remove unnecessary shadowing of loop variables: Replace network addresses created with by using instead, because host-port pairs made with don't work with IPv6: Inline function calls accoring to the comment directives: Replace explicit loops over maps with calls to package ( , , , or depending on the context): Replace if/else statements with calls to or : Replace custom "pointer to" functions with : Remove from struct-type fields because this tag doesn't have any effect on them: Remove obsolete comments: Replace 3-clause for loops with for-range over integers: Replace with : Replace loops with or : Replace with for basic types: Use iterators instead of / -style APIs for certain types in the standard library: Replace repeated with : Replace some uses of and string slicing with or : Replace / with and / with : Replace ranging over / with / : Replace with in tests: Replace + with : 𝗣 71859 👥 Alan Donovan , Jonathan Amsterdam Vet is for reporting problems. Its analyzers describe actual issues, but they don't always suggest fixes, and the fixes aren't always safe to apply. Fix is (mostly) for modernizing the code to use newer language and library features. Its analyzers produce fixes are always safe to apply, but don't necessarily indicate problems with the code.

0 views
Anton Zhiyanov 3 months ago

Detecting goroutine leaks in modern Go

Deadlocks, race conditions, and goroutine leaks are probably the three most common problems in concurrent Go programming. Deadlocks usually cause panics, so they're easier to spot. The race detector can help find data races (although it doesn't catch everything and doesn't help with other types of race conditions). As for goroutine leaks, Go's tooling did not address them for a long time. A leak occurs when one or more goroutines are indefinitely blocked on synchronization primitives like channels, while other goroutines continue running and the program as a whole keeps functioning. We'll look at some examples shortly. Things started to change in Go 1.24 with the introduction of the package. There will be even bigger changes in Go 1.26, which adds a new experimental profile that reports leaked goroutines. Let's take a look! A simple leak  • Detection: goleak  • Detection: synctest  • Detection: pprof  • Algorithm  • Range over channel  • Double send  • Early return  • Take first  • Cancel/timeout  • Orphans  • Final thoughts Let's say there's a function that runs the given functions concurrently and sends their results to an output channel: And a simple test: Send three functions to be executed and collect the results from the output channel. The test passed, so the function works correctly. But does it really? Let's pass three functions to without collecting the results, and count the goroutines: After 50 ms — when all the functions should definitely have finished — there are still three running goroutines ( ). In other words, all the goroutines are stuck. The reason is that the channel is unbuffered. If the client doesn't read from it, or doesn't read all the results, the goroutines inside get blocked on sending the result to . Let's modify the test to catch the leak. Obviously, we don't want to rely on in tests — such check is too fragile. Let's use a third-party goleak package instead: playground ▶ The test output clearly shows where the leak occurs. Goleak uses internally, but it does so quite efficiently. It inspects the stack for unexpected goroutines up to 20 times, with the wait time between checks increasing exponentially, starting at 1 microsecond and going up to 100 milliseconds. This way, the test runs almost instantly. Still, I'd prefer not to use third-party packages and . Let's check for leaks without any third-party packages by using the package (experimental in Go 1.24, production-ready in Go 1.25+): I'll keep this explanation short since isn't the main focus of this article. If you want to learn more about it, check out the Concurrency testing guide. I highly recommend it — is super useful! Here's what happens: Next, comes into play. It tries to wait for all child goroutines to finish before it returns. But if it sees that some goroutines are durably blocked (in our case, all three are blocked trying to send to the channel), it panics: main bubble goroutine has exited but blocked goroutines remain So, here we found the leak without using or goleak. Pretty useful! Let's check for leaks using the new profile type (experimental in Go 1.26). We'll use a helper function to run the profiled code and print the results when the profile is ready: Call with three functions and observe all three leaks: We have a nice goroutine stack trace that shows exactly where the leak happens. Unfortunately, we had to use again, so this probably isn't the best way to test — unless we combine it with to use the fake clock. On the other hand, we can collect a from a running program, which makes it really useful for finding leaks in production systems (unlike ). Pretty neat. This profile uses the garbage collector's marking phase to find goroutines that are permanently blocked (leaked). The approach is explained in detail in the proposal and the paper by Saioc et al. — check it out if you're interested. Here's the gist of it: In the rest of the article, we'll review the different types of leaks often observed in production and see whether and are able to detect each of them (spoiler: they are). Based on the code examples from the common-goroutine-leak-patterns repository by Georgian-Vlad Saioc, licensed under the Apache-2.0 license. One or more goroutines receive from a channel using , but the sender never closes the channel, so all the receivers eventually leak: Notice how and give almost the same stack traces, clearly showing the root cause of the problem. You'll see this in the next examples as well. Fix: The sender should close the channel after it finishes sending. Try uncommenting the ⓧ line and see if both checks pass. The sender accidentally sends more values to a channel than intended, and leaks: Fix: Make sure that each possible path in the code sends to the channel no more times than the receiver is ready for. Alternatively, make the channel's buffer large enough to handle all possible sends. Try uncommenting the ⓧ line and see if both checks pass. The parent goroutine exits without receiving a value from the child goroutine, so the child leaks: Fix: Make the channel buffered so the child goroutine doesn't get blocked when sending. Try making the channel buffered at line ⓧ and see if both checks pass. Similar to "early return". If the parent is canceled before receiving a value from the child goroutine, the child leaks: Fix: Make the channel buffered so the child goroutine doesn't get blocked when sending. Try making the channel buffered at line ⓧ and see if both checks pass. The parent launches N child goroutines, but is only interested in the first result. The rest N-1 children leak: Using (zero items, the parent leaks): Using (multiple items, children leak): Using (zero items, the parent leaks): Using (multiple items, children leak): Fix: Make the channel's buffer large enough to hold values from all child goroutines. Also, return early if the source collection is empty. Try changing the implementation as follows and see if both checks pass: Inner goroutines leak because the client doesn't follow the contract described in the type's interface and documentation. Let's say we have a type with the following contract: The implementation isn't particularly important — what really matters is the public contract. Let's say the client breaks the contract and doesn't stop the worker: Then the worker goroutines will leak, just like the documentation says. Fix: Follow the contract and stop the worker to make sure all goroutines are stopped. Try uncommenting the ⓧ line and see if both checks pass. Thanks to improvements in Go 1.24-1.26, it's now much easier to catch goroutine leaks, both during testing and in production. The package is available in 1.24 (experimental) and 1.25+ (production-ready). If you're interested, I have a detailed interactive guide on it. The profile will be available in 1.26 (experimental). According to the authors, the implementation is already production-ready. It's only marked as experimental so they can get feedback on the API, especially about making it a new profile. Check the proposal and the commits for more details on : P.S. If you are into concurrency, check out my interactive book . The call to starts a testing bubble in a separate goroutine. The call to starts three goroutines. The call to blocks the root bubble goroutine. One of the goroutines executes , tries to write to , and gets blocked (because no one is reading from ). The same thing happens to the other two goroutines. sees that all the child goroutines in the bubble are durably blocked, so it unblocks the root goroutine. The inner test function finishes. Collect live goroutines . Start with currently active (runnable or running) goroutines as roots. Ignore blocked goroutines for now. Mark reachable memory . Trace pointers from roots to find which memory objects (like channels or mutexes) are currently reachable by these roots. Resurrect blocked goroutines . Check all currently blocked goroutines. If a blocked goroutine is waiting for a synchronization resource that was just marked as reachable — add that goroutine to the roots. Iterate . Repeat steps 2 and 3 until there are no more new goroutines blocked on reachable objects. Report the leaks . Any goroutines left in the blocked state are waiting for resources that no active part of the program can access. They're considered leaked. 𝗣 74609 , 75280 👥 Vlad Saioc , Michael Knyszek 𝗖𝗟 688335 👥 Vlad Saioc

0 views
Anton Zhiyanov 4 months ago

Timing 'Hello, world'

Here's a little unscientific chart showing the compile/run times of a "hello world" program in different languages: For interpreted languages, the times shown are only for running the program, since there's no separate compilation step. I had to shorten the Kotlin bar a bit to make it fit within 80 characters. All measurements were done in single-core, containerized sandboxes on an ancient CPU, and the timings include the overhead of . So the exact times aren't very interesting, especially for the top group (Bash to Ruby) — they all took about the same amount of time. Here is the program source code in C: Other languages: Bash · C# · C++ · Dart · Elixir · Go · Haskell · Java · JavaScript · Kotlin · Lua · Odin · PHP · Python · R · Ruby · Rust · Swift · V · Zig Of course, this ranking will be different for real-world projects with lots of code and dependencies. Still, I found it curious to see how each language performs on a simple "hello world" task.

0 views
Anton Zhiyanov 4 months ago

Gist of Go: Concurrency is out!

My book on concurrent programming in Go is finally finished. It walks you through goroutines, channels, select, pipelines, synchronization, race prevention, time handling, signaling, atomicity, testing, and concurrency internals. The book follows my usual style: clear explanations with interactive examples, plus auto-tested exercises so you can practice as you go. I genuinely think it's the best practical guide for everyone learning concurrency from scratch or looking to go beyond the basics. There's a dedicated page with all the book details — check it out !

1 views
Anton Zhiyanov 4 months ago

Go proposal: Secret mode

Part of the Accepted! series, explaining the upcoming Go changes in simple terms. Automatically erase used memory to prevent secret leaks. Ver. 1.26 • Stdlib • Low impact The new package lets you run a function in secret mode . After the function finishes, it immediately erases (zeroes out) the registers and stack it used. Heap allocations made by the function are erased as soon as the garbage collector decides they are no longer reachable. This helps make sure sensitive information doesn't stay in memory longer than needed, lowering the risk of attackers getting to it. The package is experimental and is mainly for developers of cryptographic libraries, not for application developers. Cryptographic protocols like WireGuard or TLS have a property called "forward secrecy". This means that even if an attacker gains access to long-term secrets (like a private key in TLS), they shouldn't be able to decrypt past communication sessions. To make this work, session keys (used to encrypt and decrypt data during a specific communication session) need to be erased from memory after they're used. If there's no reliable way to clear this memory, the keys could stay there indefinitely, which would break forward secrecy. In Go, the runtime manages memory, and it doesn't guarantee when or how memory is cleared. Sensitive data might remain in heap allocations or stack frames, potentially exposed in core dumps or through memory attacks. Developers often have to use unreliable "hacks" with reflection to try to zero out internal buffers in cryptographic libraries. Even so, some data might still stay in memory where the developer can't reach or control it. The solution is to provide a runtime mechanism that automatically erases all temporary storage used during sensitive operations. This will make it easier for library developers to write secure code without using workarounds. Add the package with and functions: The current implementation has several limitations: The last point might not be immediately obvious, so here's an example. If an offset in an array is itself secret (you have a array and the secret key always starts at ), don't create a pointer to that location (don't create a pointer to ). Otherwise, the garbage collector might store this pointer, since it needs to know about all active pointers to do its job. If someone launches an attack to access the GC's memory, your secret offset could be exposed. The package is mainly for developers who work on cryptographic libraries. Most apps should use higher-level libraries that use behind the scenes. As of Go 1.26, the package is experimental and can be enabled by setting at build time. Use to generate a session key and encrypt a message using AES-GCM: Note that protects not just the raw key, but also the structure (which contains the expanded key schedule) created inside the function. This is a simplified example, of course — it only shows how memory erasure works, not a full cryptographic exchange. In real situations, the key needs to be shared securely with the receiver (for example, through key exchange) so decryption can work. 𝗣 21865 • 𝗖𝗟 704615 • 👥 Daniel Morsing , Dave Anderson , Filippo Valsorda , Jason A. Donenfeld , Keith Randall , Russ Cox Only supported on linux/amd64 and linux/arm64. On unsupported platforms, invokes directly. Protection does not cover any global variables that writes to. Trying to start a goroutine within causes a panic. If calls , erasure is delayed until all deferred functions are executed. Heap allocations are only erased if ➊ the program drops all references to them, and ➋ then the garbage collector notices that those references are gone. The program controls the first part, but the second part depends on when the runtime decides to act. If panics, the panicked value might reference memory allocated inside . That memory won't be erased until (at least) the panicked value is no longer reachable. Pointer addresses might leak into data buffers that the runtime uses for garbage collection. Do not put confidential information into pointers.

1 views
Anton Zhiyanov 4 months ago

Gist of Go: Concurrency internals

This is a chapter from my book on Go concurrency , which teaches the topic from the ground up through interactive examples. Here's where we started this book: Functions that run with are called goroutines. The Go runtime juggles these goroutines and distributes them among operating system threads running on CPU cores. Compared to OS threads, goroutines are lightweight, so you can create hundreds or thousands of them. That's generally correct, but it's a little too brief. In this chapter, we'll take a closer look at how goroutines work. We'll still use a simplified model, but it should help you understand how everything fits together. Concurrency • Goroutine scheduler • GOMAXPROCS • Concurrency primitives • Scheduler metrics • Profiling • Tracing • Keep it up At the hardware level, CPU cores are responsible for running parallel tasks. If a processor has 4 cores, it can run 4 instructions at the same time — one on each core. At the operating system level, a thread is the basic unit of execution. There are usually many more threads than CPU cores, so the operating system's scheduler decides which threads to run and which ones to pause. The scheduler keeps switching between threads to make sure each one gets a turn to run on a CPU, instead of waiting in line forever. This is how the operating system handles concurrency. At the Go runtime level, a goroutine is the basic unit of execution. The runtime scheduler runs a fixed number of OS threads, often one per CPU core. There can be many more goroutines than threads, so the scheduler decides which goroutines to run on the available threads and which ones to pause. The scheduler keeps switching between goroutines to make sure each one gets a turn to run on a thread, instead of waiting in line forever. This is how Go handles concurrency. The Go runtime scheduler doesn't decide which threads run on the CPU — that's the operating system scheduler's job. The Go runtime makes sure all goroutines run on the threads it manages, but the OS controls how and when those threads actually get CPU time. The scheduler's job is to run M goroutines on N operating system threads, where M can be much larger than N. Here's a simple way to do it: Take goroutines G11-G14 and run them: Goroutine G12 got blocked while reading from the channel. Put it back in the queue and replace it with G15: But there are a few things to keep in mind. Let's say goroutines G11–G14 are running smoothly without getting blocked by mutexes or channels. Does that mean goroutines G15–G20 won't run at all and will just have to wait ( starve ) until one of G11–G14 finally finishes? That would be unfortunate. That's why the scheduler checks each running goroutine roughly every 10 ms to decide if it's time to pause it and put it back in the queue. This approach is called preemptive scheduling: the scheduler can interrupt running goroutines when needed so others have a chance to run too. System calls The scheduler can manage a goroutine while it's running Go code. But what happens if a goroutine makes a system call, like reading from disk? In that case, the scheduler can't take the goroutine off the thread, and there's no way to know how long the system call will take. For example, if goroutines G11–G14 in our example spend a long time in system calls, all worker threads will be blocked, and the program will basically "freeze". To solve this problem, the scheduler starts new threads if the existing ones get blocked in a system call. For example, here's what happens if G11 and G12 make system calls: Here, the scheduler started two new threads, E and F, and assigned goroutines G15 and G16 from the queue to these threads. When G11 and G12 finish their system calls, the scheduler will stop or terminate the extra threads (E and F) and keep running the goroutines on four threads: A-B-C-D. This is a simplified model of how the goroutine scheduler works in Go. If you want to learn more, I recommend watching the talk by Dmitry Vyukov, one of the scheduler's developers: Go scheduler: Implementing language with lightweight concurrency ( video , slides ) We said that the scheduler uses N threads to run goroutines. In the Go runtime, the value of N is set by a parameter called . The runtime setting controls the maximum number of operating system threads the Go scheduler can use to execute goroutines concurrently. It defaults to the value of , which is the number of logical CPUs on the machine. Strictly speaking, is either the total number of logical CPUs or the number allowed by the CPU affinity mask, whichever is lower. This can be adjusted by the CPU quota, as explained below. For example, on my 8-core laptop, the default value of is also 8: You can change by setting environment variable or calling : You can also undo the manual changes and go back to the default value set by the runtime. To do this, use the function (Go 1.25+): Go programs often run in containers, like those managed by Docker or Kubernetes. These systems let you limit the CPU resources for a container using a Linux feature called cgroups . A cgroup (control group) in Linux lets you group processes together and control how much CPU, memory, and network I/O they can use by setting limits and priorities. For example, here's how you can limit a Docker container to use only four CPUs: Before version 1.25, the Go runtime didn't consider the CPU quota when setting the value. No matter how you limited CPU resources, was always set to the number of logical CPUs on the host machine: Starting with version 1.25, the Go runtime respects the CPU quota: So, the default value is set to either the number of logical CPUs or the CPU limit enforced by cgroup settings for the process, whichever is lower. Note on CPU limits Cgroups actually offer not just one, but two ways to limit CPU resources: Docker's and / set the quota, while sets the shares. Kubernetes' CPU limit sets the quota, while CPU request sets the shares. Go's runtime only takes the CPU quota into account, not the shares. Fractional CPU limits are rounded up: On a machine with multiple CPUs, the minimum default value for is 2, even if the CPU limit is set lower: The Go runtime automatically updates if the CPU limit changes. It happens up to once per second (less frequently if the application is idle). Let's take a quick look at the three main concurrency tools for Go: goroutines, channels, and select. A goroutine is implemented as a pointer to a structure. Here's what it looks like: The structure has many fields, but most of its memory is taken up by the stack, which holds the goroutine's local variables. By default, each stack gets 2 KB of memory, and it grows if needed. Because goroutines use very little memory, they're much more efficient than operating system threads, which usually need about 1 MB each. Their small size lets you run tens (or even hundreds) of thousands of goroutines on a single machine. A channel is implemented as a pointer to a structure. Here's what it looks like: The buffer array ( ) has a fixed size ( , which you can get with the builtin). It's created when you make a buffered channel. The number of items in the channel ( , which you can get with the builtin) increases when you send to the channel and decreases when you receive from it. The builtin sets the field to 1. Sending an item to an unbuffered channel, or to a buffered channel that's already full, puts the goroutine into the queue. Receiving from an empty channel puts the goroutine into the queue. The select logic is implemented in the function. It's a huge function that takes a list of select cases and (very simply put) works as follows: ✎ Exercise: Runtime simulator Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it . If you are okay with just theory for now, let's continue. Metrics show how the Go runtime is performing, like how much heap memory it uses or how long garbage collection pauses take. Each metric has a unique name (for example, ) and a value, which can be a number or a histogram. We use the package to work with metrics. List all available metrics with descriptions: Get the value of a specific metric: Here are some goroutine-related metrics: In real projects, runtime metrics are usually exported automatically with client libraries for Prometheus, OpenTelemetry, or other observability tools. Here's an example for Prometheus: The exported metrics are then collected by Prometheus, visualized, and used to set up alerts. Profiling helps you understand exactly what the program is doing, what resources it uses, and where in the code this happens. Profiling is often not recommended in production because it's a "heavy" process that can slow things down. But that's not the case with Go. Go's profiler is designed for production use. It uses sampling, so it doesn't track every single operation. Instead, it takes quick snapshots of the runtime every 10 ms and puts them together to give you a full picture. Go supports the following profiles: The easiest way to add a profiler to your app is by using the package. When you import it, it automatically registers HTTP handlers for collecting profiles: Or you can register profiler handlers manually: After that, you can start profiling with a specific profile by running the command with the matching URL, or just open that URL in your browser: For the CPU profile, you can choose how long the profiler runs (the default is 30 seconds). Other profiles are taken instantly. After running the profiler, you'll get a binary file that you can open in the browser using the same utility. For example: The pprof web interface lets you view the same profile in different ways. My personal favorites are the flame graph , which clearly shows the call hierarchy and resource usage, and the source view, which shows the exact lines of code. You can also profile manually. To collect a CPU profile, use and : To collect other profiles, use : Profiling is a broad topic, and we've only touched the surface. To learn more, start with these articles: Tracing records certain types of events while the program is running, mainly those related to concurrency and memory: If you enabled the profiling server as described earlier, you can collect a trace using this URL: Trace files can be quite large, so it's better to use a small N value. After tracing is complete, you'll get a binary file that you can open in the browser using the utility: In the trace web interface, you'll see each goroutine's "lifecycle" on its own line. You can zoom in and out of the trace with the W and S keys, and you can click on any event to see more details: You can also collect a trace manually: Flight recording is a tracing technique that collects execution data, such as function calls and memory allocations, within a sliding window that's limited by size or duration. It helps to record traces of interesting program behavior, even if you don't know in advance when it will happen. The type (Go 1.25+) implements a flight recorder in Go. It tracks a moving window over the execution trace produced by the runtime, always containing the most recent trace data. Here's an example of how you might use it. First, configure the sliding window: Then create the recorder and start it: Continue with the application code as usual: Finally, save the trace snapshot to a file when an important event occurs: Use to view the trace in the browser: ✎ Exercise: Comparing blocks Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it . If you are okay with just theory for now, let's continue. Now you can see how challenging the Go scheduler's job is. Fortunately, most of the time you don't need to worry about how it works behind the scenes — sticking to goroutines, channels, select, and other synchronization primitives is usually enough. This is the final chapter of my "Gist of Go: Concurrency" book. I invite you to read it — the book is an easy-to-understand, interactive guide to concurrency programming in Go. Pre-order for $10   or read online Put all goroutines in a queue. Take N goroutines from the queue and run them. If a running goroutine gets blocked (for example, waiting to read from a channel or waiting on a mutex), put it back in the queue and run the next goroutine from the queue. CPU quota — the maximum CPU time the cgroup may use within some period window. CPU shares — relative CPU priorities given to the kernel scheduler. Go through the cases and check if the matching channels are ready to send or receive. If several cases are ready, choose one at random (to prevent starvation, where some cases are always chosen and others are never chosen). Once a case is selected, perform the send or receive operation on the matching channel. If there is a default case and no other cases are ready, pick the default. If no cases are ready, block the goroutine and add it to the channel queue for each case. Count of goroutines created since program start (Go 1.26+). Count of live goroutines (created but not finished yet). An increase in this metric may indicate a goroutine leak. Approximate count of goroutines running or blocked in a system call or cgo call (Go 1.26+). An increase in this metric may indicate problems with such calls. Approximate count of goroutines ready to execute, but not executing (Go 1.26+). An increase in this metric may mean the system is overloaded and the CPU can't keep up with the growing number of goroutines. Approximate count of goroutines executing (Go 1.26+). Always less than or equal to . Approximate count of goroutines waiting on a resource — I/O or sync primitives (Go 1.26+). An increase in this metric may indicate issues with mutex locks, other synchronization blocks, or I/O issues. The current count of live threads that are owned by the runtime (Go 1.26+). The current setting — the maximum number of operating system threads the scheduler can use to execute goroutines concurrently. CPU . Shows how much CPU time each function uses. Use it to find performance bottlenecks if your program is running slowly because of CPU-heavy tasks. Heap . Shows the heap memory currently used by each function. Use it to detect memory leaks or excessive memory usage. Allocs . Shows which functions have used heap memory since the profiler started (not just currently). Use it to optimize garbage collection or reduce allocations that impact performance. Goroutine . Shows the stack traces of all current goroutines. Use it to get an overview of what the program is doing. Block . Shows where goroutines block waiting on synchronization primitives like channels, mutexes and wait groups. Use it to identify synchronization bottlenecks and issues in data exchange between goroutines. Disabled by default. Mutex . Shows lock contentions on mutexes and internal runtime locks. Use it to find "problematic" mutexes that goroutines are frequently waiting for. Disabled by default. Profiling Go Programs Diagnostics goroutine creation and state changes; system calls; garbage collection; heap size changes;

1 views
Anton Zhiyanov 4 months ago

Go proposal: Type-safe error checking

Part of the Accepted! series, explaining the upcoming Go changes in simple terms. Introducing — a modern, type-safe alternative to . Ver. 1.26 • Stdlib • High impact The new function is a generic version of : It's type-safe, faster, and easier to use: is not deprecated (yet), but is recommended for new code. The function requires you to declare a variable of the target error type and pass a pointer to it: It makes the code quite verbose, especially when checking for multiple types of errors: With a generic , you can specify the error type right in the function call. This makes the code shorter and keeps error variables scoped to their blocks: Another issue with is that it uses reflection and can cause runtime panics if used incorrectly (like if you pass a non-pointer or a type that doesn't implement ). While static analysis tools usually catch these issues, using the generic has several benefits: Finally, can handle everything that does, so it's a drop-in improvement for new code. Add the function to the package: Recommend using instead of : Open a file and check if the error is related to the file path: 𝗣 51945 • 𝗖𝗟 707235 Unlike , doesn't use the package, but it still relies on type assertions and interface checks. These operations access runtime type metadata, so isn't completely "reflection-free" in the strict sense.  ↩︎ No reflection 1 . No runtime panics. Less allocations. Compile-time type safety. Unlike , doesn't use the package, but it still relies on type assertions and interface checks. These operations access runtime type metadata, so isn't completely "reflection-free" in the strict sense.  ↩︎

0 views
Anton Zhiyanov 4 months ago

Go proposal: Goroutine metrics

Part of the Accepted! series, explaining the upcoming Go changes in simple terms. Export goroutine-related metrics from the Go runtime. Ver. 1.26 • Stdlib • Medium impact New metrics in the package give better insight into goroutine scheduling: Go's runtime/metrics package already provides a lot of runtime stats, but it doesn't include metrics for goroutine states or thread counts. Per-state goroutine metrics can be linked to common production issues. An increasing waiting count can show a lock contention problem. A high not-in-go count means goroutines are stuck in syscalls or cgo. A growing runnable backlog suggests the CPUs can't keep up with demand. Observability systems can track these counters to spot regressions, find scheduler bottlenecks, and send alerts when goroutine behavior changes from the usual patterns. Developers can use them to catch problems early without needing full traces. Add the following metrics to the package: The per-state numbers are not guaranteed to add up to the live goroutine count ( , available since Go 1.16). All metrics use uint64 counters. Start some goroutines and print the metrics after 100 ms of activity: No surprises here: we read the new metric values the same way as before — using metrics.Read . 𝗣 15490 • 𝗖𝗟 690397 , 690398 , 690399 P.S. If you are into goroutines, check out my interactive book on concurrency Total number of goroutines since the program started. Number of goroutines in each state. Number of active threads.

1 views
Anton Zhiyanov 4 months ago

Gist of Go: Concurrency testing

This is a chapter from my book on Go concurrency , which teaches the topic from the ground up through interactive examples. Testing concurrent programs is a lot like testing single-task programs. If the code is well-designed, you can test the state of a concurrent program with standard tools like channels, wait groups, and other abstractions built on top of them. But if you've made it so far, you know that concurrency is never that easy. In this chapter, we'll go over common testing problems and the solutions that Go offers. Waiting for goroutines • Checking channels • Checking for leaks • Durable blocking • Instant waiting • Time inside the bubble • Thoughts on time 1  ✎ • Thoughts on time 2  ✎ • Checking for cleanup • Bubble rules • Keep it up Let's say we want to test this function: Calculations run asynchronously in a separate goroutine. However, the function returns a result channel, so this isn't a problem: At point ⓧ, the test is guaranteed to wait for the inner goroutine to finish. The rest of the test code doesn't need to know anything about how concurrency works inside the function. Overall, the test isn't any more complicated than if were synchronous. But we're lucky that returns a channel. What if it doesn't? Let's say the function looks like this: We write a simple test and run it: The assertion fails because at point ⓧ, we didn't wait for the inner goroutine to finish. In other words, we didn't synchronize the and goroutines. That's why still has its initial value (0) when we do the check. We can add a short delay with : The test is now passing. But using to sync goroutines isn't a great idea, even in tests. We don't want to set a custom delay for every function we're testing. Also, the function's execution time may be different on the local machine compared to a CI server. If we use a longer delay just to be safe, the tests will end up taking too long to run. Sometimes you can't avoid using in tests, but since Go 1.25, the package has made these cases much less common. Let's see how it works. The package has a lot going on under the hood, but its public API is very simple: The function creates an isolated bubble where you can control time to some extent. Any new goroutines started inside this bubble become part of the bubble. So, if we wrap the test code with , everything will run inside the bubble — the test code, the function we're testing, and its goroutine. At point ⓧ, we want to wait for the goroutine to finish. The function comes to the rescue! It blocks the calling goroutine until all other goroutines in the bubble are finished. (It's actually a bit more complicated than that, but we'll talk about it later.) In our case, there's only one other goroutine (the inner goroutine), so will pause until it finishes, and then the test will move on. Now the test passes instantly. That's better! ✎ Exercise: Wait until done Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it . If you are okay with just theory for now, let's continue. As we've seen, you can use to wait for the tested goroutine to finish, and then check the state of the data you are interested in. You can also use it to check the state of channels. Let's say there's a function that generates N numbers like 11, 22, 33, and so on: And a simple test: Set N=2, get the first number from the generator's output channel, then get the second number. The test passed, so the function works correctly. But does it really? Let's use in "production": Panic! We forgot to close the channel when exiting the inner goroutine, so the for-range loop waiting on that channel got stuck. Let's fix the code: And add a test for the channel state: The test is still failing, even though we're now closing the channel when the goroutine exits. This is a familiar problem: at point ⓧ, we didn't wait for the inner goroutine to finish. So when we check the channel, it hasn't closed yet. That's why the test fails. We can delay the check using : But it's better to use : At point ⓧ, blocks the test until the only other goroutine (the inner goroutine) finishes. Once the goroutine has exited, the channel is already closed. So, in the select statement, the case triggers with set to , allowing the test to pass. As you can see, the package helped us avoid delays in the test, and the test itself didn't get much more complicated. As we've seen, you can use to wait for the tested goroutine to finish, and then check the state of the data or channels. You can also use it to detect goroutine leaks. Let's say there's a function that runs the given functions concurrently and sends their results to an output channel: And a simple test: Send three functions to be executed, get the first result from the output channel, and check it. The test passed, so the function works correctly. But does it really? Let's run three times, passing three functions each time: After 50 ms — when all the functions should definitely have finished — there are still 9 running goroutines ( ). In other words, all the goroutines are stuck. The reason is that the channel is unbuffered. If the client doesn't read from it, or doesn't read all the results, the goroutines inside get blocked when they try to send the result of to . Let's fix this by adding a buffer of the right size to the channel: Then add a test to check the number of goroutines: The test is still failing, even though the channel is now buffered, and the goroutines shouldn't block on sending to it. This is a familiar problem: at point ⓧ, we didn't wait for the running goroutines to finish. So is greater than zero, which makes the test fail. We can delay the check using (not recommended), or use a third-party package like goleak (a better option): The test passes now. By the way, goleak also uses internally, but it does so much more efficiently. It tries up to 20 times, with the wait time between checks increasing exponentially, starting at 1 microsecond and going up to 100 milliseconds. This way, the test runs almost instantly. Even better, we can check for leaks without any third-party packages by using : Earlier, I said that blocks the calling goroutine until all other goroutines finish. Actually, it's a bit more complicated. blocks until all other goroutines either finish or become durably blocked . We'll talk about "durably" later. For now, let's focus on "become blocked." Let's temporarily remove the buffer from the channel and check the test results: Here's what happens: Next, comes into play. It not only starts the bubble goroutine, but also tries to wait for all child goroutines to finish before it returns. If sees that some goroutines are stuck (in our case, all 9 are blocked trying to send to the channel), it panics: main bubble goroutine has exited but blocked goroutines remain So, we found the leak without using or goleak, thanks to the useful features of and : Now let's make the channel buffered and run the test again: As we've found, blocks until all goroutines in the bubble — except the one that called — have either finished or are durably blocked. Let's figure out what "durably blocked" means. For , a goroutine inside a bubble is considered durably blocked if it is blocked by any of the following operations: Other blocking operations are not considered durable, and ignores them. For example: The distinction between "durable" and other types of blocks is just a implementation detail of the package. It's not a fundamental property of the blocking operations themselves. In real-world applications, this distinction doesn't exist, and "durable" blocks are neither better nor worse than any others. Let's look at an example. Let's say there's a type that performs some asynchronous computation: Our goal is to write a test that checks the result while the calculation is still running . Let's see how the test changes depending on how is implemented (except for the version — we'll cover that one a bit later). Let's say is implemented using a done channel: Naive test: The check fails because when is called, the goroutine in hasn't set yet. Let's use to wait until the goroutine is blocked at point ⓧ: In ⓧ, the goroutine is blocked on reading from the channel. This channel is created inside the bubble, so the block is durable. The call in the test returns as soon as happens, and we get the current value of . Let's say is implemented using select: Let's use to wait until the goroutine is blocked at point ⓧ: In ⓧ, the goroutine is blocked on a select statement. Both channels used in the select ( and ) are created inside the bubble, so the block is durable. The call in the test returns as soon as happens, and we get the current value of . Let's say is implemented using a wait group: Let's use to wait until the goroutine is blocked at point ⓧ: In ⓧ, the goroutine is blocked on the wait group's call. The group's method was called inside the bubble, so this is a durable block. The call in the test returns as soon as happens, and we get the current value of . Let's say is implemented using a condition variable: Let's use to wait until the goroutine is blocked at point ⓧ: In ⓧ, the goroutine is blocked on the condition variable's call. This is a durable block. The call returns as soon as happens, and we get the current value of . Let's say is implemented using a mutex: Let's try using to wait until the goroutine is blocked at point ⓧ: In ⓧ, the goroutine is blocked on the mutex's call. doesn't consider blocking on a mutex to be durable. The call ignores the block and never returns. The test hangs and only fails when the overall timeout is reached. You might be wondering why the authors didn't consider blocking on mutexes to be durable. There are a couple of reasons: ⌘ ⌘ ⌘ Let's go back to the original question: how does the test change depending on how is implemented? It doesn't change at all. We used the exact same test code every time: If your program uses durably blocking operations, always works the same way: Very convenient! ✎ Exercise: Blocking queue Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it . If you are okay with just theory for now, let's continue. Inside the bubble, time works differently. Instead of using a regular wall clock, the bubble uses a fake clock that can jump forward to any point in the future. This can be quite handy when testing time-sensitive code. Let's say we want to test this function: The positive scenario is straightforward: send a value to the channel, call the function, and check the result: The negative scenario, where the function times out, is also pretty straightforward. But the test takes the full three seconds to complete: We're actually lucky the timeout is only three seconds. It could have been as long as sixty! To make the test run instantly, let's wrap it in : Note that there is no call here, and the only goroutine in the bubble (the root one) gets durably blocked on a select statement in . Here's what happens next: Thanks to the fake clock, the test runs instantly instead of taking three seconds like it would with the "naive" approach. You might have noticed that quite a few circumstances coincided here: We'll look at the alternatives soon, but first, here's a quick exercise. ✎ Exercise: Wait, repeat Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it . If you are okay with just theory for now, let's continue. The fake clock in can be tricky. It move forward only if: ➊ all goroutines in the bubble are durably blocked; ➋ there's a future moment when at least one goroutine will unblock; and ➌ isn't running. Let's look at the alternatives. I'll say right away, this isn't an easy topic. But when has time travel ever been easy? :) Here's the function we're testing: Let's run in a separate goroutine, so there will be two goroutines in the bubble: panicked because the root bubble goroutine finished while the goroutine was still blocked on a select. Reason: only advances the clock if all goroutines are blocked — including the root bubble goroutine. How to fix: Use to make sure the root goroutine is also durably blocked. Now all three conditions are met again (all goroutines are durably blocked; the moment of future unblocking is known; there is no call to ). The fake clock moves forward 3 seconds, which unblocks the goroutine. The goroutine finishes, leaving only the root one, which is still blocked on . The clock moves forward another 2 seconds, unblocking the root goroutine. The assertion passes, and the test completes successfully. But if we run the test with the race detector enabled (using the flag), it reports a data race on the variable: Logically, using in the root goroutine doesn't guarantee that the goroutine (which writes to the variable) will finish before the root goroutine reads from . That's why the race detector reports a problem. Technically, the test passes because of how is implemented, but the race still exists in the code. The right way to handle this is to call after : Calling ensures that the goroutine finishes before the root goroutine reads , so there's no data race anymore. Here's the function we're testing: Let's replace in the root goroutine with : panicked because the root bubble goroutine finished while the goroutine was still blocked on a select. Reason: only advances the clock if there is no active running. If all bubble goroutines are durably blocked but a is running, won't advance the clock. Instead, it will simply finish the call and return control to the goroutine that called it (in this case, the root bubble goroutine). How to fix: don't use . Let's update to use context cancellation instead of a timer: We won't cancel the context in the test: panicked because all goroutines in the bubble are hopelessly blocked. Reason: only advances the clock if it knows how much to advance it. In this case, there is no future moment that would unblock the select in . How to fix: Manually unblock the goroutine and call to wait for it to finish. Now, cancels the context and unblocks the select in , while makes sure the goroutine finishes before the test checks and . Let's update to lock the mutex before doing any calculations: In the test, we'll lock the mutex before calling , so it will block: The test failed because it hit the overall timeout set in . Reason: only works with durable blocks. Blocking on a mutex lock isn't considered durable, so the bubble can't do anything about it — even though the sleeping inner goroutine would have unlocked the mutex in 10 ms if the bubble had used the wall clock. How to fix: Don't use . Now the mutex unlocks after 10 milliseconds (wall clock), finishes successfully, and the check passes. The clock inside the buuble won't move forward if: ✎ Exercise: Asynchronous repeater Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it . If you are okay with just theory for now, let's continue. Let's practice understanding time in the bubble with some thinking exercises. Try to solve the problem in your head before using the playground. Here's a function that performs synchronous work: And a test for it: What is the test missing at point ⓧ? ✓ Thoughts on time 1 There's only one goroutine in the test, so when gets blocked by , the time in the bubble jumps forward by 3 seconds. Then sets to and finishes. Finally, the test checks and passes successfully. No need to add anything. Let's keep practicing our understanding of time in the bubble with some thinking exercises. Try to solve the problem in your head before using the playground. Here's a function that performs asynchronous work: And a test for it: What is the test missing at point ⓧ? ✓ Thoughts on time 2 Let's go over the options. ✘ synctest.Wait This won't help because returns as soon as inside is called. The check fails, and panics with the error: "main bubble goroutine has exited but blocked goroutines remain". ✘ time.Sleep Because of the call in the root goroutine, the wait inside in is already over by the time is checked. However, there's no guarantee that has run yet. That's why the test might pass or might fail. ✘ synctest.Wait, then time.Sleep This option is basically the same as just using , because returns before the in even starts. The test might pass or might fail. ✓ time.Sleep, then synctest.Wait This is the correct answer: Since the root goroutine isn't blocked, it checks while the goroutine is blocked by the call. The check fails, and panics with the message: "main bubble goroutine has exited but blocked goroutines remain". Sometimes you need to test objects that use resources and should be able to release them. For example, this could be a server that, when started, creates a pool of network connections, connects to a database, and writes file caches. When stopped, it should clean all this up. Let's see how we can make sure everything is properly stopped in the tests. We're going to test this server: Let's say we wrote a basic functional test: The test passes, but does that really mean the server stopped when we called ? Not necessarily. For example, here's a buggy implementation where our test would still pass: As you can see, the author simply forgot to stop the server here. To detect the problem, we can wrap the test in and see it panic: The server ignores the call and doesn't stop the goroutine running inside . Because of this, the goroutine gets blocked while writing to the channel. When finishes, it detects the blocked goroutine and panics. Let's fix the server code (to keep things simple, we won't support multiple or calls): Now the test passes. Here's how it works: Instead of using to stop something, it's common to use the method. It registers a function that will run when the test finishes: Functions registered with run in last-in, first-out (LIFO) order, after all deferred functions have executed. In the test above, there's not much difference between using and . But the difference becomes important if we move the server setup into a separate helper function, so we don't have to repeat the setup code in different tests: The approach doesn't work because it calls when returns — before the test assertions run: The approach works because it calls when has finished — after all the assertions have already run: Sometimes, a context ( ) is used to stop the server instead of a separate method. In that case, our server interface might look like this: Now we don't even need to use or to check whether the server stops when the context is canceled. Just pass as the context: returns a context that is automatically created when the test starts and is automatically canceled when the test finishes. Here's how it works: To check for stopping via a method or function, use or . To check for cancellation or stopping via context, use . Inside a bubble, returns a context whose channel is associated with the bubble. The context is automatically canceled when ends. Functions registered with inside the bubble run just before finishes. Let's go over the rules for living in the bubble. The following operations durably block a goroutine: The limitations are quite logical, and you probably won't run into them. Don't create channels or objects that contain channels (like tickers or timers) outside the bubble. Otherwise, the bubble won't be able to manage them, and the test will hang: Don't access synchronization primitives associated with a bubble from outside the bubble: Don't call , , or inside a bubble: Don't call inside the bubble: Don't call from outside the bubble: Don't call concurrently from multiple goroutines: ✎ Exercise: Testing a pipeline Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it . If you are okay with just theory for now, let's continue. The package is a complicated beast. But now that you've studied it, you can test concurrent programs no matter what synchronization tools they use—channels, selects, wait groups, timers or tickers, or even . In the next chapter, we'll talk about concurrency internals (coming soon). Pre-order for $10   or read online Three calls to start 9 goroutines. The call to blocks the root bubble goroutine ( ). One of the goroutines finishes its work, tries to write to , and gets blocked (because no one is reading from ). The same thing happens to the other 8 goroutines. sees that all the child goroutines in the bubble are blocked, so it unblocks the root goroutine. The root goroutine finishes. unblocks as soon as all other goroutines are durably blocked. panics when finished if there are still blocked goroutines left in the bubble. Sending to or receiving from a channel created within the bubble. A select statement where every case is a channel created within the bubble. Calling if all calls were made inside the bubble. Sending to or receiving from a channel created outside the bubble. Calling or . I/O operations (like reading a file from disk or waiting for a network response). System calls and cgo calls. Mutexes are usually used to protect shared state, not to coordinate goroutines (the example above is completely unrealistic). In tests, you usually don't need to pause before locking a mutex to check something. Mutex locks are usually held for a very short time, and mutexes themselves need to be as fast as possible. Adding extra logic to support could slow them down in normal (non-test) situations. It waits until all other goroutines in the bubble are blocked. Then, it unblocks the goroutine that called it. The bubble checks if the goroutine can be unblocked by waiting. In our case, it can — we just need to wait 3 seconds. The bubble's clock instantly jumps forward 3 seconds. The select in chooses the timeout case, and the function returns . The test assertions for and both pass successfully. There's no call. There's only one goroutine. The goroutine is durably blocked. It will be unblocked at certain point in the future. There are any goroutines that aren't durably blocked. It's unclear how much time to advance. is running. Because of the call in the root goroutine, the wait inside in is already over by the time is checked. Because of the call, the goroutine is guaranteed to finish (and hence to call ) before is checked. The main test code runs. Before the test finishes, the deferred is called. In the server goroutine, the case in the select statement triggers, and the goroutine ends. sees that there are no blocked goroutines and finishes without panicking. The main test code runs. Before the test finishes, the context is automatically canceled. The server goroutine stops (as long as the server is implemented correctly and checks for context cancellation). sees that there are no blocked goroutines and finishes without panicking. A bubble is created by calling . Each call creates a separate bubble. Goroutines started inside the bubble become part of it. The bubble can only manage durable blocks. Other types of blocks are invisible to it. If all goroutines in the bubble are durably blocked with no way to unblock them (such as by advancing the clock or returning from a call), panics. When finishes, it tries to wait for all child goroutines to complete. However, if even a single goroutine is durably blocked, panics. Calling returns a context whose channel is associated with the bubble. Functions registered with run inside the bubble, immediately before returns. Calling in a bubble blocks the goroutine that called it. returns when all other goroutines in the bubble are durably blocked. returns when all other goroutines in the bubble have finished. The bubble uses a fake clock (starting at 2000-01-01 00:00:00 UTC). Time in the bubble only moves forward if all goroutines are durably blocked. Time advances by the smallest amount needed to unblock at least one goroutine. If the bubble has to choose between moving time forward or returning from a running , it returns from . A blocking send or receive on a channel created within the bubble. A blocking select statement where every case is a channel created within the bubble. Calling if all calls were made inside the bubble.

0 views
Anton Zhiyanov 5 months ago

Go proposal: Context-aware Dialer methods

Part of the Accepted! series, explaining the upcoming Go changes in simple terms. Add context-aware, network-specific methods to the type. Ver. 1.26 • Stdlib • Low impact The type connects to the address using a given network (protocol) — TCP, UDP, IP, or Unix sockets. The new context-aware methods ( , , , and ) combine the efficiency of the existing network-specific functions (which skip address resolution and dispatch) with the cancellation capabilities of . The package already has top-level functions for different networks ( , , , and ), but these were made before was introduced, so they don't support cancellation: On the other hand, the type has a general-purpose method. It supports cancellation and can be used to connect to any of the known networks: However, if you already know the network type and address, using is a bit less efficient than network-specific functions like due to: Address resolution overhead: handles address resolution internally (like DNS lookups and converting to or ) using the network and address strings you provide. Network-specific functions accept a pre-resolved address object, so they skip this step. Network type dispatch: must route the call to the protocol-specific dialer. Network-specific functions already know which protocol to use, so they skip this step. So, network-specific functions in the package are more efficient, but they don't support cancellation. The type supports cancellation, but it's less efficient. This proposal aims to solve the mismatch by adding context-aware, network-specific methods to the type. Also, adding new methods to the lets you use the newer address types from the package (like instead of ), which are preferred in modern Go code. Add four new methods to the : The method signatures are similar to the existing top-level functions, but they also accept a context and use the newer address types from the package. Use the method to connect to a TCP server: Use the method to connect to a Unix socket: In both cases, the dialing fails because I didn't bother to start the server in the playground :) 𝗣 49097 • 𝗖𝗟 657296 Address resolution overhead: handles address resolution internally (like DNS lookups and converting to or ) using the network and address strings you provide. Network-specific functions accept a pre-resolved address object, so they skip this step. Network type dispatch: must route the call to the protocol-specific dialer. Network-specific functions already know which protocol to use, so they skip this step.

1 views
Anton Zhiyanov 5 months ago

Go proposal: Compare IP subnets

Part of the Accepted! series, explaining the upcoming Go changes in simple terms. Compare IP address prefixes the same way IANA does. Ver. 1.26 • Stdlib • Low impact An IP address prefix represents a IP subnet. These prefixes are usually written in CIDR notation: In Go, an IP prefix is represented by the type. The new method lets you compare two IP prefixes, making it easy to sort them without having to write your own comparison code. The imposed order matches both Python's implementation and the assumed order from IANA. When the Go team initially designed the IP subnet type ( ), they chose not to add a method because there wasn't a widely accepted way to order these values. Because of this, if a developer needs to sort IP subnets — for example, to organize routing tables or run tests — they have to write their own comparison logic. This results in repetitive and error-prone code. The proposal aims to provide a standard way to compare IP prefixes. This should reduce boilerplate code and help programs sort IP subnets consistently. Add the method to the type: orders two prefixes as follows: This follows the same order as Python's and the standard IANA convention . Sort a list of IP prefixes: 𝗣 61642 • 𝗖𝗟 700355 First by validity (invalid before valid). Then by address family (IPv4 before IPv6). Then by masked IP address (network IP). Then by prefix length. Then by unmasked address (original IP).

1 views