GreatReads - Blog Aggregator · Phoenix Framework

Porting Go's strings package to C

Creating a subset of Go that translates to C was never my end goal. I liked writing C code with Go, but without the standard library it felt pretty limited. So, the next logical step was to port Go's stdlib to C. Of course, this isn't something I could do all at once. I started with the io package , which provides core abstractions like and , as well as general-purpose functions like . But isn't very interesting on its own, since it doesn't include specific reader or writer implementations. So my next choices were naturally and — the workhorses of almost every Go program. This post is about how the porting process went. Bits and UTF-8 • Bytes • Allocators • Buffers and builders • Benchmarks • Optimizing search • Optimizing builder • Wrapping up Before I could start porting , I had to deal with its dependencies first: Both of these packages are made up of pure functions, so they were pretty easy to port. The only minor challenge was the difference in operator precedence between Go and C — specifically, bit shifts ( , ). In Go, bit shifts have higher precedence than addition and subtraction. In C, they have lower precedence: The simplest solution was to just use parentheses everywhere shifts are involved: With and done, I moved on to . The package provides functions for working with byte slices: Some of them were easy to port, like . Here's how it looks in Go: And here's the C version: Just like in Go, the ( → ) macro doesn't allocate memory; it just reinterprets the byte slice's underlying storage as a string. The function (which works like in Go) is easy to implement using from the libc API. Another example is the function, which looks for a specific byte in a slice. Here's the pure-Go implementation: And here's the C version: I used a regular C loop to mimic Go's : But and don't allocate memory. What should I do with , since it clearly does? I had a decision to make. The Go runtime handles memory allocation and deallocation automatically. In C, I had a few options: An allocator is a tool that reserves memory (typically on the heap) so a program can store its data structures there. See Allocators from C to Zig if you want to learn more about them. For me, the winner was clear. Modern systems programming languages like Zig and Odin clearly showed the value of allocators: An is an interface with three methods: , , and . In C, it translates to a struct with function pointers: As I mentioned in the post about porting the io package , this interface representation isn't as efficient as using a static method table, but it's simpler. If you're interested in other options, check out the post on interfaces . By convention, if a function allocates memory, it takes an allocator as its first parameter. So Go's : Translates to this C code: If the caller doesn't care about using a specific allocator, they can just pass an empty allocator, and the implementation will use the system allocator — , , and from libc. Here's a simplified version of the system allocator (I removed safety checks to make it easier to read): The system allocator is stateless, so it's safe to have a global instance: Here's an example of how to call with an allocator: Way better than hidden allocations! Besides pure functions, and also provide types like , , and . I ported them using the same approach as with functions. For types that allocate memory, like , the allocator becomes a struct field: The code is pretty wordy — most C developers would dislike using instead of something shorter like . My solution to this problem is to automatically translate Go code to C (which is actually what I do when porting Go's stdlib). If you're interested, check out the post about this approach — Solod: Go can be a better C . Types that don't allocate, like , need no special treatment — they translate directly to C structs without an allocator field. The package is the twin of , so porting it was uneventful. Here's usage example in Go and C side by side: Again, the C code is just a more verbose version of Go's implementation, plus explicit memory allocation. What's the point of writing C code if it's slow, right? I decided it was time to benchmark the ported C types and functions against their Go versions. To do that, I ported the benchmarking part of Go's package. Surprisingly, the simplified version was only 300 lines long and included everything I needed: Here's a sample benchmark for the type: Reads almost like Go's benchmarks. To monitor memory usage, I created — a memory allocator that wraps another allocator and keeps track of allocations: The benchmark gets an allocator through the function and wraps it in a to keep track of allocations: There's no auto-discovery, but the manual setup is quite straightforward. With the benchmarking setup ready, I ran benchmarks on the package. Some functions did well — about 1.5-2x faster than their Go equivalents: But (searching for a substring in a string) was a total disaster — it was nearly 20 times slower than in Go: The problem was caused by the function we looked at earlier: This "pure" Go implementation is just a fallback. On most platforms, Go uses a specialized version of written in assembly. For the C version, the easiest solution was to use , which is also optimized for most platforms: With this fix, the benchmark results changed drastically: Still not quite as fast as Go, but it's close. Honestly, I don't know why the -based implementation is still slower than Go's assembly here, but I decided not to pursue it any further. After running the rest of the function benchmarks, the ported versions won all of them except for two: Benchmarking details is a common way to compose strings from parts in Go, so I tested its performance too. The results were worse than I expected: Here, the C version performed about the same as Go, but I expected it to be faster. Unlike , is written entirely in Go, so there's no reason the ported version should lose in this benchmark. The method looked almost identical in Go and C: Go's automatically grows the backing slice, while does it manually ( , on the contrary, doesn't grow the slice — it's merely a wrapper). So, there shouldn't be any difference. I had to investigate. Looking at the compiled binary, I noticed a difference in how the functions returned results. Go returns multiple values in separate registers, so uses three registers: one for 8-byte , two for the interface (implemented as two 8-byte pointers). But in C, was a single struct made up of two unions and a pointer: Of course, this 56-byte monster can't be returned in registers — the C calling convention passes it through memory instead. Since is on the hot path in the benchmark, I figured this had to be the issue. So I switched from a single monolithic type to signature-specific types for multi-return pairs: Now, the implementation in C looked like this: is only 16 bytes — small enough to be returned in two registers. Problem solved! But it wasn't — the benchmark only showed a slight improvement. After looking into it more, I finally found the real issue: unlike Go, the C compiler wasn't inlining calls. Adding and moving to the header file made all the difference: 2-4x faster. That's what I was hoping for! Porting and was a mix of easy parts and interesting challenges. The pure functions were straightforward — just translate the syntax and pay attention to operator precedence. The real design challenge was memory management. Using allocators turned out to be a good solution, making memory allocation clear and explicit without being too difficult to use. The benchmarks showed that the C versions outperformed Go in most cases, sometimes by 2-4x. The only exceptions were and , where Go relies on hand-written assembly. The optimization was an interesting challenge: what seemed like a return-type issue was actually an inlining problem, and fixing it gave a nice speed boost. There's a lot more of Go's stdlib to port. In the next post, we'll cover — a very unique Go package. In the meantime, if you'd like to write Go that translates to C — with no runtime and manual memory management — I invite you to try Solod . The and packages are included, of course. implements bit counting and manipulation functions. implements functions for UTF-8 encoded text. Loop over the slice indexes with ( is a macro that returns , similar to Go's built-in). Access the i-th byte with (a bounds-checking macro that returns ). Use a reliable garbage collector like Boehm GC to closely match Go's behavior. Allocate memory with libc's and have the caller free it later with . Introduce allocators. It's obvious whether a function allocates memory or not: if it has an allocator as a parameter, it allocates. It's easy to use different allocation methods: you can use for one function, an arena for another, and a stack allocator for a third. It helps with testing and debugging: you can use a tracking allocator to find memory leaks, or a failing allocator to test error handling. Figuring out how many iterations to run. Running the benchmark function in a loop. Recording metrics (ns/op, MB/s, B/op, allocs/op). Reporting the results.

Porting Go's strings package to C

Porting Go's io package to C

Solod: Go can be a better C

Allocators from C to Zig

(Un)portable defer in C

Interfaces and traits in C

Go 1.26 interactive tour

Fear is not advocacy

'Better C' playgrounds

Go feature: Modernized go fix

Detecting goroutine leaks in modern Go

Timing 'Hello, world'

Gist of Go: Concurrency is out!

Go proposal: Secret mode

Gist of Go: Concurrency internals

Go proposal: Type-safe error checking

Go proposal: Goroutine metrics

Gist of Go: Concurrency testing

Go proposal: Context-aware Dialer methods

Go proposal: Compare IP subnets