Go 1.26 interactive tour
Go 1.26 is coming out in February, so it's a good time to explore what's new. The official release notes are pretty dry, so I prepared an interactive version with lots of examples showing what has changed and what the new behavior is. Read on and see! new(expr) • Type-safe error checking • Green Tea GC • Faster cgo and syscalls • Faster memory allocation • Vectorized operations • Secret mode • Reader-less cryptography • Goroutine leak profile • Goroutine metrics • Reflective iterators • Peek into a buffer • Process handle • Signal as cause • Compare IP subnets • Context-aware dialing • Fake example.com • Optimized fmt.Errorf • Optimized io.ReadAll • Multiple log handlers • Test artifacts • Modernized go fix • Final thoughts This article is based on the official release notes from The Go Authors and the Go source code, licensed under the BSD-3-Clause license. This is not an exhaustive list; see the official release notes for that. I provide links to the documentation (𝗗), proposals (𝗣), commits (𝗖𝗟), and authors (𝗔) for the features described. Check them out for motivation, usage, and implementation details. I also have dedicated guides (𝗚) for some of the features. Error handling is often skipped to keep things simple. Don't do this in production ツ Previously, you could only use the built-in with types: Now you can also use it with expressions: If the argument is an expression of type T, then allocates a variable of type T, initializes it to the value of , and returns its address, a value of type . This feature is especially helpful if you use pointer fields in a struct to represent optional values that you marshal to JSON or Protobuf: You can use with composite values: And function calls: Passing is still not allowed: 𝗗 spec • 𝗣 45624 • 𝗖𝗟 704935 , 704737 , 704955 , 705157 • 𝗔 Alan Donovan The new function is a generic version of : It's type-safe and easier to use: is especially handy when checking for multiple types of errors. It makes the code shorter and keeps error variables scoped to their blocks: Another issue with is that it uses reflection and can cause runtime panics if used incorrectly (like if you pass a non-pointer or a type that doesn't implement ): doesn't cause a runtime panic; it gives a clear compile-time error instead: doesn't use , executes faster, and allocates less than : Since can handle everything that does, it's a recommended drop-in replacement for new code. 𝗗 errors.AsType • 𝗣 51945 • 𝗖𝗟 707235 • 𝗔 Julien Cretel The new garbage collector (first introduced as experimental in 1.25) is designed to make memory management more efficient on modern computers with many CPU cores. Go's traditional garbage collector algorithm operates on graph, treating objects as nodes and pointers as edges, without considering their physical location in memory. The scanner jumps between distant memory locations, causing frequent cache misses. As a result, the CPU spends too much time waiting for data to arrive from memory. More than 35% of the time spent scanning memory is wasted just stalling while waiting for memory accesses. As computers get more CPU cores, this problem gets even worse. Green Tea shifts the focus from being processor-centered to being memory-aware. Instead of scanning individual objects, it scans memory in contiguous 8 KiB blocks called spans . The algorithm focuses on small objects (up to 512 bytes) because they are the most common and hardest to scan efficiently. Each span is divided into equal slots based on its assigned size class , and it only contains objects of that size class. For example, if a span is assigned to the 32-byte size class, the whole block is split into 32-byte slots, and objects are placed directly into these slots, each starting at the beginning of its slot. Because of this fixed layout, the garbage collector can easily find an object's metadata using simple address arithmetic, without checking the size of each object it finds. When the algorithm finds an object that needs to be scanned, it marks the object's location in its span but doesn't scan it immediately. Instead, it waits until there are several objects in the same span that need scanning. Then, when the garbage collector processes that span, it scans multiple objects at once. This is much faster than going over the same area of memory multiple times. To make better use of CPU cores, GC workers share the workload by stealing tasks from each other. Each worker has its own local queue of spans to scan, and if a worker is idle, it can grab tasks from the queues of other busy workers. This decentralized approach removes the need for a central global list, prevents delays, and reduces contention between CPU cores. Green Tea uses vectorized CPU instructions (only on amd64 architectures) to process memory spans in bulk when there are enough objects. Benchmark results vary, but the Go team expects a 10–40% reduction in garbage collection overhead in real-world programs that rely heavily on the garbage collector. Plus, with vectorized implementation, an extra 10% reduction in GC overhead when running on CPUs like Intel Ice Lake or AMD Zen 4 and newer. Unfortunately, I couldn't find any public benchmark results from the Go team for the latest version of Green Tea, and I wasn't able to create a good synthetic benchmark myself. So, no details this time :( The new garbage collector is enabled by default. To use the old garbage collector, set at build time (this option is expected to be removed in Go 1.27). 𝗣 73581 • 𝗔 Michael Knyszek In the Go runtime, a processor (often referred to as a P) is a resource required to run the code. For a thread (a machine or M) to execute a goroutine (G), it must first acquire a processor. Processors move through different states. They can be (executing code), (waiting for work), or (paused because of the garbage collection). Previously, processors had a state called used when a goroutine is making a system or cgo call. Now, this state has been removed. Instead of using a separate processor state, the system now checks the status of the goroutine assigned to the processor to see if it's involved in a system call. This reduces internal runtime overhead and simplifies code paths for cgo and syscalls. The Go release notes say -30% in cgo runtime overhead, and the commit mentions an 18% sec/op improvement: I decided to run the CgoCall benchmarks locally as well: Either way, both a 20% and a 30% improvement are pretty impressive. And here are the results from a local syscall benchmark: That's pretty good too. 𝗖𝗟 646198 • 𝗔 Michael Knyszek The Go runtime now has specialized versions of its memory allocation function for small objects (from 1 to 512 bytes). It uses jump tables to quickly choose the right function for each size, instead of relying on a single general-purpose implementation. The Go release notes say "the compiler will now generate calls to size-specialized memory allocation routines". But based on the code, that's not completely accurate: the compiler still emits calls to the general-purpose function. Then, at runtime, dispatches those calls to the new specialized allocation functions. This change reduces the cost of small object memory allocations by up to 30%. The Go team expects the overall improvement to be ~1% in real allocation-heavy programs. I couldn't find any existing benchmarks, so I came up with my own. And indeed, running it on Go 1.25 compared to 1.26 shows a significant improvement: The new implementation is enabled by default. You can disable it by setting at build time (this option is expected to be removed in Go 1.27). 𝗖𝗟 665835 • 𝗔 Michael Matloob The new package provides access to architecture-specific vectorized operations (SIMD — single instruction, multiple data). This is a low-level package that exposes hardware-specific functionality. It currently only supports amd64 platforms. Because different CPU architectures have very different SIMD operations, it's hard to create a single portable API that works for all of them. So the Go team decided to start with a low-level, architecture-specific API first, giving "power users" immediate access to SIMD features on the most common server platform — amd64. The package defines vector types as structs, like (a 128-bit SIMD vector with sixteen 8-bit integers) and (a 512-bit SIMD vector with eight 64-bit floats). These match the hardware's vector registers. The package supports vectors that are 128, 256, or 512 bits wide. Most operations are defined as methods on vector types. They usually map directly to hardware instructions with zero overhead. To give you a taste, here's a custom function that uses SIMD instructions to add 32-bit float vectors: Let's try it on two vectors: Common operations in the package include: The package uses only AVX instructions, not SSE. Here's a simple benchmark for adding two vectors (both the "plain" and SIMD versions use pre-allocated slices): The package is experimental and can be enabled by setting at build time. 𝗗 simd/archsimd • 𝗣 73787 • 𝗖𝗟 701915 , 712880 , 729900 , 732020 • 𝗔 Junyang Shao , Sean Liao , Tom Thorogood Cryptographic protocols like WireGuard or TLS have a property called "forward secrecy". This means that even if an attacker gains access to long-term secrets (like a private key in TLS), they shouldn't be able to decrypt past communication sessions. To make this work, ephemeral keys (temporary keys used to negotiate the session) need to be erased from memory immediately after the handshake. If there's no reliable way to clear this memory, these keys could stay there indefinitely. An attacker who finds them later could re-derive the session key and decrypt past traffic, breaking forward secrecy. In Go, the runtime manages memory, and it doesn't guarantee when or how memory is cleared. Sensitive data might remain in heap allocations or stack frames, potentially exposed in core dumps or through memory attacks. Developers often have to use unreliable "hacks" with reflection to try to zero out internal buffers in cryptographic libraries. Even so, some data might still stay in memory where the developer can't reach or control it. The Go team's solution to this problem is the new package. It lets you run a function in secret mode . After the function finishes, it immediately erases (zeroes out) the registers and stack it used. Heap allocations made by the function are erased as soon as the garbage collector decides they are no longer reachable. This helps make sure sensitive information doesn't stay in memory longer than needed, lowering the risk of attackers getting to it. Here's an example that shows how might be used in a more or less realistic setting. Let's say you want to generate a session key while keeping the ephemeral private key and shared secret safe: Here, the ephemeral private key and the raw shared secret are effectively "toxic waste" — they are necessary to create the final session key, but dangerous to keep around. If these values stay in the heap and an attacker later gets access to the application's memory (for example, via a core dump or a vulnerability like Heartbleed), they could use these intermediates to re-derive the session key and decrypt past conversations. By wrapping the calculation in , we make sure that as soon as the session key is created, the "ingredients" used to make it are permanently destroyed. This means that even if the server is compromised in the future, this specific past session can't be exposed, which ensures forward secrecy. The current implementation only supports Linux (amd64 and arm64). On unsupported platforms, invokes the function directly. Also, trying to start a goroutine within the function causes a panic (this will be fixed in Go 1.27). The package is mainly for developers who work on cryptographic libraries. Most apps should use higher-level libraries that use behind the scenes. The package is experimental and can be enabled by setting at build time. 𝗗 runtime/secret • 𝗣 21865 • 𝗖𝗟 704615 • 𝗔 Daniel Morsing Current cryptographic APIs, like or , often accept an as the source of random data: These APIs don't commit to a specific way of using random bytes from the reader. Any change to underlying cryptographic algorithms can change the sequence or amount of bytes read. Because of this, if the application code (mistakenly) relies on a specific implementation in Go version X, it might fail or behave differently in version X+1. The Go team chose a pretty bold solution to this problem. Now, most crypto APIs will just ignore the random parameter and always use the system random source ( ). The change applies to the following subpackages: still uses the random reader if provided. But if is nil, it uses an internal secure source of random bytes instead of (which could be overridden). To support deterministic testing, there's a new package with a single function. It sets a global, deterministic cryptographic randomness source for the duration of the given test: affects and all implicit sources of cryptographic randomness in the packages: To temporarily restore the old reader-respecting behavior, set (this option will be removed in a future release). 𝗗 testing/cryptotest • 𝗣 70942 • 𝗖𝗟 724480 • 𝗔 Filippo Valsorda , qiulaidongfeng A leak occurs when one or more goroutines are indefinitely blocked on synchronization primitives like channels, while other goroutines continue running and the program as a whole keeps functioning. Here's a simple example: If we call and don't read from the output channel, the inner goroutine will stay blocked trying to send to the channel for the rest of the program: Unlike deadlocks, leaks do not cause panics, so they are much harder to spot. Also, unlike data races, Go's tooling did not address them for a long time. Things started to change in Go 1.24 with the introduction of the package. Not many people talk about it, but is a great tool for catching leaks during testing. Go 1.26 adds a new experimental profile designed to report leaked goroutines in production. Here's how we can use it in the example above: As you can see, we have a nice goroutine stack trace that shows exactly where the leak happens. The profile finds leaks by using the garbage collector's marking phase to check which blocked goroutines are still connected to active code. It starts with runnable goroutines, marks all sync objects they can reach, and keeps adding any blocked goroutines waiting on those objects. When it can't add any more, any blocked goroutines left are waiting on resources that can't be reached — so they're considered leaked. Here's the gist of it: For even more details, see the paper by Saioc et al. If you want to see how (and ) can catch typical leaks that often happen in production — check out my article on goroutine leaks . The profile is experimental and can be enabled by setting at build time. Enabling the experiment also makes the profile available as a net/http/pprof endpoint, . According to the authors, the implementation is already production-ready. It's only marked as experimental so they can get feedback on the API, especially about making it a new profile. 𝗗 runtime/pprof • 𝗚 Detecting leaks • 𝗣 74609 , 75280 • 𝗖𝗟 688335 • 𝗔 Vlad Saioc New metrics in the package give better insight into goroutine scheduling: Here's the full list: Per-state goroutine metrics can be linked to common production issues. For example, an increasing waiting count can show a lock contention problem. A high not-in-go count means goroutines are stuck in syscalls or cgo. A growing runnable backlog suggests the CPUs can't keep up with demand. You can read the new metric values using the regular function: The per-state numbers (not-in-go + runnable + running + waiting) are not guaranteed to add up to the live goroutine count ( , available since Go 1.16). All new metrics use counters. 𝗗 runtime/metrics • 𝗣 15490 • 𝗖𝗟 690397 , 690398 , 690399 • 𝗔 Michael Knyszek The new and methods in the package return iterators for a type's fields and methods: The new methods and return iterators for the input and output parameters of a function type: The new methods and return iterators for a value's fields and methods. Each iteration yields both the type information ( or ) and the value: Previously, you could get all this information by using a for-range loop with methods (which is what iterators do internally): Using an iterator is more concise. I hope it justifies the increased API surface. 𝗗 reflect • 𝗣 66631 • 𝗖𝗟 707356 • 𝗔 Quentin Quaadgras The new method in the package returns the next N bytes from the buffer without advancing it: If returns fewer than N bytes, it also returns : The slice returned by points to the buffer's content and stays valid until the buffer is changed. So, if you change the slice right away, it will affect future reads: The slice returned by is only valid until the next call to a read or write method. 𝗗 Buffer.Peek • 𝗣 73794 • 𝗖𝗟 674415 • 𝗔 Ilia Choly After you start a process in Go, you can access its ID: Internally, the type uses a process handle instead of the PID (which is just an integer), if the operating system supports it. Specifically, in Linux it uses pidfd , which is a file descriptor that refers to a process. Using the handle instead of the PID makes sure that methods always work with the same OS process, and not a different process that just happens to have the same ID. Previously, you couldn't access the process handle. Now you can, thanks to the new method: calls a specified function and passes a process handle as an argument: The handle is guaranteed to refer to the process until the callback function returns, even if the process has already terminated. That's why it's implemented as a callback instead of a field or method. is only supported on Linux 5.4+ and Windows. On other operating systems, it doesn't execute the callback and returns an error. 𝗗 Process.WithHandle • 𝗣 70352 • 𝗖𝗟 699615 • 𝗔 Kir Kolyshkin returns a context that gets canceled when any of the specified signals is received. Previously, the canceled context only showed the standard "context canceled" cause: Now the context's cause shows exactly which signal was received: The returned type, , is based on , so it doesn't provide the actual value — just its string representation. 𝗗 signal.NotifyContext • 𝗖𝗟 721700 • 𝗔 Filippo Valsorda An IP address prefix represents an IP subnet. These prefixes are usually written in CIDR notation: In Go, an IP prefix is represented by the type. The new method lets you compare two IP prefixes, making it easy to sort them without having to write your own comparison code: orders two prefixes as follows: This follows the same order as Python's and the standard IANA (Internet Assigned Numbers Authority) convention. 𝗗 Prefix.Compare • 𝗣 61642 • 𝗖𝗟 700355 • 𝗔 database64128 The package has top-level functions for connecting to an address using different networks (protocols) — , , , and . They were made before was introduced, so they don't support cancellation: There's also a type with a general-purpose method. It supports cancellation and can be used to connect to any of the known networks: However, a bit less efficient than network-specific functions like — because of the extra overhead from address resolution and network type dispatching. So, network-specific functions in the package are more efficient, but they don't support cancellation. The type supports cancellation, but it's less efficient. The Go team decided to resolve this contradiction. The new context-aware methods ( , , , and ) combine the efficiency of the existing network-specific functions with the cancellation capabilities of : I wouldn't say that having three different ways to dial is very convenient, but that's the price of backward compatibility. 𝗗 net.Dialer • 𝗣 49097 • 𝗖𝗟 490975 • 𝗔 Michael Fraenkel The default certificate already lists in its DNSNames (a list of hostnames or domain names that the certificate is authorized to secure). Because of this, doesn't trust responses from the real : To fix this issue, the HTTP client returned by now redirects requests for and its subdomains to the test server: 𝗗 Server.Client • 𝗖𝗟 666855 • 𝗔 Sean Liao People often point out that using for plain strings causes more memory allocations than . Because of this, some suggest switching code from to when formatting isn't needed. The Go team disagrees. Here's a quote from Russ Cox: Using is completely fine, especially in a program where all the errors are constructed with . Having to mentally switch between two functions based on the argument is unnecessary noise. With the new Go release, this debate should finally be settled. For unformatted strings, now allocates less and generally matches the allocations for . Specifically, goes from 2 allocations to 0 allocations for a non-escaping error, and from 2 allocations to 1 allocation for an escaping error: This matches the allocations for in both cases. The difference in CPU cost is also much smaller now. Previously, it was ~64ns vs. ~21ns for vs. for escaping errors, now it's ~25ns vs. ~21ns. Here are the "before and after" benchmarks for the change. The non-escaping case is called , and the escaping case is called . If there's just a plain error string, it's . If the error includes formatting, it's . Seconds per operation: Bytes per operation: Allocations per operation: If you're interested in the details, I highly recommend reading the CL — it's perfectly written. 𝗗 fmt.Errorf • 𝗖𝗟 708836 • 𝗔 thepudds Previously, allocated a lot of intermediate memory as it grew its result slice to the size of the input data. Now, it uses intermediate slices of exponentially growing size, and then copies them into a final perfectly-sized slice at the end. The new implementation is about twice as fast and uses roughly half the memory for a 65KiB input; it's even more efficient with larger inputs. Here are the geomean results comparing the old and new versions for different input sizes: See the full benchmark results in the commit. Unfortunately, the author didn't provide the benchmark source code. Ensuring the final slice is minimally sized is also quite helpful. The slice might persist for a long time, and the unused capacity in a backing array (as in the old version) would just waste memory. As with the optimization, I recommend reading the CL — it's very good. Both changes come from thepudds , whose change descriptions are every reviewer's dream come true. 𝗗 io.ReadAll • 𝗖𝗟 722500 • 𝗔 thepudds The package, introduced in version 1.21, offers a reliable, production-ready logging solution. Since its release, many projects have switched from third-party logging packages to use it. However, it was missing one key feature: the ability to send log records to multiple handlers, such as stdout or a log file. The new type solves this problem. It implements the standard interface and calls all the handlers you set up. For example, we can create a log handler that writes to stdout: And another handler that writes to a file: Finally, combine them using a : I'm also printing the file contents here to show the results. When the receives a log record, it sends it to each enabled handler one by one. If any handler returns an error, doesn't stop; instead, it combines all the errors using : The method reports whether any of the configured handlers is enabled: Other methods — and — call the corresponding methods on each of the enabled handlers. 𝗗 slog.MultiHandler • 𝗣 65954 • 𝗖𝗟 692237 • 𝗔 Jes Cok Test artifacts are files created by tests or benchmarks, such as execution logs, memory dumps, or analysis reports. They are important for debugging failures in remote environments (like CI), where developers can't step through the code manually. Previously, the Go test framework and tools didn't support test artifacts. Now they do. The new methods , , and return a directory where you can write test output files: If you use with , this directory will be inside the output directory (specified by , or the current directory by default): As you can see, the first time is called, it writes the directory location to the test log, which is quite handy. If you don't use , artifacts are stored in a temporary directory which is deleted after the test completes. Each test or subtest within each package has its own unique artifact directory. Subtest outputs are not stored inside the parent test's output directory — all artifact directories for a given package are created at the same level: The artifact directory path normally looks like this: But if this path can't be safely converted into a local file path (which, for some reason, always happens on my machine), the path will simply be: (which is what happens in the examples above) Repeated calls to in the same test or subtest return the same directory. 𝗗 T.ArtifactDir • 𝗣 71287 • 𝗖𝗟 696399 • 𝗔 Damien Neil Over the years, the command became a sad, neglected bag of rewrites for very ancient Go features. But now, it's making a comeback. The new is re-implemented using the Go analysis framework — the same one uses. While and now use the same infrastructure, they have different purposes and use different sets of analyzers: By default, runs a full set of analyzers (currently, there are more than 20). To choose specific analyzers, use the flag for each one, or use to run all analyzers except the ones you turned off. For example, here we only enable the analyzer: And here, we enable all analyzers except : Currently, there's no way to suppress specific analyzers for certain files or sections of code. To give you a taste of analyzers, here's one of them in action. It replaces loops with or : If you're interested, check out the dedicated blog post for the full list of analyzers with examples. 𝗗 cmd/fix • 𝗚 go fix • 𝗣 71859 • 𝗔 Alan Donovan Go 1.26 is incredibly big — it's the largest release I've ever seen, and for good reason: All in all, a great release! You might be wondering about the package that was introduced as experimental in 1.25. It's still experimental and available with the flag. P.S. To catch up on other Go releases, check out the Go features by version list or explore the interactive tours for Go 1.25 and 1.24 . P.P.S. Want to learn more about Go? Check out my interactive book on concurrency a vector from array/slice, or a vector to array/slice. Arithmetic: , , , , . Bitwise: , , , , . Comparison: , , , , . Conversion: , , . Masking: , , . Rearrangement: . Collect live goroutines . Start with currently active (runnable or running) goroutines as roots. Ignore blocked goroutines for now. Mark reachable memory . Trace pointers from roots to find which synchronization objects (like channels or wait groups) are currently reachable by these roots. Resurrect blocked goroutines . Check all currently blocked goroutines. If a blocked goroutine is waiting for a synchronization resource that was just marked as reachable — add that goroutine to the roots. Iterate . Repeat steps 2 and 3 until there are no more new goroutines blocked on reachable objects. Report the leaks . Any goroutines left in the blocked state are waiting for resources that no active part of the program can access. They're considered leaked. Total number of goroutines since the program started. Number of goroutines in each state. Number of active threads. First by validity (invalid before valid). Then by address family (IPv4 before IPv6). Then by masked IP address (network IP). Then by prefix length. Then by unmasked address (original IP). Vet is for reporting problems. Its analyzers describe actual issues, but they don't always suggest fixes, and the fixes aren't always safe to apply. Fix is (mostly) for modernizing the code to use newer language and library features. Its analyzers produce fixes are always safe to apply, but don't necessarily indicate problems with the code. It brings a lot of useful updates, like the improved builtin, type-safe error checking, and goroutine leak detector. There are also many performance upgrades, including the new garbage collector, faster cgo and memory allocation, and optimized and . On top of that, it adds quality-of-life features like multiple log handlers, test artifacts, and the updated tool. Finally, there are two specialized experimental packages: one with SIMD support and another with protected mode for forward secrecy.