Debugging a rustc segfault on illumos
At Oxide , we use Helios as the base OS for the cloud computers we sell. Helios is a distribution of illumos , a Unix-based operating system descended from Solaris. As someone who learned illumos on the job, I’ve been really impressed by the powerful debugging tools it provides. I had a chance to use some of them recently to track down a segmentation fault in the Rust compiler, with the help of several of my colleagues. I learned a lot from the process, and I thought I’d write about it! I’m writing this post for an audience of curious technologists who aren’t necessarily familiar with systems work. If you’re an experienced systems developer, parts of it are likely familiar to you—feel free to skip over them. A couple of weeks ago, I wanted to make a change to the Rust standard library on illumos. I logged into my illumos box and cloned the Rust repository (revision ). Following the setup instructions , I configured the build system with the build profile. When I went to run , I saw an error with the following output: Quite concerning! Like any good technologist I tried running the command again. But the segfault seemed to be completely deterministic: the program would crash while compiling every time. Coincidentally, we had our fortnightly “Rust @ Oxide” virtual meetup at around that time. There wasn’t much to discuss there, so we turned that meeting into a debugging session. (I love how my coworkers get excited about debugging strange issues.) Like the compilers for many other languages, the Rust compiler is written in the language it is intending to compile (in this case, Rust). In other words, the Rust compiler is self-hosting . Any self-hosting compiler needs to answer the question: how in the world do you compile the compiler if you don’t already have a working compiler? This is known as the bootstrapping problem . There are several ways to address the problem, but the two most common are: Use the previous version of the compiler. In other words, use version N-1 of the compiler to compile version N. For example, use Rust 1.75 to compile Rust 1.76. The earliest versions of Rust were written in Ocaml. So if you’re spinning up Rust on a brand new platform and have an Ocaml compiler available, you can actually start from there and effectively create your own lineage of compilers. There are also implementations of Rust in other languages, like in C++, which can be used to build some (typically pretty old) version of the compiler. Interestingly, these other implementations don’t need to be perfect—for example, since they’re only used to compile code that’s known to be valid, they don’t need to handle errors well. That’s a large chunk of the complexity of a real compiler. Cross-compile from another platform. As a shortcut, if you have a way to cross-compile code from another platform, you can use that to set up the initial compiler. This is the most common method for setting up Rust on a new platform. (But note that method 1 must be used on at least one platform.) While bootstrapping from the previous version of Rust, the toolchain follows a series of stages , ranging from stage 0 to stage 2 . In our case, since we’re working with the standard library we’re only concerned with stage 0 : the standard library compiled with the previous version of . That is the build process that crashed. The first thing to find is the version of that’s crashing. There are a few ways to find the compiler, but a simple command works well: This command finds at . Let’s ask it for its version: Can the bug be reproduced independently of the Rust toolchain? The toolchain does all sorts of non-standard things, so it’s worth checking. The output says , so let’s try building that separately. Again, there are a few ways to do this, but the easiest is to make a simple Cargo project that depends on the crate. And then run . I didn’t have rustc 1.80.0 beta 1 on the machine, so I tried with the 1.80.0 release: Yep, it crashes in the same spot. This is a minimal-enough example, so let’s work with this. When a program crashes, systems are typically configured to generate a core dump , also known as a core file. The first step while debugging any crash is to ensure that core dumps are generated, and then to find one to examine it. On illumos, many of the system-level administration tools are called . The tool for managing core files is called . Let’s run that: This suggests that core “per-process core dumps” are enabled. The lack of a pattern indicates that the defaults are used. Generally, on Unix systems the default is to generate a file named in the current directory of the crashing process. A simple in our little test project doesn’t show a file, which means that it might be elsewhere. Let’s just do a global for it. This showed a few files on my system, including: . Bingo! That looks like a hit. (Why is it in the registry? Because when compiling a crate, Cargo sets the current working directory of the child process to the crate’s directory.) The next step is to move the file into another directory 1 . After doing that, let’s start examining it. The best way to examine a core file on illumos is with the Modular Debugger, . is a powerful tool that can be used to inspect the state of both live and dead processes, as well as the kernel itself. Using with the core file is simple: just run . The first step is to enable symbol demangling 2 . The command to do that in is , so let’s run that: (The output says “C++”, but illumos’s demangler can handle Rust symbols, too.) Let’s look at the CPU registers now. A register stores a small amount of data that the CPU can access very quickly. Core files typically have the contents of registers at the time of the crash, which can be very useful for debugging. In , the command to print out registers is or . Here’s the output: All right, there’s a lot going on here. A full accounting of the registers on x86-64 is beyond the scope of this post, but if you’re interested here’s a quick summary . The most important registers here are , , and . All three of these are 64-bit addresses. is the instruction pointer , also known as the program counter . is a special register that points to the next instruction to be executed. The CPU uses to keep track of where it is in the program. is the stack pointer . The call stack is a region of memory that is used to store function call information and local variables. The stack pointer points to the head of the stack. Note that on most architectures including x86-64, the stack grows down in memory: when a function is called, a new stack frame is set up and the stack pointer is decremented by however much space the function needs. is the base pointer , more commonly known as the frame pointer . It points to the base of the current stack frame 3 . We can also look at the call stack via the command. The stack turns out to be enormous ( full output ): (The is used to send the output to a shell command, in this case one that counts the number of lines.) It looks like the crash is in the parser. (Notably, the crash is while compiling a crate called , which suggests automatic code generation. Generated code often tends to stress the parser in ways that manually written code does not.) Based on the call stack, it looks like the parser is recursive in nature. A quick Google search confirms that the parser is a “simple hand-written recursive descent parser”. This isn’t surprising, since most production parsers are written this way. (For example, is also a recursive descent parser.) Turning our attention to the instruction pointer , we can use the command to disassemble the function at that address. ( Full output ; the flag ensures that addresses are not converted to very long function names.) So it looks like the crash is happening in a instruction to another function, . (Keep in mind that this information could be completely unreliable! The stack might be corrupted, the registers might be wrong, and so on. But it’s what we have for now.) On virtual memory systems , which includes all modern desktop and server systems, each process gets the illusion that it has a very large amount of memory all to itself. This is called the address space of a process. The instructions, the call stack, and the heap all get their own regions of addresses in that space, called memory mappings . The 64-bit addresses that we saw earlier are all part of the address space. has a command called to look up which part of memory an address is at. Let’s look at the stack pointer first: This tells us that the address is in the range to . This is a small 4 KiB range. What about the frame pointer? This appears to be in a different range. In this case, the ending address is (note the , not the !). This address is bytes away from the starting address. That is equal to 1028 KiB , or 1 MiB + 4 KiB page 4 . Something else that’s relevant here is what permissions each range of addresses has. Like files on Unix, a block of virtual memory can have read , write , or execute permissions. (In this case, execute means that it is valid for the instruction pointer to point here 5 .) On illumos, a tool called can show these spaces. works on both live processes and core files. Running shows the permissions for the addresses we’re interested in ( full output ): The 1028 KiB range is read-write, and the 4 KiB range above that doesn’t have any permissions whatsoever. This would explain the segfault . A segfault is an attempt to operate on a part of memory that the program doesn’t have permissions for. Attempting to read from or write to memory which has no permissions is an example of that. At this point, we have enough information to come up with a theory: But there are also other bits of evidence that this theory doesn’t explain, or even cuts against. (This is what makes post-mortem debugging exciting! There are often contradictory-seeming pieces of information that need to be explained.) The memory is marked or . That’s not how call stacks are supposed to be marked! In the output, there’s a line which says: So you’d expect call stacks to be marked with , not . Why is the size of the allocation 1028 KiB? You’d generally expect stack sizes to be a round power of two. Isn’t 1028 KiB kind of small? The thread is a non-main thread, and the default stack size for Rust threads is 2 MiB . Why is our thread ~1 MiB and not 2 MiB? On Unix platforms, for the main thread, the call stack size is determined by (in KiB). On my illumos machine, this printed , indicating a 10 MiB call stack. For child threads, the call stack size is determined by whatever created them. For Rust, the default is 2 MiB. Why doesn’t this crash happen on other platforms? If this is a crash in the parser, one would ordinarily expect it to arise everywhere. Yet it doesn’t seem to occur on Linux, macOS, or Windows. What’s special about illumos? Setting doesn’t help. Rust-created thread stack sizes can be configured via the environment variable . If we try to use that: It turns out that crashes at exactly the same spot. That’s really strange! It is possible that the stack size was overridden at thread creation time. The documentation for says: “Note that setting will override this.” But that seems unlikely. Looking towards the bottom of the call stack, there’s something really strange : Notice the jump in addresses from to ? Normally, stack addresses are decremented as new functions are called: the number goes down. In this case the stack address is incremented . The number went up. Strange. Also notice that this coincides with the use of a function called . Now that’s a real lead! What part of memory is in? says: So this address is part of the stack for thread 3. agrees : What is ? Time for some googling! Per the documentation , is: A library to help grow the stack when it runs out of space. This is an implementation of manually instrumented segmented stacks where points in a program’s control flow are annotated with “maybe grow the stack here”. Each point of annotation indicates how far away from the end of the stack it’s allowed to be, plus the amount of stack to allocate if it does reach the end. Because the parser is recursive, it is susceptible to call stack exhaustion. The use of is supposed to prevent, or at least mitigate, that. How does work? The library has a pretty simple API : The developer is expected to intersperse calls to within their recursive function. If less than bytes of stack space remain, will allocate a new segment of bytes, and run with the stack pointer pointing to the new segment. How does rustc use ? The code is in this file . The code requests an additional 1 MiB stack with a red zone of 100 KiB. Why did create a new stack segment? In our case, the call is at the very bottom of the stack, when plenty of space should be available, so ordinarily should not need to allocate a new segment. Why did it do so here? The answer is in ’s source code . There is code to guess the stack size on many platforms. But it isn’t enabled on illumos: always returns . With this information in hand, we can flesh out our call stack exhaustion theory: Some file in was triggering the crash by requiring more than 1 MiB of stack space. Had this bug occurred on other platforms like Linux, this issue would have been a showstopper. However, it wasn’t visible on those platforms because: didn’t call enough! In order for it to work, needs to be interspersed throughout the recursive code. But some recursive parts did not appear to have called it. (It is somewhat ironic that , a library meant to prevent call stack exhaustion, was actively making life worse here.) Where does the 1028 KiB come from? Looking at the source code : It looks like first computes the number of requested pages by dividing the requested stack size by the page size, rounding up. Then it adds 2 to that. In our case: This explains both the 1028 KiB allocation (one guard page after the stack), and the 4 KiB guard page we’re crashing at (one guard page before the stack). If the issue is that a 1 MiB stack isn’t enough, it should be possible to reproduce this on other platforms by setting their stack size to something smaller than the 2 MiB default. With a stack size <= 1 MiB, we would expect that: Let’s try to compile on Linux with a reduced stack size. This does crash as expected. The full output is here . Some of the symbols are missing, but the crash does seem to be in parser code. (At this point, we could have gone further and tried to make a debug-assertions build of – but it was already pretty clear why the crash was happening.) Call stack exhaustion in the parser suggests that the crash is happening in some kind of large, automatically generated file. But what file is it? It’s hard to tell by looking at the core file itself, but we have another dimension of debugging at hand: syscall tracers! These tools print out all the syscalls made by a process. Most OSes have some means to trace syscalls: on Linux, on macOS, Process Monitor on Windows, and on illumos 7 . Since we’re interested in file reads, we can try filtering it down to the and syscalls . You need to open a file to read it, after all. (Alternatively, we can also simply not filter out any syscalls, dump the entire trace to a file, and then look at it afterwards.) On illumos, we tell to run , filtering syscalls to and ( ), and following child processes ( ): This prints out every file that the child tries to open ( full output ): It looks like the crash is in a file called in the directory. With Cargo, a file being in an directory is a pretty strong indication that it is generated by a build script. On Linux, a similar command is: This command also blames the same file, . What does this file look like, anyway? Here’s my copy. It’s pretty big and deeply nested! It does look large and complex enough to trigger call stack exhaustion. Syscall traces would definitely be somewhat harder to get if the crash weren’t so easily reproducible. Someone smarter than me should write about how to figure this out using just the core file. The file’s fully loaded into memory so it seems like it should be possible. Going back to the beginning: the reason I went down this adventure was because I wanted to make an unrelated change to the Rust standard library. But the stage 0 compiler being broken meant that it was impossible to get to the point where I could build the standard library as-is, let alone test that change. How can we work around this? Well, going back to basics, where did the stage 0 compiler come from? It came from Rust’s CI, and it wasn’t actually built on illumos! (Partly because there’s no publicly-available CI system running illumos.) Instead, it was cross-compiled from Linux to illumos. Based on this, my coworker Joshua suggested that I try and do whatever Rust’s CI does to build a stage 0 compiler for illumos. Rust’s CI uses a set of Docker images to build distribution artifacts. In theory, building a patched rustc should be as simple as running these commands on my Linux machine: In reality, there were some Docker permissions issues due to which I had to make a couple of changes to the script. Overall, though, it was quite simple. Here’s the patch I built the compiler with, including the changes to the CI scripts. The result of building the compiler was a set of files, just like the ones published by Rust’s CI . After copying the files over to my illumos machine, I wasn’t sure which tarballs to extract. So I made a small change to the bootstrap script to use my patched tarballs. With this patch, I was able to successfully build Rust’s standard library on illumos and test my changes. Hooray! ( Here’s what I was trying to test.) Update 2024-08-05: After this post was published, jyn pointed out on Mastodon that is actually optional, and that I could have also worked around the issue by disabling it in the build system’s . Thanks! The bug occurred due to a combination of several factors. It also revealed a few other issues, such as the lack of an environment variable workaround and some missing error reporting. Here are some ways we can make the situation better, and help us have an easier time debugging similar issues in the future. isn’t using enough. The basic problem underneath it all is that the part of the parser that triggered the bug wasn’t calling often enough to make new stack segments. should be calling more than it is today. cannot detect the stack size on illumos. This is something that we should fix in , but this is actually a secondary issue here. On other platforms, ’s ability to detect the stack size was masking the bug. Fixing this requires two changes: -created segments don’t print a nice message on stack exhaustion. This is a bit ironic because is supposed to prevent stack exhaustion. But when it does happen, it would be nice if printed out a message like standard Rust does. On illumos, the Rust runtime doesn’t print a message on stack exhaustion. Separate from the previous point, on illumos the Rust runtime doesn’t print a message on stack exhaustion even when using native stacks. Rust’s CI doesn’t run on illumos. At Oxide, we have an existential dependency on Rust targeting illumos. Even a shadow CI that ran on nightly releases would have caught this issue right away. We’re discussing the possibilities for this internally; stay tuned! segment sizes can’t be controlled via the environment. Being able to control stack sizes with is a great way to work around issues. It doesn’t appear that segment sizes can be controlled in this manner. Maybe that functionality should be added to , or to itself? Maybe a crater run with a smaller stack size? It would be interesting to see if there are other parts of the Rust codebase that need to call more as well. suggests disabling optional components. Since was an optional component that can be disabled, the tooling could notice if a build failed in such a component, and recommend disabling that component. Added 2024-08-05, suggested by jyn . To me, this is the most exciting part of debugging: what kinds of changes can we make, both specific and systemic ones, to make life easier for our future selves? This was a really fun debugging experience because I got to learn about several illumos debugging tools, and also because we could synthesize information from several sources to figure out a complex issue. (Thankfully, the root cause was straightforward, with no memory corruption or other “spooky action at a distance” involved.) Debugging this was a real team effort. I couldn’t have done it without the assistance of several of my exceptional colleagues. In no particular order: Thanks to all of you! I neglected to do this during my own debugging session, which led to some confusion when I re-ran the process and found that the core file had been overwritten. ↩︎ Name mangling is a big topic of its own, but the short version is that the Rust compiler uses an algorithm to encode function names into the binary. The encoding is designed to be reversible, and the process of doing so is called demangling. (Other languages like C++ do name mangling, too.) ↩︎ You might have heard about “frame pointer omission”, which is a technique to infer the base of stack frames rather than storing it in explicitly. In this case, the frame pointer is not omitted. ↩︎ A page is the smallest amount of physical memory that can be atomically mapped to virtual memory. On x86-64, the page size is virtually always 4 KiB. ↩︎ Memory being both writable and executable is dangerous, and modern systems do not permit this by default for security reasons. Some platforms like iOS even make it impossible for memory to be writable and executable, unless the platform holder gives you the corresponding permissions. ↩︎ This is generally known as a “stack overflow”, but that term can also mean a stack-based buffer overflow . Throughout this document, we use “call stack exhaustion” to avoid confusion. ↩︎ There is likely some way to get itself to print out which files it opened, but the beauty of system call tracers is that you don’t need to know anything about the program you’re tracing. ↩︎ Use the previous version of the compiler. In other words, use version N-1 of the compiler to compile version N. For example, use Rust 1.75 to compile Rust 1.76. From where do you begin, though? The earliest versions of Rust were written in Ocaml. So if you’re spinning up Rust on a brand new platform and have an Ocaml compiler available, you can actually start from there and effectively create your own lineage of compilers. There are also implementations of Rust in other languages, like in C++, which can be used to build some (typically pretty old) version of the compiler. Interestingly, these other implementations don’t need to be perfect—for example, since they’re only used to compile code that’s known to be valid, they don’t need to handle errors well. That’s a large chunk of the complexity of a real compiler. Cross-compile from another platform. As a shortcut, if you have a way to cross-compile code from another platform, you can use that to set up the initial compiler. This is the most common method for setting up Rust on a new platform. (But note that method 1 must be used on at least one platform.) is the instruction pointer , also known as the program counter . is a special register that points to the next instruction to be executed. The CPU uses to keep track of where it is in the program. is the stack pointer . The call stack is a region of memory that is used to store function call information and local variables. The stack pointer points to the head of the stack. Note that on most architectures including x86-64, the stack grows down in memory: when a function is called, a new stack frame is set up and the stack pointer is decremented by however much space the function needs. is the base pointer , more commonly known as the frame pointer . It points to the base of the current stack frame 3 . The thread had a call stack of 1028 KiB available to it, starting at . The call stack pointer was at (only = 320 bytes away), and it tried to create a frame of size (1312) bytes, at . This caused the call stack to be exhausted : the thread ran out of space 6 . When the thread ran out of space, it indexed into a 4 KiB section known as a guard page . The thread did not have any permissions to operate on the page, and was in fact designed to cause a segfault if accessed in any way. The program then (correctly) segfaulted. The memory is marked or . That’s not how call stacks are supposed to be marked! In the output, there’s a line which says: So you’d expect call stacks to be marked with , not . Why is the size of the allocation 1028 KiB? You’d generally expect stack sizes to be a round power of two. Isn’t 1028 KiB kind of small? The thread is a non-main thread, and the default stack size for Rust threads is 2 MiB . Why is our thread ~1 MiB and not 2 MiB? How are call stack sizes determined? On Unix platforms, for the main thread, the call stack size is determined by (in KiB). On my illumos machine, this printed , indicating a 10 MiB call stack. For child threads, the call stack size is determined by whatever created them. For Rust, the default is 2 MiB. Why doesn’t this crash happen on other platforms? If this is a crash in the parser, one would ordinarily expect it to arise everywhere. Yet it doesn’t seem to occur on Linux, macOS, or Windows. What’s special about illumos? Setting doesn’t help. Rust-created thread stack sizes can be configured via the environment variable . If we try to use that: It turns out that crashes at exactly the same spot. That’s really strange! It is possible that the stack size was overridden at thread creation time. The documentation for says: “Note that setting will override this.” But that seems unlikely. Some file in was triggering the crash by requiring more than 1 MiB of stack space. The parser running against needed more than 1 MiB of stack space, but less than 2 MiB. Had this bug occurred on other platforms like Linux, this issue would have been a showstopper. However, it wasn’t visible on those platforms because: Threads created by Rust use a 2 MiB stack by default. requested that create a 1 MiB stack segment, but only if less than 100 KiB of stack space was left. On the other platforms, could see that well over 100 KiB of stack space was left, and so it did not allocate a new segment. On illumos, could not see how much stack was left, and so it allocated a new 1 MiB segment. This 1 MiB stack was simply not enough to parse . didn’t call enough! In order for it to work, needs to be interspersed throughout the recursive code. But some recursive parts did not appear to have called it. The requested stack size is 1 MiB. With 4 KiB pages, this works out to 256 pages. then requests 256 + 2 = 258 pages, which is 1032 KiB. calls as before. There are two possibilities: either decides there is enough stack space and doesn’t create a new segment, or it decides there isn’t enough and does create a new 1 MiB segment. In either case, 1 MiB is simply not enough to parse , and the program crashes. isn’t using enough. The basic problem underneath it all is that the part of the parser that triggered the bug wasn’t calling often enough to make new stack segments. should be calling more than it is today. Filed as rust-lang/rust#128422 . cannot detect the stack size on illumos. This is something that we should fix in , but this is actually a secondary issue here. On other platforms, ’s ability to detect the stack size was masking the bug. Fixing this requires two changes: A PR to to add the function to it. A PR to to use this function to detect the stack size on illumos. -created segments don’t print a nice message on stack exhaustion. This is a bit ironic because is supposed to prevent stack exhaustion. But when it does happen, it would be nice if printed out a message like standard Rust does. This is rust-lang/stacker#59 . On illumos, the Rust runtime doesn’t print a message on stack exhaustion. Separate from the previous point, on illumos the Rust runtime doesn’t print a message on stack exhaustion even when using native stacks. Filed as rust-lang/rust#128568 . Rust’s CI doesn’t run on illumos. At Oxide, we have an existential dependency on Rust targeting illumos. Even a shadow CI that ran on nightly releases would have caught this issue right away. We’re discussing the possibilities for this internally; stay tuned! segment sizes can’t be controlled via the environment. Being able to control stack sizes with is a great way to work around issues. It doesn’t appear that segment sizes can be controlled in this manner. Maybe that functionality should be added to , or to itself? Opened a discussion on internals.rust-lang.org . Maybe a crater run with a smaller stack size? It would be interesting to see if there are other parts of the Rust codebase that need to call more as well. suggests disabling optional components. Since was an optional component that can be disabled, the tooling could notice if a build failed in such a component, and recommend disabling that component. Added 2024-08-05, suggested by jyn . Joshua M. Clulow Matt Keeter Cliff Biffle Steve Klabnik artemis everfree I neglected to do this during my own debugging session, which led to some confusion when I re-ran the process and found that the core file had been overwritten. ↩︎ Name mangling is a big topic of its own, but the short version is that the Rust compiler uses an algorithm to encode function names into the binary. The encoding is designed to be reversible, and the process of doing so is called demangling. (Other languages like C++ do name mangling, too.) ↩︎ You might have heard about “frame pointer omission”, which is a technique to infer the base of stack frames rather than storing it in explicitly. In this case, the frame pointer is not omitted. ↩︎ A page is the smallest amount of physical memory that can be atomically mapped to virtual memory. On x86-64, the page size is virtually always 4 KiB. ↩︎ Memory being both writable and executable is dangerous, and modern systems do not permit this by default for security reasons. Some platforms like iOS even make it impossible for memory to be writable and executable, unless the platform holder gives you the corresponding permissions. ↩︎ This is generally known as a “stack overflow”, but that term can also mean a stack-based buffer overflow . Throughout this document, we use “call stack exhaustion” to avoid confusion. ↩︎ There is likely some way to get itself to print out which files it opened, but the beauty of system call tracers is that you don’t need to know anything about the program you’re tracing. ↩︎