Latest Posts (10 found)
Pat Shaughnessy 1 weeks ago

Compiling Ruby To Machine Language

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished. Here’s an excerpt from the completely new content for Chapter 4, about YJIT and ZJIT. I’m still finishing this up… so this content is fresh off the page! It’s been a lot of fun for me to learn about how JIT compilers work and to brush up on my Rust skills as well. And it’s very exciting to see all the impressive work the Ruby team at Shopify and other contributors have done to improve Ruby’s runtime performance. To find hot spots, YJIT counts how many times your program calls each function or block. When this count reaches a certain threshold, YJIT stops your program and converts that section of code into machine language. Later Ruby will execute the machine language version instead of the original YARV instructions. To keep track of these counts, YJIT saves an internal counter nearby the YARV instruction sequence for each function or block. Figure 4-5 shows the YARV instruction sequence the main Ruby compiler created for the sum += i block at (3) in Listing 4-1. At the top, above the YARV instructions, Figure 4-5 shows two YJIT related values: jit_entry and jit_entry_calls . As we’ll see in a moment, jit_entry starts as a null value but will later hold a pointer to the machine language instructions YJIT produces for this Ruby block. Below jit_entry , Figure 4-5 also shows jit_entry_calls , YJIT’s internal counter. Each time the program in Listing 4-1 calls this block, YJIT increments the value of jit_entry_calls . Since the range at (1) in Listing 4-1 spans from 1 through 40, this counter will start at zero and increase by 1 each time Range#each calls the block at (3). When the jit_entry_calls reaches a particular threshold, YJIT will compile the YARV instructions into machine language. By default for small Ruby programs YJIT in Ruby 3.5 uses a threshold of 30. Larger programs, like Ruby on Rails web applications, will use a larger threshold value of 120. (You can also change the threshold by passing —yjit-call-threshold when you run your Ruby program.) While compiling your Ruby program, YJIT saves the machine language instructions it creates into YJIT blocks . YJIT blocks, which are distinct from Ruby blocks, each contain a sequence of machine language instructions for a range of corresponding YARV instructions. By grouping YARV instructions and compiling each group into a YJIT block, YJIT can produce more optimized code that is tailored to your program’s behavior and avoid compiling code that your program doesn’t need. As we’ll see next, a single YJIT block doesn’t correspond to a Ruby function or block. YJIT blocks instead represent smaller sections of code: individual YARV instructions or a small range of YARV instructions. Each Ruby function or block typically consists of several YJIT blocks. Let’s see how this works for our example. After the program in Listing 4-1 executes the Ruby block at (3) 29 times, YJIT will increment the jit_entry_calls counter again, just before Ruby runs the block for the 30th time. Since jit_entry_calls reaches the threshold value of 30, YJIT triggers the compilation process. YJIT compiles the first YARV instruction getlocal_WC_1 and saves machine language instructions that perform the same work as getlocal_WC_1 into a new YJIT block: On the left side, Figure 4-6 shows the YARV instructions for the sum += i Ruby block. On the right, Figure 4-6 shows the new YJIT block corresponding to getlocal_WC_1 . Next, the YJIT compiler continues and compiles the second YARV instruction from the left side of Figure 4-7: getlocal_WC_0 at index 2. On the left side, Figure 4-7 shows the same YARV instructions for the sum += i Ruby block that we saw above in Figure 4-6. But now the two dotted arrows indicate that the YJIT block on the right contains the machine language instructions equivalent to both getlocal_WC_1 and getlocal_WC_0 . Let’s take a look inside this new block. YJIT compiles or translates the Ruby YARV instructions into machine language instructions. In this example, running on my Mac laptop, YJIT writes the following machine language instructions into this new block: Figure 4-8 shows a closer view of the new YJIT block that appeared on the right side of Figures 4-6 and 4-7. Inside the block, Figure 4-8 shows the assembly language acronyms corresponding to the ARM64 machine language instructions that YJIT generated for the two YARV instructions shown on the left. The YARV instructions on the left are: getlocal_WC_1 , which loads a value from a local variable located in the previous stack frame and saves it on the YARV stack, and getlocal_WC_0 , which loads a local variable from the current stack from and also saves it on the YARV stack. The machine language instructions on the right side of Figure 4-8 perform the same task, loading these values into registers on my M1 microprocessor: x1 and x9 . If you’re curious and would like to learn more about what the machine language instructions mean and how they work, the section “Adding Two Integers Using Machine Language” discusses the instructions for this example in more detail. Next, YJIT continues down the sequence of YARV instructions and compiles the opt_plus YARV instruction at index 4 in Figures 4-6 and 4-7. But this time, YJIT runs into a problem: It doesn’t know the type of the addition arguments. That is, will opt_plus add two integers? Or two strings, floating point numbers, or some other types? Machine language is very specific. To add two 64-bit integers on an M1 microprocessor, YJIT could use the adds assembly language instruction. But adding two floating pointer numbers would require different instructions. And, of course, adding or concatenating two strings is an entirely different operation altogether. In order for YJIT to know which machine language instructions to save into the YJIT block for opt_plus , YJIT needs to know exactly what type of values the Ruby program might ever add at (3) in Listing 4-1. You and I can tell by reading Listing 4-1 that the Ruby code is adding integers. We know right away that the sum += 1 block at (3) is always adding one integer to another. But YJIT doesn’t know this. YJIT uses a clever trick to solve this problem. Instead of analyzing the entire program ahead of time to determine all of the possible types of values the opt_plus YARV instruction might ever need to add, YJIT simply waits until the block runs and observes which types the program actually passes in. YJIT uses branch stubs to achieve this wait-and-see compile behavior, as shown in Figure 4-9. Figure 4-9 shows the YARV instructions on the left, and the YJIT block for indexes 0000-0002 on the right. But note the bottom right corner of Figure 4-7, which shows an arrow pointing down from the block to a box labeled stub. This arrow represents a YJIT branch. Since this new branch doesn’t point to a block yet, YJIT sets up the branch to point to a branch stub instead.

0 views
Pat Shaughnessy 2 weeks ago

YARV’s Internal Stack and Your Ruby Stack

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished. The content of Chapter 3, about the YARV virtual machine, hasn't changed much since 2014. However, I did update all of the diagrams to account for some new values YARV now saves inside of each stack frame. And some of the common YARV instructions were renamed as well. I also moved some content that was previously part of Chapter 4 here into Chapter 3. Right now I'm rewriting Chapter 4 from scratch, describing Ruby's new JIT compilers. As we’ll see in a moment, YARV uses a stack internally to track intermediate values, arguments, and return values. YARV is a stack-oriented virtual machine. In addition to its own internal stack, YARV keeps track of your Ruby program’s call stack , recording which methods call which other methods, functions, blocks, lambdas, and so on. In fact, YARV is not just a stack machine—it’s a double-stack machine! It has to track the arguments and return values not only for its own internal instructions but also for your Ruby program. Figure 3-1 shows YARV’s basic registers and internal stack. YARV’s internal stack is on the left. The SP label is the stack pointer, or the location of the top of the stack. On the right are the instructions that YARV is executing. PC is the program counter, or the location of the current instruction. You can see the YARV instructions that Ruby compiled from the puts 2+2 example on the right side of Figure 3-1. YARV stores both the SP and PC registers in a C structure called rb_control_frame_t , along with the current value of Ruby’s self variable and some other values not shown here. At the same time, YARV maintains another stack of these rb_control_frame_t structures, as shown in Figure 3-2. This second stack of rb_control_frame_t structures represents the path that YARV has taken through your Ruby program, and YARV’s current location. In other words, this is your Ruby call stack—what you would see if you ran puts caller . The CFP pointer indicates the current frame pointer. Each stack frame in your Ruby program stack contains, in turn, a different value for the self, PC, and SP registers, as shown in Figure 3-1. Ruby also keeps track of type of code running at each level in your Ruby call stack, indicated by the “[BLOCK]”, “[METHOD]” notation in Figure 3-2. In order to help you understand this a bit better, here are a couple of examples. I’ll begin with the simple 2+2 example from Chapters 1 and 2, shown again in Listing 3-1. This one-line Ruby script doesn’t have a Ruby call stack, so I’ll focus on the internal YARV stack for now. Figure 3-3 shows how YARV will execute this script, beginning with the first instruction, putself . As you can see in Figure 3-3, YARV starts the program counter (PC) at the first instruction, and initially the stack is empty. Now YARV executes the putself instruction, and pushes the current value of self onto the stack, as shown in Figure 3-4. Because this simple script contains no Ruby objects or classes, the self pointer is set to the default top self object. This is an instance of the Object class that Ruby automatically creates when YARV starts. It serves as the receiver for method calls and the container for instance variables in the top-level scope. The top self object contains a single, predefined to_s method, which returns the string “main.” You can call this method by running the following command in the console: YARV will use this self value on the stack when it executes the opt_send_without_block instruction: self is the receiver of the puts method because I didn’t specify a receiver for this method call. Next, YARV executes putobject 2 . It pushes the numeric value 2 onto the stack and increments the PC again, as shown in Figure 3-5. This is the first step of the receiver (arguments) operation pattern described in “How Ruby Compiles a Simple Script” on page 34. First, Ruby pushes the receiver onto the internal YARV stack. In this example, the Fixnum object 2 is the receiver of the message/method + , which takes a single argument, also a 2. Next, Ruby pushes the argument 2, as shown in Figure 3-6. Finally, Ruby executes the + operation. In this case, opt_plus is an optimized instruction that will add two values: the receiver and the argument, as shown in Figure 3-7. As you can see in Figure 3-7, the opt_plus instruction leaves the result, 4, at the top of the stack. Now Ruby is perfectly positioned to execute the puts function call: The receiver self is first on the stack, and the single argument, 4, is at the top of the stack. (I’ll describe how method lookup works in Chapter 6.) Next, Figure 3-8 shows what happens when Ruby executes the puts method call. As you can see, the opt_send_without_block instruction leaves the return value, nil , at the top of the stack. Finally, Ruby executes the last instruction, leave , which finishes the execution of our simple, one-line Ruby program. Of course, when Ruby executes the puts call, the C code implementing the puts function will actually display the value 4 in the console output.

0 views
Pat Shaughnessy 3 weeks ago

Compiling a Call to a Block

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished. This week's excerpt is from Chapter 2, about Ruby's compiler. Whenever I think about it, I'm always suprised that Ruby has a compiler like C, Java or any other programming language. The only difference is that we don't normally interact with Ruby's compiler directly. The developers who contributed Ruby's new parser, Prism, also had to rewrite the Ruby compiler because Prism now produces a completely different, redesigned abstract syntax tree (AST). Chapter 2's outline is more or less the same as it was in 2014, but I redrew all of the diagrams and updated much of the text to match the new AST nodes and other changes for Prism. Next, let’s compile my 10.times do example from Listing 1-1 in Chapter 1 (see Listing 2-2). Notice that this example contains a block parameter to the times method. This is interesting because it will give us a chance to see how the Ruby compiler handles blocks. Figure 2-13 shows the AST for the 10.times do example again. The left side of Figure 2-13 shows the AST for the 10.times function call: the call node and the receiver 10, represented by integer node. On the right, Figure 2-13 shows the beginning of the AST for the block: do |n| puts n end , represented by the block node. You can see Ruby has added a scope node on both sides, since there are two lexical scopes in Listing 2-2: the top level and the block. Let’s break down how Ruby compiles the main portion of the script shown on the left of Figure 2-13. As before, Ruby starts with the first PM_NODE_SCOPE and creates a new snippet of YARV instructions, as shown in Figure 2-14. Next, Ruby steps down the AST nodes to PM_CALL_NODE, as shown in Figure 2-15. At this point, there is still no code generated, but notice in Figure 2-13 that two arrows lead from PM_CALL_NODE : one to PM_INTEGER_NODE , which represents the 10 in the 10.times call, and another to the inner block. Ruby will first continue down the AST to the integer node and compile the 10.times method call. The resulting YARV code, following the same receiver-arguments-message pattern we saw in Figures 2-7 through 2-11, is shown in Figure 2-16. Notice that the new YARV instructions shown in Figure 2-16 push the receiver (the integer object 10) onto the stack first, after which Ruby generates an instruction to execute the times method call. But notice, too, the block in <main> argument in the send instruction. This indicates that the method call also contains a block argument: do |n| puts n end . In this example, the arrow from PM_CALL_NODE to the second PM_SCOPE_NODE has caused the Ruby compiler to include this block argument. Ruby continues by compiling the inner block, beginning with the second PM_CALL_NODE shown at right in Figure 2-13. Figure 2-17 shows what the AST for that inner block looks like. Notice Ruby inserted a scope node at the top of this branch of the AST also. Figure 2-17 shows the scope node contains two values: argc=1 and locals: [n] . These values were empty in the parent scope node, but Ruby set them here to indicate the presence of the block parameter n . From a relatively high level, Figure 2-18 shows how Ruby compiles the inner block. You can see the parent PM_NODE_SCOPE at the top, along with the YARV code from Figure 2-16. And below that Figure 2-18 shows the the inner scope node for the block, along with the YARV instructions for the block’s call to puts n . Later in this chapter we’ll learn how Ruby handles parameters and local variables, like n in this example; why Ruby generates these instructions for puts n . The key point for now is that Ruby compiles each distinct scope in your Ruby program—methods, blocks, classes, or modules, for example—into a separate snippet of YARV instructions.

0 views
Pat Shaughnessy 1 months ago

Parsing: How Ruby Understands Your Code

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished. Update : I’ve made a lot of progress so far this year. I had time to completely rewrite Chapters 1 and 2, which cover Ruby’s new Prism parser and the Ruby compiler which now handles the Prism AST. I also updated Chapter 3 about YARV and right now I’m working on rewriting Chapter 4 which will cover YJIT and possibly other Ruby JIT compilers. Here’s an excerpt from the new version of Chapter 1. Many thanks to Kevin Newton, who reviewed the content about Prism and had a number of corrections and great suggestions. Also thanks to Douglas Eichelberger who had some great feedback as well. I’ll post more excerpts from Chapters 2, 3 and 4 in the coming weeks. Thanks for everyone’s interest in Ruby Under a Microscope! Once Ruby converts your code into a series of tokens, what does it do next? How does it actually understand and run your program? Does Ruby simply step through the tokens and execute each one in order? No. Your code still has a long way to go before Ruby can run it. The next step on its journey through Ruby is called parsing , where words or tokens are grouped into sentences or phrases that make sense to Ruby. When parsing, Ruby takes into account the order of operations, methods, blocks, and other larger code structures. Ruby’s parsing engine defines Ruby’s syntax rules. Reading in tokens, Ruby matches the token types and the order the tokens appear with a large series of patterns. These patterns, indeed, are the heart and soul of the Ruby language. How we write a function call, how we define a method using the def keyword, how we write classes and modules - the patterns Ruby looks for define the language. Ruby’s parse algorithm has three high level steps: Let’s break down these ideas further, by following Ruby through the “Hello World” program. Afterwards, we’ll look at a second, slightly more complicated example. As we saw in the previous section, Ruby first converts the text in this code file into tokens. For Hello World, Ruby’s tokenizer produces these five tokens: To make the following diagrams simpler, let’s redraw these tokens in a more compact format: Using a single gray line of text, Figure 1-15 shows the five tokens from Figure 1-14 in a more compact format. First, PM_TOKEN_IDENTIFIER represents the word “puts” from the beginning of the program. Next, three tokens make up the string literal value: PM_TOKEN_STRING_BEGIN for the first double quote, followed by PM_TOKEN_STRING_VALUE for the words Hello and World, and PM_TOKEN_STRING_END represents the second quote. Finally, the program ends with PM_TOKEN_EOF to mark the end of the source code file. Now let’s follow Ruby as it processes the Hello World example using the three steps: identify, recurse and compare. First, identify . How does Ruby understand what the first token, PM_TOKEN_IDENTIFIER , means? Figure 1-16 represents the state of Ruby’s parser when it starts to parse this code. At this moment, Ruby is just getting started by inspecting the puts identifier. One of the patterns Ruby looks for matches the identifier; but what does this identifier mean? Ruby knows puts could be a local variable, or it could be the name of a function to call. Since there are no local variables defined in this program, Ruby determines that the puts identifier represents a function the program is calling. (It’s also possible that the program is about to create a new local variable like this: puts = "Hello World" . If that were the case, Ruby would see the assignment operator next and parse things differently.) What happens next? After matching the token to the function call pattern, Ruby records this match in a data structure called an abstract syntax tree (AST). Ruby and most other programming languages use ASTs to record the results of parsing tokens like this. As we’ll see, the AST’s tree structure is well suited for holding the nested, recursive structure of computer programs. Figure 1-17 shows the first node Ruby saves in the AST tree. In a moment, Ruby will begin to add more nodes to the AST. Before proceeding to the next token, let’s imagine the syntax pattern for a function call: Although in Ruby the parentheses are optional, so this pattern also applies: NOTE The original version of the Ruby parser used patterns or grammar rules like this directly with a tool called a parser generator. However, starting with Ruby 3.3, Ruby uses a new parser called Prism, which detects these patterns directly using hand written C code. After parsing the first token, Ruby inspects the second token. According to the function call pattern, Ruby knows the second token might represent the first argument to the function call. But, how many arguments are there? And what is each argument? The program in Listing 1-11 is very simple, but it could have instead printed a very complex expression - the arguments to puts could have run on for many lines and used hundreds of tokens. Second, recurse . To parse each of the arguments to puts, Ruby has to call itself. Figure 1-18 shows two levels of the Ruby parser’s call stack; the top line shows Ruby parsing the puts identifier token, and matching the function call pattern. The second line shows how Ruby called itself to parse the second token, PM_TOKEN_STRING_BEGIN , the leading quote of the string literal. Think of these lines as the backtrace of the Ruby parser. Figure 1-18 also shows a value 14 on the right side. While calling itself recursively, Ruby passes in a numeric value called the binding power . We’ll return to this later. Now that Ruby has called itself, Ruby starts the 3-step process all over again: identify, recurse and compare. This time, Ruby has to identify what the PM_TOKEN_STRING_BEGIN token means. This token always indicates the start of a string value. In this example PM_TOKEN_STRING_BEGIN represents the double quote that appears after puts . But the same token might represent a single quote or one of the other ways you can write a string in Ruby, for example using %Q or %q . Ruby’s new parser, Prism, next parses the string contents directly by processing the following two tokens: In this example, Ruby’s parser is done after finding the PM_TOKEN_STRING_END token and can continue to the next step. More complex strings - strings that contain interpolated values using #{} for example - might have required Ruby to call itself yet again to process more nested expressions. But for the simple "Hello World" string Ruby is done. To record the string value, Ruby creates a new AST node called PM_STRING_NODE . Figure 1-20 shows two AST nodes Ruby has created so far: the call node created earlier, and now a new string node. Ruby’s parser is a recursive descent parser . This Computer Science term describes parsers that resemble the grammar or syntax rules of the programs they parse, and call themselves recursively in a top-down manner as they process nested structures. Many modern programming languages today use this general approach. Identify : First, Ruby identifies what the next token represents. Ruby does this by comparing the token’s type - and possibly the types of the following tokens - with a large series of patterns. If one pattern matches, Ruby understands what your code means. If not, Ruby emits a syntax error. Recurse : Secondly, Ruby calls itself. Each value in one of the syntax patterns can itself be a subexpression - a smaller program that Ruby needs to parse. To do this, Ruby calls itself recursively. Compare : Third, Ruby compares the current token with the next token to determine which has a higher precedence. This comparison leads Ruby down a specific path, processing the tokens in a certain order.

0 views
Pat Shaughnessy 9 months ago

Write Barriers

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished. Ruby’s garbage collection implementation is complex and confusing - yet powerful and beautiful at the same time. Chapter 13 summarizes and explains advanced aspects of Ruby’s garbage collector. I love learning about the difficult bits at a deep enough level to be able to explain them to someone else. In RUM I often show snippets from Ruby's C source code in gray callouts. These sections are for readers who already understand C syntax, or who would like to learn it. These can give you sign posts in case you ever decide to read Ruby’s source code for yourself. Write barriers warn Ruby’s garbage collection algorithm whenever your program creates a new object that might have to be marked. You can think of the barriers as small boxes that surround arrays, hashes and other data structures you might have in your program. For example, in Listing 13-1 Ruby places a write barrier around arr , the array that contains all of the new objects: Figure 13-9 repeats Figure 13-6, which showed the array arr , just after Ruby removed it from the mark stack, along with all of the elements of arr , still on the stack. However, on the left the dotted rectangle represents a writer barrier around this array. (In reality, there’s nothing special about this array; Ruby uses write barriers for all arrays, hashes and other data structures.) The write barrier, as the name implies, jumps into action whenever your program writes to the array inside. In this example, the barrier will be triggered when the program runs the line of code at (3) in Listing 13-1 inside the loop: If your program was running between incremental GC steps when Ruby added a new element to the array, Ruby would move the array back on to the mark stack: Figure 13-10 shows the array arr on the left, inside of a write barrier. The line of code just above, arr.push , triggers the write barrier because it writes to the array’s contents. This triggers the write barrier, which moves the array arr back on to the mark stack, shown on the right. Now during the next GC step, Ruby will process the array’s children again, even if it had processed them earlier. This allows Ruby to find and mark the new object just added to the array. The mark stack is how Ruby remembers its place inside of the GC algorithm, between one incremental GC and the next. Whenever an incremental GC step starts, Ruby continues to mark the objects present on the mark stack that were pending from last time or that were modified in the meantime . Whenever your program modifies an array, hash or other value protected by write barriers, Ruby calls this C code internally. at (1) in Listing 13-2 pushes a modified object back on to the mark stack, as shown above in Figure 13-10: Ruby first checks at (2) whether it is currently in the midst of an incremental garbage collection. If it is, Ruby continues to the line at (3), and checks whether the given object was already completely processed: whether Ruby marked it and all of its children. (Recall the non-inclusive term “black” from the tricolor GC algorithm.) For example, in Figure 13-9 the value was completed processed, since Ruby marked it and also moved all of its children on to the mark stack, removing from the mark stack. If Ruby did already mark the value, and if Ruby already pushed the value’s children on to the mark stack, then Ruby knows it needs to process the value again - since your program modified it. In this case, Ruby continues on to the line at (4) and calls , which pushes the value back on to the mark stack. Later Ruby will iterate over its children again, pushing each of them back on to the mark stack. Looking ahead to the next section, Generational GC on page 22, write barriers use the code in the else clause at (5) to handle “old” values. We’ll cover Generational GC next. But first, Experiment 13-1 will take a look at incremental GC in action.

0 views
Pat Shaughnessy 9 months ago

Using Different Size Pools

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished. The Ruby team has done a tremendous amount of work over the past decade on Ruby's garbage collection (GC) implementation. In fact, Ruby's new GC is one of the key reasons Ruby 3 is so much faster than Ruby 2. To bring all of this work to light, I decided to rewrite Chapter 12 from scratch, covering garbage collection in Ruby more accurately and in more depth. But then, after a few months, I realized I had gotten carried away and wrote too much material for one chapter. So the updated book will have two new chapters on garbage collection: one on garbage collection basics and a second new chapter on incremental and generational garbage collection. Here's a small excerpt. Ruby 3.2 and later provides a way to see statistics about size pools, the GC.stat_heap class method. GC.stat_heap returns a hash as shown in Listing 12-5. Listing 12-5 shows the output of GC.stat_heap for Size Pool 0, which includes the slot size ( :slot_size=>40 ), the number of pages Ruby has allocated so far for this size ( :heap_eden_pages=>10 ) and the total number of slots allocated across all of these pages ( :heap_eden_slots=>16374 ). The output of GC.stat_heap continues on to show the same statistics for the other size pools. We can use GC.stat_heap to investigate how Ruby uses size pools as we allocate more and more objects. Listing 12-6 shows a Ruby program that allocates arrays of varying sizes, and then records the output from GC.stat_heap. This program contains an inner loop and an outer loop. The outer loop at (3) in Listing 12-6 iterates over arrays of different capacities, from 1 up to 100 ( CAPACITY_ITER ). For each array capacity, the program creates 10,000 ( ALLOCATE_ITER ) array objects of that size using the inner loop (4). Note the program saves all of the new arrays into a single array called all , created at (2). This insures that Ruby doesn’t free all of our new arrays by running a garbage collection. After creating 10,000 arrays of the given capacity, the program saves the heap_eden_slots value from the return value of GC.stat_heap for all of the size pools at (5), and then prints out the results at (6). Running the code in Listing 12-6 produces this output: The output in Listing 12-7 shows how many slots Ruby has allocated in total after each iteration of the outer, array capacity loop. For example: … shows that after allocating 10,000 empty arrays, Ruby has now uses a total of 22923 slots for Size Pool 0, 6548 for Size Pool One, 2861 for Size Pool Two, etc. If you try running this, you will see slightly different values. Plotting these values, we can see which pool Ruby uses for the new arrays of various capacities: Each bar in Figure 12-10 represents values from a line of output from Listing 12-7. For example, the first bar on the far left plots this output: The first value, 0, is the position of each bar on the x-axis, while the bar’s color segments display the following three values: The dark grey bar at the bottom left corner represents Size Pool 0 (22923), the lighter bar above it shows Size Pool 1 (6548), and the lightest, top bar shows Size Pool 2 (2861). Moving to the right, each successive bar shows the values for different array capacities. Looking over the entire graph, we can see the following pattern in Figure 12-10: The dark grey bars for Size Pool 0 at the bottom of Figure 12-10 increase in size from capacity 0 through capacity 3, and then remain the same height after that for capacities 4 and greater. The medium grey bars for Size Pool 1 have the same height from capacity 0 through 3, but then increase from capacity 4 through 8. From capacity 9 and onward, the medium grey bars have the same height again. The light grey bars at the top for Size Pool 2 remain small for capacities 0 through 8, and then increase in size steadily for capacities 9 through 18. Then the remain unchanged after that. Figure 12-10 shows Ruby saves new arrays using the following size pools: Plotting the entire output from Listing 12-7 up to capacity=100, we see the following: Figure 12-11 shows an interesting pattern. Ruby uses size pools 0 through 4 to save arrays with capacities from 0 up to 78 in a similar way: For each capacity range, the length of the corresponding bars grows steadily, and then stops growing. However, once we started to save large arrays with capacities of 79 or more elements, Ruby saved them in the original Size Pool Zero again. This indicates that Ruby stopped embedding the array elements in the size pool entirely, and instead allocated a new, separate memory segment to save the elements. For these large arrays, small 40 byte RVALUE slots in Size Pool Zero were sufficient, because they each contained a pointer to the array data, and not the embedded array data itself. Figure 12-12 shows how large arrays, arrays which contain 79 or more elements, do not save their elements inside of the Array structure, but instead save a pointer (ptr) which contains the location of a separate memory segment that holds the array elements. One key detail of this experiment was in Listing 12-6 at (2): the all array. The inner loop just below in Listing 12-6 at (4) saved each new array into the all array. This meant all the new arrays were in fact still being used and Ruby’s garbage collector could not reclaim their memory. Without this line of code, we would not have seen the total number of slots continually increase, preventing us from discovering which slots Ruby saved the arrays into. But how did Ruby’s garbage collector know this, exactly? How does Ruby identify which values are used and unused by our programs? And how does it reclaim memory from the unused values? Let’s take a look at Ruby’s garbage collection process next. The dark grey bars for Size Pool 0 at the bottom of Figure 12-10 increase in size from capacity 0 through capacity 3, and then remain the same height after that for capacities 4 and greater. The medium grey bars for Size Pool 1 have the same height from capacity 0 through 3, but then increase from capacity 4 through 8. From capacity 9 and onward, the medium grey bars have the same height again. The light grey bars at the top for Size Pool 2 remain small for capacities 0 through 8, and then increase in size steadily for capacities 9 through 18. Then the remain unchanged after that.

0 views
Pat Shaughnessy 9 months ago

Inserting One New Element into Hashes of Varying Sizes

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished. RUM includes a series of “experiments:” simple code snippets that show evidence the book’s explanations are accurate. One of the first experiments I wrote back in 2013, Experiment 7-2 is a fun way to see exactly when Ruby increases the number of bins in a hash table. The experiments in RUM are a great way to see for yourself how Ruby works. They also keep me honest; in fact, I ran this code again recently using Ruby 3.4.1 and saw different results than what I expected! One way to test whether this rehashing, or redistribution, of entries really occurs when Ruby expands a hash is to measure the amount of time Ruby takes to save one new element into existing hashes of different sizes. As we add more elements to the same hash, we should eventually see evidence that Ruby is taking extra time to rehash the elements. The code for this experiment is shown in Listing 7-3. At (1) the outer loop iterates over hash sizes from 0 to 100, and at (2) the inner loop creates 10,000 hashes of the given size. After disabling garbage collection, this experiment uses the benchmark library to measure how long it takes Ruby to insert a single new value at (3) into all 10,000 hashes of the given size. The results are surprising! Figure 7-13 shows the results for Ruby 3.4.1. Interpreting these data values from left to right, we see the following: It takes about 0.6 ms to insert the first element into an empty hash (10,000 times). As the hash size increases from 2 to 8, the amount of time required to insert a new element is about the same: 0.6ms more or less. Inserting the 9th key/value pair takes much longer, about 2ms for 10,000 hashes! Next, as the hash size increases from 10 up to 16, once again Ruby can insert new elements quickly, between 0.6ms and 0.7ms (10,000 times). A huge spike! It takes almost 3.1ms to insert the 17th element. And then once again starting with the 18th element, the time to insert each element reduced to around 0.7ms-0.8.ms. A 3rd spike appears when Ruby inserts the 33rd element: almost 5ms. The graph once again flattens and returns to around 0.7-0.8ms per element (10,000 times). And a 4th spike appears when Ruby inserts the 65th element: almost 6ms. What’s going on here? Well, Ruby spends the extra time required to insert that 17th key/value pair expanding the hash table: reallocating the entries array from 16 to 32 entries, and the bin array from 32 to 64 bins, and then reassigning the structures to the new bin array. Ruby expands the entries and bins arrays a second time again after inserting the 33rd entry, this from from 32 to 64 entries and 64 to 128 bins. (Recall the table, shown on page 15, used powers of 2 to determine these array sizes.) The smaller spike on the 9th insert in this figure is curious. While not as pronounced as the spike at the 17th element, this smaller spike is nonetheless noticeable. As it turns out, Ruby contains another optimization that speeds up hash access even more for small hashes that contain less than 9 elements. It takes about 0.6 ms to insert the first element into an empty hash (10,000 times). As the hash size increases from 2 to 8, the amount of time required to insert a new element is about the same: 0.6ms more or less. Inserting the 9th key/value pair takes much longer, about 2ms for 10,000 hashes! Next, as the hash size increases from 10 up to 16, once again Ruby can insert new elements quickly, between 0.6ms and 0.7ms (10,000 times). A huge spike! It takes almost 3.1ms to insert the 17th element. And then once again starting with the 18th element, the time to insert each element reduced to around 0.7ms-0.8.ms. A 3rd spike appears when Ruby inserts the 33rd element: almost 5ms. The graph once again flattens and returns to around 0.7-0.8ms per element (10,000 times). And a 4th spike appears when Ruby inserts the 65th element: almost 6ms.

0 views
Pat Shaughnessy 10 months ago

Updating Ruby Under a Microscope

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while to finish. Leave a comment or drop me a line and I'll email you when it's finished. In the meantime, here’s an excerpt from a rewrite of Chapter 7 about hash tables. Vladimir Makarov and the Ruby team redesigned Ruby’s hash table implementation back in 2015 to use open addressing, shortly after I published the first edition of RUM. This seemed like a good place to start. Hash tables are a commonly used, well-known, age-old concept in computer science. They organize values into groups, or bins , based on an integer value calculated from each value — a hash . When you need to find a value, you can figure out which bin it’s in by recalculating its hash value, thus speeding up the search. Figure 7-1 shows a single hash object and its hash table. On the left is the (short for Ruby hash ) structure. On the right, you see the hash table used by this hash, represented by the structure. This C structure contains the basic information about the hash table, including the number of entries saved in the table and pointers to the entries and bins. Each structure contains a pointer to a corresponding structure. For many hashes, Ruby initially creates 32 entries and 64 bins. (Hashes with 8 or fewer entries work somewhat differently; see “Optimization for Small Hashes” on page 20.) The best way to understand how a hash table works is by stepping through an example. Suppose I add a new key/value to a hash called : While executing this line of code, Ruby saves the key and value into the first entry, as shown in Figure 7-2. Here you can see the new key/value pair inside the first entry slot, called an in Ruby’s C source code. Ruby saves the keys and values in the entries array in the order you add them. This makes it easy for Ruby to return them back to you in the same order. Also see that Ruby saved an entry index of 0 in the third bin, number 2. Ruby did this by taking the given key — in this example, the symbol — and passing it to an internal hash function that returns a pseudorandom integer: Next, Ruby takes the hash value — in this example, — and calculates the modulus by the number of bins, which is the remainder after dividing by the number of bins. Now let’s add a second element to the hash: This time let’s imagine that the hash value of divided by 64 yields a remainder of 5. Figure 7-3 shows that Ruby fills in a second structure in the entries array, and the entry index 1 in bin number 5, the sixth bin. The benefit of using a hash table becomes clear when you ask Ruby to retrieve the value for a given key. For example: If Ruby had saved all of the keys and values in an array or linked list, it would have to iterate over all the elements in that array or list, looking for :key. This might take a very long time, depending on the number of elements. But using a hash table, Ruby can jump straight to the key it needs to find by recalculating the hash value for that key. To recalculate the hash value for a particular key, Ruby simply calls the hash function again: . Then, it redivides the hash value by the number of bins to get the remainder, or the modulus. At this point, Ruby knows to look in bin number 2 and finds the entry index 0, and in turn finds the entry with the key of in entry number 0 or the first entry slot. Ruby can later find the value for by repeating the same hash calculation . Figure 7-4 explains how Ruby would find these two values in the hash table: On the left side, the first arrow starts from the third bin (bin index 2 = ) and points to the first key/value pair, at index 0. To the right, the second arrow starts from the sixth bin (bin index 5 = ) and points to the second key/value pair, at index 1.

0 views
Pat Shaughnessy 3 years ago

LLVM IR: The Esperanto of Computer Languages

I empathize for people who have to learn English as a foreign language. English grammar is inconsistent, arbitrary and hard to master. English spelling is even worse. I sometimes find myself apologizing for my language’s shortcomings. But learning any foreign language as an adult is very difficult. Esperanto , an “artificial language,” is different. Invented by Ludwik Zamenhof in 1873, Esperanto has a vocabulary and grammar that are logical and consistent, designed to be easier to learn. Zamenhof intended Esperanto to become the universal second language. Computers have to learn foreign languages too. Every time you compile and run a program, your compiler translates your code into a foreign language: the native machine language that runs on your target platform. Compilers should have been called translators. And compilers struggle with the same things we do: inconsistent grammar and vocabulary, and other peculiarities of the target platform. Recently, however, more and more compilers translate your code to an artificial machine language. They produce a simpler, more consistent, more powerful machine language that doesn’t actually run on any machine. This artificial machine language, LLVM IR, makes writing compilers simpler and reading the code compilers produce simpler too. LLVM IR is becoming the universal second language for compilers. The Low Level Virtual Machine (LLVM) project had the novel idea of inventing a virtual machine that was easy for compiler engineers to use as a target platform. The LLVM team designed a special instruction set called intermediate representation (IR). New, modern languages such as Rust, Swift, Clang-based versions of C and many others, first translate your code to LLVM IR. Then they use the LLVM framework to convert the IR into actual machine language for any target platform LLVM supports: LLVM is great for compilers. Compiler engineers don’t have to worry about the detailed instruction set of each platform, and LLVM optimizes your code for whatever platform you choose automatically. And LLVM is also great for people like me who are interested in what machine language instructions look like and how CPUs execute them. LLVM instructions are much easier to follow than real machine instructions. Let’s take a look at one! Here’s a line of LLVM IR I generated from a simple Crystal program: Wait a minute! This isn’t simple or easy to follow at all! What am I talking about here? At first glance, this does look confusing. But as we’ll see, most of the confusing syntax is related to Crystal, not LLVM. Studying this line of code will reveal more about Crystal than it will about LLVM. The rest of this article will unpack and explain what this line of code means. It looks complex, but is actually quite simple. The instruction above is a function call in LLVM IR. To produce this code, I wrote a small Crystal program and then translated it using this command: The option directed Crystal to generate a file called array_example.ll, which contains the line above along with thousands of other lines. We’ll get to the Crystal code in a minute. But for now, how do I get started understanding what the LLVM code means? The LLVM Language Reference Manual has documentation for and all of the other LLVM IR instructions. Here’s the syntax for : My example instruction doesn’t use many of these options. Removing the unused options, I can see the actual, basic syntax of : In order from left to right, these values are: which register to save the result in the type of the return value a pointer to the function to call the arguments to pass to that function What does all of this mean, exactly? Let’s find out! Starting on the left and moving right, let’s step through the instruction: The token to the left of the equals sign tells LLVM where to save the return value of the function call that follows. This isn’t a normal variable; is an LLVM “register.” Registers are physical circuits located on microprocessor chips used to save intermediate values. Saving a value in a CPU register is much faster than saving a value in memory, since the register is located on the same chip as the rest of the microprocessor. Saving a value in RAM memory, on the other hand, requires transmitting that value from one chip to another and is much slower, relatively speaking. Unfortunately, each CPU has a limited number of registers available, and so compilers have to decide which values are used frequently enough to warrant saving in nearby registers, and which other values can be moved out to more distant memory. Unlike the limited number of registers available on a real CPU, the imaginary LLVM microprocessor has an infinite number of them. Because of this, compilers that target LLVM can simply save values to a register whenever they would like. There’s no need to find an available register, or to move an existing value out of a register first before using it for something else. Busy work that normal machine language code can’t avoid. In this program, the Crystal compiler had already saved 56 other values in “registers” and so for this line of LLVM IR, Crystal simply used the next register, number 57. Moving left to right, LLVM instructions next indicate the type of the function call’s return value: This name of this type, , is generated by the Crystal compiler, not by LLVM. That is, this is a type from my Crystal program. It could have been anything, and indeed other compilers that target LLVM will generate completely different type names. The example Crystal program I used to generate this LLVM code was: When I compiled this program, Crystal generated the instruction above, which returns a pointer to the new array, . Since is an array containing integers, Crystal uses a generic type Machine languages that target real machines only support hardware types that machine supports. For example, Intel x86 assembly language allows you to save integers of different widths, 16, 32 or 64 bits for example, and an Intel x86 CPU has registers designed to hold values of each of these sizes. LLVM IR is more powerful. It supports “structure types,” similar to a C structure or an object in a language like Crystal or Swift. Here the syntax indicates the name inside the quotes is the name of a structure type. And the asterisk which follows, like in C, indicates the type of the return value of my function call is a pointer to this structure. My example LLVM program defines the type like this: Structure types allow LLVM IR programs to create pointers to structures or objects, and to access any of the values inside each object. That makes writing a compiler much easier. In my example, the call instruction returns a pointer to an object which contains 4 32-bit integer values, followed by a pointer to other 32 integer values. But what are all of these integer values? Above I said this function call was returning a new array - how can that be the case? LLVM itself has no idea, and no opinion on the matter. To understand what these values are, and what they have to do with the array in my program, we need to learn more about the Crystal compiler that generated this LLVM IR code. Reading the Crystal standard library , we can see Crystal implements arrays like this: The comments above are very illustrative and complete - the Crystal team took the time to document their standard library and explain not only how to use each class, like , but how they are implemented internally. In this case, we can see the four values inside the LLVM structure type hold the size and capacity off the array, among other things. And the value is a pointer to the actual contents of the array. The target of the call instruction appears next, after the return type: This is quite a mouthful! What sort of function is this? There are two steps to understanding this: First, the syntax. This is simply a global identifier in this LLVM program. So my instruction is just calling a global function. In LLVM programs, all functions are global; there is no concept of a class, module or similar groupings of code. But what in the world does that crazy identifier mean? LLVM ignores this complex name. For LLVM this is just a name like or . But for Crystal, the name has much more significance. Crystal encoded a lot of information into this one name. Crystal can do this because the LLVM code isn’t intended for anyone to read directly. Crystal has created a “mangled name,” meaning the original version of the function to call is there but it’s been mangled or rewritten in a confusing manner. Crystal rewrites function names to ensure they are unique. In Crystal, like in many other statically typed languages, functions with different argument types or return value types are actually different functions. So in Crystal if I write: …I have two separate, different functions both called . The type of the parameter distinguishes one from the other. Crystal generates unique function names by encoding the arguments, return value and type of the receiver into the into the function name string, making it quite complex. Let’s break it down: - this is the type of the receiver. That means the function is actually a method on the generic class. And in this case, the receiver is an array holding 32 bit integers, the class. Crystal includes both names in the mangled function name. - this is the function Crystal is calling. - these are the function’s parameter types. In this case, Crystal is passing in a single integer, so we just see one type. - this is the return value type, a new array containing integers. As I discussed in my last post , the Crystal compiler internally rewrites my array literal expression into code that creates and initializes a new array object: In this expanded code, Crystal calls and passes in , the required capacity of the new array. And to distinguish this use of from other functions that might exist in my program, the compiler generated the mangled name we see above. Finally, after the function name the LLVM IR instruction shows the arguments for the function call: LLVM IR uses parentheses, like most languages, to enclose the arguments to a function call. And the types precede each value: is a 32-bit integer and is also a 32-bit integer. But wait a minute! We saw just above the expanded Crystal code for generating the array literal passes a single value, , into the call to . And looking at the mangled function name above, we also see there is a single parameter to the function call. But reading the LLVM IR code we can see a second value is also passed in: . What in the world does mean? I don’t have 610 elements in my new array, and 610 is not one of the array elements. So what is going on here? Crystal is an object oriented language, meaning that each function is optionally associated with a class. In OOP parlance, we say that we are “sending a message” to a “receiver.” In this case, is the message, and is the receiver. In fact, this function is really a class method. We are calling on the class, not on an instance of one array. Regardless, LLVM IR does’t support classes or instance methods or class methods. In LLVM IR, we only have simple, global functions. And indeed, the LLVM virtual machine doesn’t care what these arguments are or what they mean. LLVM doesn’t encode the meaning or purpose of each argument; it just does what the Crystal compiler tells it to do. But Crystal, on the other hand, has to implement object oriented behavior somehow. Specifically, the function needs to behave differently depending on which class it was called for, depending on what the receiver is. For example: … has to return an array of two integers. While: …has to return an array of two strings. How does this work in the LLVM IR code? To implement object oriented behavior, Crystal passes the receiver as a hidden, special argument to the function call: This receiver argument is a reference or pointer to the receiver’s object, and is normally known as . Here is a reference or tag corresponding to the class, the receiver. And is the actual argument to the method. Reading the LLVM IR code, we’ve learned that Crystal secretly passes a hidden argument to every method call to an object. Then inside each method, the code has access to , to the object instance that code is running for. Some languages, like Rust, require us to pass explicitly in each method call; in Crystal this behavior is automatic and hidden. LLVM IR is a simple language designed for compiler engineers. I think of it like a blank slate for them to write on. Most LLVM instructions are quite simple and easy to understand; as we saw above, understanding the basic syntax of the call instruction wasn’t hard at all. The hard part was understanding how the Crystal compiler, which targets LLVM IR, generates code. The LLVM syntax itself was easy to follow; it was the Crystal language’s implementation that was harder to understand. And this is the real reason to learn about LLVM IR syntax. If you take the time to learn how LLVM instructions work, then you can start to read the code your favorite language’s compiler generates. And once you can do that, you can learn more about how your favorite compiler works, and what your programs actually do when you run them. which register to save the result in the type of the return value a pointer to the function to call the arguments to pass to that function - this is the type of the receiver. That means the function is actually a method on the generic class. And in this case, the receiver is an array holding 32 bit integers, the class. Crystal includes both names in the mangled function name. - this is the function Crystal is calling. - these are the function’s parameter types. In this case, Crystal is passing in a single integer, so we just see one type. - this is the return value type, a new array containing integers.

0 views
Pat Shaughnessy 3 years ago

Visiting an Abstract Syntax Tree

In my last post , I explored how Crystal parsed a simple program and produced a data structure called an abstract syntax tree (AST). But what does Crystal do with the AST? Why bother going to such lengths to create it? After Crystal parses my code, it repeatedly steps through all the entries or nodes in the AST and builds up a description of the intended meaning and behavior of my code. This process is known as semantic analysis . Later, Crystal will use this description to convert my program into a machine language executable. But what does this description contain? What does it really mean for a compiler to understand anything? Let’s pack our bags and visit an abstract syntax tree with Crystal to find out. Imagine several tourists visiting a famous tree: Each of them sees the same tree in a different way. The tree doesn’t change, but the perspective of each person looking at it is different. They each take a different photo, or remember different details. In Computer Science this separation of the data structure (the tree) from the algorithms using it (the tourists) is known as the visitor pattern . This technique allows compilers and other programs to run multiple algorithms on the same data structure without making a mess. The visitor pattern calls for two functions: and . First, a node in the data structure “accepts” a visitor: After accepting a visitor, the node turns around and calls the method on : The method implements whatever algorithm that visitor is interested in. This seems kind of pointless… why use at all? We could just call directly. The key is that, after calling the visitor and passing itself, the node passes the visitor to each of its children, recursively: And then the visitor can visit each of the child nodes also. The class doesn’t necessarily need to know anything about how to navigate the node data structure. And more and more visitor classes can implement new algorithms without changing the underlying data structure and breaking each other. In order to understand what my code means, Crystal reads through my program’s AST over and over again using different visitors. Each algorithm looks for certain syntax, records information about the types and objects my code uses or possibly even transforms my code into a different form. Crystal implements the basics of the visitor pattern in visitor.cr , inside the superclass of all AST nodes: Each subclass of implements its own version of . To get a sense of how the visitor pattern works inside of Crystal, let’s look at one line of code from my last post: As I explained last month, the Crystal parser generates this AST tree fragment: Once the parser is finished and has created this small tree, the Crystal compiler steps through it a number of different times, looking for classes, variables, type declarations, etc. Each of these passes through the AST is performed by a different visitor class: , or among many others. The most important visitor class the Crystal compiler uses is called simply . You can find the code for in main_visitor.cr : Since Crystal supports typed parameters and method overloading, the visitor class implements a different method for each type of node that it visits, for example: Now let’s look at three examples of what the class does with my code: identifying variables, assigning types and expanding array literals. The Crystal compiler is much too complex to describe in a single blog post, but hopefully I can give you glimpse into the sort of work Crystal does during semantic analysis. Obviously, my example code creates and initializes a variable called : But how does Crystal identify this variable’s name and value? What does it do with ? The class starts to process my code by visiting the root node of this branch of my AST, the node: As you can see, earlier during the parsing phrase Crystal had saved the target variable and value of this assign statement in the child AST nodes. The target variable, , appears in the node, and the value to assign is an node. The now knows I declared a new variable, called , in the current lexical scope. Since my program has no classes, methods or any other lexical scopes, Crystal saves this variable in a table of variables for the top level program: Actually, to be more accurate, there will always be many other variables in this table along with . All Crystal programs automatically include the standard library, so Crystal also saves all of the top level variables from the standard library in this table. In a more normal program, there will be many lexical scopes for different method and class or module definitions, and will save each variable in the corresponding table. Probably the most important function of is to assign a type to each value in my program. The simplest example of this is when visits a node: Looking at the size of the numeric value, Crystal determines the type should be . Crystal then saves this type right inside of the node: Strictly speaking, this violates the visitor pattern because the visitors shouldn’t be modifying the data structure they visit. But the type of each node, the type of each programming construct in my program, is really an integral part of that node. In this case the is really just completing each node. It’s not changing the structure of the AST in this case… although as we’ll see in a minute the does this for other nodes! Sometimes type values can’t be determined from the intrinsic value of an AST node. Often the type of a node is determined by other nodes in the AST. Recall my example line of code is: Here Crystal automatically sets the type of the arr variable to the type of the array literal expression: . In Computer Science, this is known as type inference . Because Crystal can automatically determine the type of , I don’t need to declare it explicitly by writing something more complicated like this: Type inference allows me to write concise, clean code with fewer type annotations. Most modern, statically typed languages such as Swift, Rust, Julia, Kotlin, etc., use type inference in the same way as Crystal. Even newer versions of Java or C++ use type inference. The Crystal compiler implements type inference when the MainVisitor encounters an AST node, what we saw above. After encountering the node, Crystal recursively processes one of the two child nodes, the value, and its child nodes. When this process finishes, Crystal knows the type of the node is : I’ll take a closer look at how Crystal processes the node next. But for now, once Crystal has the type of the node it copies that type over to the node and sets its type also: But Crystal does something else interesting here: It sets up a dependency between the two AST nodes: it “binds” the variable to the value: This binding dependency allows Crystal to later update the type of the variable whenever necessary. In this case the value will always have the same type, but I suspect that sometimes the Crystal compiler can update types during semantic analysis. In this way if the Crystal compiler ever changed its mind about the type of some value, it can easy update the types of any dependent values. I also suspect Crystal uses these type dependency connections to produce error messages whenever you pass an incorrect type to some function, for example. These are just guesses, however; if anyone from the Crystal team knows exactly what these type bindings are used for let me know. Update: Ary Borenszweig explained that sometimes the Crystal compiler updates the type of variables based on how they are used. He posted an interesting example on The Crystal Programming Language Forum . So far we’ve seen Crystal set the type of the node to , and we’ve seen Crystal assign a type of . But how did Crystal determine the type of the array literal ? This is where things get even more complicated. Sometimes during semantic analysis the Crystal compiler completely rewrites parts of your code, replacing it with something else. This happens even with my simple example. When visiting the node, the expands this simple line of code into something more complex: Reading this, you can see how later my compiled program will create the new array. First Crystal creates an empty array with a capacity of 2, and an element type of . returns the type (or multiple types inside a union type) found in the given set of values, in this case just . Later Crystal sets the two elements in the array just by assigning them. Crystal achieves this by replacing part of my program’s AST with a new branch: For clarity, I’m not drawing the AST nodes for the inner assign operations, only the first line: With this new, updated AST we can see exactly how Crystal determines the type of my variable . Starting at the root of my AST, visits all of the AST nodes in this order in a series of recursive calls: And it determines the types of each of these nodes as it returns from the recursive calls: Some interesting details here that I don’t understand completely or have space to explain here: The node calculates a common union type using a type formula. In this example, it just returns because both elements of my array, and , are simple 32 bit integers. I believe the node refers to a Crystal generic class via the node shown above, in this example . When processes the node, it sets to the type , arriving at the type . The node looks up the method my code is calling ( ) and uses the type from that method’s return value. I didn’t have time to explore how method lookup works in Crystal, however, so I’m not sure about this. Today we looked at a tiny piece of what the Crystal compiler can do. There are many more types of AST nodes, each of which the class handles differently. And there are many different visitor classes also, beyond . When analyzing a more complex program Crystal has to understand class and module definitions, instance and class variables, type annotations, different lexical scopes, macros, and much, much more. Crystal will need all of this information later, during the code generation phase, the next step that follows semantic analysis. But I hope this article gave you a sense of what sort of work a compiler has to do in order to understand your code. As you can see, for a statically typed language like Crystal the compiler spends much of its time identifying all of the types in your code, and determining which programming constructs or AST nodes have which types. Next time I’ll look at code generation: Now that Crystal has identified the variables, function calls and types in my code it is ready to generate the machine language code needed to execute my program. To do that, it will leverage the LLVM framework. The node calculates a common union type using a type formula. In this example, it just returns because both elements of my array, and , are simple 32 bit integers. I believe the node refers to a Crystal generic class via the node shown above, in this example . When processes the node, it sets to the type , arriving at the type . The node looks up the method my code is calling ( ) and uses the type from that method’s return value. I didn’t have time to explore how method lookup works in Crystal, however, so I’m not sure about this.

0 views