GreatReads - Blog Aggregator · Phoenix Framework

Hardware

0 views

Ankur Sethi 1 months ago

I built a programming language using Claude Code

Over the course of four weeks in January and February, I built a new programming language using Claude Code. I named it Cutlet after my cat. It’s completely legal to do that. You can find the source code on GitHub , along with build instructions and example programs . I’ve been using LLM-assisted programming since the original GitHub Copilot release in 2021, but so far I’ve limited my use of LLMs to generating boilerplate and making specific, targeted changes to my projects. While working on Cutlet, though, I allowed Claude to generate every single line of code. I didn’t even read any of the code. Instead, I built guardrails to make sure it worked correctly (more on that later). I’m surprised by the results of this experiment. Cutlet exists today. It builds and runs on both macOS and Linux. It can execute real programs. There might be bugs hiding deep in its internals, but they’re probably no worse than ones you’d find in any other four-week-old programming language in the world. I have Feelings™ about all of this and what it means for my profession, but I want to give you a tour of the language before I get up on my soapbox. If you want to follow along, build the Cutlet interpreter from source and drop into a REPL using . Arrays and strings work as you’d expect in any dynamic language. Variables are declared with the keyword. Variable names can include dashes. Same syntax rules as Raku . The only type of number (so far) is a double. Here’s something cool: the meta-operator turns any regular binary operator into a vectorized operation over an array. In the next line, we’re multiplying every element of by 1.8, then adding 32 to each element of the resulting array. The operator is a zip operation. It zips two arrays into a map. Output text using the built-in function. This function returns , which is Cutlet’s version of . The meta operator also works with comparisons. Here’s another cool bit: you can index into an array using an array of booleans. This is a filter operation. It picks the element indexes corresponding to and discards those that correspond to . Here’s a shorter way of writing that. Let’s print this out with a user-friendly message. The operator concatenates strings and arrays. The built-in turns things into strings. The meta-operator in the prefix position acts as a reduce operation. Let’s find the average temperature. adds all the temperatures, and the built-in finds the length of the array. Let’s print this out nicely, too. Functions are declared with . Everything in Cutlet is an expression, including functions and conditionals. The last value produced by an expression in a function becomes its return value. Your own functions can work with too. Let’s reduce the temperatures with our function to find the hottest temperature. Cutlet can do a lot more. It has all the usual features you’d expect from a dynamic language: loops, objects, prototypal inheritance, mixins, a mark-and-sweep garbage collector, and a friendly REPL. We don’t have file I/O yet, and some fundamental constructs like error handling are still missing, but we’re getting there! See TUTORIAL.md in the git repository for the full documentation. I’m a frontend engineer and (occasional) designer. I’ve tried using LLMs for building web applications, but I’ve always run into limitations. In my experience, Claude and friends are scary good at writing complex business logic, but fare poorly on any task that requires visual design skills. Turns out describing responsive layouts and animations in English is not easy. No amount of screenshots and wireframes can communicate fluid layouts and animations to an LLM. I’ve wasted hours fighting with Claude about layout issues it swore it had fixed, but which I could still see plainly with my leaky human eyes. I’ve also found these tools to excel at producing cookie-cutter interfaces they’ve seen before in publicly available repositories, but they fall off when I want to do anything novel. I often work with clients building complex data visualizations for niche domains, and LLMs have comprehensively failed to produce useful outputs on these projects. On the other hand, I’d seen people accomplish incredible things using LLMs in the last few months, and I wanted to replicate those experiments myself. But my previous experience with LLMs suggested that I had to pick my project carefully. A small, dynamic programming language met all my requirements. Finally, this was also an experiment to figure out how far I could push agentic engineering. Could I compress six months of work into a few weeks? Could I build something that was beyond my own ability to build? What would my day-to-day work life look like if I went all-in on LLM-driven programming? I wanted to answer all these questions. I went into this experiment with some skepticism. My previous attempts at building something entirely using Claude Code hadn’t worked out. But this attempt has not only been successful, but produced results beyond what I’d imagined possible. I don’t hold the belief that all software in the future will be written by LLMs. But I do believe there is a large subset that can be partially or mostly outsourced to these new tools. Building Cutlet taught me something important: using LLMs to produce code does not mean you forget everything you’ve learned about building software. Agentic engineering requires careful planning, skill, craftsmanship, and discipline, just like any software worth building before generative AI. The skills required to work with coding agents might look different from typing code line-by-line into an editor, but they’re still very much the same engineering skills we’ve been sharpening all our careers. There is a lot of work involved in getting good output from LLMs. Agentic engineering does not mean dumping vague instructions into a chat box and harvesting the code that comes out. I believe there are four main skills you have to learn today in order to work effectively with coding agents: Models and harnesses are changing rapidly, so figuring out which problems LLMs are good at solving requires developing your intuition, talking to your peers, and keeping your ear to the ground. However, if you don’t want to stay up-to-date with a rapidly-changing field—and I wouldn’t judge you for it, it’s crazy out there—here are two questions you can ask yourself to figure out if your problem is LLM-shaped: If the answer to either of those questions is “no”, throwing AI at the problem is unlikely to yield good results. If the answer to both of them is “yes”, then you might find success with agentic engineering. The good news is that the cost of figuring this out is the price of a Claude Code subscription and one sacrificial lamb on your team willing to spend a month trying it out on your codebase. LLMs work with natural language, so learning to communicate your ideas using words has become crucial. If you can’t explain your ideas in writing to your co-workers, you can’t work effectively with coding agents. You can get a lot out of Claude Code using simple, vague, overly general prompts. But when you do that, you’re outsourcing a lot of your thinking and decision-making to the robot. This is fine for throwaway projects, but you probably want to be more careful when you’re building something you will put into production and maintain for years. You want to feed coding agents precisely written specifications that capture as much of your problem space as possible. While working on Cutlet, I spent most of my time writing, generating, reading, and correcting spec documents . For me, this was a new experience. I primarily work with early-stage startups, so for most of my career, I’ve treated my code as the spec. Writing formal specifications was an alien experience. Thankfully, I could rely on Claude to help me write most of these specifications. I was only comfortable doing this because Cutlet was an experiment. On a project I wanted to stake my reputation on, I might take the agent out of the equation altogether and write the specs myself. This was my general workflow while making any change to Cutlet: This workflow front-loaded the cognitive effort of making any change to the language. All the thinking happened before a single line of code was written, which is something I almost never do. For me, programming involves organically discovering the shape of a problem as I’m working on it. However, I’ve found that working that way with LLMs is difficult. They’re great at making sweeping changes to your codebase, but terrible at quick, iterative, organic development workflows. Maybe my workflow will evolve as inference gets faster and models become better, but until then, this waterfall-style model works best. I find this to be the most interesting and fun part of working with coding agents. It’s a whole new class of problem to solve! The core principle is this: coding agents are computer programs, and therefore have a limited view of the world they exist in. Their only window into the problem you’re trying to solve is the directory of code they can access. This doesn’t give them enough agency or information to be able to do a good job. So, to help them thrive, you must give them that agency and information in the form of tools they can use to reach out into the wider world. What does this mean in practice? It looks different for different projects, but this is what I did for Cutlet: All these tools and abilities guaranteed that any updates to the code resulted in a project that at least compiled and executed. But more importantly, they increased the information and agency Claude had access to, making it more effective at discovering and debugging problems without my intervention. If I keep working on this project, my main focus will be to give my agents even more insight into the artifact they are building, even more debugging tools, even more freedom, and even more access to useful information. You will want to come up with your own tooling that works for your specific project. If you’re building a Django app, you might want to give the agent access to a staging database. If you’re building a React app, you might want to give it access to a headless browser. There’s no single answer that works for every project, and I bet people are going to come up with some very interesting tools that allow LLMs to observe the results of their work in the real world. Coding agents can sometimes be inefficient in how they use the tools you give them. For example, while working on this project, sometimes Claude would run a command, decide its output was too long to fit into the context window, and run it again with the output piped to . Other times it would run , forget to the output for errors, and run it a second time to capture the output. This would result in the same expensive checks running multiple times in the course of making a single edit. These mistakes slowed down the agentic loop significantly. I could fix some of these performance bottlenecks by editing or changing the output of a custom script. But there were some issues that required more effort to discover and fix. I quickly got into the habit of observing the agent at work, noticing sequences of commands that the agent repeated over and over again, and turning them into scripts for the agent to call instead. Many of the scripts in Cutlet’s directory came about this way. This was very manual, very not-fun work. I’m hoping this becomes more automated as time goes on. Maybe a future version of Claude Code could review its own tool calling outputs and suggest scripts you could write for it? Of course, the most fruitful optimization was to run Claude inside Docker with and access. By doing this, I took myself out of the agentic loop. After a plan file had been produced, I didn’t want to hang around babysitting agents and saying every time they wanted to run . As Cutlet evolved, the infrastructure I built for Claude also evolved. Eventually, I captured many of the workflows Claude naturally followed as scripts, slash commands, or instructions in . I also learned where the agent stumbled most, and preempted those mistakes by giving it better instructions or scripts to run. The infrastructure I built for Claude was also valuable for me, the human working on the project. The same scripts that helped Claude automate its work also helped me accomplish common tasks quickly. As the project grows, this infrastructure will keep evolving along with it. Models change all the time. So do project requirements and workflows. I look at all this project infrastructure as an organic thing that will keep changing as long as the project is active. Now that it’s possible for individual developers to accomplish so much in such little time, is software engineering as a career dead? My answer to this question is nope, not at all . Software engineering skills are just as valuable today as they were before language models got good. If I hadn’t taken a compilers course in college and worked through Crafting Interpreters , I wouldn’t have been able to build Cutlet. I still had to make technical decisions that I could only make because I had (some) domain knowledge and experience. Besides, I had to learn a bunch of new skills in order to effectively work on Cutlet. These new skills also required technical knowledge. A strange and new and different kind of technical knowledge, but technical knowledge nonetheless. Before working on this project, I was worried about whether I’d have a job five years from now. But today I’m convinced that the world will continue to have a need for software engineers in the future. Our jobs will transform—and some people might not enjoy the new jobs anymore—but there will still be plenty of work for us to do. Maybe we’ll have even more work to do than before, since LLMs allow us to build a lot more software a lot faster. And for those of us who never want to touch LLMs, there will be domains where LLMs never make any inroads. My friends who work on low-level multimedia systems have found less success using LLMs compared to those who build webapps. This is likely to be the case for many years to come. Eventually, those jobs will transform, too, but it will be a far slower shift. Is it fair to say that I built Cutlet? After all, Claude did most of the work. What was my contribution here besides writing the prompts? Moreover, this experiment only worked because Claude had access to multiple language runtimes and computer science books in its training data. Without the work done by hundreds of programmers, academics, and writers who have freely donated their work to the public, this project wouldn’t have been possible. So who really built Cutlet? I don’t have a good answer to that. I’m comfortable taking credit for the care and feeding of the coding agent as it went about generating tokens, but I don’t feel a sense of ownership over the code itself. I don’t consider this “my” work. It doesn’t feel right. Maybe my feelings will change in the future, but I don’t quite see how. Because of my reservations about who this code really belongs to, I haven’t added a license to Cutlet’s GitHub repository. Cutlet belongs to the collective consciousness of every programming language designer, implementer, and educator to have released their work on the internet. (Also, it’s worth noting that Cutlet almost certainly includes code from the Lua and Python interpreters. It referred to those languages all the time when we talked about language features. I’ve also seen a ton of code from Crafting Interpreters making its way into the codebase with my own two fleshy eyes.) I’d be remiss if I didn’t include a note on mental health in this already mammoth blog post. It’s easy to get addicted to agentic engineering tools. While working on this project, I often found myself at my computer at midnight going “just one more prompt”, as if I was playing the world’s most obscure game of Civilization . I’m embarrassed to admit that I often had Claude Code churning away in the background when guests were over at my place, when I stepped into the shower, or when I went off to lunch. There’s a heady feeling that comes from accomplishing so much in such little time. More addictive than that is the unpredictability and randomness inherent to these tools. If you throw a problem at Claude, you can never tell what it will come up with. It could one-shot a difficult problem you’ve been stuck on for weeks, or it could make a huge mess. Just like a slot machine, you can never tell what might happen. That creates a strong urge to try using it for everything all the time. And just like with slot machines, the house always wins. These days, I set limits for how long and how often I’m allowed to use Claude. As LLMs become widely available, we as a society will have to figure out the best way to use them without destroying our mental health. This is the part I’m not very optimistic about. We have comprehensively failed to regulate or limit our use of social media, and I’m willing to bet we’ll have a repeat of that scenario with LLMs. Now that we can produce large volumes of code very quickly, what can we do that we couldn’t do before? This is another question I’m not equipped to answer fully at the moment. That said, one area where I can see LLMs being immediately of use to me personally is the ability to experiment very quickly. It’s very easy for me to try out ten different features in Cutlet because I just have to spec them out and walk away from the computer. Failed experiments cost almost nothing. Even if I can’t use the code Claude generates, having working prototypes helps me validate ideas quickly and discard bad ones early. I’ve also been able to radically reduce my dependency on third-party libraries in my JavaScript and Python projects. I often use LLMs to generate small utility functions that previously required pulling in dependencies from NPM or PyPI. But honestly, these changes are small beans. I can’t predict the larger societal changes that will come about because of AI agents. All I can say is programming will look radically different in 2030 than it does in 2026. This project was a proof of concept to see how far I could push Claude Code. I’m currently looking for a new contract as a frontend engineer, so I probably won’t have the time to keep working on Cutlet. I also have a few more ideas for pushing agentic programming further, so I’m likely to prioritize those over continuing work on Cutlet. When the mood strikes me, I might still add small features now and then to the language. Now that I’ve removed myself from the development loop, it doesn’t take a lot of time and effort. I might even do Advent of Code using Cutlet in December! Of course, if you work at Anthropic and want to give me money so I can keep running this experiment, I’m available for contract work for the next 8 months :) For now, I’m closing the book on Cutlet and moving on to other projects (and cat). Thanks to Shruti Sunderraman for proofreading this post. Also thanks to Cutlet the cat for walking across the keyboard and deleting all my work three times today. I didn’t want to solve a particularly novel problem, but I wanted the ability to sometimes steer the LLM into interesting directions. I didn’t want to manually verify LLM-generated code. I wanted to give the LLM specifications, test cases, documentation, and sample outputs, and make it do all the difficult work of figuring out if it was doing the right thing. I wanted to give the agent a strong feedback loop so it could run autonomously. I don’t like MCPs. I didn’t want to deal with them. So anything that required connecting to a browser, taking screenshots, or talking to an API over the network was automatically disqualified. I wanted to use a boring language with as few external dependencies as possible. LLMs know how to build language implementations because their training data contains thousands of existing implementations, papers, and CS books. I was intrigued by the idea of creating a “remix” language by picking and choosing features I enjoy from various existing languages. I could write a bunch of small deterministic programs along with their expected outputs to test the implementation. I could even get Claude to write them for me, giving me a potentially infinite number of test cases to verify that the language was working correctly. Language implementations can be tested from the command line, with purely textual inputs and outputs. No need to take screenshots or videos or set up fragile MCPs. There’s no better feedback loop for an agent than “run and until there are no more errors”. C is as boring as it gets, and there are a large number of language implementations built in C. Understanding which problems can be solved effectively using LLMs, which ones need a human in the loop, and which ones should be handled entirely by humans. Communicating your intent clearly and defining criteria for success. Creating an environment in which the LLM can do its best work. Monitoring and optimizing the agentic loop so the agent can work efficiently. For the problem you want to solve, is it possible to define and verify success criteria in an automated fashion? Have other people solved this problem—or a similar one—before? In other words, is your problem likely to be in the training data for an LLM? First, I’d present the LLM with a new feature (e.g. loops) or refactor (e.g. moving from a tree-walking interpreter to a bytecode VM). Then I’d have a conversation with it about how the change would work in the context of Cutlet, how other languages implemented it, design considerations, ideas we could steal from interesting/niche languages, etc. Just a casual back-and-forth, the same way you might talk to a co-worker. After I had a good handle on what the feature or change would look like, I’d ask the LLM to give me an implementation plan broken down into small steps. I’d review the plan and go back and forth with the LLM to refine it. We’d explore various corner cases, footguns, gotchas, missing pieces, and improvements. When I was happy with the plan, I’d ask the LLM to write it out to a file that would go into a directory. Sometimes we’d end up with 3-4 plan files for a single feature. This was intentional. I needed the plans to be human-readable, and I needed each plan to be an atomic unit I could roll back if things didn’t work out. They also served as a history of the project’s evolution. You can find all the historical plan files in the Cutlet repository. I’d read and review the generated plan file, go back and forth again with the LLM to make changes to it, and commit it when everything looked good. Finally, I’d fire up a Docker container, run Claude with all permissions—including access—and ask it to implement my plan. Comprehensive test suite . My project instructions told Claude to write tests and make sure they failed before writing any new code. Alongside, I asked it to run tests after making significant code changes or merging any branches. Armed with a constantly growing test suite, Claude was able to quickly identify and fix any regressions it introduced into the codebase. The tests also served as documentation and specification. Sample inputs and outputs . These were my integration tests. I added a number of example programs to the Cutlet repository—most of them written by Claude itself—that not only serve as documentation for humans, but also as an end-to-end test suite. The project instructions told Claude to run all of them and verify their output after every code change. Linters, formatters, and static analysis tools . Cutlet uses and to ensure a baseline of code quality. Just like with tests, the project instructions asked the LLM to run these tools after every major code change. I noticed that would often produce diagnostics that would force Claude to rewrite parts of the code. If I had access to some of the more expensive static analysis tools (such as Coverity ), I would have added them to my development process too. Memory safety tools . I asked Claude to create a target that rebuilt the entire project and test suite with ASan and UBSan enabled (with LSan riding along via ASan), then ran every test under the instrumented build. The project instructions included running this check at the end of implementing a plan. This caught memory errors—use-after-free, buffer overflows, undefined behavior—that neither the tests nor the linter could find. Running these tests took time and greatly slowed down the agent, but they caught even more issues than . Symbol indexes . The agent had access to and for navigating the source code. I don’t know how useful this was, because I rarely ever saw it use them. Most of the time it would just the code for symbols. I might remove this in the future. Runtime introspection tools . Early in the project, I asked Claude to give Cutlet the ability to dump the token stream, AST, and bytecode for any piece of code to the standard output before executing it. This allowed the agent to quickly figure out if it had introduced errors into any part of the execution pipeline without having to navigate the source code or drop into a debugger. Pipeline tracing . I asked Claude to write a Python script that fed a Cutlet program through the interpreter with debug flags to capture the full compilation pipeline : the token stream, the AST, and the bytecode disassembly. It then mapped each token type, AST node, and opcode back to the exact source locations in the parser, compiler, and VM where they were handled. When an agent needed to add a new language feature, it could run the tracer on an example of a similar existing feature to see precisely which files and functions to touch. I was very proud of this machinery, but I never saw Claude make much use of it either. Running with every possible permission . I wanted the agent to work autonomously and have access to every debugging tool it might want to use. To do this, I ran it inside a Docker container with enabled and full access. I believe this is the only practical way to use coding agents on large projects. Answering permissions prompts is cognitively taxing when you have five agents working in parallel, and restricting their ability to do whatever they want makes them less effective at their job. We will need to figure out all sorts of safety issues that arise when you give LLMs the ability to take full control of a system, but on this project, I was willing to accept the risks that come with YOLO mode.

AI Lua

Open Source

JavaScript

0 views

(think) 1 months ago

Emacs and Vim in the Age of AI

It’s tough to make predictions, especially about the future. – Yogi Berra I’ve been an Emacs fanatic for over 20 years. I’ve built and maintained some of the most popular Emacs packages, contributed to Emacs itself, and spent countless hours tweaking my configuration. Emacs isn’t just my editor – it’s my passion, and my happy place. Over the past year I’ve also been spending a lot of time with Vim and Neovim, relearning them from scratch and having a blast contrasting how the two communities approach similar problems. It’s been a fun and refreshing experience. 1 And lately, like everyone else in our industry, I’ve been playing with AI tools – Claude Code in particular – watching the impact of AI on the broader programming landscape, and pondering what it all means for the future of programming. Naturally, I keep coming back to the same question: what happens to my beloved Emacs and its “arch nemesis” Vim in this brave new world? I think the answer is more nuanced than either “they’re doomed” or “nothing changes”. Predicting the future is obviously hard work, but it’s so fun to speculate on it. My reasoning is that every major industry shift presents plenty of risks and opportunities for those involved in it, so I want to spend a bit of time ruminating over the risks and opportunities for Emacs and Vim. VS Code is already the dominant editor by a wide margin, and it’s going to get first-class integrations with every major AI tool – Copilot (obviously), Codex, Claude, Gemini, you name it. Microsoft has every incentive to make VS Code the best possible host for AI-assisted development, and the resources to do it. On top of that, purpose-built AI editors like Cursor , Windsurf , and others are attracting serious investment and talent. These aren’t adding AI to an existing editor as an afterthought – they’re building the entire experience around AI workflows. They offer integrated context management, inline diffs, multi-file editing, and agent loops that feel native rather than bolted on. Every developer who switches to one of these tools is a developer who isn’t learning Emacs or Vim keybindings, isn’t writing Elisp, and isn’t contributing to our ecosystems. The gravity well is real. I never tried Cursor and Windsurf simply because they are essentially forks of VS Code and I can’t stand VS Code. I tried it several times over the years and I never felt productive in it for a variety of reasons. Part of the case for Emacs and Vim has always been that they make you faster at writing and editing code. The keybindings, the macros, the extensibility – all of it is in service of making the human more efficient at the mechanical act of coding. But if AI is writing most of your code, how much does mechanical editing speed matter? When you’re reviewing and steering AI-generated diffs rather than typing code character by character, the bottleneck shifts from “how fast can I edit” to “how well can I specify intent and evaluate output.” That’s a fundamentally different skill, and it’s not clear that Emacs or Vim have an inherent advantage there. The learning curve argument gets harder to justify too. “Spend six months learning Emacs and you’ll be 10x faster” is a tough sell when a junior developer with Cursor can scaffold an entire application in an afternoon. 2 VS Code has Microsoft. Cursor has venture capital. Emacs has… a small group of volunteers and the FSF. Vim had Bram, and now has a community of maintainers. Neovim has a small but dedicated core team. This has always been the case, of course, but AI amplifies the gap. Building deep AI integrations requires keeping up with fast-moving APIs, models, and paradigms. Well-funded teams can dedicate engineers to this full-time. Volunteer-driven projects move at the pace of people’s spare time and enthusiasm. Let’s go all the way: what if programming as we know it is fully automated within the next decade? If AI agents can take a specification and produce working, tested, deployed software without human intervention, we won’t need coding editors at all. Not Emacs, not Vim, not VS Code, not Cursor. The entire category becomes irrelevant. I don’t think this is likely in the near term, but it’s worth acknowledging as a possibility. The trajectory of AI capabilities has surprised even the optimists (and I was initially an AI skeptic, but the rapid advancements last year eventually changed my mind). Here’s the thing almost nobody is talking about: Emacs and Vim have always suffered from the obscurity of their extension languages. Emacs Lisp is a 1980s Lisp dialect that most programmers have never seen before. VimScript is… VimScript. Even Lua, which Neovim adopted specifically because it’s more approachable, is niche enough that most developers haven’t written a line of it. This has been the single biggest bottleneck for both ecosystems. Not the editors themselves – they’re incredibly powerful – but the fact that customizing them requires learning an unfamiliar language, and most people never make it past copying snippets from blog posts and READMEs. I felt incredibly overwhelmed by Elisp and VimScript when I was learning Emacs and Vim for the first time, and I imagine I wasn’t the only one. I started to feel very productive in Emacs only after putting in quite a lot of time to actually learn Elisp properly. (never bothered to do the same for VimScript, though, and admittedly I’m not too eager to master Lua either) AI changes this overnight. You can now describe what you want in plain English and get working Elisp, VimScript, or Lua. “Write me an Emacs function that reformats the current paragraph to 72 columns and adds a prefix” – done. “Configure lazy.nvim to set up LSP with these keybindings” – done. The extension language barrier, which has been the biggest obstacle to adoption for decades, is suddenly much lower. After 20+ years in the Emacs community, I often have the feeling that a relatively small group – maybe 50 to 100 people – is driving most of the meaningful progress. The same names show up in MELPA, on the mailing lists, and in bug reports. This isn’t a criticism of those people (I’m proud to be among them), but it’s a structural weakness. A community that depends on so few contributors is fragile. And it’s not just Elisp and VimScript. The C internals of both Emacs and Vim (and Neovim’s C core) are maintained by an even smaller group. Finding people who are both willing and able to hack on decades-old C codebases is genuinely hard, and it’s only getting harder as fewer developers learn C at all. AI tools can help here in two ways. First, they lower the barrier for new contributors – someone who understands the concept of what they want to build can now get AI assistance with the implementation in an unfamiliar language. Second, they help existing maintainers move faster. I’ve personally found that AI is excellent at generating test scaffolding, writing documentation, and handling the tedious parts of package maintenance that slow everything down. The Emacs and Neovim communities aren’t sitting idle. There are already impressive AI integrations: And this is just a sample. Building these integrations isn’t as hard as it might seem – the APIs are straightforward, and the extensibility of both editors means you can wire up AI tools in ways that feel native. With AI assistance, creating new integrations becomes even easier. I wouldn’t be surprised if the pace of plugin development accelerates significantly. Here’s an irony that deserves more attention: many of the most powerful AI coding tools are terminal-native. Claude Code, Aider, and various Copilot CLI tools all run in the terminal. And what lives in the terminal? Emacs and Vim. 3 Running Claude Code in an Emacs buffer or a Neovim terminal split is a perfectly natural workflow. You get the AI agent in one pane and your editor in another, with all your keybindings and tools intact. There’s no context switching to a different application – it’s all in the same environment. This is actually an advantage over GUI-based AI editors, where the AI integration is tightly coupled to the editor’s own interface. With terminal-native tools, you get to choose your own editor and your own AI tool, and they compose naturally. Emacs’s “editor as operating system” philosophy is uniquely well-suited to AI integration. It’s not just a code editor – it’s a mail client (Gnus, mu4e), a note-taking system (Org mode), a Git interface (Magit), a terminal emulator, a file manager, an RSS reader, and much more. AI can be integrated at every one of these layers. Imagine an AI assistant that can read your org-mode agenda, draft email replies in mu4e, help you write commit messages in Magit, and refactor code in your source buffers – all within the same environment, sharing context. No other editor architecture makes this kind of deep, cross-domain integration as natural as Emacs does. Admittedly, I’ve stopped using Emacs as my OS a long time ago, and these days I use it mostly for programming and blogging. (I’m writing this article in Emacs with the help of ) Still, I’m only one Emacs user and many are probably using it in a more holistic manner. One of the most underappreciated benefits of AI for Emacs and Vim users is mundane: troubleshooting. Both editors have notoriously steep learning curves and opaque error messages. “Wrong type argument: stringp, nil” has driven more people away from Emacs than any competitor ever did. AI tools are remarkably good at explaining cryptic error messages, diagnosing configuration issues, and suggesting fixes. They can read your init file and spot the problem. They can explain what a piece of Elisp does. They can help you understand why your keybinding isn’t working. This dramatically flattens the learning curve – not by making the editor simpler, but by giving every user access to a patient, knowledgeable guide. I don’t really need any AI assistance to troubleshoot anything in my Emacs setup, but it’s been handy occasionally in Neovim-land, where my knowledge is relatively modest by comparison. There’s at least one documented case of someone returning to Emacs after years away , specifically because Claude Code made it painless to fix configuration issues. They’d left for IntelliJ because the configuration burden got too annoying – and came back once AI removed that barrier. “Happy f*cking days I’m home again,” as they put it. If AI can bring back lapsed Emacs users, that’s a good thing in my book. Let’s revisit the doomsday scenario. Say programming is fully automated and nobody writes code anymore. Does Emacs die? Not necessarily. Emacs is already used for far more than programming. People use Org mode to manage their entire lives – tasks, notes, calendars, journals, time tracking, even academic papers. Emacs is a capable writing environment for prose, with excellent support for LaTeX, Markdown, AsciiDoc, and plain text. You can read email, browse the web, manage files, and yes, play Tetris. Vim, similarly, is a text editing paradigm as much as a program. Vim keybindings have colonized every text input in the computing world – VS Code, IntelliJ, browsers, shells, even Emacs (via Evil mode). Even if the Vim program fades, the Vim idea is immortal. 4 And who knows – maybe there’ll be a market for artisanal, hand-crafted software one day. “Locally sourced, free-range code, written by a human in Emacs.” I’d buy that t-shirt. And I’m fairly certain those artisan programmers won’t be using VS Code. So even in the most extreme scenario, both editors have a life beyond code. A diminished one, perhaps, but a life nonetheless. I think what’s actually happening is more interesting than “editors die” or “editors are fine.” The role of the editor is shifting. For decades, the editor was where you wrote code. Increasingly, it’s becoming where you review, steer, and refine code that AI writes. The skills that matter are shifting from typing speed and editing gymnastics to specification clarity, code reading, and architectural judgment. In this world, the editor that wins isn’t the one with the best code completion – it’s the one that gives you the most control over your workflow. And that has always been Emacs and Vim’s core value proposition. The question is whether the communities can adapt fast enough. The tools are there. The architecture is there. The philosophy is right. What’s needed is people – more contributors, more plugin authors, more documentation writers, more voices in the conversation. AI can help bridge the gap, but it can’t replace genuine community engagement. Not everyone in the Emacs and Vim communities is enthusiastic about AI, and the objections go beyond mere technophobia. There are legitimate ethical concerns that are going to be debated for a long time: Energy consumption. Training and running large language models requires enormous amounts of compute and electricity. For communities that have long valued efficiency and minimalism – Emacs users who pride themselves on running a 40-year-old editor, Vim users who boast about their sub-second startup times – the environmental cost of AI is hard to ignore. Copyright and training data. LLMs are trained on vast corpora of code and text, and the legality and ethics of that training remain contested. Some developers are uncomfortable using tools that may have learned from copyrighted code without explicit consent. This concern hits close to home for open-source communities that care deeply about licensing. Job displacement. If AI makes developers significantly more productive, fewer developers might be needed. This is an uncomfortable thought for any programming community, and it’s especially pointed for editors whose identity is built around empowering human programmers. These concerns are already producing concrete action. The Vim community recently saw the creation of EVi , a fork of Vim whose entire raison d’etre is to provide a text editor free from AI integration. Whether you agree with the premise or not, the fact that people are forking established editors over this tells you how strongly some community members feel. I don’t think these concerns should stop anyone from exploring AI tools, but they’re real and worth taking seriously. I expect to see plenty of spirited debate about this on emacs-devel and the Neovim issue tracker in the years ahead. The future ain’t what it used to be. – Yogi Berra I won’t pretend I’m not worried. The AI wave is moving fast, the incumbents have massive advantages in funding and mindshare, and the very nature of programming is shifting under our feet. It’s entirely possible that Emacs and Vim will gradually fade into niche obscurity, used only by a handful of diehards who refuse to move on. But I’ve been hearing that Emacs is dying for 20 years, and it’s still here. The community is small but passionate, the editor is more capable than ever, and the architecture is genuinely well-suited to the AI era. Vim’s situation is similar – the core idea is so powerful that it keeps finding new expression (Neovim being the latest and most vigorous incarnation). The editors that survive won’t be the ones with the flashiest AI features. They’ll be the ones whose users care enough to keep building, adapting, and sharing. That’s always been the real engine of open-source software, and no amount of AI changes that. So if you’re an Emacs or Vim user: don’t panic, but don’t be complacent either. Learn the new AI tools (if you’re not fundamentally opposed to them, that is). Pimp your setup and make it awesome. Write about your workflows. Help newcomers. The best way to ensure your editor survives the AI age is to make it thrive in it. Maybe the future ain’t what it used to be – but that’s not necessarily a bad thing. That’s all I have for you today. Keep hacking! If you’re curious about my Vim adventures, I wrote about them in Learning Vim in 3 Steps . ↩︎ Not to mention you’ll probably have to put in several years in Emacs before you’re actually more productive than you were with your old editor/IDE of choice. ↩︎ At least some of the time. Admittedly I usually use Emacs in GUI mode, but I always use (Neo)vim in the terminal. ↩︎ Even Claude Code has vim mode. ↩︎ gptel – a versatile LLM client that supports multiple backends (Claude, GPT, Gemini, local models) ellama – an Emacs interface for interacting with LLMs via llama.cpp and Ollama aider.el – Emacs integration for Aider , the popular AI pair programming tool copilot.el – GitHub Copilot integration (I happen to be the current maintainer of the project) elysium – an AI-powered coding assistant with inline diff application agent-shell – a native Emacs buffer for interacting with LLM agents (Claude Code, Gemini CLI, etc.) via the Agent Client Protocol avante.nvim – a Cursor-like AI coding experience inside Neovim codecompanion.nvim – a Copilot Chat replacement supporting multiple LLM providers copilot.lua – native Copilot integration for Neovim gp.nvim – ChatGPT-like sessions in Neovim with support for multiple providers Energy consumption. Training and running large language models requires enormous amounts of compute and electricity. For communities that have long valued efficiency and minimalism – Emacs users who pride themselves on running a 40-year-old editor, Vim users who boast about their sub-second startup times – the environmental cost of AI is hard to ignore. Copyright and training data. LLMs are trained on vast corpora of code and text, and the legality and ethics of that training remain contested. Some developers are uncomfortable using tools that may have learned from copyrighted code without explicit consent. This concern hits close to home for open-source communities that care deeply about licensing. Job displacement. If AI makes developers significantly more productive, fewer developers might be needed. This is an uncomfortable thought for any programming community, and it’s especially pointed for editors whose identity is built around empowering human programmers. If you’re curious about my Vim adventures, I wrote about them in Learning Vim in 3 Steps . ↩︎ Not to mention you’ll probably have to put in several years in Emacs before you’re actually more productive than you were with your old editor/IDE of choice. ↩︎ At least some of the time. Admittedly I usually use Emacs in GUI mode, but I always use (Neo)vim in the terminal. ↩︎ Even Claude Code has vim mode. ↩︎

Lua

Open Source

C++

0 views

JSLegendDev 1 months ago

If You Like PICO-8, You'll Love KAPLAY (Probably)

I’ve been checking out PICO-8 recently. For those unaware, It’s a nicely constrained environment for making small games in Lua. It provides a built-in editor allowing you to write code, make sprites, make tile maps and make sounds. This makes it ideal to prototype game ideas or make small games. You know what tool is also great for prototyping game ideas or making small games? Well… KAPLAY ! It’s a simple free and open source library for making games in JavaScript. I suspect there might be a sizeable overlap between people who like PICO-8 and those who would end up liking or even loving KAPLAY as well if they gave it a try. During my PICO-8 learning journey, I came across this nice tutorial teaching you how to make a coin collecting game in 10 minutes. In this article, I’d like to teach you how to build roughly the same game in KAPLAY. This will better demonstrate in what ways this game library makes game development faster much like PICO-8. Feel free to follow along if you wish to! KAPLAY lacks all of the tools included in PICO-8. There is no all-in-one package you can use to write your code, make your sprites, build your maps or even make sounds. You might be wondering, then, how KAPLAY is in any way similar to PICO-8 if it lacks all of this? My answer : KAPLAY makes up for it by making the coding part really easy by offering you a lot logic built-in. For example, it handles collisions, physics, scene management, animations etc… for you. You’ll see some of this in action when we arrive at the part where we write the game’s code. Now, how do we use KAPLAY? Here’s the simplest way I’ve found. You install VSCode (a popular code editor) along with the Live Server extension (can be found in the extensions marketplace within the editor). You then create a folder that you open within VSCode. Once the folder is opened, we only need to create two files. One called index.html and the other main.js. Your index.html file should contain the following : Since KAPLAY works on the web, it lives within a web page. index.html is that page. Then, we link our JavaScript file to it. We set the type to “module” so we can use import statements in our JS. We then add the following : Voilà! We can now use the KAPLAY library. Since we installed the Live Server extension, you should now have access to a “Go Live” button at the bottom of the editor. To actually run the game, all you have to do is click it. This will open the web page in your default browser. KAPLAY by default creates a canvas with a checkered pattern. One thing pretty cool with this setup, is that every time you change something in your code and hit save (Ctrl+S or Cmd+S on a Mac), the web page reloads and you can see your latest changes instantly. I’ve created the following spritesheet to be used in our game. Note that since the image is transparent, the cloud to the right is not really visible. You can download the image above to follow along. The next step is to place the image in the same folder as our HTML page and JavaScript file. We’re now ready to make our game. Here we set the width and the height of our canvas. The letterbox option is used to make sure the canvas scales according to the browser window but without losing its aspect ratio. We can load our spritesheet by using the loadSprite KAPLAY function. The first param is the name you want to use to refer to it elsewhere in your code. The second param is the path to get that asset. Finally, the third param is used to tell KAPLAY how to slice your image into individual frames. Considering that in our spritesheet we have three sprites placed in a row, the sliceX property should be set to 3. Since we have only one sprite per column (because we only have one column) sliceY should be set to 1. To make the coins fall from the top, we’ll use KAPLAY’s physics system. You can set the gravity by calling KAPLAY’s setGravity function. KAPLAY’s add function is used to create a game object by providing an array of components. These components are offered by KAPLAY and come with prebuilt functionality. The rect() component sets the graphics of the game object to be a rectangle with a width and height of 1000. On the other hand, the color component sets its color. You should have the following result at this stage. Creating The Basket The basket is a also a game object with several different components. Here is what each does : Sets the sprite used by the game object. The first param is for providing the name of the sprite we want to use. Since we’re using a spritesheet which contains three different sprites in the same image, we need to specify the frame to use. The basket sprite corresponds to frame 0. anchor() By default, game objects are positioned based on their top-left corner. However, I prefer having it centered. The anchor component is for this purpose. This component is used to set the position of the game object on the canvas. Here we also use center() which is a KAPLAY provided function that provides the coordinates of the center of the canvas. This component is used to set the hitbox of a game object. This will allow KAPLAY’s physics system to handle collisions for us. There is a debug mode you can access by pressing the f1 (fn+f1 on Mac) key which will make hitboxes visible. Example when debug mode is on. As for setting the shape of the hitbox, you can use the Rect class which takes 3 params. The first expects a vec2 (a data structure offered by KAPLAY used to set pair of values) describing where to place the hitbox relative to the game object. If set to 0, the hitbox will have the same position as the game object. The two params left are used to set its width and height. Finally, the body component is used to make the game object susceptible to physics. If added alone, the game object will be affected by gravity. However, to prevent this, we can set the isStatic property to true. This is very useful, for example, in a platformer where platforms need to be static so they don’t fall off. Here we can use the move method available on all game objects to make the basket move in the desired direction. The loop function spawns a coin every second. We use the randi function to set a random X position between 10 and 950. The offscreen component is used to destroy the game object once it’s out of view. Finally a simple string “coin” is added alongside the array of components to tag the game object being created. This will allow us to determine which coin collided with the basket so we can destroy it and increase the score. Text can be displayed by creating a game object with the text component. To know when a coin collides with the basket, we can use its onCollide method (available by default). The first param of that method is the tag of the game object you want to check collisions with. Since we have multiple coins using the “coin” tag, the specific coin currently colliding with the basket will be passed as a param to the collision handler. Now we can destroy the coin, increase the score and display the new score. As mentioned earlier, KAPLAY does not have a map making tool. However, it does offer the ability to create maps using arrays of strings. For anything more complex, you should check out Tiled which is also open source and made for that purpose. Where we place the # character in the string array determines where clouds will be placed in the game. Publishing a KAPLAY game is very simple. You compress your folder into a .zip archive and you upload it to itch.io or any other site you wish to. The game will be playable in the browser without players needing to download it. Now, what if you’d like to make it downloadable as well? A very simple tool you can use is GemShell. It allows you to make executables for Windows/Mac/Linux in what amounts to a click. You can use the lite version for free. If you plan on upgrading, you can use my link to get 15% off your purchase. To be transparent, this is an affliate link. If you end purchasing the tool using my link, I’ll get a cut of that sale. I just scratched the surface with KAPLAY today. I hope it gave you a good idea of what it’s like to make games with it. If you’re interested in more technical articles like this one, I recommend subscribing to not miss out on future publications. Subscribe now In the meantime, you can check out the following :

HTML

Tutorial

JavaScript Lua

0 views

Andre Garzia 1 months ago

Building your own blogging tools is a fun journey

# Building your own blogging tools is a fun journey I read a very interesting blog post today: ["So I've Been Thinking About Static Site Generators" by PolyWolf](https://wolfgirl.dev/blog/2026-02-23-so-ive-been-thinking-about-static-site-generators/) in which she goes in depth about her quest to create a 🚀BLAZING🔥 fast [static site generator](https://en.wikipedia.org/wiki/Static_site_generator). It was a very good read and I'm amazed at how fast she got things running. The [conversation about the post on Lobste.rs](https://lobste.rs/s/pgh4ss/so_i_ve_been_thinking_about_static_site) is also full of gems. Seeing so many people pouring energy into the specific problem of making SSGs very fast feels to me pretty much like modders getting the utmost performance out of their CPUs or car engines. It is fun to see how they are doing and how fast they can make clean and incremental builds go. > No one will ever complain about their SSG being too fast. As someone who used [a very slow SSG](https://docs.racket-lang.org/pollen/) for years and eventually migrated to [my own homegrown dynamic site](/2025/03/why-i-choose-lua-for-this-blog.html), I understand how frustrating slow site generation can be. In my own personal case, I decided to go with an old-school dynamic website using old 90s tech such as *cgi-bin* scripts in [Lua](https://lua.org). That eliminates the need for rebuilds of the site as it is generated at runtime. One criticism I keep hearing is about the scalability of my approach, people say: *"what if one of your posts go viral and the site crashes?"*, well, that is ok for me cause if I get a cold or flu I crash too, why would I demand of my site something I don't demand of myself? Jokes aside, the problem of scalability can be dealt with by having some heuristic figuring out when a post is getting hot and then generating a static version of that post while keeping posts that are not hot dynamic. I'm not worried about it. Instead of devoting my time to the engineering problem of making my SSG fast, I decided to put my energy elsewhere. A point that is often overlooked by many people developing blogging systems is the editing and posting workflow. They'll have really fast SSGs and then let the user figure out how to write the source files using whatever tool they want. Nothing wrong with that, but I want something better than launching $EDITOR to write my posts. In my case, what prevented me from posting more was not how long my SSG took to rebuild my site, but the friction between wanting to post and having the post written. What tools to use, how to handle file uploads, etc. So I begun to optmising and developing tools for helping me with that. First, I [made a simple posting interface](/2025/01/creating-a-simple-posting-interface.html). This is not a part of the blogging system, it is an independent tool that shares the code base with the rest of the blog (just so I have my own CGI routines available). Internally it uses [micropub](https://www.w3.org/TR/micropub/) to publish. After that, I made it into a Firefox Add-on. The add-on is built for ad-hoc distribution and not shared on the store, it is just for me. Once installed, I get a sidebar that allows me to edit or post. ![Editor](/2026/02/img/c6e6afba-3141-4eca-a0f4-3425f7bea0d8.png) This is part of making my web browser of choice not only a web browser but a web making tool. I'm integrating all I need to write posts into the browser itself and thus diminishing the distance between browsing the web and making the web. I added features to the add-on to help me quote posts, get addresses as I browse them, and edit my own posts. It is all there right in the browser. ![quoting a post](/2026/02/img/b6d2abc1-c4ae-42c5-931f-d390fc9b793f.png) Like PolyWolf, I am passionate about my tools and blogging. I think we should take upon ourselves to build the tools we need if they're not available already (or just for the fun of it). Even though I'm no longer in the SSG bandwagon anymore, I'm deeply interested in blogging and would like to see more people experimenting with building their own tools, especially if their focus is on interesting ux and writing workflows.

Web Development

HTML

0 views

(think) 1 months ago

Learning Vim in 3 Steps

Every now and then someone asks me how to learn Vim. 1 My answer is always the same: it’s simpler than you think, but it takes longer than you’d like. Here’s my bulletproof 3-step plan. Start with – it ships with Vim and takes about 30 minutes. It’ll teach you enough to survive: moving around, editing text, saving, quitting. The essentials. Once you’re past that, I strongly recommend Practical Vim by Drew Neil. This book changed the way I think about Vim. I had known the basics of Vim for over 20 years, but the Vim editing model never really clicked for me until I read it. The key insight is that Vim has a grammar – operators (verbs) combine with motions (nouns) to form commands. (delete) + (word) = . (change) + (inside quotes) = . Once you internalize this composable language, you stop memorizing individual commands and start thinking in Vim . The book is structured as 121 self-contained tips rather than a linear tutorial, which makes it great for dipping in and out. You could also just read cover to cover – Vim’s built-in documentation is excellent. But let’s be honest, few people have that kind of patience. Other resources worth checking out: Resist the temptation to grab a massive Neovim distribution like LazyVim on day one. You’ll find it overwhelming if you don’t understand the basics and don’t know how the Vim/Neovim plugin ecosystem works. It’s like trying to drive a race car before you’ve learned how a clutch works. Instead, start with a minimal configuration and grow it gradually. I wrote about this in detail in Build your .vimrc from Scratch – the short version is that modern Vim and Neovim ship with excellent defaults and you can get surprisingly far with a handful of settings. I’m a tinkerer by nature. I like to understand how my tools operate at their fundamental level, and I always take that approach when learning something new. Building your config piece by piece means you understand every line in it, and when something breaks you know exactly where to look. I’m only half joking. Peter Norvig’s famous essay Teach Yourself Programming in Ten Years makes the case that mastering any complex skill requires sustained, deliberate practice over a long period – not a weekend crash course. The same applies to Vim. Grow your configuration one setting at a time. Learn Vimscript (or Lua if you’re on Neovim). Read other people’s configs. Maybe write a small plugin. Every month you’ll discover some built-in feature or clever trick that makes you wonder how you ever lived without it. One of the reasons I chose Emacs over Vim back in the day was that I really hated Vimscript – it was a terrible language to write anything in. These days the situation is much better: Vim9 Script is a significant improvement, and Neovim’s switch to Lua makes building configs and plugins genuinely enjoyable. Mastering an editor like Vim is a lifelong journey. Then again, the way things are going with LLM-assisted coding, maybe you should think long and hard about whether you want to commit your life to learning an editor when half the industry is “programming” without one. But that’s a rant for another day. If this bulletproof plan doesn’t work out for you, there’s always Emacs. Over 20 years in and I’m still learning new things – these days mostly how to make the best of evil-mode so I can have the best of both worlds. As I like to say: The road to Emacs mastery is paved with a lifetime of invocations. That’s all I have for you today. Keep hacking! Just kidding – everyone asks me about learning Emacs. But here we are. ↩︎ Advent of Vim – a playlist of short video tutorials covering basic Vim topics. Great for visual learners who prefer bite-sized lessons. ThePrimeagen’s Vim Fundamentals – if you prefer video content and a more energetic teaching style. vim-be-good – a Neovim plugin that gamifies Vim practice. Good for building muscle memory. Just kidding – everyone asks me about learning Emacs. But here we are. ↩︎

Tutorial Lua

0 views

Michael Lynch 4 months ago

Refactoring English: Month 12

Hi, I’m Michael. I’m a software developer and founder of small, indie tech businesses. I’m currently working on a book called Refactoring English: Effective Writing for Software Developers . Every month, I publish a retrospective like this one to share how things are going with my book and my professional life overall. At the start of each month, I declare what I’d like to accomplish. Here’s how I did against those goals: I’ve gotten stuck on my design docs chapter. There’s a lot I want to cover, and I’m having trouble articulating some of it and deciding how much of it belongs in the book. Part of the problem is that the chapter is so long that it feels overwhelming to tackle all at once. My new plan is to break the chapter into smaller sections and focus on those one at a time. I think this is my last “hard” chapter, as I have a better sense of what I want to say in the remaining chapters. I keep procrastinating on this even though I enjoy doing it and get useful responses. I keep automating more of the logistical work in the hopes that reducing initial friction will motivate me to do it more. 3,508 people read the post, so it was somewhat successful at attracting new readers. Bob Nystrom, the author I was writing about, liked my article , which was gratifying. I figured even if my article flopped, at least it would let Bob Nystrom know how much I appreciated his work. November was a good month in terms of visits and sales. Visits were down slightly from October, but it was still one of the strongest months of the year. I did a Black Friday discount for 30% off. I only advertised it to readers on my mailing list, as I always feel strange spamming a sale everywhere. But the announcement was successful, as 18 customers purchased for a total of $359.41. Peter Spiess-Knafl , co-founder of zeitkapsl , cited Refactoring English in a blog post , which reached #1 on Lobsters . I was glad to see Peter’s post, as my plan for the book has always been for it to help readers write successful blog posts and be happy enough about the book that they recommend it. I read Hacker News so often that I feel like I’d be good at predicting which stories will reach the front page, but I’ve never tested this belief rigorously. So, I made a game to test my accuracy. The game shows me the newest submissions to Hacker News, and the player predicts whether or not they’ll reach the front page: The biggest problem with the game is that a story can take up to 24 hours to reach the front page. Waiting 24 hours for results sucks the fun out of the game. I tried changing the rules so that you’re predicting whether an article will reach the front page in its first 30 minutes, but 30 minutes still feels painfully slow. My new idea is to make a tentative call 10 minutes after a story has been submitted. Given the story’s age, upvotes, and comment count, I can calculate some rough probability of whether it has a chance of hitting the front page. So, if you predicted a story would reach the front page, but 10 minutes later, it still has no upvotes or comments, the game will tentatively tell you that you got it wrong, but you can still get the points back if the story makes a miraculous comeback in the next 24 hours. I thought about making a version of the game where you guess the results of past stories. That way, I could give instant feedback because the answer is already available, but that feels less fun, as other people have made similar games. Plus, for the HN diehards I’m hoping this game appeals to, past data ruins it because you kind of remember what was on the front page and what wasn’t. My wife and I had our first child last year , so we wanted a way to share baby photos with our family privately. Some of my friends had used apps like this, but they were all ad-supported. I hate the idea of companies slapping ads on photos of my child, so I looked for other options. When I came across TinyBeans, I thought I’d found a winner. They had a paid version that disabled ads, and privacy was the main feature they advertised: perfect! Then, I started using TinyBeans, and there were ads everywhere. “Buy our photo books!” “Give us more personal information!” I opened the app just now and had to dismiss three separate ads to see photos of my own child. TinyBeans shows me three huge ads when I open the app, even though I’m a paying customer and have dismissed these exact ads dozens of times before. It also turns out that my family members receive even more ads than I see, including for third-party services. Here’s a recent one that encourages my family to invest in some scammy AI company: When TinyBeans sends emails to my family, they stick spammy ads like these in between photos of my son. The “no ads” promise of the paid tier is limited to me and my wife; TinyBeans bombards everyone else in my family with ads and upsells. I wanted to ditch TinyBeans early on, but I was too busy with new parent stuff to find a new app and migrate my whole family to it. So, each month, I begrudgingly give TinyBeans my $9. Then, Black Friday happened. TinyBeans sent me an email patting themselves on the back for not cluttering my inbox with Black Friday deals because all the deals would be in the app. TinyBeans sends me a pointless email to boast about not cluttering my inbox with pointless emails. Great, an email congratulating yourself about how little you’ll email me. But that wasn’t even true! TinyBeans proceeded to send me four more emails telling me to check my app for Black Friday deals: After promising not to bombard me with Black Friday promotions, TinyBeans emailed me five Black Friday promotions. That pushed me over the edge, and now I’m on a spite mission to create my own TinyBeans replacement and stop giving TinyBeans my money. “And what are your reasons for wanting to create an app to share baby photos?” The only functionality I care about in TinyBeans is: How hard could that be? 20 hours of dev work? The TinyBeans web and Android apps suck anyway, so I’ll be glad to move away from them. And because the experience is mostly email-based, I can replace TinyBeans with my own app without my family having to do any work as part of the migration. I’m not starting a company to compete with TinyBeans. I just want to make a web app that replaces TinyBeans’ functionality. One of my shameful secrets as a developer is that I’m bad at managing windows on my screen. I compensate by overusing my mouse, even though that’s slow and inefficient. Last year, I switched from Windows to Linux and got a 49" ultrawide monitor . While Windows was designed for mouse-happy users like me, Linux desktops are much more keyboard-focused, so my lack of keyboard discipline began catching up with me. I’d keep opening windows and never close them, so I’d end up with 10+ VS Code windows, 10+ Firefox windows, and 5 different instances of the calculator app for one-off calculations. They were all in one big pile in the middle of my desktop. At that point, it was obvious I was wasting tons of screen real estate and burning time locating my windows. I tried a few different window managers, but I kept running into issues. Like I couldn’t get lockscreens to work, or they’d fail to use my monitor’s full 5120x1440 resolution. The fastest person I’ve ever seen navigate their computer is my friend okay zed . I asked him for advice, and he explained his approach to window management . His strategy is to use many virtual desktops where windows are almost always full screen within the desktop. He uses xmonad, but he suggested I try Awesome Window Manager. I liked okay’s philosophy of single-purpose virtual desktops, so I created an Awesome window manager configuration around it. So, I have a dedicated desktop for my blog, a dedicated desktop for my book, one for email, etc. I try to limit myself to 1-2 windows per desktop, but sometimes I’ll pull up a third or fourth while looking something up. Here’s what my blog desktop looks like, which is pretty typical: one VS Code window for editing, one Firefox window for viewing the result, and sometimes a second Firefox window for looking stuff up: I didn’t like any of the default desktop modes, so I had to roll my own . It gives each window 25% of my screen’s width, and if I open more than four, it squishes everything to fit. I can also manually expand or contract windows with Shift+Win+H and Shift+Win+L. Except sometimes I accidentally lock myself out because Win+L is my hotkey for locking the screen. Based on a few weeks with Awesome, here’s how I’m feeling: I was talking to LGUG2Z on Mastodon about how annoying it is to embed tweets on my blog. If the user deletes their tweet, I end up with dead content in my post. Even when it works, my readers have to load trackers from Twitter. I’ve been working around it by just screenshotting tweets, but that’s an ugly solution. I want to embed tweets in Hugo (the static site generator I use for this blog) with a shortcode like , and then Hugo could fetch the tweet data and store it under source control so that I don’t have an ongoing dependency on Twitter. LGUG2Z explored this idea and implemented support for it on his Zola blog. He runs a script to pre-download data once from external sources (like tweets), and then he can embed the content in his blog without re-retrieving it at blog build time or reader visit time. I tried to adapt LGUG2Z’s solution for Hugo, but it got too complicated . I wrote a standalone script that downloads data from Twitter and then I’d render it in a tweet-like UI . Regular text tweets worked okay, but once I got to tweets with embedded media or retweets, it felt like I was building too much on shaky foundation. I used to store all of my photos on Google Photos. Despite my privacy concerns, Google Photos was just so much better than anything else that I held my nose and just gave them all my photos. I’ve since become more privacy sensitive and distrustful of Google, so I stopped uploading new photos to Google Photos, but I haven’t found a replacement. I’ve heard good things about Immich and Ente, so I was glad to see this detailed writeup from Michael Stapelberg about his experience setting up an Immich server using NixOS . Firefox recently improved their Enhanced Tracking Protection , a feature I didn’t realize existed. I turned it on, and it blocks trackers that uBlock was allowing and hasn’t had any false positives. I just discovered “Rich Friend, Poor Friend” from 2022 and the follow up from a few weeks ago. I definitely relate to hiring professionals instead of asking my friends for help (e.g., hiring movers instead of asking friends). I’m maybe in the worst part of the curve where I’m wealthy enough to not want to ask friends to help me move but not so wealthy that I have a separate guest house to make it easy to host them. The Deel corporate espionage story is getting surprisingly little attention in my bubble. In March 2025, Rippling revealed that they discovered one of their employees was actually a corporate spy working for their competitor, Deel. When they caught the spy, he ran into the bathroom and tried to flush his phone down the toilet. Rippling posted an update in November that they found banking records showing that Deel had routed payments to the spy through the wife of Deel’s COO. The wife was, coincidentally, a compliance lead at Robinhood, another company known for its scummy ethics . As an unhappy former Deel customer, I’m happy to see them get their comeuppance. I’m working on a game to predict which posts will reach the front page of Hacker News. I’m creating a family photo sharing app out of spite. I switched to a keyboard-first window manager. Result : Published one new chapter Result : I only reached out to two readers (one responded). Result : Published “What Makes the Intro to Crafting Interpreters so Good?” My family can browse the baby photos and videos I’ve uploaded. My family members can subscribe to receive new photos and videos via email. My family members can comment or give emoji reactions to photos. What I like Encourages me to keep single-purpose desktops for better focus. Encourages me to navigate via keyboard hotkeys rather than mouse clicks. Doesn’t crash on suspend 2% of the time like Gnome did. What I dislike Everything is implemented in and configured through Lua, a language I don’t know. I’m using LLMs to write all my configs. The configuration is fairly low-level, so you have to write your own logic for things like filling the viewport without overflowing it. I don’t like any of the default desktop modes, so I had to roll my own. The documentation is all text, which feels bizarre for software designed specifically around graphics. If you accidentally define conflicting hotkeys, Awesome doesn’t warn you. If I click a link outside of Firefox, sometimes it loads the link in a browser that isn’t on my current desktop. I’m guessing it loads it on whatever Firefox window I most recently touched. What I still need to figure out How to implement “scratchpad” functionality. Like if I want to pull up my password manager as a floating window or summon the calculator for a quick calculation, then dismiss it. How to put more widgets into the status bar like network connectivity and resource usage. Published “What Makes the Intro to Crafting Interpreters so Good?” Published “My First Impressions of MeshCore Off-Grid Messaging” . Published “Add a VLAN to OPNsense in Just 26 Clicks Across 6 Screens” Created a tiny Zig utility called count-clicks to count clicks and keystrokes on an x11 system. Got Awesome Window Manager working. Quick feedback is important in creating a fun game. TinyBeans actually has a lot of ads, even on the paid version. The Awesome window manager is a better fit for my needs than Gnome. Publish a game that attracts people to the Refactoring English website. Publish two chapters of Refactoring English . Write a design doc for a just-for-fun family photo sharing app. If you’re interested in beta testing the “Will it Hit the Front Page?” game, reach out .

Writing Zig Lua

1 views

JSLegendDev 4 months ago

Making a Small RPG

I’ve always wanted to try my hand making an RPG but always assumed it would take too much time. However, I didn’t want to give up before trying so I started to think of ways I could still make something compelling in 1-2 months. To help me come up with something, I decided to look into older RPGs as I had a hunch they could teach me a lot about scoping because back in the 80s, games were small because of technical limitations. A game that particularly caught my attention was the first Dragon Quest. This game was very important because it popularized the RPG genre in Japan by simplifying the formula therefore, making it more accessible. It can be considered the father of the JRPG sub-genre. What caught my attention was the simplicity of the game. There were no party members, the battle system was turn based and simple and you were free to just explore around. I was particularly surprised by how the game could give a sense of exploration while the map was technically very small. This was achieved by making the player move on an overworld map with a different scale proportion compared to when navigating towns and points of interest. In the overworld section, the player appeared bigger while the geography was smaller, allowing players to cover large amounts of territory relatively quickly. The advantage of this was that you could switch between biomes quickly without it feeling jarring. You still had the impression of traversing a large world despite being small in reality. This idea of using an overworld map was common in older games but somehow died off as devs had less and less technical limitations and more budget to work with. Seeing its potential, I decided that I would include one in my project even if I didn’t have a clear vision at this point. Playing Dragon Quest 1 also reminded me of how annoying random battle encounters were. You would take a few steps and get assaulted by an enemy of some kind. At the same time, this mechanic was needed, because grinding was necessary to be able to face stronger enemies in further zones of the map. My solution : What if instead of getting assaulted, you were the one doing the assault? As you would move on the map, encounter opportunities signified by a star would appear. Only if you went there and overlapped with one would a battle start. This gave the player agency to determine if they needed to battle or not. This idea seemed so appealing that I knew I needed to include it in my project. While my vision on what I wanted to make started to become clearer, I also started to get a sense of what I didn’t want to make. The idea of including a traditional turn based battle system was unappealing. That wasn’t because I hated this type of gameplay, but ever since I made a 6 hour tutorial on how to build one , I realized how complicated pulling one off is. Sure, you can get something basic quickly, but to actually make it engaging and well balanced is another story. A story that would exceed 1-2 months to deal with. I needed to opt for something more real-time and action based if I wanted to complete this project in a reasonable time frame. Back in 2015, an RPG that would prove to be very influential released and “broke the internet”. It was impossible to avoid seeing the mention of Undertale online. It was absolutely everywhere. The game received praised for a lot of different aspects but what held my attention, was its combat system. It was the first game I was aware of, that included a section of combat dedicated to avoiding projectiles (otherwise known as bullet hell) in a turn based battle system. This made the combat more action oriented which translated into something very engaging and fun. This type of gameplay left a strong impression in my mind and I thought that making something similar would be a better fit for my project as it was simpler to implement. While learning about Dragon Quest 1, I couldn’t help but be reminded me of The Legend of Zelda Breath of The Wild released in 2017. Similarly to Dragon Quest, a lot of freedom was granted to the player in how and when they tackled the game’s objectives. For example, in Breath of The Wild, you could go straight to the final boss after the tutorial section. I wanted to take this aspect of the game and incorporate it into my project. I felt it would be better to have one final boss and every other enemy encounter would be optional preparation you could engage with to get stronger. This felt like something that was achievable in a smaller scope compared to crafting a linear story the player would progress through. Another game that inspired me was Elden Ring, an open world action RPG similar to Breath of The Wild in its world structure but with the DNA of Dark Souls, a trilogy of games made previously by the same developers. What stuck with me regarding Elden Ring, for the purpose of my project, was its unique way it handled experience points. It was the first RPG I played that used them as a currency you could spend to level up different attributes making up your character or to buy items. Taking inspiration from it, I decided that my project would feature individually upgradable stats and that experience points would act as a currency. The idea was that the player would gain an amount of the game’s currency after battle and use that to upgrade different attributes. Like in Elden Ring, if you died in combat you would lose all currency you were currently holding. I needed a system like this for my project to count as an RPG. Since by definition an RPG is stats driven. A system like this would also allow the player to manage difficulty more easily and it would act as the progression system of my game. When I started getting into game development, I quickly came across Pico-8. Pico-8, for those unaware, is a fantasy console with a set of limitations. It’s not a console you buy physically but rather a software program that runs on your computer (or in a web browser) that mimics an older console that never existed. To put it simply, it was like running an emulator for a console that could’ve existed but never actually did. Hence the fantasy aspect of it. Pico-8 includes everything you need to make games. It has a built-in code editor, sprite editor, map editor, sound editor, etc… It uses the approachable Lua programming language which is similar to Python. Since Pico-8 is limited, it’s easier to actually finish making a game rather than being caught in scope creep. One game made in Pico-8 particularly caught my interest. In this game you play as a little character on a grid. Your goal is to fight just one boss. To attack this boss, you need to step on a glowing tile while avoiding taking damage by incoming obstacles and projectiles thrown at you. ( Epilepsy Warning regarding the game footage below due to the usage of flashing bright colors.) This game convinced me to ditch the turned based aspect I envisioned for my project entirely. Rather than having bullet hell sections within a turn based system like in Undertale the whole battle would instead be bullet hell. I could make the player attack without needing to have turns by making attack zones spawn within the battlefield. The player would then need to collide with them for an attack to register. I was now convinced that I had something to stand on. It was now time to see if it would work in practice but I needed to clearly formulate my vision first. The game I had in mind would take place under two main scenes. The first, was the overworld in which the player moved around and could engage in battle encounters, lore encounters, heal or upgrade their stats. The second, being the battle scene, would be were battles would take place. The player would be represented by a cursor and they were expected to move around dodging incoming attacks while seeking to collide with attack zones to deal damage to the enemy. The purpose of the game was to defeat a single final boss named king Donovan who was a tyrant ruling over the land of Hydralia where the game took place. At any point, the player could enter the castle to face the final boss immediately. However, most likely, the boss would be too strong. To prepare, the player would roam around the world engaging in various battle encounters. Depending on where the encounter was triggered, a different enemy would show up that fitted the theme of the location they were in. The enemy’s difficulty and experience reward if beaten would drastically vary depending on the location. Finally, the player could level up and heal in a village. I was now ready to start programming the game and figuring out the details as I went along. For this purpose, I decided to write the game using the JavaScript programming language and the KAPLAY game library. I chose these tools because they were what I was most familiar with. For JavaScript, I knew the language before getting into game dev as I previously worked as a software developer for a company who’s product was a complex web application. While most of the code was in TypeScript, knowing JavaScript was pretty much necessary to work in TypeScript since the language is a superset of JavaScript. As an aside, despite its flaws as a language, JavaScript is an extremely empowering language to know as a solo dev. You can make games, websites, web apps, browser extensions, desktop apps, mobile apps, server side apps, etc… with this one language. It’s like the English of programming languages. Not perfect, but highly useful in today’s world. I’ll just caveat that using JavaScript makes sense for 2D games and light 3D games. For anything more advanced, you’d be better off using Unreal, Unity or Godot. As for the KAPLAY game library, it allows me to make games quickly because it provides a lot of functionality out of the box. It’s also very easy to learn. While it’s relatively easy to package a JavaScript game as an app that can be put on Steam, what about consoles? Well it’s not straightforward at all but at the same time, I don’t really care about consoles unless my game is a smash hit on Steam. If my game does become very successful than it would make sense businesswise to pay a porting company to remake the game for consoles, getting devkits, dealing with optimizations and all the complexity that comes with publishing a game on these platforms. Anyway, to start off the game’s development, I decided to implement the battle scene first with all of its related mechanics as I needed to make sure the battle system I had in mind was fun to play in practice. To also save time later down the line, I figured that I would make the game have a square aspect ratio. This would allow me to save time during asset creation, especially for the map as I wanted the whole map to be visible at once as I wouldn’t use a scrolling camera for this game. After a while, I had a first “bare bones” version of the battle system. You could move around to avoid projectiles and attack the enemy by colliding with red attack zones. Initially, I wanted the player to have many stats they could upgrade. They could upgrade their health (HP), speed, attack power and FP which stood for focus points. However, I had to axe the FP stat as I originally wanted to use it as a way to introduce a cost to using items in battle. However, I gave up on the idea of making items entirely as they would require too much time to create and properly balance. I also had the idea of adding a stamina mechanic similar to the one you see in Elden Ring. Moving around would consume stamina that could only replenish when you stopped moving. I initially though that this would result in fun gameplay as you could upgrade your stamina over time but it ended up being very tedious and useless. Therefore, I also ended up removing it. Now that the battle system was mostly done, I decided to work on the world scene where the player could move around. I first implemented battle encounters that would spawn randomly on the screen as red squares, I then created the upgrade system allowing the player to upgrade between 3 stats : Their health (HP), attack power and speed. In this version of the game, the player could restore their health near where they could upgrade their stats. While working on the world scene was the focus, I also made a tweak to the battle scene. Instead of displaying the current amount of health left as a fraction, I decided a health bar would be necessary because when engaged in a fast paced battle, the player does not have time to interpret fractions to determine the state of their health. A health bar would convey the info faster in this context. However, I quickly noticed an issue with how health was restored in my game. Since the world was constrained to a single screen, it made going back to the center to get healed after every fight the optimal way to play. This resulted in feeling obligated to go back to the center rather than freely roaming around. To fix this issue, I made it so the player needed to pay to heal using the same currency for leveling up. Now you needed to carefully balance between healing or saving your experience currency for an upgrade by continuing to explore/engage in battle. All of this while keeping in mind that you could lose all of your currency if defeated in battle. It’s important to note that you could also heal partially which provided flexibility in how the player managed the currency resource. Now that I was satisfied with the “bare bones” state of the game, I needed to make nice looking graphics. To achieve this, I decided to go with a pixel art style. I could spend a lot of time explaining how to make good pixel art but, I already did so previously. I recommend checking my post on the topic. I started by putting a lot effort drawing the overworld map as the player would spend a lot of time in it. It was a this stage that I decided to make villages the places where you would heal or level up. To make this clearer, I added icons on top of each village to make it obvious what each was for. Now that I was satisfied with how the map turned out, I started designing and implementing the player character. For each distinct zone of the map, I added a collider so that battle encounters could determine which enemy and what background to display during battle. It was at this point that I made encounters appear as flashing stars on the map. Since my work on the overworld was done, I now needed to produce a variety of battle backgrounds to really immerse the player in the world. I sat down and locked in. These were by far one of the most time intensive art assets to make for this project but I’m happy with the results. After finishing making all backgrounds, I implemented the logic to show them in battle according to the zone where the encounter occurred. The next assets to make were enemies. This was another time intensive task but I’m happy with how they turned out. The character at the bottom left is king Donovan the main antagonist of the game. Further Developing The Battle Gameplay While developing the game, I noticed that it took too much time to go from one end of the battle zone to the other. This made the gameplay tedious so I decided to make the battle zone smaller. At this point, I also changed the player cursor to be diamond shaped and red rather than a circle and white. I also decided to use the same flashing star sprite used for encounters on the map but this time, for attack zones. I also decided to change the font used in the game to something better. At this point, the projectiles thrown towards the player didn’t move in a cohesive pattern the player could learn over time. It was also absolutely necessary to create a system in which the attack patterns of the enemy would be progressively shown to the player. This is why I stopped everything to work on the enemy’s attack pattern. I also, by the same token, started to add effects to make the battle more engaging and sprites for the projectiles. While the game was coming along nicely, I started to experience performance issues. I go into more detail in a previous post if you’re interested. To add another layer of depth to my game, I decided that the reward you got from a specific enemy encounter would not only depend on which enemy you were fighting but also how much damage you took. For example, if a basic enemy in the Hydralia field would give you a reward of a 100 after battle, you would actually get less unless you did not take damage during that battle. This was to encourage careful dodging of projectiles and to reward players who learned the enemy pattern thoroughly. This would also add replayability as there was now a purpose to fight the same enemy over and over again. The formula I used to determine the final reward granted can be described as follows : At this point, it wasn’t well communicated to the player how much of the base reward they were granted after battle. That’s why I added the “Excellence” indication. When beating an enemy, if done without taking damage, instead of having the usual “Foe Vanquished” message appearing on the screen, you would get a “Foe Vanquised With Excellence” message in bright Yellow. In addition to being able to enter into battle encounters, I wanted the player to have lore/tips encounters. Using the same system, I would randomly spawn a flashing star of a blueish-white color. If the player overlapped with it, a dialogue box would appear telling them some lore/tips related to the location they were in. Sometimes, these encounters would result in a chest containing exp currency reward. This was to give a reason for the player to pursue these encounters. This is still a work in progress, as I haven’t decided what kind of lore to express through these. One thing I forgot to show earlier was how I revamped the menu to use the new font. That’s all I have to share for now. What do you think? I also think it’s a good time to ask for advice regarding the game’s title. Since the game takes place in a land named Hydralia . I thought about using the same name for the game. However, since your mission is to defeat a tyrant king named Donovan, maybe a title like Hydralia : Donovan’s Demise would be a better fit. If you have any ideas regarding naming, feel free to leave a comment! Anyway, if you want to keep up with the game’s development or are more generally interested in game development, I recommend subscribing to not miss out on future posts. Subscribe now In the meantime, you can read the following :

Design

JavaScript

Python Lua

0 views

Playtank 5 months ago

Maximum Iteration

The quality of your game is directly related to the number of iterations you have time to make. The adage is that game development is an iterative process . We know we should be tweaking and tuning our game until it feels and runs great. To make it the best it can be; greater than the sum of its parts. Early on, to make sure that the features we work on are worth pursuing. An iteration can be as small as an incremented variable or as big as a complete reset of your entire game project. What iterations have in common is that the only way to get more of them is to teach yourselves the right mindset and to continuously remove anything that costs time. For the past few years, this has been at the top of my mind: how to maximise iteration . At the very highest level, you need to remove obstacles, clicks, and tools. The fewer things a developer needs to know and do per iteration, the better. Those three are what this is all about. I’ve come up with five areas where you need to optimise iteration, that I’ve obsessively built into my own pipelines. These five are what the rest of this post elaborates on: Iterating on object and state authoring means creating new objects and states and connecting them to data. A character that can roam, shoot, and take cover, and has MoveSpeed, TurnSpeed, and Morale, perhaps. This is one of those things where many developers will get used to how their first engine does things and forever see it as the norm. But most tools for object authoring are actually quite terrible (in my opinion), and are also highly unlikely to match your specific needs. They are far more likely to present you with hoops to jump through and prevent you from achieving fast iteration. It’s not unusual for getting a new object into a game to take hours and involve multiple people. Particularly if the game’s pipeline has grown organically over several years of production. Where you only had to add a single collision capsule at first, maybe you must now add a full ragdoll, two different sets of hit capsules, IK targets, and a bunch of other things before the new asset works as intended. Some of which has to be created manually. Forget one step, and your game may crash or exhibit weird results. This is a big threat to iteration. Maybe the biggest. So if you can, you should make your own tools for object authoring that are perfectly suited to your needs, require as few steps as possible, and waste as little time as possible. Or use a tool that’s specifically made for exactly the thing you need, if you can find it. I tend to think of objects in systemic design as Characters, Props, and Devices. This is not in any way strict, it’s only what my favorite designs tend to need. If you are working on a grand strategy game, a puzzle game, or something else, the nature of your objects may vary. The key to object authoring is variation. A lamp is not the same thing as a crate or a human, but they should be able to interact in interesting ways. To make them interact, you need to be able to vary them easily and then hand off responsibility to the game’s systems in a predictable way. Something that can’t be stressed enough is to always set working defaults for all of your objects. Make sure that objects work out of the box so iteration can begin immediately. Few things waste more time than “oops, forgot the flag that did the thing.” The most intuitive way to represent objects is to use objects, unsurprisingly. A Character can be expected to do certain things and a Door will do other things. Enemy and Player can now inherit from Character and they may make use of a Gun or a Broom depending on the kind of game you’re making. With this setup, authoring objects is no harder than inheriting from the right class and then tweaking the numbers. This is how Unreal Engine is used by many teams. But this gets cumbersome if you want a character that can fly or to utilise the dialogue system in a character but for something that cannot move. Or maybe the spline following that characters have, but now for a train car. Authoring with object-oriented systems seems intuitive but doesn’t handle exceptions well. Everything now needs to be a character if it wants to access certain things, and designers will have to learn the intricacies of all the objects in the game before they can truly begin iterating. If you want your object to collide with things in a physics simulation, you add a Collider. If you want it to move on a flow field, you add FlowFieldMove. The sum of an object’s components dictates its behavior. This may use many different types of component setups, but the two most common are GameObject/Component (GO/C) and Entity Component System (ECS). Both Unreal and Unity uses the first, but in very different ways. Both Unreal and Unity also provide ways to use the second, but in ways that are mostly incompatible with the first. Conceptually, component-based object authoring is great. In practice, it tends to be a deep rabbit hole of exceptions and flawed component combinations that have grown organically through an engine’s lifetime. Most game engines today are data-driven at some level. You plug data in, it gets compiled into an engine-friendly format, and voila: the engine knows what to do. The data is picked up by a renderer, physics engine, or something else, and things simply happen just the way they are supposed to because the data is clear enough to just chug along. Like feeding coal into a steam engine. With a data-driven approach, you will usually be collecting all that data and bundling it up using authoring tools. Bring in the mesh asset, animate it using animation assets, play some sound assets on cue, etc. The data itself will drive the process. For example in a “target-based” setup, where one piece of data activates another which activates a third, etc., until the game level or other logic has run its course. You need ways to define how something goes from Alive to Dead, or when something should be Idle instead of Moving. This layer of authoring and iteration is very rarely straightforward, and parts of it are almost always deep down in the code for your game. This is bad. So let’s discuss how to make it not bad, and how to open up your game for more direct rules authoring through state transitions. If my use of the word “state” in this post gets confusing, you can look into the state-space prototyping post to see what I mean. This is not standard jargon used by all game developers, but it is a key part in my own framework. A good state authoring tool allows you to list which states an object can be in, where it can collect changes from, and how it behaves in relation to other objects and their state. Just to be clear: this doesn’t have to be complex at all. It can be enough to list the actions an entity can use and then leave it to other systems to actually select actions. Take a look at the An Object-Rich World post if you are curious about other models for working with permissions and restrictions. The most important element of permissions and restrictions is predictability . There are many cases where our games become interconnected in ways that are not immediately visible. For example, when you say that a character’s ability to Move has been restricted due to a state, you may have to manually add this to multiple places. Perhaps the sound, animation, and head-bobbing system also need to be paused separately. This is extremely bad, because it means both that you will get unpredictable results and that you will often have to revisit the same changes. A specific state is only relevant for a particular object. A generic state can be used by any object sharing the same characteristics. Think of the idea of spotting something, for example. A sensor picking up that an object can be seen. If a player is going to spot something, this needs to be specific , since the player’s avatar, unlike a NPC avatar, will generally have a camera attached to it. So to check if the player spots something, we can use the camera’s viewport to determine if the thing is on-screen or not. A generic version of the same thing could instead use the avatar’s forward vector, an arbitrary angle, and perhaps a linecast, to determine if the object can be seen. This could be used by any avatar, player or otherwise, and would probably be accurate enough if your game doesn’t need more granularity. An exclusive state is the only state that can be run at a given time, whereas an inclusive state also allows other state to run alongside it. Parallell states are made to run at the same time as each other and may therefore not poke at the same data, or you could get unpredictable results. A state is conditional if it only activates based on preset conditions. It’s your if-then-else setup. Conditionals will often need considerable tweaking, and if you’re not careful in how you build such systems, they can turn into a tangled mess. Just like nested ifs. Common ways to handle conditional states are predicate functions, tags, flags, and many of the other things brought up in the A State-Rich Simulation post. Preferably, setting or changing conditionals should be just a click or two, and it should respect the type of data separation mentioned earlier. When a game has multiple dynamic sources for conditions, it quickly gets complicated. For this reason, your tools should provide debug settings for visualising where conditions are coming from, and you can also log everything that gets triggered by certain conditions during a session. A state is injected when it’s pushed into an object. This can follow any number of systemic effects , from straightup addition to slightly more granular propagation . Common points in a game simulation for state to get injected are collision events, spawning or destruction, proximity, spotting, and various forms of scripted messaging. This means that having a solid system for defining such injections is a great starting point for how transitions will work in your game. If you have the concept of a Room, for example, this Room may keep track of what’s inside of it and then propagate that knowledge to anyone visiting the room. Objects would then inject their presence into the room, while the room would inject relevant state into the objects in turn. An explicit conditional state is something like the Idle state pushing a Move state onto an internal stack because move vector magnitude is higher than zero. These are the only circumstances where Move will ever happen, making it an explicit transition. A dynamic state would be something like a gunshot killing you by injecting the Dead state. This is a dynamic transition because it can happen at any time, and beyond any restrictions on the injection itself (ammo, aiming, etc.), you won’t be defining anything in advance, and you’re not really waiting for it to happen. It happens when it happens, or it may not happen at all. A state is timed if it remains active for a limited time. It can also loop over a given duration and either bounce back (i.e., from 0 to 1 back to 0) or it can reset and repeat. The current value of the timed state is often referred to simply as T and should be a normalized (0-1) floating point number. This type of state is extremely handy, and you will want to tweak how the T value output gets handled in as many varied ways as possible. You want to be able to use curves, easing functions, and all thinkable different kinds of interpolation. Timed state can be used to achieve anything from a Thief -style AI sense of “smell,” to a menu blend, to an animation system, to reward pizzazz. It’s the perfect type of state for an interstitial and is where you will be able to do much of your polish. A state is interstitial when it’s added between other states without affecting them beyond the delay this may cause. Screenfades, stop frames, and sound triggers, are some examples of this. Objects and states will be defining the game at its highest level. But you will also want to change the rat catcher’s catching range from 2.3 to 2.5 and maybe add an additional key to a curve to make a fade-in smoother. It’s been mentioned before, but may be worth repeating: you should separate data from objects from the very beginning of your project. Every second you can avoid having to navigate the jungle of files in your project is a second gained towards additional iteration. Remember: remove clicks and remove tools. Many games will expect either a database approach (“spreadsheet specific,” in Michael Sellers’ terms), or they will have a hard connection between an object and its data. But a good data authoring tool is either integrated with the game engine or is an established external tool, such as a spreadsheet or database, that has a single-click or dynamic export/import process into the game. Many games still to this day keep data hard-coded into their compiled executables. This can be done for security or obfuscation reasons, out of habit, or because the engine used for a certain game is structured that way. For a small game with simple data, this is rarely an issue. You can make your changes, recompile, and then test, within seconds. But for bigger or more complex projects, it can have a cascading effect on iteration complexity. It also forces you to rely on programmers even for changes that have nothing to do with game logic or code. If you can avoid this, do so. It doesn’t matter if a compile takes five minutes, it’ll be stealing those five minutes over and over again. It will also decrease the number of iterations you can make. Issues with compiled data are not new. One common way to avoid some of them is to use lightweight text files that can be loaded and interpreted at runtime. This can be done in one of two ways. You can construct data this way . The below is a small example of this, where Lua was used to package information about different sectors in a space game. In this case, a sector has details about which other sectors the player can travel to, which pilots are present in the sector, and which stations and colonies can be visited. This is information that could’ve been hardcoded into the client, but this way it’s made available at runtime and much easier to iterate on. You can build logic this way . The next example is also Lua, but is a narrative sequence from the same space game. By exposing gameplay features to Lua, it becomes possible to script these sequences that can be loaded and parsed by the engine on demand. One benefit of this is that you can rewrite the script, make the engine reload the data, and then test within moments of making the change. If there’s such a thing as a standard today, it’s to store your data in a database. This database may live on a proprietary server owned by the developer or publisher, or it can utilise something in the cloud, like Microsoft Azure or Amazon Web Services (AWS). It can also be an offline database that you store with your game client much like a script. A database forces you to decouple data from objects and allows live editing of data (if in the cloud). Most modern live service games do this for some of its data, if not all, as it makes it a lot easier to respond to community feedback and fix data-related issues. Planning how you structure your data before a project begins can save you many headaches. If you want to do MoveSpeed, you could have a MoveSpeed baseline multiplier at 1.0, each object could have a MoveSpeed attribute of maybe 10-20, and gear or other props could then add their own MoveSpeed modifiers on top as additions, multipliers, cumulative multipliers, or some other thing. You’d get something like MoveSpeed = Baseline * (Attribute + Modifier(s)) . If you manage to separate these from their objects you can mix things up for any reason you want without ever touching or even looking for the objects ever again. The amount of time this saves for more iteration can’t be overstated. (Again: remove clicks, remove tools.) Maybe you want to modify Baseline based on difficulty, so that MoveSpeed is 1.5x on Easy, but only 0.75 on Hard. Or go in there and double the MoveSpeed attribute for all enemies that have the Small trait. With this type of separation, all of those things can suddenly be done in seconds. This makes everything from bulk operations to conditional exceptions a lot easier to make and therefore to iterate on. A change set is a collection of changes made to your existing data. You can look at it as a changelist or commit in version control. Bundling variables into change sets is a handy way to keep track of what you are doing and makes it easier to compare one change to another. Change sets really come into their own if you can combine them, turn them on/off, and provide more than one at a time. Over time, these sets can become like a log for your earlier tweaks, creating a kind of tweak history for your game’s design. To know how any iteration works out you need to play it. But it’s not enough to merely play as you usually do. You need to compare changes and report when something doesn’t work out. Even as a solo developer, a solid reporting tool can be the difference between fixing problems and shipping with them. This is where your change sets from before will work their magic. Let’s say you made a “goblin damage debuff” change set where you decreased how much damage the goblin dealt by half, and you now go into your change set tool to activate that change set. Or you tell external playtesters to play once with and once without the change set. You can suddenly talk about balancing the same way you’d talk about feature implementations. I encounted Semantic Versioning during my first mobile game studio experience, at Stardoll Mobile Games. I’ve stuck to it ever since. The summary for Semantic Versioning is so simple, yet so powerful: “Given a version number MAJOR.MINOR.PATCH, increment the: This is a convenient way to plan your assets. The Patch version can be automatically incremented whenever you build your game to identify each change and you can regulate when the Minor and Major version must be incremented. For example, you can plan that you only release a new Major when you are releasing new content and a Minor when features are added or changed. At Calm Island, we used to maintain one Dev and one Stable branch. The latter meant we could always show the game to any external stakeholders, even if it may have been an older build. The stable version was also the one deployed to stores after final validation. The idea to always keep your game playable may sound self-explanatory, but good processes for this are uncommon. Many studios still use a single main branch for everything and when a deadline looms the only way to safeguard its health is to enact some kind of commit/submit stop where no one is allowed to push anything that risks the playability of the build. This often results in a rush of new code and content right after the stop is lifted, that almost always breaks something and may take days or weeks to resolve. A common issue with playtesting is that you need to jump through hoops before you can test the thing you’re actually working on. This can be because you need to launch the game, go through the splash screen, load the right level, noclip or teleport to the right place, etc., before you actually play . If your game is unstable (see Always Playable above), this can be further exacerbated by crashes or bugs that are not yours to fix. To avoid this it’s important to be able to do targeted testing. Using isolated environments, such as a “gym” level for movement testing, and testing exactly the thing you just tweaked or implemented without any distractions. You need to be able to mix and match both systems and change sets in your game, to iterate as much as possible. Play without the enemy AI running, no props spawning, or with that goblin damage debuff or double move speed turned on or off. You can look at this like the layers in Photoshop, where you can turn things on or off so they don’t impact your testing when you need to test something specific. Once you have a modular setup, make sure that you can switch quickly and easily between different modules as well. Make them incerchangeable. If you need to test playing against only a single goblin, but that goblin can’t move, and you have only torches and stale bread; then it should be as few clicks and tools involved as possible to do so. Once the data is separated, you can take it one step further: you can remove entire segments of your game and isolate iteration and testing to retention loops or other longterm systems. Think of a standard game loop. You have some inputs into each session, such as matchmaking settings or difficulty selection. This input affects how the session plays. Once the session completes, you get outputs , such as XP or treasure , that you can then reinvest into progression. This is the template for many standard game loops. Simulated state allows you to pretend that one of these steps happened without actually having to take the time to play them. You can randomise the inputs and then play, or skip the session entirely to only work on the output and investment cycle. Once you reach the modular and interchangeable iteration dream, this is quite possible. The value of this type of testing is high, since longterm systems often don’t get the testing they need simply because you must finish a real session of gameplay to get the “proper” outputs. Being able to compare different iterations to each other and choose which comparisons to make is more of a meta tool than it’s directly testing related. It’s more about comparing the results you gain from testing than the testing itself. Look at the Game Balancing Guide for some inspiration on what kinds of things you could potentially compare. If you find something that’s not great or that you want to revisit, make it easy to take notes or report to a central system; you may even go so far as to generate planning tickets from an in-engine event. Have your testers press some easy to access key combination (on controllers, maybe to hold both triggers and both stick buttons down for one second). Sometimes in a big team, the more technical tasks involved with the build and distribution process are invisible to you. You may hear about porting or signing or compliance, but you never have to deal with any of it. You happily playtest on whatever is easy and available, usually your development computer. Sometimes even inside of your development environment. The reason this happens is because your updating process is not built with iteration in mind. Builds take too long, frequently don’t work, and distributing to local devices is a hassle. Many teams “forget,” or rather downprioritise, testing on their proper target devices. One of the stranger things I’ve run into is developers who not only dislike testing on their current target platform but basically refuse. It’s so much easier to stay in your comfortable development environment indefinitely. Some studios may even resent some of their own target platforms, for example mobile platforms or consoles, because they are allowing personal opinion to affect their professionalism. But there’s really no excuse: you should always test on your target devices. Something that’s easy to overlook is to keep visible and easily copy/pasteable version information on-screen in your game. This is good for a product after launch too, so that players can provide you with more detailed information if they experience bugs or crashes. One of the first things I did in gamedev was to drive cars along a race track’s edges to make sure that the collisions worked like they should. A kind of testing that you can automate relatively easily. In test-driven development , testing and automation is already part of the thinking, and there’s really no need for game development to be different. Automate the right things, however. An automated test can’t tell you about quality. It can’t suggest design changes or warn that a player may not understand the phrasing of a dialogue line. Automate regression testing, compliance testing, integration testing, and the driving along the tracks to test collision. But don’t automate quality testing. Building for all of your platforms without having to do so manually is an essential element of game development. No amount of testing in a development environment compares to testing real builds. Automated builds are often triggered by new commits or version increments. It’s also common to have nightly builds, hourly builds, and build cadences based on testing needs and build duration. What’s important for such a pipeline is that it can clearly say what’s going wrong by posting logs and details to the relevant people. A Slack channel, for example. What you absolutely don’t want is to put developers on fulltime duty to get builds out. Once you have a build, you need to get that build onto the right device for testing. Most devkits and software platforms allow remote connection. You can usually set up jobs to trigger automatically when a build completes and publish your game to your testing platform (or even live) without requiring any work at all. Hopefully, this post provides some food for thought on iteration and what it really means. If not, tell me every way I’m wrong in an e-mail to [email protected] or in a comment. Here’s the list: Remove obstacles . Make the process of iteration as fast as possible, by removing gatekeepers and bottlenecks. Maybe you shouldn’t go through the full approval process for a quality of life improvement, maybe your playtesters should get three separate sets of things to test instead of just one, and maybe a developer can prioritise their own tasks rather than sitting in hours-long meetings or being micromanaged. Remove clicks . I once heard the suggestion that you lose 50% of viewers with every required interaction on a website. More clicks will invite more pain points, more potential human errors, and will also lead to fewer iterations. Just imagine (or remember) not having box selection in a node tool vs having it. Remove tools . You need special skills, licenses, installation time, and more, the more tools you require. Everything in your pipeline that can be either bundled into something else or removed entirely via devops automation should be considered. Not least of all because tools development is itself a deep rabbit hole . Authoring objects and data. Transitioning objects between states. Tweaking and rebalancing data. Testing and comparing iterations. Updating the game for testing and distribution. For object-oriented authoring: clearly visualise what an object can (and can’t) do based on its inheritance; don’t hide logic deep into a dropdown hierarchy. For component-based authoring: make non-destructive tools with opt-in as the default rather than opt-out . Provide good error messaging for when requirements are not met. For data-driven authoring: provide clear debug information and visual representations for where data is coming from, when, and what it allows. Make it clear what data is expected where, so no steps are missed. Make it easy to list states and transitions per object. Provide state transition information with data reporting, so that you can keep track of all the whens and whys. Make states have meaning; if a state says that an object cannot move, this should be definitive. Differentiate between Specific and Generic states, so that you will never accidentally add state to an object that won’t work. Set clear guardrails between Exclusive, Inclusive, and Parallell states. Plan what you need each state to be able to do and where to get its data. Visualise which conditions apply at a given moment and why. Show when conditions are unavailable and why. Log transition changes and which conditions made them change. Show when, how, and from where a state injection occurs. Make it clear which explicit states are running at any given time. When dynamic state is triggered, make all of its relevant overrides predictable and singular: it should always be enough to turn something on or off once . Provide visualisations of start and end positions for timed states. Allow developers to scroll timed states manually to preview them. Allow states to resume after interruption, so that you can use interstitials in a non-destructive way. Separate your data into logical containers, such as Baseline, Attribute, and Modifier. Bundle collections of changes into change sets . E.g., “double move speed.” Identify change sets modularly, so you can test more than one thing at a time. MAJOR version when you make incompatible API changes MINOR version when you add functionality in a backward compatible manner PATCH version when you make backward compatible bug fixes” Maintain clear versioning, even if just for yourself. Make sure that you can always play a recent version of your game. Provide shortcuts and settings that let you avoid time sinks. Make it easy to choose what to test. Make it clear what is being tested. Make your systems modular. Make modules easy to toggle. Allow testers to easily switch out and modify what they are testing: anything with the same output should be able to tie into the correct input. Make it possible to simulate the systems without running them. Show the data; show comparisons. Make it easy to file bug reports and provide feedback without leaving your game. Integrate screenshot tools and video recording. Test on target devices. Test your lowest spec targets. Make version numbers visible in all game builds, including release. Automate functionality testing, but not quality testing. Building the game automatically and get new builds continuously without requiring manual intervention. Remove all obstacles for build distribution: make it a single click (or less) to get a functional build to play on the right device. For object-oriented authoring: clearly visualise what an object can (and can’t) do based on its inheritance; don’t hide logic deep into a dropdown hierarchy. For component-based authoring: make non-destructive tools with opt-in as the default rather than opt-out . Provide good error messaging for when requirements are not met. For data-driven authoring: provide clear debug information and visual representations for where data is coming from, when, and what it allows. Make it clear what data is expected where, so no steps are missed. Make it easy to list states and transitions per object. Provide state transition information with data reporting, so that you can keep track of all the whens and whys. Make states have meaning; if a state says that an object cannot move, this should be definitive. Differentiate between Specific and Generic states, so that you will never accidentally add state to an object that won’t work. Set clear guardrails between Exclusive, Inclusive, and Parallell states. Plan what you need each state to be able to do and where to get its data. Visualise which conditions apply at a given moment and why. Show when conditions are unavailable and why. Log transition changes and which conditions made them change. Show when, how, and from where a state injection occurs. Make it clear which explicit states are running at any given time. When dynamic state is triggered, make all of its relevant overrides predictable and singular: it should always be enough to turn something on or off once. Provide visualisations of start and end positions for timed states. Allow developers to scroll timed states manually to preview them. Allow states to resume after interruption, so that you can use interstitials in a non-destructive way. Separate your data into logical containers, such as Baseline, Attribute, and Modifier. Bundle collections of changes into change sets. E.g., “double move speed.” Identify change sets modularly, so you can test more than one thing at a time. Maintain clear versioning, even if just for yourself. Make sure that you can always play a recent version of your game. Provide shortcuts and settings that let you avoid time sinks. Make it easy to choose what to test. Make it clear what is being tested. Make your systems modular. Make modules easy to toggle. Allow testers to easily switch out and modify what they are testing: anything with the same output should be able to tie into the correct input. Make it possible to simulate systems without running them. Show the data; show comparisons. Make it easy to file bug reports and provide feedback without leaving your game. Integrate screenshot tools and video recording. Always test on target devices: no amount of emulation will ever compensate for real qualitative testing. Have as many diverse target devices available as financially and physically possible. Test on target devices. Test your lowest spec targets. Make version numbers visible in all game builds, including release. Automate functionality testing, but not quality testing. Building the game automatically and get new builds continuously without requiring manual intervention. Remove all obstacles for build distribution: make it a single click (or less) to get a functional build to play on the right device.

Design Lua

0 views

Giles's blog 5 months ago

Retro Language Models: Rebuilding Karpathy’s RNN in PyTorch

I recently posted about Andrej Karpathy's classic 2015 essay, " The Unreasonable Effectiveness of Recurrent Neural Networks ". In that post, I went through what the essay said, and gave a few hints on how the RNNs he was working with at the time differ from the Transformers-based LLMs I've been learning about. This post is a bit more hands-on. To understand how these RNNs really work, it's best to write some actual code, so I've implemented a version of Karpathy's original code using PyTorch's built-in class -- here's the repo . I've tried to stay as close as possible to the original, but I believe it's reasonably PyTorch-native in style too. (Which is maybe not all that surprising, given that he wrote it using Torch, the Lua-based predecessor to PyTorch.) In this post, I'll walk through how it works, as of commit . In follow-up posts, I'll dig in further, actually implementing my own RNNs rather than relying on PyTorch's. If you already have a basic understanding of what RNNs are and roughly how they work, you should be fine with this post. However, if you're coming directly from normal "vanilla" neural nets, or even Transformers-based LLMs (like the one I'm working through in my LLM from scratch series), then it's definitely worth reading through the last post , where I give a crash course in the important stuff. So with that said, let's get into the weirdest bit from a "normal" LLM perspective: the dataset. Every now and then on X/Twitter you'll see wry comments from practitioners along the lines of "AI is 5% writing cool models and 95% wrangling data". My limited experience bears this out, and for RNNs it's particularly weird, because the format of the data that you feed in is very different to what you might be used to for LLMs. With a transformers-based LLM, you have a fixed context length -- for the GPT-2 style ones I've posted about in the past, for example, you have a fixed set of position embeddings. More recent position encoding mechanisms exist that aren't quite so constraining, but even then, for a given training run you're going to be thinking in terms of a specific context length -- let's call it n -- that you want to train for. So: you split up your training data into independent chunks, each one n long. Then you designate some subset of those your validation set (and perhaps another bunch your test set), and train on them -- probably in a completely random order. You'll be training with batches of course; each batch would likely be a completely random set of chunks. To get to the core of how different RNNs are, it helps to start with an idealised model of how you might train one. Remember, an RNN receives an input, uses that to modify its internal hidden state , and then emits an output based on the updated hidden state. Then you feed in the next input, update the hidden state again, get the next output, and so on. Let's imagine that you wanted to train an RNN on the complete works of Shakespeare. A super-simple -- if impractical -- way to do that would be to feed it in, character by character. Each time you'd work out your cross-entropy loss . Once you'd run it all through, you'd use those accumulated per-character losses to work out an overall loss (probably just by averaging them). You would run a backward pass using that loss, and use that to adjust the parameters. If you're feeling all at sea with that backpropagation over multiple steps of a single neural network with hidden state, check out the " Training RNNs " section of the last post. You can see that in this model, we don't have any kind of chunked data. The whole thing is just run through as a single sequence. But there are three problems: Let's address those -- firstly, those vanishing or exploding gradients. In the last post I touched on truncated backpropagation through time (TBPTT). The idea is that instead of backpropagating through every step we took while going through our batched input sequences, we run a number of them through, then backpropagate, and then continue. Importantly, we keep the hidden state going through the whole sequence -- but we detach it from the compute graph after each of these steps, which essentially means that we start accumulating gradients afresh, as if it was a new sequence, but because it started from a non-zero initial hidden state, we're still getting some training value from the stuff we've already been through. 2 Imagine we have this simple sequence: Let's say we're doing TBPTT of length 3: we can split up our training set so that it looks like this: So now, we just feed in "a", then "b", then "c", then do our TBTT -- we calculate loss just over those items, update our gradients, and then detach the hidden state, but keep its raw, un-gradient-ed value. Then we start with that stored hidden state, and feed in "d", "e", "f". Rinse and repeat. In practice we'd probably throw away that short sequence at the end (because it would cause issues with gradient updates -- more here ), so we'd just get this: Now, let's look into batching. It's a bit harder, but with a bit of thought it's clear enough. Let's say that you want b items in your batch. You can just split your data into b separate sequences, and then "stack them up", like this with b = 2 : So for training, we'd feed our vector in as a batch, calculate loss on both of them, then , and so on. The important thing is that each batch position -- each row, in that example -- is a consistent, continuous, meaningful sequence in and of itself. Finally, for validation, you also need some real sequences. For that, you can just split up the batched subsequences, with a "vertical" slice. Let's take the rather extreme view that you want 50% of your data for validation (in reality it would be more like 10-20%, but using 50% here makes it clearer): Your training set would wind up being this: ...and the validation set this: And we're done! So that's what we wind up feeding in. And it kind of looks a bit like what we might wind up feeding in to a regular LLM training loop! It's a set of fixed-length chunks. But there's one critically important difference -- they're not in an arbitrary order, and we can't randomise anything. The sequence of inputs in, for example, batch position one, needs to be a real sequence from our original data. This has been a lot of theoretical stuff for a post that is meant to be getting down and dirty with the code. But I think it's important to get it clear before moving on to the code because when you see it, it looks pretty much like normal dataset-wrangling -- so you need to know why it's really not. Let's get into the code now. In the file , we define our dataset: The that we pass in will be our complete training corpus -- eg. the complete works of Shakespeare -- and is the limit we're going to apply to our truncated backpropagation through time -- that is, three in the example above. Karpathy's blog post mentions using 100, though he says that limiting it to 50 doesn't have any major impact. Next, we make sure that we have at least enough data to do one of those TBPTTs, plus one extra byte at the end (remember, we need our targets for the predictions -- the Ys are the Xs shifted left with an extra byte at the end). ...and we stash away the data, trimmed so that we have an exact number of these sequences, plus one extra byte for our shifted-left targets. Now we create a tokeniser. 3 This is related to something I mentioned in the last post. Karpathy's post talks about character-based RNNs, but the code works with bytes. The RNNs receive as their input a one-hot vector. Now, if we just used the bytes naively, that would mean we'd need 256 inputs (and accept 256 outputs) to handle that representation. That's quite a lot of inputs, and the network would have to learn quite a lot about them -- which would be wasteful, because real human-language text, at least in European languages, will rarely use most of them. His solution is to convert each byte into an ID; there are exactly as many possible IDs as there are different bytes in the training corpus, and they're assigned an ID based on their position in their natural sort order -- that is, if our corpus was just the bytes , and , then we'd have this mapping 4 : We just run the full dataset through to get the set of unique bytes, then sort it -- that gives us a Python list in the right order so that we can just do lookups into it to map from an ID to the actual byte. The class is defined in and is too simple to be worth digging into; it just defines quick and easy ways to get the vocab size (the number of IDs we have), and to encode sequences of bytes into PyTorch tensors of byte IDs and to decode them in the other direction. Because these byte IDs are so similar to the token IDs that we use in LLMs, I've adopted the name "tokens" for them just because it's familiar (I don't know if this is standard). So, at this point, we have our data and our tokenizer; we finish up by stashing away an encoded version of the data ready to go: Next we define a method to say how long our dataset is -- this is calculated in terms of how many TBPTT sequences it has: -- and a method: This works out the start and the end of the th subsequence of length in the data. It then returns four things: The code as it stands doesn't actually use the last two, the raw bytes -- but they did prove useful when debugging, and I've left them in just in case they're useful in the future. If you look back at the more theoretical examples above, what this Dataset is doing is essentially the first bit: the splitting into BPTT-length subsequences and dropping any short ones from the end -- the bit where we go from The only extra thing is that it also works out our target sequences, which will be a transformation like this: So that's our . Next we have a simple function to read in data; like the original code I just assume that input data is in some file called in a directory somewhere: Now we have the next step, the function : This looks a little more complicated than it actually is, because it's building up a list of tuples, each one of which is a set of , , and . If we imagine that it only did the , it would look like this: So, what it's doing is working out how many batches of size there are in the sequence. With our toy sequence ...and a batch size of two, there are . In this case, it would then loop from zero to 3 inclusive. Inside that loop it would create a list, then loop from zero to 1 inclusive. The first time round that loop it would get the item at , which is 0 + 0 * 4 = 0, so the subsequence . It would add that to the list. Then it would go round the inner loop again, and get the item at the new . is now 1, so that would be 0 + 1 * 4 = 4, so it would get the subsequence at index 4, which is , and add that to the list. We'd now have finished our first run through the inner loop, and we'd have the list [ , ], so we stack them up into a 2-D tensor: Hopefully it's now fairly clear that in our next pass around the outer loop, we'll pull out the items at index 1 and index 5 to get our next batch, and , and so on, so that at the end we have done the full calculation to get this: ...as a list of 2 × 3 PyTorch tensors. And equally hopefully, it's clear that the code in is just doing that, but not only for the but also for the , and . One thing to note before moving on is what happens if the number of items doesn't divide evenly into batches -- this code: ...means that we'll drop them. So, for example, if we wanted a batch size of three with our toy sequence ...then we'd get this: ...and the and would be dropped. And that's it for the dataset code! You might be wondering where the split to get the validation set comes -- that's actually later on, in the training code that actually uses this stuff. So let's move on to that! This is, logically enough, in the file train_rnn.py . There's quite a lot of code in there, but much of it is stuff I put in for quality-of-life (QoL) while using this. It's useful -- but I'll skip it for now and come back to it later. Initially, I want to focus on the core. We'll start with the function at the bottom. It starts like this: The -related stuff is QoL, so we'll come back to it later. All we need to know right now is that it's a way of getting information into the system about where its input data is, plus some other stuff -- in particular our TBPTT sequence length and our . So it uses that to read in some training data, then initialises one of our s with it and the , then uses to split it into batches. Next we have this: So our gives us a validation data percentage; we do some sanity checks and then just slice off an appropriate amount from the end of the we got to split the data into train and validation sets. That's the equivalent of the transform from the example earlier from To this training set: ...and this validation set: Now, we create our model: We're using a new class, which is an extension of the PyTorch built-in class -- we'll come back to that later. It's also getting parameters (things like the size of the hidden state and the number of layers) from the . Finally, we do the training in a function: So let's look at that now. It starts like this: That's fairly standard boilerplate to use CUDA if we have it, and to put the model onto whatever device we wind up using. Next: The class name for the optimiser is another one of those things from the , as are the learning rate and weight decay hyperparameters. So we just create an instance of it, and give it the model's parameters to work with along with those. Next, we get our patience: This is a QoL thing, but I think it's worth going into what it actually means. When we're training, we normally train for a fixed number of epochs. However, sometimes we might find that our model was overfitting -- say, at epoch 50 out of 100 we might see that the training loss was still decreasing, but our validation loss started rising. Any further training past that point might be pointless -- if we're doing things properly, we're saving checkpoints of the model periodically, so we'd be able to resurrect the model that we had at the point where validation loss was lowest, but we're still wasting time continuing training. A common solution to that is to have early stopping in the training loop. If the validation loss starts rising then we bail out early, and don't do the full number of epochs that we originally planned to do. Naively, we might keep track of the validation loss from the last epoch, and then if the current epoch has a higher loss, then we bail out. However, sometimes you find that validation loss rises a bit, but then starts going down again -- it's kind of like a meta version of finding a local minimum in the loss function itself. The solution to that is to use patience -- a measure of how many epochs of rising validation loss you're willing to put up with before you do your early exit. That's the number we're getting from our here -- it's a positive number (note the paranoid ), and if it's not defined we just assume that we have infinite patience. The next two lines are related to patience too -- before we go into our main training loop, we define the two variables we need to control early exit with patience: Pretty obviously, those are the best validation loss that we've seen so far, and the number of the epoch where we saw it. Right, finally we get to some training code! We have our epoch loop: We're using the rather nice module to get progress bars showing how far we are through the train (ignoring any early exits due to running out of patience, of course). We start the epoch by generating some random text from the model. This gives us a reasonably easy-to-understand indication of progress as we go. Next we put our model into training mode: ...set an initial empty hidden state: You might be wondering why the hidden state is getting a variable of its own, given that it's meant to be hidden -- it's right there in the name! Don't worry, we'll come to that. Next we initialise some variables we'll use to keep track of loss -- the total loss across all of the batches we've pushed through, plus the total number of tokens. The metric we track for each epoch is the loss per token, so we use those to work out an average. Now it's time to start the inner training loop over our batches: We're just unpacking those tuples that were created by into our and (I think I was being ultra-cautious about things here when I added to the start of ). And again we're using to have a sub-progress bar for this epoch. Next, we move our Xs and Ys to the device we have the model sitting on: And then run it through the model. The code to do this looks like this: ...and I think it's worth breaking down a bit. You can see that there's a branch at the top, if there's a hidden state then we need to pass it in and if there isn't, we don't. But let's focus on the no-hidden state option in the branch first, because there's something surprising there: Remember the description of an RNN from above: an RNN receives an input, uses that to modify its internal hidden state , and then emits an output based on the updated hidden state. Then you feed in the next input, update the hidden state again, get the next output, and so on. We can easily extend that to handle batches -- you'd give the RNN a batch of inputs (let's say a tensor b × 1 , and get a batch of results, also b × 1 . You'd also need the RNN to hold b hidden states, but that's not a big jump. But what we're doing in that code is something different -- we're feeding in a whole series of inputs -- that is, is of size b × n , where n is our desired TBPTT sequence length. What's worse, in our description above, the hidden state was just that -- something hidden in the model. Now it's being returned by the RNN! What's going on? Let's start off with that hidden state. We often need to do stuff with the hidden state from outside the RNN -- indeed, we're detaching it as an important part of our TBPTT. So the PyTorch RNN actually does work rather like the simplified model that I described in my last post , and treats the hidden state like an output, like in this pseudocode: That is, the hidden state is an input and a return value, like this: OK, so the hidden state thing makes sense. How about the fact that we're feeding in a whole set of inputs? This is actually just due to a quality of life thing provided by PyTorch's various RNN classes. Wanting to feed in a sequence is, of course, a super-common thing to want to do with an RNN. So instead of having to do something like the pseudocode above, it's baked in. When you run ...then because is b × n , it just runs the RNN n times, accumulating the outputs, then returns the outputs as another b × n tensor, along with the final from the last run through that loop. (There is a wrinkle there that we'll come to shortly.) With that explained, hopefully that branch is clear. We don't have a hidden state right now, so we run all of the inputs across all of our batch items through the RNN in one go, and we get the outputs plus the hidden state that the RNN had at the end of processing that batch of sequences. Now let's look at the other branch, where there is a pre-existing hidden state: Hopefully the last line is clear -- we're just doing the same as we did in the branch, but we're passing the hidden state in because in this case we actually have one. The first two lines are a bit more complex. As you know, we need to detach the hidden state from PyTorch's computation graph in order to truncate our backpropagation through time. We're doing that here at the start of the loop just to make sure that each batch that we're pushing through starts with a guaranteed-detached hidden state. So that explains those calls to the methods. The fact that our hidden state is a tuple of two things that we have to detach separately is a little deeper; for now, all we need to know is that the LSTM models that we're using are a variant of RNN that has two hidden states rather than one, and so we need to handle that. I'll go into that in more depth in a future post. Once we've done that, we've completed our forward pass for the epoch. Let's move on to the backward pass. Next, we have this: Pretty standard stuff. is defined further up in the file: It's exactly the same as the function we used to calculate loss in the LLM-from-scratch posts: I wrote more about that here if you're interested in the details. Next, we do something new: This is something that is generally very useful in RNNs. They are prone to vanishing and exploding gradients, and this code is to help handle the exploding case. What it says is, if we've defined a , we use it to clip gradients when they get too big, which means that training is going be better because we're not going to have updates swinging wildly up and down. Let's say that we set to 1.0. If, at the time this code is run, the norm of the gradients -- which is a measurement of their size 5 -- is, say, 10, then they would all be scaled down to 10% of their size, making the new norm 1.0. So that keeps them in check, and stops any wild variations in gradient updates. So, in short -- it's a stabilisation technique to stop exploding gradients leading to issues with training. Next, we have our normal code to update the parameters based on these (potentially clipped) gradients: And finally, we update our count of how many inputs we've seen and our total loss so far in this epoch: That's our training loop! Once we've done that code -- run our input through the model, calculated loss, worked out our gradients, clipped them if necessary, done our update and stored away our housekeeping data, we can move on to the next batch in our sequences. When we've gone through all of the batches that we have, our training for the epoch is complete. We print out our loss per-token: ...and then it's time for our validation loop. This is so similar to the training loop that I don't think it needs a detailed explanation: The only big difference (apart from the lack of a backward pass and parameter updates) is that we're not detaching the hidden state, which makes sense -- we're in a block with the model in mode, so there is no computation graph to detach them from. Validation done, it's time for a bit of housekeeping: All we're doing here is keeping track of whether this is the best epoch in terms of validation loss. The boolean is exactly what it says it is. If we're on our first run through the loop ( is None) then we record our current val loss as , and store this epoch's number into . Otherwise, we do have an existing , and if our current val loss is lower than that one, we also stash away our current loss and epoch as the best ones. Otherwise we are clearly not in the best epoch so we update to reflect that. Once we've done that, we save a checkpoint: I'll go into the persistence stuff -- saving and loading checkpoints -- later on. Next, a QoL thing -- we generate a chart showing how training and validation loss have been going so far: Again, I'll go into that later. Finally, we do our early stopping if we need to: If the current epoch is more than epochs past the one that had the best validation loss so far, then we stop. That's the end of the outside loop over epochs for our training! If we manage to get through all of that, we print out some sample text: ...and we're done! That's our training loop. Now let's move on to the model itself. I called my model class a , and you can see the code here . It's actually not a great name, as it implies there's something specifically Andrej Karpathy-like about it as a way of doing LSTMs, while what I was trying to express is that it wraps a regular PyTorch LSTM with some extra stuff to make it work more like his original Lua Torch implementation . I tried to come up with a more descriptive name, but they all started feeling like the kinds of class names you get in "Enterprise" Java code like so I gave up and named it after Karpathy. Hopefully he'll never find out, and won't mind if he does... 6 The Lua code does four things differently to PyTorch's built-in class: Let's look at the code now: You can see that it's doing 1 to 3 of those steps above -- the one-hot, the extra dropout, and the linear layer to project back to vocab space. The only other oddity there is this kwarg: That's the wrinkle I was talking about when we went through the training loop and was discussing batches. The PyTorch LSTM by default expects the batch dimension to be the second one of the input tensors -- that is, instead of passing in a b × n tensor, it wants an n × b one. That's not what I'm used to (nor is it what the original Lua code uses, if I'm reading it correctly), but luckily it can be overridden by the logically-named option. The only step we don't do in this class is the softmaxing of the logits to convert them to probabilities. That's because PyTorch's built-in wants logits rather than probabilities, so it was easier to just call softmax on the outputs where necessary. So that's our model. Let's take a look at the code that we can use to run it and generate some text. The code for this is in . Ignoring the boilerplate that parses the command-line options, we can start here: So, we're taking the directory and run name that the QoL helpers that I'll be describing later, a specific checkpoint of a training run to use, the number of bytes that we want to generate, the temperature to use when sampling (more about temperature here ) and a "primer" text. That last one is because in order to get something out of our RNN, we need to feed something in. I tried using a single random byte from the vocab initially (that's still the default, as we'll see shortly), and that was OK, but the bytes aren't equally represented in the training data (eg. "z" is less common than "e", but weird bytes that only occur in occasional multibyte unicode characters are rarer still) -- and that means that we might be trying to get our RNN to start with something it hasn't seen very much, so we get bad results. Even worse, because some of the input text is unicode, there's no guarantee that a random byte is even valid on its own -- it might be something that only makes sense after some leader bytes. So I found that in general it's best to provide a fixed string to start with -- say, "ACT" for Shakespeare, or "He said" for "War and Peace". So, with those command-line flags, we start off by using the QoL stuff to get the metadata we need about the model: ...then we use our persistence code to load up the desired checkpoint: At this point we have the version of the model that was saved for that checkpoint, and its associated tokeniser. We move this to an appropriate device -- CUDA if we have it, CPU otherwise: ...and then use a helper function to generate some text: Once we have that, we print it out, after decoding it as UTF-8: If a primer was provided, we print it first, but if the primer was a random byte we don't. Also, because the generated bytes might include invalid Unicode, we just replace those with "?" when we decode (that kwarg). Let's look at the helper next. So, after a little bit of paranoia about our desired sequence length, we make sure we're not tracking gradients and put the model into eval mode (to disable dropout). Next, we work out our primer bytes -- either by picking a random one, or by decoding the string that we were provided into its constituent UTF-8 bytes: The primer needs to be converted to the byte token IDs that our tokeniser uses: The is something you might remember from the LLM posts -- we need to run a batch through our RNN, and the is just a tensor of n bytes. adds on an extra dimension so that it's 1 × n , as we want. Next, we put the primer onto the same device as the model: As an aside, I think I might start using code like that more often, I often find myself passing variables around and TBH it seems much more natural to just ask the model what device it's using. Next, we run it through the model: Now we use a helper function to sample from those logits to get our first generated byte: Note that we are explicitly taking the last item from . It is a b × n × v tensor, where b is our batch size (always one in this script), n is the length of the primer that we fed in, and v is our vocab size. The just extracts the last item along the n dimension so that we have the b × v logits that came out of the RNN for the last character of the primer, which is what we want. We'll get to the function later, but it returns a b × 1 tensor, so now, we just extract the byte ID from it and put it into a new list: Next comes our autoregressive loop -- we've already generated one byte, so we loop times to get the rest, each time running the model on the last byte we got, sampling from the distribution implied by the logits, and adding it onto our list: Once that's done, we have our generated byte IDs in , so we just use the tokeniser to turn them back into bytes and return the result: Easy, right? Now let's look at . The function takes logits and the temperature: Firstly, we handle the case where temperature is zero. By convention this means greedy sampling -- we just always return the highest-probability next token, so we can use for that: If the temperature is non-zero, we divide the logits by it and run softmax over the result: ...and then we just sample from the probability distribution that we get from that: And that's it! The only things to explain now are the quality of life stuff, and the persistence functions that handle saving and loading checkpoints. Let's look at our QoL things first. When I started building this code I knew I wanted to run RNNs on multiple input texts -- Shakespeare, "War and Peace", etc. I also realised that for each of those input texts, I'd want to try different model sizes. The underlying concept I came up with was to have "experiments", which would each have a particular training text. Each experiment would have multiple "runs", which would have particular training hyperparameters -- the model size, number of epochs, and so on. I decided to represent that with a directory structure, which you can see here . One subdirectory per experiment, and if you go into the one you'll see that it has two subdirectories, for the training data and for the different training runs I tried. The directory contains a file called , which is the training data itself. That one only exists in the experiment, though, because I was concerned with copyright for the other training sets. There is a file in all data directories for all experiments, though, which explains how to get the data. The directory has more in it. Each run is for a particular set of hyperparameters, so let's look at the ones for the run. We have two files, , which looks like this: It's essentially the model-specific hyperparameters, the ones we pass in when creating our -- for example, remember this from the training code: is this JSON dict loaded into Python. There's also , which has the training data: Hopefully these are all familiar from the training code; they all go into , so they're used in code like this: So, now when we look at the start of the and scripts, and see things like this: ...it should be clear that we're loading up those JSON dicts from those files. You can see that code at the start of . It looks like this: So, some basic sanity checking that we have the directories we expect. Next: ...we create a checkpoints directory if it doesn't exist, stashing away its path, then finally we load up those two JSON files: The rest of that file handles checkpointing, so let's move on to that. Remember, in the training loop, each epoch we saved a checkpoint: ..and at the start of the code to generate some text, we load one: Let's take a look at saving first. Each checkpoint is a directory with a filename based on the timestamp when it was saved, inside the directory for the run that it relates to, so firstly we work out the full path for that: (The directories inside experiments are explicitly ignored in our file so that we don't accidentally commit them.) Now, we don't want half-saved checkpoints due to crashes or anything like that, so we initially create a directory to write to using the path that we're going to use but with at the end: Next, we write a file (the path within the checkpoint's dir is worked out by a helper function) containing some useful information about the model's progress -- it's epoch number, the training and validation loss, and the mapping that its tokeniser uses (from which we can later construct a new tokeniser): Then we dump the model's current parameters into a file using function from the Hugging Face library (getting the file's path through another helper function): Now that our checkpoint is complete, we can rename our temporary directory to the real name for the checkpoint: Next, we do some symlinks. We want a symlink in the directory called , which links to the checkpoint that had the lowest validation loss. The training loop is tracking whether any given epoch had the lowest, and you can see it passed in an parameter, so if that's true, we create the symlink, removing any pre-existing one: For completeness, we also create one that points to the most recent checkpoint -- that will always be the one we're doing right now, so: And that's it for saving! Loading is even simpler (and note that we can just specify "best" as the checkpoint due to that symlink -- I pretty much always do): So, we've made sure that the checkpoint directory is indeed a directory. Next, we load up the model metadata: ...and we use ' to load our parameters: Now we can construct a tokeniser based on that mapping that we put into the metadata: ...and an based on the other metadata parameters: and load the parameters into the model: That's it! We can return the model and the tokeniser for use: So that's all the code needed for checkpointing. Now let's look at the final QoL trick, one that I left out of the earlier list because it needs the checkpoints to work: charting our progress. Remember this line from the training loop, which was called after we saved our checkpoint? It generates charts like this: The chart is updated every epoch, and saved into the root of the directory. There's also a helpful file placed there that reloads that generated chart every second, so you can just load it into a browser tab while you are training and watch it live. Let's look into the code. It's in . The function starts like this: So, we use a utility function (which we'll get into in a moment) to load up the data -- training and validation loss per epoch, and the specific epoch that was the best. Once we have that, we just use (with my preferred xkcd styling) to plot the two loss lines: We also plot a single vertical red line at the best epoch so that we can see if we're past that and running into the patience period: Then a bit more pyplot boilerplate... ...and we've got our chart, saved as . Finally, we just copy that useful auto-reloading into the same directory as the chart: ...and we're done. So, how do we get the data? Originally I was keeping lists of loss values over time, but eventually realised that the data was already there in the checkpoint metadata files. So, the helper function just iterates over the checkpoints, skipping the symlinks, creating lists of (epoch number, loss) tuples for both training and validation loss using the numbers in those metadata files, and for the symlink just storing its epoch number: Those loss lists will just be in whatever random order returned them in, so we sort them by epoch number: ...and we have something we can return to the charting code: That brings us to the end of the charting code -- and, indeed, to the end of all of the code in this repo! So let's wrap up. That was quite a long writeup, but I think it was worthwhile. Indeed, if you look at the commit history, you'll see that there were one or two things where while explaining the code I realised that it was doing things badly -- not so badly that it didn't work, or gave bad results, but doing things in a way that offended my sense of what's right as an engineer. Hopefully it was interesting, and has set things up well for the next step, where I'll use the same framework, but plug in my own RNN implementation so that we can see how it compares. Stay tuned :-) Intuitively: if you train on "I like bacon", then "I like cheese", then "I like wine", then you can imagine that they might have different effects -- maybe the first would have the largest impact, then the second, then the third -- or perhaps it might be the other way around. By comparison, if you trained on all three in parallel, you would expect them to be more evenly balanced in their effect. ↩ I'm accumulating a never-ending list of things to dig into in the future, but let me add yet another one: it would be good to work through how PyTorch uses this compute graph in practice to do all of its automated differentiation magic! Andrej Karpathy will likely pop up again, as he did pretty much that in his micrograd project . ↩ In case you're wondering: I tend to use UK spelling like "tokeniser" in writing, as it's much more natural to me. But in code I tend to standardise (or standardize) on the US spelling. For private projects like this, it doesn't matter much, but when collaborating with other people from various places in the world, it's helpful to use a standardised spelling just to make life easier when searching code. ↩ Sharp-eyed readers might note that my token IDs start at zero, while Karpathy's start at 1. Zero-based indexing is the natural way to represent them in Python, one-based in Lua. Keeping things natural like that makes it a bit easier when we convert things into one-hot vectors later. ↩ Remember that gradients are vectors in a high-dimensional space. So to work out a measurement of size, for each parameter we square all of the numbers in its gradient, then add them together. We then add all of those squared numbers across all parameters together, and take the square root of the sum. ↩ Thanks to Claude for generating that monstrosity of a Java class name. It added: "For bonus points, imagine this is in a package like: And it probably has exactly one method: :-)" ↩ Vanishing/exploding gradients. Let's say that we're training a three-layer network on the 5,617,124 characters of the Project Gutenberg "Complete Works of Shakespeare" . That's essentially backpropagation through a 16-million layer network. You won't get far through that before your gradients vanish to zero or explode to infinity. The only meaningful parameter updates will be for the last something-or-other layers. Batching. Running multiple inputs through a model in parallel has two benefits: it's faster and more efficient, and it means that your gradient updates are informed by multiple inputs at the same time, which will make them more stable. 1 Validation . There's nothing in there as a validation set, so we will have no way of checking whether our model is really learning, or just memorising the training set. (There's the same problem with the test set, but for this writeup I'll ignore that, as the solution is the same too.) : the byte IDs of the bytes in that sequence -- these are the ones we'll run through the model, our Xs. Note that these are slices of the PyTorch tensors that were returned by the tokeniser, so they're tensors themselves. : the shifted-left-by-one-plus-an-extra-byte target sequence as byte IDs -- the Ys for those Xs. These are likewise tensors. : the raw bytes for the . :the raw bytes for the . It accepts the inputs as "token IDs", and maps them to a one-hot vector itself. It applies dropout after the last layer of the LSTM (rather than just internally between the layers). It expands the output vector back out to the vocab size with a linear layer after the LSTM so that we have logits across our vocab space. This is because an LSTM's output has the same dimensionality as the hidden state. It runs those logits through softmax so that it returns probabilities. Intuitively: if you train on "I like bacon", then "I like cheese", then "I like wine", then you can imagine that they might have different effects -- maybe the first would have the largest impact, then the second, then the third -- or perhaps it might be the other way around. By comparison, if you trained on all three in parallel, you would expect them to be more evenly balanced in their effect. ↩ I'm accumulating a never-ending list of things to dig into in the future, but let me add yet another one: it would be good to work through how PyTorch uses this compute graph in practice to do all of its automated differentiation magic! Andrej Karpathy will likely pop up again, as he did pretty much that in his micrograd project . ↩ In case you're wondering: I tend to use UK spelling like "tokeniser" in writing, as it's much more natural to me. But in code I tend to standardise (or standardize) on the US spelling. For private projects like this, it doesn't matter much, but when collaborating with other people from various places in the world, it's helpful to use a standardised spelling just to make life easier when searching code. ↩ Sharp-eyed readers might note that my token IDs start at zero, while Karpathy's start at 1. Zero-based indexing is the natural way to represent them in Python, one-based in Lua. Keeping things natural like that makes it a bit easier when we convert things into one-hot vectors later. ↩ Remember that gradients are vectors in a high-dimensional space. So to work out a measurement of size, for each parameter we square all of the numbers in its gradient, then add them together. We then add all of those squared numbers across all parameters together, and take the square root of the sum. ↩ Thanks to Claude for generating that monstrosity of a Java class name. It added: "For bonus points, imagine this is in a package like: And it probably has exactly one method: :-)" ↩

JSON Lua

Machine Learning

Java

0 views

Stone Tools 6 months ago

Superbase on the Commodore 64

When it comes to databases, I've never been much more than a dabbler. I remember helping dad with PFS:File so he could do mail merge. I remember address books and recipe filers. I once tried committing my comic book collection to ClarisWorks . Regardless of the actual efficacy of those endeavors, working with database management systems never stopped feeling important. I was "getting work done," howsoever illusory it may have been. These days, the average consumer probably shies away from any kind of hardcore database software. Purpose-built apps which manage specific data (address books, invoicing software) do most of our heavy lifting, and basic spreadsheets ( Google Sheets , Notion , Airbase ) tend to fill in the remaining niche gaps. The industry was hell-bent on transforming rapidly improving home computers into productivity powerhouses and database software promised to unlock a chunk of that power. Superbase on the Commodore 64 was itself put to work in forensic medicine in England and to help catch burglars in Florida . Maybe it can help me keep track of who borrowed my VHS copy of Gremlins 2: The New Batch. The manual has a three-part tutorial, the first two parts of which have an audio component (ripped from cassette tapes). I will absolutely use it for an authentic learning experience. I'm looking forward to some pre-YouTube tutorial content, "What's up everyone, it's ya boy Peter comin' atchu with another Superbase tutorial. If you're enjoying these audio tapes, drop a like on our answering machine and subscribe to AHOY! Magazine. " From first boot, I feel the pain. After the almost instantaneous launching of trs80gp into Electric Pencil last blog, getting Superbase launched in VICE is annoyingly slow. I appreciate a pedantic pursuit of accuracy as much as anyone, but two full minutes to load Superbase is ridiculous, for my 2025 interests. Luckily VICE has a "WARP" mode which runs some 1500% faster, bringing boot time to under 10 seconds. A C64 one could only dream of is a keyboard stroke away, to enable or dismiss on a whim. How spoiled we are! Here I am, a businessman of 1983, knitted tie looking sharp with my mullet, ready to thrust my 70s HVAC business into the neon-soaked future of 80s information technology. (The company must pivot or die !) First things first, “What is a database?” I wonder, sipping a New York Seltzer. According to the very slow audio tutorial, "It's an electronic filing cabinet!" So far, so good. "And just as in an ordinary filing cabinet, information is stored in batches called 'files'. and you can think of Superbase as an office containing a number of electronic filing cabinets." OK, so if Superbase is my office, and my office currently contains seven filing cabinets with 150 files/per, I’ll make seven databases to hold my information? "Superbase will allow you to hold up to 15 files in each database." OK, I'm not sure I heard that correctly. Rather than having seven cabinets with 150 files each, I instead have 70 cabinets with 15 files each? Is this the " office of the future ?" Come to think of it, are we even using the same definition of the word "file?" When I ask Marlene to bring me "the Doogan file" I receive a file folder filled with Doogan-related stuff: one client, one file. "Each of the files is made of bits of information known as RECORDS. For example, you may have a file containing names of companies. In that case each company name would be one RECORD." A file which contains only the names of companies? Now I'm learning that records are made of FIELDS. But we were just told that a RECORD is "a bit of information" like a company name. This filing cabinet metaphor is falling apart and I'm only five minutes into a 60-minute tutorial. Not only did society have to learn how to create new tools for moving into the information age, we also had to learn how to teach one another how to use those tools. In Superbase's case, I find the manual mostly OK. It offers a glossary, sample code, and a robust rundown of each menu and command. What's missing here is an explanation of the mental shift required in moving from analog to digital files. Where a traditional filing cabinet is organized by relation, our C64 will discover relations (though this is not a relational database); a kind of inversion of the physical filing cabinet strategy. Without my 2025 understanding of such things, I would be completely lost right now about how Superbase and databases work. At any rate, working through the tutorial, I do find the operation of the software quite simple so far. Place the cursor where you want to add a field name or field input area and start typing. and set the start and end points of a field, which doubles as a visual way to set the length of that field. The field's reference name is only ever the word to the immediate left of the field entry area. Simple, if inflexible. Setting field types is also easy enough, even if the purpose and usage of the "key" field is never made explicitly clear. It is only ever described as being the field that records will be sorted on by default. Guidance on choosing an appropriate key field and how to format it is essentially nonexistant. Querying records is straightforward, though there is definitely a learning curve. Partials, wildcards, absence of a value, value sets and ranges, and comparatives (values <100, for example) are all possible and chainable. The syntax is relatively clear, even if conventions ( is the wildcard token) have subtly changed. I've now built something like a phone book and entered some sample data. This usage of the database matches my mental model of the object being replaced and I'm feeling somewhat confident. But this is also something I could have built with a type-in BASIC program from Popular Computing Weekly . If I put myself in the mindset of someone reading a contemporary book like Business Systems on the Commodore 64 by Susan Curran and Margaret Norman , it is quite unclear how my filing cabinet data and organizational structure translates to floppy disk. With floppy drives, a printer, and more I have spent almost $5000 (in 2025 money) on this system. For that outlay of cash, am I really asking too much for someone to help guide me into a "paperless office?" Speaking of which. George Pake of Xerox PARC (yes, that Xerox PARC ) gave an interview to Businessweek in June 1975 in which he spoke of his vision for a "paperless office." The later spread of that concept into larger circles seems to owe a lot to F.W. Lancaster. In 1978, Lancaster published Toward Paperless Information Systems and spent a full chapter contemplating what a paperless research lab might look like in the year 2000. Lancaster's vision paralleled a fair amount of what we know today as the internet. To readers of the time it was all brand new conceptually, so he spent a lot of time explaining concepts like "keeping a journal on the computer" and how databases could just as easily be located 5000 miles away as 5 feet away. He couldn't quite envision high resolution video displays, and expected graphic data to remain in microfilm/fiche. He could envision "pay as you go" for data access, however. It should be noted that the phrase "paperless office" does not appear in Lancaster's book (it does in his previous book). That phrase had already started an upward trend since before the Pake interview, but in my research it does seem that Lancaster really helped mainstream the concept. Lancaster identified three main functions of computer use in a paperless office. Especially in the 80s, transmit and receive were a long way from being cheap and ubiquitous enough to replace paper between two parties. That sounds obvious, but hype around the "paperless office" made it easy to overlook such flaws. Besides, wasn't it a matter of time before the flaws were resolved? Wasn't everyone working toward the same paperless vision? Well that's hard to say, given the slightly mixed messaging of the time. 1983's The Work Revolution by Gail Garfield Schwartz PhD and William Neikirk says explicitly, "we are at the brink of the paperless office." 1982's The Word Processing Handbook by Russell Allen Stultz cautions us, "The notion of a 'paperless office' is just that, a notion." But May 1983's Compute Magazine keeps the dream alive with a multi-page article, "VICSTATION: A Paperless Office" as though it had already arrived and was waiting for you to catch up. Computer magazines and academic investigations were typically cold on the idea of the "paperless office" ever coming to fruition. Rather they saw (quite correctly) that if everyone had simple, easy-to-use publishing tools at their fingertips paper usage would increase . The mainstream, ever one to latch onto a snappy catch phrase, really did seem to push the idea to the masses as an inevitability . A CEO in 1983 really couldn't be blamed for buying into the hype. To not have bought into it would have felt tantamount to corporate negligence. I asked ChatGPT for a modern parallel and all it said was, "Time is a flat circle." Building out anything more advanced than the most rudimentary of rolodexes required a lot of patience and forbidden knowledge. As noted earlier, the manual only gets you so far. There was a decent stream of books published during the early 80s which tried to fill various knowledge gaps. Some would tackle general "using your computer for business" while others would target specific software + hardware combinations. Database Management for the Apple from 1983, the release year for Superbase , has some great illustrations and explanations about databases and how they work conceptually. It digs into how to mentally adjust your thinking from manual filing to electronic filing. It also includes fully commented source code in BASIC for an entire database program. A bargain for $12.95 ($40 in 2025), but probably ignored by C64 Superbase users? Unfortunately for us in 1983, the book we Superbase users desperately need won't be published for three more years. Superbase: The Book , by Dr. Bruce Hunt, was published by Precision Software Ltd, the very makers of Superbase itself in 1986 for $15.95 ($47 in 2025). It straight up acknowledges the lack of help over the years in making the most of Superbase . "Part I: Setting Up a System" addresses almost every single thing I complained about in the tutorial. It contains a mea culpa for failing to help users build anything beyond the most rudimentary of address books. It then moves into "the most important discussion in the book." A conceptual framework for thinking about your existing files, and how to translate them into data that leverages Superbase's power, is well explained with concrete examples. As well, it works diligently to show you that the way files and fields were set up in the tutorials that shipped with Superbase was woefully inadequate for making good use of Superbase . We learned it by watching you! As an example, what was just "firstname" and "lastname" fields in the tutorial are considered here more thoroughly. We are given a proper mental context for why a name is more complex than it first looks. As data , it is better broken into at least five fields: title, initials, first name, surname, suffix. Heck, I'd throw "middle" in the mix as well. Then Dr. Hunt explains what is actually a very powerful idea: record fields don't have to exist exclusively for human-readable output purposes. That is true, and almost counter to the shallow ways fields are treated in the manual, which only ever seemed to consider field data as output to the screen or a printer. "The crucial realization is that you don't need to restrict the fields in the record to the ones that will be printed." Many examples of private data that you might want to attach to a customer record (for example) are given, as well as ways to use fields solely for the purpose of increasing the flexibility of Superbase's query tools. Lastly, in what felt like the book had thoroughly invaded my mind and read my thoughts directly, an entire section is devoted to understanding key values, how they work, and ideas for generating robust, flexible keys. The remainder of the book continues on in the same fashion, providing straightforward explanations and solutions to common user issues and confusions. It's a solid B+ effort, even if the Apple database book feels more friendly and carefully designed. I'd give this book an A had Precision Software not made its customers wait three years for it. Here in 2025, the further into the tutorial I delve, the more the word "deal-breaker" comes up. I'll start with the format of the "Date" field type, and maybe you can spot the problem? We can enter the date in two ways: means a two-digit year and ONLY two-digits. This restricts our range of possible years to 1900 - 1999. That's right, returning after a 30 year absence: it's the Y2K problem ! Not only does this prevent us from bringing Superbase into the future, but we also cannot log even the recent (relative to 1983) historical past. I had a great-grandmother alive at that time who was born in the late 1800s, yet Superbase cannot calculate her age. Moving on, a feature I enjoy in modern databases (or at least more sophisticated than Superbase ) is input validation. Being able to standardize certain field data against a master file, to ensure data consistency, would be really nice. It's also a bit of a drag that a record's key value can only ever be a text string, even if you only use numbers. The manual gives a specific workaround for this issue which is to pad a number string with leading zeros. This basically equates to no auto-increment for you. Something I very much appreciate is that the entire program can be run strictly through textual commands; no F-keys or menus necessary. In fact, I dare say the menus hide the true power of the system, functioning as a "beginner's mode" where the user is expected to graduate to command-line "expert mode" later. Personally, I say just jump straight into expert mode. We can use a convention in a command to read and write values from records. BASIC-style variables can store those values for further processing inside longer, complex commands. As a developer, I'm happy. As a non-developer, this would be an utter brick wall of complexity for which I'd probably hire an expert to help me build a bespoke database solution. "Batch" is similar to "Calc" (itself a free-form or record-specific calculator) which works across a set of records. We can perform a query, store the result as a "list," then "Batch" perform actions or calculations on every record in that list. Very useful, but it comes with a note. "Takes a while" is just south of an outright lie. I must remember that this represents many users' first transition to electronic file management. Anything faster than doing work by hand had already paid for itself; that's true even today. That said, consider this. I ran "Batch" on eight (8!) records to read a specific numeric field, reduce that value by 10%, then write that new, lower value back into each record. Now, further consider that a C64 floppy can hold about 500 records, which seems like a perfectly reasonable amount of data for a business to want to process. ONE AND A QUARTER HOURS! Look, I know it was magical to type a command, hit a button, and have tedious work done while you took a long lunch. I once tasked a Macintosh to a 48-hour render in Infini-D . Here in 2025, I'm balking even at the 6 minute best case scenario in VICE. On real hardware, we must also heed the advice from the book Business Systems on the Commodore 64 : In fairness, most of the things I'd want to do are simple lookups and record updates from time to time. Were I stuck on 1982 hardware, it would be possible to mitigate the slow processing by working processing-time into my weekly work schedule. I wouldn't necessarily be "happy" about that situation, and may even start to question my investment if that were the end of the features. Luckily, Superbase offers a killer feature which offsets the speed issue: programmability. The commands we've been using so far are in reality one-line BASIC programs, and more complex, proper programs can be authored in the "Prog" menu. We are now unbound, limited only by our knowledge of BASIC (so I'm quite limited) to extend the program, and work around the "deal-breakers" I encountered earlier. Not every standard BASIC command is available (we can't do graphics, for example), but 40 of the heavy hitters are here plus 50 Superbase- specific additions . I don't want to sound naive, but I was shocked at the depth and robustness, yes even the inclusion of its programming language. It's far more forward thinking than I expected for $99 on a 64K machine. But I also cannot credit the manual with giving too much help with these functions. It's quite bare-bones. After all is said and done, the simple form building and robust search tools have won me over, but the limitations are frustrating. Whether I could make this any kind of a daily driver depends on what I can make of the programmability. It's asking a lot of me to become proficient in BASIC here in 2025. But the journey is its own reward. I press onward. Initially I thought I would build a database of productivity software for the Commodore 64, inspired by Lemon64 . The truth is, after my training to-date I am still a fair distance from accomplishing that, though I can visualize a path to success. There are two main issues I need to solve within the confines of Superbase's tools and limitations. Doing so will give more confidence that it is still useful for projects of humble sizes. Thinking of a Lemon64-alike, to constrain the software "genre" field (for example), I need a master list against which to validate my input. Superbase has some interesting commands that appear to do cross-file lookups: The code examples are not particularly instructive, at least not for what I want to do. The linking feature needs a lot more careful attention and practice to leverage. Rethinking my approach to the problem of data conformity, I have come to realize that the answer was right in front of me. All I really need is the humble checkbox. There is no such UI element on a machine which pre-dates the Macintosh nor has a GUI operating system, but I can mimic one with a list of genre field names each of a single-character field length. Type anything into a corresponding field to designate that genre. When doing a query for genre, I can search for records whose matching field is "not empty." Faking it is A-OK in my book. Without a working date solution, my options for using Superbase in 2025 are restricted. I can either only track things from the 20th century, or only track things that don't need dates. Neither is ideal. Working on UNIX-based systems professionally all day long, I think it would be nice to get this C64 on board the "epoch time" train. Date representation as a sequential integer feels like a good solution. It would allow me to do easy chronologically sorting, do calendar math trivially, and standardize my data with the modern world. However, the C64's signed integers don't have the numeric precision to handle epoch time's per-second precision . A "big numbers" solution could overcome this, but that is a heavy way just to track the year 2000. If I limit myself to per-day precision (ignoring timezones, ahem ), that would cover me from 1970 - 2059. Not bad! I poked around looking for pre-existing BASIC solutions to the Y2K problem and came up empty-handed. Hopping into Pico-8 (my programming sketchpad of choice) I roughed out my idea as a proof of concept. Then, after many "How do I fill an array with data in BASIC?" simpleton questions answered by blogs, forum posts, and wikis I converted my Lua into a couple of BASIC routines which do successfully generate an epoch date from YYYY and back again. Y2K solved! Snippet from my date <-> epoch converter routines; now it's 2059's Chris's problem. 1 REM human yyyy mm dd to epoch day format 5 REM set up our globals and arrays 10 y=2025:m=8:d=29 11 isleap=0:yd=0:ep=0 15 dim dc%(12) 16 for i=1 to 12 17 read dc%(i) 18 next 99 REM this is the program proper, just a sequence of subroutines 100 gosub 1000 200 gosub 2000 300 gosub 3000 400 print "epoch: ";ep 900 end 999 REM is the current year (y) a leap year or not? 0=yes, 1=no 1000 if y-(int(y/4)*4) >0 then leap=1:goto 1250 1050 leap=0 1100 if y-(int(y/100)*100) > 0 then goto 1250 1150 leap=1 1200 if y-(int(y/400)*400) = 0 then leap=0 1250 isleap = leap 1300 return 1999 REM calculate number of days that have passed in the current year 2000 yd = dc%(m) 2010 yd= yd+ d 2020 if isleap=0 then yd=yd+1 2030 return 2999 REM the epoch calculation, includes leap year adjustments 3000 ty=y-1900 3010 p1 = int((ty-70)*365) 3020 p2 = int((ty-69)/4) 3030 p3 = int((ty-1)/100) 3040 p4 = int((ty+299)/400) 3050 ep=yd+p1+p2-p3+p4-1 3060 return 4999 REM days passed tally for subroutine at 2000 5000 data 0,31,59,90,120,151 5001 data 181,212,243,273,304,334 -------------------------------------------------------------------------------- 5 REM epoch date back to human readable format 10 y=0:m=0:d=0 11 isleap=0:yd=0:ep=20329 15 dim md%(12) 16 for i=1 to 12 17 read md%(i) 18 next 100 gosub 2000 200 print y, m, d 900 end 999 REM is the current year (y) a leap year or not? 0=yes, 1=no 1000 if y-(int(y/4)*4) >0 then leap=1:goto 1250 1050 leap=0 1100 if y-(int(y/100)*100) > 0 then goto 1250 1150 leap=1 1200 if y-(int(y/400)*400) = 0 then leap=0 1250 isleap = leap 1300 return 1999 REM add days to 1970 Jan 1 counting up until we reach our epoch (ep) target 2000 y=1970:dy=0:td=ep 2049 REM ---- get the year 2050 gosub 1000 2100 if isleap=0 then dy=366 2200 if isleap>0 then dy=365 2300 if td>dy or td=dy then td=td-dy:y=y+1:goto 2050 2399 REM ---- get the month 2400 m=1:dm=0 2500 dm=md%(m) 2700 if m=2 and isleap=0 then dm=dm+1 2800 if td>dm or td=dm then td=td-dm:m=m+1:goto 2500 2899 REM add in the remaining days, +1 because calendars start day 1, not 0 2900 d=td+1 3000 return 4999 REM days-per-month lookup array data 5000 data 31,28,31,30,31,30 5001 data 31,31,30,31,30,31 I'm hedging here as I've had a kind of up-and-down experience with the software. I have the absolute luxury of having the fastest, most tricked out, most infinite storage of any C64 that ever existed in 1983. Likewise, I possess time travel abilities, plucking articles and books from "the future" to solve my problems. I have it made. There are limitations to be sure, starting with the 40-column display. But I also find the limitations kind of liberating? I can't do anything and everything, so I have to focus and zero in on what data is truly important and how to store that data efficiently. The form layout tools are as simplistic as it gets, which also means I can't spend hours fiddling with layouts. Even if the manual let me down, the intention behind its design unlocks a vast untapped power in a Commodore 64. It's almost magical how much it can do with so little. I can easily see why it won over so many reviewers back in the day. Though the cost and complexity would have frustrated me back in the day, in the here and now with the resources available to me, it could possibly meet my needs for a basic, occasional, nuts-and-bolts database. It would require learning a fair bit more BASIC to really do genuinely useful things, but overall it's pretty good! Ways to improve the experience, notable deficiencies, workarounds, and notes about incorporating the software into modern workflows (if possible). While Warp mode in VICE is very handy, it's only truly useful when I hit slowness due to disk access. I'm sure I'll find more activities that benefit as this blog progresses, but for text-input based productivity tools, warp mode also warps the keyboard input. Utterly unusable. Basically I just use the system at normal speed. When I commit to a long-term action like loading the database, sorting, or something, I temporarily warp until I get feedback that the process is complete. Superbase The Book tells us that realistically a floppy will accommodate about 480 records. However, 1Mb and 10Mb hard drives are apparently supported, so storage should be fine with a proper VICE setup. VICE v3.9 (64-bit, GTK3) on Windows 11 x64sc ("cycle-based and pixel-accurate VIC-II emulation") drive 8: 1541-II; drive 9: 1581 model settings: C64C PAL printer 4: "IEC device", file system, ASCII, text, device 1 to .out file Superbase v3.01 (multi-floppy, 1581-compatible) Create information Transmit information Receive information "Ultimately, the workstation configuration will probably replace the usual office furnishings as the organization evolves toward the 'paperless office'" - The Office of the Future , Ronald P. Uhlig, 1979 "Transformation of the office into a paperless world began in the early 1980s. Computers have been an integral component of the paperless office concept." - O MNI Future Almanac , 1982 "This information revolution is transforming society through basic changes in our jobs and lifestyles. Indeed, the paperless office of the future and computerized home communications centers are information age miracles not to be hoped for, but expected." - America Wants to Know: The Issues and the Answers of the Eighties , George Horace Gallup, 1983 Base C64: 1 minute, 6.88 seconds In WARP: 6.22 seconds Base C64: 1.25 hours In WARP: 6 minutes I want to constrain some data to a standardized set of fixed values. I want to solve the Y2K problem. : select a second file in the same database only whose records you want to look up (no cross-database lookups) : specify the specific field in the file against which you want to do lookups : close the link to the second file : "reverse" the linked files; the linked becomes primary and vice versa Key input repeating like the system is demon possessed? Warp mode is probably still on. A snapshot saves the C64 state, but not the emulator state. So if you have a disk in the drive when you take a snapshot, that disk will not be inserted when you restore state. Save your snapshot with a name that reminds you which diskette should be inserted in which drive to continue smoothly from the snapshot. Superbase developers understood that data migration and interoperability are critical. We cannot have our data locked down into a proprietary format with no option to move to a different system. Print and Export accept formatting parameters which allow us to effectively duplicate CSV format. Printing with VICE generates an ASCII file. Exporting puts the data onto our virtual disk image. To get data off that disk image into our host operating system, we need to be able to browse disk contents and extract files. On Windows, DirMaster works nicely . For macOS and Linux, the DirMaster dev created dm , a command-line utility for browsing and working with C64 disk image files. Speed. I'm spoiled, I admit it. For standard searches it's snappy enough, but batch operations are tedious. Superbase isn't particularly easy to use with multiple floppies. The manual addendum for v3.01 says that a two-drive setup is supported, but I didn't really see how to do that. The initial data disk formatting routine offered no opportunity to point to drive #9, for example. I wish VICE would show the name of each .d64 file currently inserted into the virtual floppy drives. It's a little tough not having access to modern GUI elements in the form builder, like pull-down menus. "Build Your Own" is a powerful, flexible, time-consuming process using Superbase's programming tools. Getting around limitations of the pre-built fields, forms, etc seems possible with enough BASIC knowledge, time, and desire to commit to Superbase . Once that data is in there, it's honestly easier to let it stay there than try to work out some export/import function. This may be an issue for your use case.

Database

Tutorial

Hardware Lua

0 views

Kartik Agaram 6 months ago

Quickly make any LÖVE app programmable from within the app

It's a very common workflow. Type out a LÖVE app. Try running it. Get an error, go back to the source code. How can we do this from within the LÖVE app? So there's nothing to install? This is a story about a hundred lines of code that do it. I'm probably not the first to discover the trick, but I hadn't seen it before and it feels a bit magical. Read more

0 views

NULL on error 7 months ago

Carimbo now have a better stack trace and Sentry integration

As I’ve already said countless times, I’m working on my first game for Steam, which you can check out online at reprobate.site . Since it’s a paid game, I need to provide proper bug support. With that in mind, and based on both professional and personal experience, I decided to use Sentry . At first, integrating C++ and Lua should have been straightforward, but I ran into some issues when using the Conan package manager. Initially, the package wasn’t being included in the compiler’s include flags, which led me to open an issue both on Conan Center and on Sentry Native . After spending a whole day on this, I eventually found out that the fix was actually pretty simple. Now Carimbo has native support for Sentry, both on the web (WebAssembly) and natively (Android, iOS, Linux, Windows, and macOS). Here’s how I managed to get it working. This was certainly my biggest problem. For some reason, even when following Conan’s documentation, I couldn’t get it to include the header path for Sentry Native. In the end, my solution looked like this: The native part was pretty easy, but for the web part I had an insight while walking, because I realized I could inject JavaScript using the Emscripten API. Since the game assets are stored in a compressed file, PhysicsFS provides an API to handle them transparently. It’s great for distributing the game — you only need the cartridge.zip and the executable — and it works even better on the web. The engine must provide a searcher so that Lua can find the game’s other Lua scripts. For this, I use a custom searcher: it first looks for the scripts inside the game package, and if they’re not found, it falls back to the interpreter’s default search to load them from the standard library. To improve stack traces, the secret lies in the second parameter of lua.load. You can pass a string starting with followed by the file name. This alone gives you a much richer stack trace. In Carimbo, I have a terminate hook that catches any exception and atexit hooks to always handle cleanup. This way, I can provide Sentry support in a practically universal and abstract manner for the engine’s user. You can find more details about the engine and its implementation in the official repository: github.com/willtobyte/carimbo .

JavaScript

C++ Lua

0 views

NULL on error 8 months ago

Poor’s Man Shaders

Spoiler: it’s not shaders I’m waiting for a universal solution that the SDL developers are working on — cross-platform, multi-API shaders, the SDL_shadercross . The idea is that you write shaders in a single language, and at runtime, they get compiled for the target GPU. Unfortunately, it’s a large and complex project, and it will take time before it becomes stable. In the meantime, in my Carimbo engine, I was wondering if I could implement something similar to shaders — something that would allow Lua code to write arbitrary pixels into a buffer and stream that buffer into a texture. So I created what I call a canvas, which is basically a texture the same size as the screen, rendered after certain elements. The set_pixel function receives a pointer to a uint32_t buffer that exactly matches the texture size. This pointer is actually a Lua string, which I found to be the most performant way to transfer data between Lua and C++ without relying on preallocated buffers. On Lua side: Some effects I’ve created so far: https://youtu.be/GUWTWRQuzxw https://youtu.be/usJ9QM7V8BI https://youtu.be/DUhQmL91cNA

C++

0 views

Uros Popovic 8 months ago

Simple Lua integration in Go

Learn how to use Bazel to easily integrate (with a single command) mainline Lua in Go and potentially other languages (e.g. embedded C programs).

Backend

Tutorial Lua

0 views

NULL on error 9 months ago

AI will replace programmers—just not yet, because it still generates very extremely inefficient code.

I was working on my engine, which includes a sort of canvas where Lua code can generate chunks of pixels and send them in batches for the C++ engine to render. This worked very well and smoothly at 60 frames per second with no frame drops at low resolutions (240p, which is the screen size of my games). However, when I happened to try 1080p, the frame rate dropped. Since I was in a rush and a bit lazy—because I can’t afford to spend too much time on personal projects—I decided to use AI to optimize it, and this was the best solution I could squeeze out. It went from 40 FPS down to 17, much worse than the initial implementation! Naturally, the code was not just complex, but also way slower. That’s when I decided to take my brain off the shelf and came up with this solution: Kabum! Smooth 60 frames per second, even at 8K resolution or higher.

Lua

Performance

C++

0 views

maxdeviant.com 9 months ago

The ComputerCraft Iceberg

My friend Steffen recently turned me on to the ComputerCraft mod for Minecraft. For the uninitiated—a group I myself was a member of until a mere 24 hours ago—ComputerCraft is a mod that adds programmable computers and turtles to the game. "Turtles, you say? What, like these fellas ?" Cute as they may be, the sea variety of turtles are not the ones I'm excited to talk about today. Let me introduce you to a new kind of turtle: These turtles—which get their name from turtle graphics —are little robots that you can control programatically. Inside of each one is a ComputerCraft computer. Players are able to write programs in Lua and execute those programs on the turtle. Programs have access to a number of different APIs, including the module that provides functions for controlling the turtle. For instance, calling the function will move the turtle forward. Calling will have the turtle dig the block in front of it. It all started with a video Steffen sent me of a turtle-driven tree farm he had built in his world. The turtle would walk a loop around a patch of trees, checking each spot to see if a tree was grown yet. If it detected a grown tree, it would chop down the tree, replace it with a sapling, and continue on to the next spot. I decided to start up a new Minecraft world to give it a go. For my initial foray into working with turtles, I copied the tree farm program using the code that was visible in the video. I transcribed it, making a few tweaks as I went, and soon ended up with an automated tree farm of my own: During the course of building it and trying it out, I even managed to find a bug in the original program that needed fixing: With my wood situation sorted, I turned my attention to mining. Initially I wanted to write a branch mining program to assist me in quickly finding more diamonds, but this proved to be somewhat complex. I scoped down the implementation to a simple tunnel miner that would mine a tunnel and place torches on the wall every so often: It was at this point that my software engineer brain started screaming at me. I had these two working programs, but was already noticing common functions that were duplicated between the two. I factored out a new module to house the helper functions I had written for dealing with the turtle's inventory: Keeping with the mining theme, the next program I wrote was for digging out vertical mine shafts. I could imagine wanting to have different-sized mine shafts based on the need, so for this program I explored taking user input as arguments to the program: While working on that program, I noticed that could be generalized into a general-purpose function. While in this case we care about mining out a layer of blocks, the core algorithm of moving a turtle around a plane could have lots of different uses. I pulled this out into its own function: This refactoring then enabled me to quickly whip up a new program for having a turtle farm wheat for me: At this point it was bedtime, and I had wrapped up my first day of working with ComputerCraft. I had gotten to grips with basics of Lua (as this was my first time using it in any real capacity), written a handful of different programs, pulled some common functionality into modules, and was feeling pretty happy with it all. As I got ready for bed, I found myself pondering how I would maintain all of this code as I continued to expand my ComputerCraft usage. Something I had observed during my first day was that I spent a lot of time testing my programs "in production", as it were. The general flow of creating a new program looked something like: I spent a lot of time watching the turtle churn through its instructions waiting for it to reach the point in the program that needed testing and observation. I even created a separate Minecraft world that I would use to test my programs in before letting the turtles run them in my actual world. The process was slow and time-consuming. The answer to this, of course, was testing. I needed a way to write tests that I could run over and over as I made changes to the programs, and test that they were all still working in a variety of different scenarios. Bringing forth this vision of automated testing required one crucial component: a way to simulate ComputerCraft in a controlled environment. I'd spent the previous day steeped in Lua, but I set it aside for a moment and broke ground on a new Rust project. My initial idea for the simulator was quite simple: create a simplified representation of a Minecraft world, a simulated turtle that exists in that world, and an embedded Lua VM to run the programs. A few hours of hacking later, and I could write tests like this: There's still more surface area that the simulator will need to cover, but I'm excited that I was able to prove out the concept quickly. That's all for now, but I'll likely be writing more about my ComputerCraft adventures in the future. Write the first version of a program Run it on the turtle See something not work as expected Refine the program Rinse and repeat.

Rust

Gaming Lua

0 views

the website of jyn 10 months ago

how i use my terminal

this is a whole blog post because it is "outside the overton window"; it usually takes at least a video before people even understand the thing i am trying to describe. so, here's the video: the steps here that tend to surprise people are 0:11 , 0:21 , and 0:41 . when i say "surprise" i don't just mean that people are surprised that i've set this up, but they are surprised this is possible at all. here's what happens in that video: i got annoyed at VSCode a while back for being laggy, especially when the vim plugin was running, and at having lots of keybind conflicts between the editor, vim plugin, terminal, and window management. i tried zed but at the time it was quite immature (and still had the problem of lots of keybind conflicts). i switched to using nvim in the terminal, but quickly got annoyed at how much time i spent copy-pasting filenames into the editor; in particular i would often copy-paste files with columns from ripgrep, get a syntax error, and then have to edit them before actually opening the file. this was quite annoying. what i wanted was an equivalent of ctrl-click in vscode, where i could take an arbitrary file path and have it open as smoothly as i could navigate to it. so, i started using tmux and built it myself. people sometimes ask me why i use tmux. this is why! this is the whole reason! (well, this and session persistence.) terminals are stupidly powerful and most of them expose almost none of it to you as the user. i like tmux, despite its age, bugs, and antiquated syntax, because it's very extensible in this way. this is done purely with tmux config: and this is the contents of : i will not go through the whole regex, but uh. there you go. i spent more time on this than i probably should have. this is actually a trick; there are many steps here. this part is not so bad. tmux again. i also have a version that always opens an editor in the current pane, instead of launching in the default application. for example i use by default to view json files, but to edit them. here is the trick. i have created a shell script (actually a perl script) that is the default application for all text files. setting up that many file associations by hand is a pain. i will write a separate blog post about the scripts that install my dotfiles onto a system. i don't use Nix partly because all my friends who use Nix have even weirder bugs than they already had, and partly because i don't like the philosophy of not being able to install things at runtime. i want to install things at runtime and track that i did so. that's a separate post too. the relevant part is this: this bounces back to tmux. in particular, this is being very dumb and assuming that tmux is running on the machine where the file is, which happens to be the case here. this is not too bad to ensure - i just use a separate terminal emulator tab for each instance of tmux i care about; for example i will often have open one Windows Terminal tab for WSL on my local laptop, one for my desktop, and one for a remote work machine via a VPN. there's actually even more going on here—for example i am translating the syntax to something vim understands, and overriding so that it doesn't error out on the —but for the most part it's straightforward and not that interesting. this is a perl script that scripts tmux to send keys to a running instance of nvim (actually the same perl script as before, so that both of these can be bound to the same keybind regardless of whether nvim is already open or not): well. well. now that you mention it. the last thing keeping me on tmux was session persistence and Ansuz has just released a standalone tool that does persistence and nothing else . so. i plan to switch to kitty in the near future, which lets me keep all these scripts and does not require shoving a whole second terminal emulator inside my terminal emulator, which hopefully will reduce the number of weird mysterious bugs i encounter on a regular basis. the reason i picked kitty over wezterm is that ssh integration works by integrating with the shell, not by launching a server process, so it doesn't need to be installed on the remote. this mattered less for tmux because tmux is everywhere, but hardly anywhere has wezterm installed by default. honestly, yeah. i spend quite a lot less time fighting my editor these days. that said, i cannot in good conscience recommend this to anyone else. all my scripts are fragile and will probably break if you look at them wrong, which is not ideal if you haven't written them yourself and don't know where to start debugging them. if you do want something similar without writing your own tools, i can recommend: hopefully this was interesting! i am always curious what tools people use and how - feel free to email me about your own setup :) 0:00 I start with Windows Terminal open on my laptop. 0:02 I hit ctrl + shift + 5 , which opens a new terminal tab which 's to my home desktop and immediately launches tmux. 0:03 tmux launches my default shell, . zsh shows a prompt, while loading the full config asynchronously 0:08 i use to fuzzy find a recent directory 0:09 i start typing a ripgrep command. zsh autofills the command since i've typed it before and i accept it with ctrl + f . 0:11 i hit ctrl + k f , which tells tmux to search all output in the scrollback for filenames. the filenames are highlighted in blue. 0:12 i hold n to navigate through the files. there are a lot of them, so it takes me a bit to find the one i'm looking for. 0:21 i press o to open the selected file in my default application ( ). tmux launches it in a new pane. note that this is still running on the remote server ; it is opening a remote file in a remote tmux pane. i do not need to have this codebase cloned locally on my laptop. 0:26 i try to navigate to several references using rust-analyzer, which fails because RA doesn't understand the macros in this file. at 0:32 i finally find one which works and navigate to it. 0:38 i hit ctrl + k h , which tells tmux to switch focus back to the left pane. 0:39 i hit n again. the pane is still in "copy-mode", so all the files from before are still the focus of the search. they are highlighted again and tmux selects the next file in search order. 0:41 i hit o , which opens a different file than before, but in the same instance of . 0:43 i hit b , which shows my open file buffers. in particular, this shows that the earlier file is still open. i switch back and forth between the two files a couple times before ending the stream. i don't need a fancy terminal locally; something with nice fonts is enough. all the fancy things are done through tmux, which is good because it means they work on Windows too without needing to install a separate terminal. the editor thing works even if the editor doesn't support remote scripting. nvim does support RPC, but this setup also worked back when i used and . i could have written this such that the fancy terminal emulator scripts were in my editor, not in tmux (e.g. in nvim). but again this locks me into the editor; and the built-in terminals in editors are usually not very good. it's much easier to debug when something goes wrong (vscode's debugging tools are mostly for plugin extension authors and running them is non-trivial). with vim plugins i can just add statements to the lua source and see what's happening. all my keybinds make sense to me! my editor is less laggy. my terminal is much easier to script through tmux than through writing a VSCode plugin, which usually involves setting up a whole typescript toolchain and context-switching into a new project fish + zoxide + fzf . that gets you steps 4, 5, and kinda sorta-ish 6. "builtin functionality in your editor" - fuzzy find, full text search, tabs and windows, and "open recent file" are all commonly supported. qf , which gets you the "select files in terminal output" part of 6, kinda. you have to remember to pipe your output to it though, so it doesn't work after the fact and it doesn't work if your tool is interactive. note that it hard-codes a vi-like CLI ( ), so you may need to fork it or still add a script that takes the place of $EDITOR. see julia evans' most recent post for more info. e , which gets you the "translate into something your editor recognizes" part of 8, kinda. i had never heard of this tool until i wrote my own with literally the exactly the same name that did literally exactly the same thing, forgot to put it in PATH, and got a suggestion from asking if i wanted to install it, lol. or or , all of which get you 12, kinda. the problem with this is that they don't all support , and it means you have to modify this whenever you switch editors. admittedly most people don't switch editors that often, lol. terminals are a lot more powerful than people think! by using terminals that let you script them, you can do quite a lot of things. you can kinda sorta replicate most of these features without scripting your terminal, as long as you don't mind tying yourself to an editor. doing this requires quite a lot of work, because no one who builds these tools thought of these features ahead of time.

DevOps

JSON Lua Julia

0 views

Weakty 10 months ago

Radio Silence - An Unfinished Playdate Game

About 8 months ago, I started building my second game for the Playdate. The working title was "Black Hole" (later to be renamed "Radio Silence") and the idea was simple: you are a crew member in a spaceship that needs to enter a black hole. And so you orbit around the black hole, collecting raw materials from space debris, which you then use to craft items to help you successfully enter and cross through a black hole without getting crushed (Yes, I know that it's physically impossible to go through a black hole, but this is also a fictional game). When I started this game, I was coming off the momentum of having finished my first game for the Playdate, which was a success beyond my expectations (people actually paid for it!). I was determined to keep up that momentum. This time around I wanted to build something different. Whereas my first game used the Pulp game engine on the web, I now wanted to use Lua and to build something that was conceptually simpler—less of a story, and more of an arcade game. In the end, I lost steam on this project and have decided to let it go. But first, let me walk you through how far I got and some of the things I learned. As I mentioned, it's an arcade-style game and conceptually has a pretty simple story. The core loop is to collect materials, craft upgrades, and then to descend safely into the black hole. There were a few areas that I could embellish to make it more than this simple game loop, but at the end of the day, there wasn't much more to it than that. I'll walk you through the few main core mechanics that I had to build, show you some screenshots, and share what I enjoyed and disliked about that process. I began by creating a very simple mechanic, the ability for the ship to orbit a black hole and slowly descend into it. I spent maybe a week and a half on this, and it wasn't too complicated. I had to learn a little bit of math and got some help from GPT. In the end, I had a sprite that rotated around another sprite, the first of which would slowly move towards the second. Again, it was a matter of adding collision detection between the two so that when the ship touched the black hole, it would disappear, and the game would be over. But this alone was kind of boring. I needed to introduce some kind of mechanic that would keep the player paying attention, and I also wanted to find some way of using the crank. And so I built a strange little UI that was meant to represent the navigation of the ship to stay in balance with the black hole's orbit. The idea was that as you would rotate the crank, you needed to keep a special nav point in between two closing-in meters. It would get progressively harder as the game moved on, and using the crank, you would have to gently navigate this balance. Eventually, I had something in place so that when you lost control of the nav point's balance, it would cause the ship's health to decrease to a point where if it reached zero, it would fall into the black hole. For a while, I was satisfied with this, and then for a time, I thought, "This isn't very good, let's remove it." And then a few months later, I brought it back again. So I was feeling a little fickle about this feature. Before I got to the next large mechanic, I spent a good amount of time on the atmosphere of the game - I made a title screen, intro music, game music, and a generative background with shooting stars. This part was relaxing and fun, but also a place where I could noodle indefinitely. Here was when I played around with adding 1-bit characters from an asset pack I found on itch.io. I figured this could be a fun way to introduce a way of guiding the player on what to do if necessary. From the outset, I knew that I wanted to have a crafting component to this game. What I didn't know was how much work that would take. There is a lot of business logic that's involved in building a crafting system, and this really slowed down the momentum of the game. In fact, it got to a point where I really didn't want to work on it because of this crafting system. Some of the work involved: It was around this time that I started to think to myself, "I don't really want to make a game like this." And I realized that I was losing interest. I did have fun creating the raw materials, the items, and their respective descriptions (which had to be created as PNGs): At the beginning of this project, I started documenting my weekly progress on the game. This literally meant sitting in front of a camera, talking about what I had built, and then combining that with footage of the Playdate, development, and adding some reflections on top of that. I think I had some hopes that this would both a) be interesting, b) provide a record of my progress that I could feel motivated by seeing, and c) maybe get people interested in the game. Pretty typical stuff. While I do enjoy video editing, and I don't actually mind being on camera, I did find that this was pretty tedious by the end. It was taking away from my energy that I could be spending on the game, or really, on other things as well. None of the videos really got many views, just about 100 or so views per video. I was surprised to see that some repeat people would show up and comment and provide feedback as well as encouragement. That in itself is a nice component, especially when you're building something on your own, bit by bit. More importantly, I have a complicated relationship with social media. Mostly, I find participation in it to be a net negative for my life, personally. I felt that if I wanted something to be "successful" by the standards of YouTube, I would have to compromise some of my values on how I would present the content. This would mean creating something that would be oriented around clickbait, and unrealistic catfishing in terms of thumbnails and titles, rather than truly representative of what I want to make. At one point I was asking myself, am I trying to build an audience as fast and as large as possible? Or do I just want to make a game and be able to say I finished a game and here's what I learned? Documenting projects on YouTube doesn't need to be mutually exclusive of those two things, but I think that if any part of me had an interest in reaching more and more people and having a "successful" video, that it really actually just took away from my original intent. It's a tough balance to be sure. I eventually did finish the crafting system. And while I would term it as the motivation killer of the project, it was what came after that made me realize I didn't want to do any more work on this thing. After finishing these mechanics, when I looked at the game, I saw something that was very robotic. There weren't any animations, there wasn't any fluidity or juice as some people refer to it in games. And that was where I started to feel the most defeated. In fact, it wouldn't be too much work to add a few animations to spruce up the game. But something about even just having to learn how the animation class worked in the Playdate just felt insurmountable. I no longer wanted to work on the project. For what the game was, in its simplicity of the game loop, I just didn't feel like the idea and the execution so far merited any more time. Simple as that. If it isn't already pretty clear from this retrospective, there are a few things I would change. I wouldn't make YouTube videos at all. I just don't think it's worth it. While I did enjoy some parts of it, I think that it took away from having the energy to finish the game, and I think that it distracted me and discouraged me. I would have also spent more time brainstorming on what I wanted to build for my next game. While I was proud of myself for not picking the first idea that came into my head (which is what I did for my last game) and while I did indeed brainstorm several possible options, I think I should have spent more time thinking about what I really wanted to build. It's hard to let go of projects, especially ones that you've spent a fair bit of time on (and even more especially, ones that you have publicly discussed). It's hard to let go - but life's a bit too short to spend working on projects that you are toiling through and don't find too much joy in anymore. Thankfully, I learned a lot as I went along. Speaking of life - things are about to change around here anyway — We're having a kid! It's nice to let go of projects to make way for this big change; and this way I have one less unfinished thing floating around in my brain during this period. Someday, when I feel a bit more like I've got my feet under me with parenting, I might scope out something small I can build in the few scraps of time I have after the long (but rewarding) days to come. Will it be a game? Maybe. But maybe it'll be some writing, or drawing. Maybe something with a little less staring at a screen. Building the crafting menu sprite itself Building a means of collecting raw materials (asteroids) Displaying items that you've picked up in a grid Making it so that items that have been acquired can be combined with other items, i.e. creating recipes Showing the recipe once it's been crafted in the right-hand side pane of the menu Implementing the actual effects of that item once it has been crafted

0 views

NULL on error 10 months ago

How to avoid dynamic linking of Steam’s client library using a very old trick

As you know, this blog is more focused on sharing code snippets than on teaching, so today I’m going to show you something I recently discovered. If you’ve been following me, you know I’ve been working in my free time on a 2D game engine where creators can build games using only Lua — and I’d say, even fairly complex ones. Right now, I’m working on a point-and-click game that you can play here: https://bereprobate.com/. It’s built using this same engine, and I’m publishing builds in parallel to Steam and the Web using GitHub Actions. The thing is, Steam — which is the main target platform for this game — supports achievements, and I want to include them. But to use achievements, you have to link the Steam library to your engine. The problem is, doing that creates a dependency on that library in the binaries, which I don’t want. I also don’t want to maintain a separate build just for that. Then I thought: “Why not load the Steam library dynamically? Use LoadLibraryA on Windows and dlopen on macOS. (Sorry Linux — it’s Proton-only for now.)” I tried the experiment below, and it worked. If the DLL/dylib is present, the Steam features work just fine. If not, everything runs normally. Achivement class

Gaming Lua