computing 2+2: so many sandboxes
Sandboxes are so in right now. If you're doing agentic stuff, you've now doubt thought about what Simon Willison calls the lethal trifecta : private data, untrusted content, and external communication. If you work in a VM, for example, you can avoid putting a secret on that VM, and then that secret--that's not there!--can't be exfiltrated. If you want to deal with untrusted data, you can also cut off external communication. You can still use an agent, but you need to either limit its network access or limit its tools. So, today's task is to run five ways. Cloud Hypervisor is a Virtual Machine Monitor which runs on top of the Linux Kernel KVM (Kernel-based Virtual Machine) which runs on top of CPUs that support virtualization. A cloud-hypervisor VM sorta looks like a process on the host (and can be managed with cgroups, for example), but it's running a full Linux kernel. With the appropriate kernel options, you can run Docker containers, do tricky networking things, nested virtualization, and so on. Lineage-wise, it's in the same family as Firecracker and crosvm . It avoids implementing floppy devices and tries to be pretty small. Traditionally, people tell you to unpack a file system and maybe make a vinyl out of it using an iso image or some such. A trick is to instead start with a container image for your userspace, and then you get all the niceties (and all the warts) of Docker. Takes about 2 seconds. gVisor implements a large chunk of the Linux syscall interface in a Go process. Think of it as a userland kernel. It came out of Google's AppEngine work. It can use systrap/seccomp, ptrace, and KVM tricks to do the interception. The downside of gVisor is that you can't do some things inside of it. For example, you can't run vanilla Docker inside of gVisor because it doesn't support Docker's networking tricks. Again, let's use Docker to get ourselves a userland. No need for a kernel image. stands for "run secure container." Monty is a Python interpreter written in Rust. It doesn't expose the host, but can call functions that are explicitly exposed. This one's super fast. Pyodide is CPython compiled to WebAssembly. Deno is a JS runtime with permission-based security. Deno happens to run wasm code fine, so we're using it as a wasm runtime. There are other choices. Chromium is probably the world's most popular sandbox. This is pretty much the same as Deno: it's the V8 interpreter under the hood. Lots of ways to drive Chromium. Puppeteer, headless , etc. Let's try rodney : Run pyodide inside Deno inside gVisor inside cloud-hypervisor. Setting up the networking and the file system/disk sharing for these things is usually not trivial, especially if you don't want to accidentally expose the VMs to each other, and so forth. I want to compare two possible agents: a coding agent and a logs agent. A coding agent needs a full Linux, because, at the end of the day, it needs to edit files and run tests and operate git. Your sandboxing options are going to end up being a VM or a container of some sort. A logs agent needs access to your logs (say, the ability to run readonly queries on Clickhouse) and it needs to be able to send you its output. In the minimal case, it doesn't need any sandboxing at all, since it doesn't have access to anything. If you want it to be able to produce a graph, however, it will need to write out a file. At the minimum, it will need to take the results of its queries and pair them with an HTML file that has some JS that renders them with Vegalite. You might also want to mix and match the results of multiple queries, and do some data munging outside of SQL. This is all where a setup like Monty or Pyodide come in handy. Giving the agent access to some Python expands considerably how much the agent can do, and you can do it cheaply and safely with these sandboxes. In this vein, if you use DSPy for RLM, its implementation gives the LLM the Deno/pyodide solution to let the LLM have "infinite" context. Browser-based agents are a thing too. Itsy-Bitsy is a bookmarklet-based agent. It runs in the context of the web page it's operating on. Let me know what other systems I missed!