I decided to take the plunge into micro VMs. My goal? To set up a headless micro VM capable of running graphical programs remotely.
As a first milestone, I wanted to get Firefox running and smoothly playing videos. (See Part 1 for a breakdown of why I passed on other isolation methods.)
Overview
At a high level, the concept is simple: I click an icon, and Firefox opens seamlessly on my screen while actually running securely in a VM.
This setup is similar to disposable VMs in Qubes OS. When the program closes, the VM is destroyed, leaving absolutely zero trace on the disk. To pull this off, I needed to boot a micro VM with a minimal kernel and disk image, and seamlessly forward both graphics and audio to my daily-driver main VM.
The Kernel
Unlike standard VMs, micro VMs do not support PCI devices. Instead,
they rely on different protocols (like virtio_mmio and
virtio_blk), which the kernel must support natively.
Crucially, these drivers must be compiled directly into the kernel
rather than loaded as modules.
I tested a few pre-built kernels, but none fit the bill:
- Alpine image
- Debian standard kernel
- Debian cloud image (nocloud)
I decided to build my own using the kernel config recommended by Firecracker (config, doc). It turned out to be a straightforward process, resulting in a tiny, lightning-fast kernel.
The Disk Image
To run multiple programs in their own isolated micro VMs, I plan to build a single shared disk image containing all the necessary software. I can then launch multiple micro VMs from this identical base, instructing each one to run a different program, much like templates in Qubes OS.
Because I am using direct kernel boot, there is no need for
initramfs or initrd. In fact, I experimented
with an initramfs-only setup (no disk image at all) but
quickly abandoned it. I needed a file that could be directly mapped to
memory, and extracting an archive into RAM defeated the purpose.
Naturally, I tried a few pre-built disk images:
- Alpine image: Failed.
/sbin/initruns but complains that openrc is missing. This comment suggests I might need to manually install and configure it. - Alpine netboot: Boots, but expects alternative boot media.
- Debian cloud image: Works, but too slow (~8 seconds) to boot.
I also considered reusing the disk image of my Viewer VM (my daily
driver, akin to sys-gui in Qubes OS). While it would
guarantee an up-to-date environment, I hit a few roadblocks:
- It would require a dedicated
/homepartition. - There was a high risk of accidentally copying sensitive
/etcsecrets into an untrusted VM. - My daily driver, Fedora Silverblue, is simply too bloated for a micro VM.
Building and Formatting
I needed a minimal footprint and official repository support, making Debian the obvious choice (though I may explore Fedora later for specific packages).
To build the image, there are a couple of options, but they all have too many dependencies:
bootc(which I’ve tried previously, but only for servers)mkosi(which I’ve also tried before)virt-builder
While debootstrap is classic for Debian, it struggled
without root privileges. I successfully pivoted to
mmdebstrap, which gracefully handled most issues. I did
have to manually fix /etc/resolv.conf to prevent Podman’s
virtual DNS from leaking into the VM.
For the format, I needed to minimize the host’s memory footprint,
including the page cache. I also plan to let multiple micro VMs use
shared memory for the disk image. This means I will need to put raw disk
images into huge pages, so I naturally passed on LVM thin snapshots.
Because QEMU’s qcow2 format isn’t natively supported by
mmdebstrap, I am just storing the raw disk images offline
(roughly 1.1GB). In the future, I might compress them manually.
Maintenance
Keeping the kernel and disk image updated is tricky but critical. My current plan is to schedule offline weekly or biweekly builds from scratch. While in-place upgrades might be possible, building fresh is significantly safer and cleaner.
Running the Micro VM
Running the VM wasn’t too difficult, but it came with its own set of quirks:
- Standard serial ports show early boot messages, while
hvc0does not. - By default, QEMU uses SeaBIOS (with
-M microvm). Despite documentation claiming it supports direct kernel boot, the kernel couldn’t find the disk. Enablingqbootultimately solved the issue. - To avoid using host memory, I wanted to copy the disk image to
/dev/hugepages. This was surprisingly stubborn: I could create, read, andmmapfiles there, but not write to them. I ended up writing a custom Python script to “copy” the image file usingmmap. - I bridged pre-allocated TAP devices for each VM to handle firewall
rules easily, avoiding the setuid
qemu-bridge-helper. Since my minimal image lacked basic networking tools likeip,ping,ifconfig, andsystemd-networkd, debugging was a fun adventure using/proc/net/devand/proc/net/fib_trie. I also learned about theip=:::::eth0:dhcpkernel parameter.
Graphics
Without graphics, this setup is just a science experiment. This was the most fascinating and difficult part of the build.
Options
Remote desktop approaches like VNC, RDP, and SPICE are good defaults. However, I avoided them because I want:
- A minimal guest system
- A seamless experience
- A (more) secure setup
I surveyed options capable of directly forwarding Wayland:
- waypipe: forwards the Wayland protocol over a network.
- Sommelier:
part of Chromium OS; delegates compositing in the VM to the host using
shared memory (Read
more). This relies on the
virtio_wlkernel module, and it is designed to work with crosvm. Spectrum OS also utilizes it (Read more). - wayland-proxy-virtwl:
similar to Sommelier, also meant for
crosvm. It can directly talk to a local Wayland compositor or viavirtio-gpu. (Read more). - qubes-wayland, designed for Wayland in Qubes OS, using Xen’s vchan.
Ultimately, I decided to use waypipe.
Running Waypipe
Since my Viewer VM and the micro VM are isolated on the network, I decided to connect them using vsock. The waypipe manpage mentioned:
When running both client and server in virtual machines it is possible to enable the VMADDR_FLAG_TO_HOST flag for sibling communication by prefixing the CID with an s
I thought, “Ha! That’s not very difficult,” and was excited to try it
out. However, before long, I realized it wouldn’t work without a vsock
bridge like vhost-device-vsock,
which is also mentioned in the manpage a few lines below. I don’t like
this bridge, so I just used socat to forward the vsock port
from the host to the Viewer VM. It worked perfectly.
Isolation and Authentication
The tricky part was routing multiple untrusted micro VMs securely. If I have two micro VMs running, how do I ensure VM A cannot intercept VM B’s waypipe connection?
One option is to use waypipe ssh, where the Viewer VM
initiates the entire process. It should work very securely, but I don’t
prefer the encryption overhead.
Instead, I designed a custom authentication/dispatcher setup:
- On the Host: A stateless proxy forwards a vsock port to the Viewer VM, prepending the peer’s CID before the actual data.
- In the Viewer VM: A receiving bridge reads the CID, decides which UNIX socket the micro VM is authorized to use, and routes the connection accordingly.
- In the Micro VM: waypipe connects out to the host’s vsock port.
I wrote this bridge in ~100 lines of Python using
os.splice() and asyncio. Handling the
scheduling, blocking IO, EOF, and error propagation was incredibly
tricky, but the resulting efficiency was worth it.
Theoretically, waypipe in the Viewer VM could directly listen on vsock ports, eliminating the need for the bridge. In practice, there are issues:
- I run waypipe in a rootless Podman container, which cannot listen on
a vsock without the
--privilegedflag. It turns out vsock is blocked by seccomp by default, and I’d have to write a custom profile to allow it. - The authentication logic needs to be moved to the host. While the host can directly see the peer CID, coordinating which vsock ports are open to which micro VMs between the Viewer VM and the host is complicated.
Audio
Waypipe handles graphics, keyboard, and mouse data, but leaves audio behind. QEMU features a PipeWire backend, but utilizing it would have required installing bulky GUI packages in Debian, as Debian does not offer a finer-grained package.
Instead, I built a multi-hop local proxy that forwards a vsock port
directly to the local pipewire-pulse UNIX socket. This
completely bypasses the network and PipeWire’s built-in authentication,
which is perfectly acceptable for my isolated use case.
Side note: I also tried to simply add Listen=vsock... to
pipewire.socket, but it didn’t work. I guess I’d have to
add something to PipeWire config as well, which may not be even
supported
Running Firefox
With the infrastructure in place, it was time to run Firefox. I wrapped it in a systemd service within the micro VM so it can automatically shut down the micro VM when Firefox exits.
It worked like a charm.
Video playback is surprisingly smooth and completely lag-free. The CPU usage is slightly higher than native execution, but acceptable. I might look for better software decoders in the future.
Conclusion
Building this headless micro VM setup was a fantastic learning adventure.
There are still plenty of missing pieces to tackle in the future. To name a few:
- Exploring options that are more secure than waypipe.
- Optimizing memory usage across multiple micro VMs.
- Enforcing specific window border colors for visual security cues.
- Seamlessly opening host files or URLs inside the micro VMs.
- Implementing persistent states (similar to Qubes OS AppVMs).
Comments