[Update] I managed to work out the AppArmor profile, and decided to go with guestmount for now.
I am setting up a maintenance pipeline for my virtual machines.
The pipeline has two main routines:
- The BACKUP routine: Every day, this routine shuts down each VM, backs up the data in /var, updates the VM's disk image if a new version is available, and then restarts it.
- The BUILD routine: Every week, this routine uses a special builder VM to create new disk images for all VMs.
There is a scheduling conflict with the builder VM: the BACKUP routine needs to shut it down, while the BUILD routine needs it running. To resolve this, I merged both into a single set of systemd services that runs daily. The BUILD routine starts automatically when the builder VM starts, at the end of the BACKUP routine. The builder VM's systemd unit has an ExecCondition= property, which is skipped 6 days a week.
Surprisingly, the most difficult part of this pipeline was not the scheduling, but the backup process itself.
There are two general approaches to backing up a VM: backing up the entire disk image or backing up the files directly from the filesystem.
Backing Up Disk Images
Backing up a disk image is straightforward because disk images are just regular files. However, this method is often inefficient:
- Deduplication may not work well if the disk image is compressed.
- Incremental backups are not natively supported.
- The entire disk image must be backed up, including unused and deleted data.
- You cannot easily choose which specific files to include or exclude from the backup.
There are some ways to improve this approach:
- For deduplication: I can decompress the disk image and pipe the data stream directly to the backup software without saving the decompressed file.
- For incremental backups: I can create snapshots and back up only the differences. I would also need to regularly merge the snapshots.
- To reduce backup size: I can defragment, shrink, wipe, and sparsify the disk image (for example, with virt-sparsify) before backing it up.
- To exclude specific files: I can put the files or directories that I don't want to back up on a separate disk image.
Backing Up Filesystems
One can back up files from inside the VM. It is also possible to mount the disk image from the host system when the VM is shut down.
Raw disk images can be mounted directly using loop devices.
Qcow2 images have several options:
- `qemu-nbd` can expose an image as a block device (e.g., /dev/nbd*), which can then be mounted. This requires the nbd kernel module and root access. The qemu-nbd man page warns that this may not be suitable for untrusted guests. To back up multiple VMs, I would also need a way to find an available /dev/nbd* device.
- The block device can be exported without root, there are tools like `qemu-nbd`, `nbdfuse` and `qemu-storage-daemon`. This article is worth reading.
- `qemu-nbd` also supports exposing an internal snapshot of a qcow2 image.
- `guestmount` can mount a disk image without needing root. It uses QEMU and FUSE. While it works, it can be slow, and creating a secure AppArmor profile for it is difficult due to its complexity.
My Thoughts
All of these options have trade-offs between security, performance, complexity, and flexibility.
In my case, my priorities are:
- Security: I do not trust the guest VMs. This means the guest should not connect to the host, and the host should not load the guest's disk image using a kernel module. While I could move the backup logic to a separate VM, this would add a lot of complexity.
- Simplicity: I want a simple workflow that is easy to maintain and secure. I prefer to avoid writing complicated AppArmor profiles that are difficult to update.
- Performance: I don't need the backup to be super fast, meanwhile I don't want to keep a VM shut down for too long.
- Space Efficiency: I don't have a large amount of data, so disk space is not a major concern.
Considering these factors, I prefer backing up the entire disk image. Recall that my root filesystem is created from a Containerfile, and /etc is transient, so I primarily need to back up /var.
For now, I plan to use qcow2 images with compression. I will decompress the disk image and pipe the data to the backup software.
In the future, I might explore some optimizations:
- Using a raw disk image on a ZFS host filesystem with compression and possibly deduplication enabled.
- Taking a snapshot (either qcow2 or ZFS) and backing it up. This would allow me to restart the VM without waiting for the backup to finish.
Comments