Skip to main content

Exploring Immutable Distros and Declarative Management

My current server setup, based on Debian Stable and Docker, has served me reliably for years. It's stable, familiar, and gets the job done. However, an intriguing article I revisited recently about Fedora CoreOS, rpm-ostree, and OSTree native containers sparked my curiosity and sent me down a rabbit hole exploring alternative approaches to system management. Could there be a better way?

Core Goals & Requirements

Before diving into new technologies, I wanted to define what "better" means for my use case:

  • The base operating system must update automatically and reliably.
  • Hosted services (applications) should be updatable either automatically or manually, depending on the service.
  • Configuration and data files need to be easy to modify, and crucially, automatically tracked and backed up.

Current Setup: Debian Stable + Docker


My current infrastructure consists of several servers, all running Debian Stable.

  • System Updates are andled automatically via unattended-upgrades.
  • Services consist of a mix of native Debian packages and applications running in rootless Docker containers. Docker images are updated either periodically via scripts or manually when needed.
  • Config and data files are backed up automatically using rsync, but tracking which files need backing up is manual. This is my main pain point. Every time I add or modify a service or configuration file, I have to remember to update my backup scripts.
While this setup works, it's not perfect:
  • The manual tracking of configuration files is error-prone and tedious. I like how systems like NixOS manage the entire system declaratively from configuration files, automatically tracking changes. However, I have reservations about NixOS's learning curve and ecosystem, and I generally prefer sticking to more "traditional" distributions like Debian if possible.
    • Some services are Internet-facing. While rootless Docker improves security over rootful Docker, it's not a full security boundary. Moving these services into dedicated VMs could offer better isolation.

Fedora CoreOS + OSTree Native Container

Fedora CoreOS is an automatically updating, minimal, monolithic, container-focused operating system. Key concepts here include:
  • The core OS is largely read-only, making updates safer and more predictable (atomic rollbacks).
  • Configuration is applied on the first boot using Butane/Ignition. This is great for initial setup but less ideal for ongoing configuration changes.
  • rpm-ostree allows layering additional RPM packages onto the immutable base image. While OSTree tracks file changes, rpm-ostree itself doesn't inherently provide a declarative way to manage the list of installed packages or track arbitrary configuration file modifications in a user-friendly, declarative manifest like NixOS does. It knows files changed, but not necessarily why in a structured way (e.g., "/usr/bin/vim was added" vs "package vim was installed").
  • OSTree Native Containers: The article that inspired me highlighted using OSTree's ability to pull container images as system updates. The workflow looks like this:
    • Define a system image using a Containerfile/Dockerfile, starting from a CoreOS base.
    • Build this definition into an OCI container image.
    • Install a standard CoreOS, using a minimal Butane config that tells rpm-ostree to "rebase" onto your custom container image.
    • To update the system or add packages, modify the Containerfile, rebuild the image, push it to a registry, and the CoreOS systems will eventually pull and apply it as their next OS update.
This approach essentially creates a custom, version-controlled OS image. Variations include building the image via CI/CD (like quay.io), building locally, or even using coreos-assembler with a lower-level "treefile" manifest.

Bootable Containers

Emerging from the CoreOS world is Fedora Bootc. This seems specifically designed for use cases requiring more customization than standard CoreOS allows. Instead of Ignition and rpm-ostree, bootc directly manages bootable container images derived from a Containerfile. You essentially build your entire OS, including customizations, as a container image, and the system boots directly into it and updates by pulling new image versions. This feels conceptually cleaner and more aligned with the goal of a declaratively built system image.

Cloud-init

cloud-init is the de facto standard for bootstrapping cloud instances across various Linux distributions. Like Butane/Ignition, it runs on first boot, but critically, it can be re-run on subsequent boots. This makes testing configuration changes much easier, as you don't necessarily need to re-image the entire machine for every small tweak. It's widely supported but focuses more on initial setup and less on managing the entire OS lifecycle declaratively like the OSTree/Bootc approaches.

cloud-init is a popular tool to bootstrap linux machines for cloud. So it is similar to butane for CoreOS. However, cloud-init can be re-run, which means testing a simple change does not require re-imaging the machine.

Guix

Guix System is the GNU project's take on a declarative operating system, similar in principle to NixOS. It uses Guile/Scheme for its configuration language and boasts a clean command-line interface. While appealing from a purity perspective, its smaller community and adoption compared to NixOS make it a less pragmatic choice for me currently. Like NixOS, it can build VM images and mange VMs.

Conclusion and Thoughts

  • Conceptually, Fedora Bootc is the most compelling alternative I found. It directly addresses the desire to build a customized, yet manageable and updatable, system image using familiar container tooling. The main drawback? It's very new, lacks widespread adoption, and crucially, there's no "Debian Bootc" yet. If a stable, Debian-based bootc implementation existed, I'd likely jump on it.
  • Cloud-init is mature, widely adopted, and works across many distributions (including Debian). It could improve my bootstrapping process, but it doesn't solve the core desire for managing the entire system state declaratively post-install.
  • CoreOS/OSTree: While powerful, the standard CoreOS workflow with rpm-ostree layering feels slightly less integrated than bootc for building a heavily customized system declaratively. The native container approach is interesting but adds complexity.
  • NixOS/Guix/Ansible: These are powerful, but represent a significant shift in tooling and philosophy. They feel like a larger commitment than I'm ready for, especially given my preference for sticking closer to traditional distribution paradigms if possible.

For the time being, I'll likely stick with my current Debian + Docker setup. On the other hand, I might as well start a VM and try some options for better understanding.

Comments

Popular posts from this blog

Determine Perspective Lines With Off-page Vanishing Point

In perspective drawing, a vanishing point represents a group of parallel lines, in other words, a direction. For any point on the paper, if we want a line towards the same direction (in the 3d space), we simply draw a line through it and the vanishing point. But sometimes the vanishing point is too far away, such that it is outside the paper/canvas. In this example, we have a point P and two perspective lines L1 and L2. The vanishing point VP is naturally the intersection of L1 and L2. The task is to draw a line through P and VP, without having VP on the paper. I am aware of a few traditional solutions: 1. Use extra pieces of paper such that we can extend L1 and L2 until we see VP. 2. Draw everything in a smaller scale, such that we can see both P and VP on the paper. Draw the line and scale everything back. 3. Draw a perspective grid using the Brewer Method. #1 and #2 might be quite practical. #3 may not guarantee a solution, unless we can measure distances/p...

Chasing an IO Phantom

My home server has been weird since months ago, it just becomes unresponsive occassionally. It is annoying but it happens only rarely, so normally I'd just wait or reboot it. But weeks ago I decided to get to the bottom of it. What's Wrong My system set up is: Root: SSD, LUKS + LVM + Ext4 Data: HDD, LUKS + ZFS 16GB RAM + 1GB swap Rootless dockerd The system may become unresponsive, when the IO on HDD  is persistantly high for a while. Also: Often kswapd0 has high CPU High IO on root fs (SSD) From dockerd and some containers RAM usage is high, swap usage is low It is very strange that IO on HDD can affect SSD. Note that when this happens, even stopping the IO on HDD does not always help. Usually restarting dockerd does not help, but rebooting helps. Investigation: Swap An obvious potential root cause is the swap. High CPU on kswapd0 usually means the free memory is low and the kernel is busy exchanging data between disk and swap. However, I tried the following steps, none of the...

Fix Google Security Code

Google Security Code (http://g.co/sc) is one type of 2-step verification. This is particularly useful when security keys and passkeys are not available. I have been using it in my LXC containers, until today I found out that it stopped working. It just kept saying "The code is invalid". It is easy to rule out some factors: The code works on other browsers on my laptop. The code works on other devices that are directly connected to the router. So it appears that Google also checks IP addresses besides the security code. Recently I have IPv6 enabled, so most devices that are directly connected to the router have both IPv4 and IPv6 addresses. But  I only enabled IPv4 for my LXC containers. So I guess when a code is generated by device A and used by device B, Google should be able to check that device A and device B are closely located. But in my case, IPv6 address appears on device A but not on device B, which may look suspicious. To fix the problem, I just needed to disable IPv...