OTA Connect Developer Guide

OSTree and TreeHub

OSTree

OSTree is an open-source tool that combines a "git-like" model for committing and downloading bootable filesystem trees, along with a layer for deploying them and managing the bootloader configuration. It is actively developed and support by Red Hat, and used in flatpak and Project Atomic.

For more on why OSTree is the best tool for the job of doing embedded device updates, you can also jump straight to Comparing full-filesystem update strategies.

Doing It Wrong™: Bad choices for embedded updates
  • Package managers

    Hey, every Linux distro does it this way—​why don’t I just use a dpkg/rpm/apk-based package manager for my embedded system? I can control it remotely, and as long as I maintain the same sequence of package installs from the same sources, I should have perfectly consistent filesystems, right?

    The problem with this approach is that updates aren’t guaranteed to be atomic, so it’s quite easy to get the system into a state that requires user intervention to fix, especially if it’s rebooted during an update. That might be fine on the desktop or on a server where there’s a reasonable expectation that a user could intervene, but it doesn’t work for embedded devices.

  • "Update Mode" and similar designs

    The idea here is that you boot into a mode that allows the root filesystem to be overwritten, either via a pre-downloaded image or via something streamed over the network. Fortunately, we rarely see this design in the wild anymore, with the occasional exception of industrial control systems where the update process is closely supervised. When we do see it, it’s typically a legacy of days when microcontroller flashing would be done in person, with a new image streamed over a serial interface to overwrite the old system. This is a very poor choice for embedded, because of the risk that users may disconnect the device while the new image is being flashed, potentially bricking the device or requiring a more difficult intervention to fix. Lower-end home routers and DSL modems sometimes go this route; doing a quick Google search for "firmware update bricked router" should show why this is a bad idea.

TreeHub

Since OSTree is "git-like", you can probably imagine that you can have remote repositories. TreeHub is exactly that. It’s seamlessly integrated into the meta-updater layer and into the HERE OTA Connect site itself. Your builds get automatically pushed to TreeHub as soon as you make them, and you can use OTA Connect to wirelessly update your devices—​one at a time, or in targeted campaigns. You can even set certain devices to automatically pull updates from TreeHub as soon as they’re pushed, and stop wasting time re-flashing the units on your test bench every time you build new code.

Comparing full-filesystem update strategies

OSTree provides a number of very significant technological advantages over other full-filesystem updating schemes. For embedded systems that need a solution for safe, atomic, full-filesystem updates, the usual approach is to have some kind of dual-bank scheme. Here, we’re going to take a look at the difference between OSTree and dual-bank systems, and the advantages OSTree can provide.

Dual-bank

In a dual-bank system, the read-only root filesystem is kept on a different partition from the writable user space, so that when an update is needed the whole partition can be overwritten. For atomicity and safety, this read-only partition is duplicated: there are two complete copies of the filesystem, kept on different partitions, and the active partition can be selected at boot time.

When the system needs to be updated, the new filesystem image is written to the inactive partition, and the next time the system reboots, that partition becomes the active one.

dual bank system update flow
Figure 1: Dual-bank update process (click to enlarge)

The main advantage of this update model is its safety. Updates are always strictly atomic, and there is always a known good image that can be rolled back to. However, there are significant trade-offs in flexibility and materials costs that must be made: the size of the root partition must be chosen when the system is flashed for the very first time, and the duplication of the root partition doubles the space required. When choosing how big to make the root partition, a device manufacturer has to consider not just how big their filesystem image currently is, but also must estimate and plan for the size of all future updates. If the size chosen is too small, it may restrict the ability to add new features. Making it larger, of course, adds to the bill of goods for the product—​and since it’s duplicated, every extra megabyte of future capacity actually costs two megabytes to accommodate.

OSTree

OSTree checksums individual files and stores them as content-addressed objects, much like git. The read-only filesystem is built by "checking out" a particular revision, and hardlinking the content-addressed objects into the actual Linux directory structure. Multiple filesystem versions can be stored, and any content that is duplicated across versions is only stored once. A complete history of all versions is stored in TreeHub, but it is not required to store that complete revision history on the device. Only one partition is needed—​writable user space can be on the same partition as the OSTree content store.

When the system needs to be updated, HERE OTA Connect sends a small metadata file with a particular commit identifier. The client pulls that commit from TreeHub, only downloading the new files, and only downloading binary diffs of changed files. Once the pull is complete and verified, the system is instructed to boot into the new version the next time it starts up.

ostree update flow
Figure 2: OSTree update process (click to enlarge)

With OSTree, you no longer need to guess how much room you might need in the future to expand your system; the OSTree content store expands and contracts as needed. You also save a significant amount of space, since only diffs between versions need to be stored. OSTree also allows you to garbage-collect old images: if you upgrade 1.0 → 1.1 → 1.2, for example, by default the OTA Connect client will garbage-collect all local objects unique to 1.0. If you decided later on that you in fact did want to go back to v1.0, you still could: if you pushed v1.0 from OTA Connect, the client would download only the diff from TreeHub, repopulate the local object store, and then reboot into that version. Of course, it’s also possible to configure OSTree to keep more than two revisions on the local disk; this can be particularly useful in QA workflows, allowing for rapid testing of a feature or an external integration against multiple different firmware versions.

Best yet, you get all of these benefits without having to give up the safety of a dual-bank setup. Updates are still strictly atomic; if power is lost during the download of an update, the client will still boot into the old system when it starts up next, and will simply resume the download it had begun. You still always have a known good image on the system to roll back to; in fact, as stated above, you can keep an arbitrarily large number of revisions—​an impossibility in a dual-bank system.