Proposal: Live Upgrade
Replacing a running service with a new binary, without dropping outstanding capability references or losing in-flight work.
Problem
In a Linux-like system, “upgrading a service” is one of:
- Restart: stop the old process, start the new one. Clients holding
file descriptors, sockets, or pipes to the old process receive
ECONNRESETorEPIPEand must reconnect. Session state is lost unless clients serialize it themselves. - Graceful restart (nginx
-s reload, unicorn, systemd socket activation): new process starts alongside old, inherits the listening socket, old drains in-flight requests. Works only for request/response protocols where the session is the request. Does nothing for stateful sessions. - Live patch (kpatch, ksplice): binary-level function replacement. Narrow, fragile, no schema for state layout changes.
None of these compose with a capability OS. A CapId held by a client
points at a specific process; if that process exits, the cap is dead.
There is no “the service” abstraction the kernel could re-bind — the
point of capabilities is that they identify a specific reference, not
a name that could be redirected after the fact.
But capOS has a kernel-side primitive the Linux model lacks: the kernel
already owns the authoritative table of every CapId and which process
serves it. Rewriting “cap X is served by process v1” → “cap X is served
by process v2” is a table update. The question is when it is safe, and
how v2 inherits enough state to answer the next call.
Three Cases
Live upgrade has three distinct cost profiles. The right design is to make each one explicit rather than pretend the hard case doesn’t exist.
Case 1: Stateless services
Each SQE is independent; the service holds no state that matters across calls. A request router, a pure codec, a logger that flushes to an external sink.
Upgrade is trivial: start v2, retarget every CapId from v1 to v2,
exit v1. Clients may observe a small latency spike; no DISCONNECTED
CQE fires. Only the kernel primitive is needed.
Case 2: State externalized into other caps
The service’s in-memory data is a cache or dispatch table; durable state
lives behind caps the service holds (Store, SessionMap, Namespace).
v1’s held caps are passed to v2 at spawn time (via the supervisor, per
the manifest), kernel retargets client caps, v1 exits.
Architecturally this is the idiomatic capOS pattern: services stay thin, state is factored into dedicated holders with their own caps. The Fetch/HttpEndpoint split in the service-architecture proposal already pushes in this direction. In that world, most services fall into this bucket by construction.
Case 3: Stateful services requiring migration
The service has in-memory state that matters: a JIT’s code cache, a codec’s ring buffer, a parser’s arena, session data not yet flushed. Upgrade requires v1 to hand its state to v2.
capOS’s contribution here is that the state wire format is already capnp — the same format the service uses for IPC. v1 serializes its state as a capnp message; v2 consumes it. There is no separate serialization layer to build and no opportunity for it to drift from the IPC format.
The contract extends the service’s capnp interface:
interface Upgradable {
# Called on v1 by the supervisor. Returns a snapshot of service
# state and stops accepting new calls. Calls already in flight
# complete before the snapshot returns.
quiesce @0 () -> (state :Data);
# Called on v2 after spawn. Loads state from the snapshot. After
# this returns, v2 is ready to serve calls.
resume @1 (state :Data) -> ();
}
The state schema is service-defined. Schema evolution follows capnp’s standard rules: adding fields is backward-compatible, renaming requires care, removing requires a major version bump.
Kernel Primitive: CapRetarget
The kernel exposes the retarget as a capability method, not a syscall:
interface ProcessControl {
# Atomically redirect every CapId currently served by `old` to
# be served by `new`. Requires: `new` implements a schema
# superset of `old` (schema-id compatibility), `new` is Ready,
# `old` is Quiesced (graceful) or the caller has permission to
# force.
retargetCaps @0 (old :ProcessHandle, new :ProcessHandle,
mode :RetargetMode) -> ();
}
enum RetargetMode {
graceful @0; # old must be Quiesced; in-flight calls drain on old
force @1; # caps redirect immediately; in-flight calls fail
}
Only a process holding a ProcessControl cap to both processes can
perform this — typically the supervisor that spawned them. The kernel
never initiates upgrades.
Atomicity is per-CapId. From a client’s perspective, the retarget is a
single point in time: a CALL SQE submitted before retarget goes to v1;
a CALL SQE submitted after goes to v2. A CALL already dispatched to v1
either completes there (graceful) or returns a DISCONNECTED CQE
(force).
Supervisor-Level Upgrade Protocol
The primitives above compose into a protocol the supervisor runs:
1. spawn v2 from the new binary in the manifest
2. Case 1 & 2: v2.resume(EMPTY_STATE)
Case 3: state = v1.quiesce()
v2.resume(state)
3. kernel.retargetCaps(v1, v2, graceful)
4. wait for v1 to drain (graceful mode)
5. v1.exit()
If any step fails, the supervisor rolls back: kill v2, resume v1 (if quiesced), log the failure. Because the retarget hasn’t happened yet, clients never observe the aborted attempt.
In-Flight Calls
The subtle case is a client that has already posted a CALL SQE to v1 when the retarget happens. Two options:
- Graceful mode. v1 finishes the call, kernel routes the CQE back to the client on v1’s ring. v1 exits only after its ring is empty. This preserves call semantics; v1 and v2 coexist briefly.
- Force mode. The in-flight CALL returns
DISCONNECTED. Client retries against v2. Appropriate when v1 is wedged andquiescewon’t return.
In graceful mode the client cannot distinguish “call landed on v1” from “call landed on v2” — which is the point. Capability identity survives the upgrade; process identity does not.
Relationship to Fault Containment
Live upgrade and fault containment (driver panics → supervisor respawns) share machinery. The difference is one step of the protocol:
- Fault containment: v1 has crashed; kernel has already marked it
dead and epoch-bumped its caps. Supervisor spawns v2, issues a
graceful retarget (no quiesce — v1 is gone; in-flight CALLs already
delivered
DISCONNECTED). Clients reconnect to v2. - Live upgrade: v1 is healthy; supervisor initiates
quiesce→ state transfer → retarget, and no CQE ever reportsDISCONNECTEDto any caller.
The epoch-based revocation work from Stage 6 is the foundation for both. CapRetarget is one additional primitive layered on top.
Security and Trust
Live upgrade does not expand the trust model. The supervisor already holds the authority to kill, restart, and reassign caps for services it spawned — upgrade is a refinement of that authority, not a new principal. Requirements:
- Only a holder of
ProcessControlcaps to botholdandnewcan callretargetCaps. By construction this is the supervisor that spawned them. - The new binary must be legitimately obtained — in practice, loaded from the same content-addressed store as everything else (ties to Content-Addressed Boot).
- Schema compatibility (
newis a superset ofold) is checked by the kernel before retarget. This prevents an upgrade from silently narrowing the interface clients depend on.
Non-Goals
- Code hot-patching. No binary-level function replacement. Upgrade is at the process boundary, not the symbol boundary.
- Kernel live replacement. Covered by Reboot-Proof / process persistence (reboot with state preserved, not live replacement). The kernel is a single trust domain; replacing it in place needs a different design.
- Automatic schema migration across incompatible changes. If v2’s state schema is not a capnp-evolution-compatible superset of v1’s, the service author writes the migration. The kernel does not.
- System-wide registry of upgradable services. The supervisor knows what it spawned; there is no ambient discovery.
Phased Implementation
- CapRetarget primitive. Kernel operation +
ProcessControlcap. Useful immediately for stateless services (Case 1) and as the foundation of Fault Containment (respawn with a new process, point its caps to a fresh instance). - Upgradable interface. Schema, contract documentation, and a
Rust helper in
capos-rtthat services derive. - Graceful drain. Quiesce + in-flight call completion + v1 exit synchronization.
- Stateful demo. A service maintaining session state, upgraded live with zero session loss. This is the Live Upgrade observable milestone.
Related Work
- Erlang/OTP
code_change/3is the closest prior art: processes upgrade their behavior module in place, with a callback to migrate state. capOS differs only in that state transport goes through capnp rather than Erlang term format, and that the process boundary is an OS process rather than a BEAM process. - Fuchsia component updates rebind component instances in the routing graph. Similar primitive in a different mechanism.
- nginx
-s reloadis graceful restart for request/response servers. The design here generalizes it by exposing the state migration point explicitly rather than relying on “the session is the request.”