Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Genode OS Framework: Research Report for capOS

Research on Genode’s capability-based component framework, session routing, VFS architecture, and POSIX compatibility – with lessons for capOS.

1. Capability-Based Component Framework

Core Abstraction: RPC Objects

Genode’s fundamental abstraction is the RPC object. Every service in the system is implemented as an RPC object that can be invoked by clients holding a capability to it. The capability is an unforgeable reference – a kernel- protected token that names a specific RPC object and grants the holder the right to invoke its methods.

Genode supports multiple microkernels (NOVA, seL4, Fiasco.OC, a custom base-hw kernel). The capability model is consistent across all of them, though the kernel-level implementation details differ. The framework abstracts kernel capabilities into its own uniform model.

Key properties of Genode capabilities:

  • Unforgeable. A capability can only be obtained by delegation from a holder or creation by the kernel. There is no mechanism to synthesize a capability from an integer or address.
  • Typed. Each capability refers to an RPC object with a specific interface. The C++ type system enforces interface contracts at compile time.
  • Delegatable. A capability holder can pass it to another component via RPC arguments, allowing authority to flow through the system graph.
  • Revocable. Capabilities can be revoked (invalidated). When an RPC object is destroyed, all capabilities pointing to it become invalid.

Capability Types in Genode

Genode distinguishes several kinds of capabilities based on what they refer to:

  1. Session capabilities. The most common type. A session capability refers to a service session – an ongoing relationship between a client and a server. Example: a Log_session capability lets a client write log messages to a specific log session on a LOG server.

  2. Parent capability. Every component holds an implicit capability to its parent. This is the channel through which it requests resources and sessions. The parent capability is never explicitly passed – it’s built into the component framework.

  3. Dataspace capabilities. Represent shared-memory regions. A Ram_dataspace capability grants access to a specific region of physical memory. Dataspaces are the mechanism for bulk data transfer between components (the RPC path is for small messages and control).

  4. Signal capabilities. Used for asynchronous notifications. A signal source produces signals; holders of the signal capability can register handlers. Signals are Genode’s primary async notification mechanism – they don’t carry data, just wake up the receiver.

Sessions: The Service Contract

A session is the central concept of Genode’s inter-component communication. It represents an established relationship between a client component and a server component, with negotiated resource commitments.

Session lifecycle:

  1. Request. A client asks its parent to create a session of a specific type (e.g., Gui::Session, File_system::Session, Nic::Session). The request includes a label string and optional session arguments.

  2. Routing. The parent routes the session request according to its policy (see Section 2). The request may traverse multiple levels of the component tree.

  3. Creation. The server creates a session object, allocates resources for it (e.g., a shared-memory buffer), and returns a session capability to the client.

  4. Use. The client invokes RPC methods on the session capability. The server handles the calls. Both sides can use shared dataspaces for bulk data.

  5. Close. Either side can close the session. Resources committed to the session are released back.

This model is fundamentally different from Unix IPC (anonymous pipes/sockets). Every session is:

  • Typed – the interface is known at compile time.
  • Named – sessions carry a label used for routing and policy.
  • Resource-accounted – the client explicitly donates RAM to the server via a “session quota” to fund the server-side state for this session. This prevents denial-of-service through resource exhaustion.

Resource Trading

Genode’s resource model is unique and worth studying closely. Resources (primarily RAM) flow through the component tree:

  • The kernel grants a fixed RAM budget to core (the root component).
  • Core grants budgets to its children (typically just init).
  • Init grants budgets to its children according to the deployment config.
  • Each component can donate RAM to servers when opening sessions.

The session_quota mechanism works as follows: when a client opens a session, it specifies how much RAM it donates. This RAM transfer goes from the client’s budget to the server’s budget. The server uses this donated RAM to allocate server-side state for the session. When the session closes, the RAM flows back.

This creates a closed accounting system:

  • No component can use more RAM than it was granted.
  • Servers don’t need their own large budgets – clients fund their sessions.
  • Resource exhaustion is contained: a misbehaving client can only exhaust its own budget, not the server’s.

Capability Invocation vs. Delegation

Genode distinguishes two fundamental operations on capabilities:

Invocation: calling an RPC method on the capability. The caller sends a message to the RPC object named by the capability, the server processes it and returns a result. This is synchronous in Genode – the caller blocks until the server replies. (Asynchronous interaction uses signals and shared memory.)

Delegation: passing a capability as an argument in an RPC call. When a capability appears as a parameter or return value, the kernel transfers the capability reference to the receiving component. The receiver now holds an independent reference to the same RPC object. This is how authority propagates through the system.

Example: when a client opens a File_system::Session, the session creation returns a session capability. If the file system server needs to allocate memory, it calls back to the client’s RAM service using a RAM capability that was delegated during session setup.

Capabilities in Genode RPC are transferred by the kernel during the IPC operation – the framework marshals them into a special “capability argument” slot in the IPC message, and the kernel copies the capability reference into the receiver’s capability space. This is transparent to application code: capabilities appear as typed C++ objects in the RPC interface.

2. Session Routing

The Problem Session Routing Solves

In a traditional OS, services are found via well-known names in a global namespace (D-Bus addresses, socket paths, service names). This creates ambient authority – any process can connect to any service if it knows the name.

Genode has no global service namespace. A component can only obtain sessions through its parent. The parent decides which server to route each session request to. This means:

  • Service visibility is controlled structurally.
  • A component can only reach services its parent explicitly allows.
  • Different children of the same parent can be routed to different servers for the same service type.

Parent-Child Relationship

Every Genode component (except core) has exactly one parent. The parent:

  1. Created the child (spawned it with an initial set of resources).
  2. Intercepts all session requests from the child.
  3. Routes requests according to its routing policy.
  4. Can deny requests entirely (the child gets an error).

This creates a tree structure where authority flows downward. A child cannot bypass its parent to reach a service the parent didn’t approve.

Init’s Routing Configuration

The init process (Genode’s init) reads an XML configuration that specifies which services to start and how to route their session requests. This is the core of system policy.

A minimal init config:

<config>
  <parent-provides>
    <service name="LOG"/>
    <service name="ROM"/>
    <service name="CPU"/>
    <service name="RAM"/>
    <service name="PD"/>
  </parent-provides>

  <start name="timer">
    <resource name="RAM" quantum="1M"/>
    <provides> <service name="Timer"/> </provides>
    <route>
      <service name="ROM"> <parent/> </service>
      <service name="LOG"> <parent/> </service>
      <service name="CPU"> <parent/> </service>
      <service name="RAM"> <parent/> </service>
      <service name="PD">  <parent/> </service>
    </route>
  </start>

  <start name="test-log">
    <resource name="RAM" quantum="1M"/>
    <route>
      <service name="Timer"> <child name="timer"/> </service>
      <service name="LOG">   <parent/> </service>
      <!-- remaining services routed to parent by default -->
      <any-service> <parent/> </any-service>
    </route>
  </start>
</config>

Key routing directives:

  • <parent/> – route to the parent (upward delegation).
  • <child name="x"/> – route to a specific child (sibling routing).
  • <any-child/> – route to any child that provides the service.
  • <any-service> – catch-all for unspecified service types.

Label-Based Routing

Labels are strings attached to session requests. They carry context about who is requesting and what they want, enabling fine-grained routing decisions.

When a client requests a session, it attaches a label. As the request traverses the routing tree, each intermediate component (typically init) can prepend its own label. By the time the request reaches the server, the label encodes the full path through the component tree.

Example: a component named my-app inside an init subsystem named apps requests a File_system session with label "data". The composed label arriving at the file system server is: "apps -> my-app -> data".

The server can use this label for:

  • Access control. Grant different permissions based on who is asking.
  • Isolation. Store data in different directories per client.
  • Logging. Identify which component generated a message.

Label-based routing in init config:

<start name="fs">
  <provides> <service name="File_system"/> </provides>
  <route> ... </route>
</start>

<start name="app-a">
  <route>
    <service name="File_system" label="data">
      <child name="fs"/>
    </service>
    <service name="File_system" label="config">
      <child name="config-fs"/>
    </service>
  </route>
</start>

Here, app-a’s file system requests are split: requests labeled "data" go to one server, requests labeled "config" go to another. The application code is unchanged – the routing is entirely a deployment decision.

Routing as Policy

The critical insight is that routing IS access control. There is no separate permission system. If a component’s route config doesn’t include a path to a network service, that component has no network access – period. It cannot discover the network service because it has no way to name it.

This replaces:

  • Firewall rules (routing controls which network services are reachable)
  • File permissions (routing controls which file system sessions are available)
  • Process isolation policies (routing controls everything)

The routing configuration is equivalent to a whitelist of allowed service connections for each component. Adding or removing access means editing the init config, not modifying the component’s code or the server’s access control lists.

Dynamic Routing and Sculpt

In the static case (Genode’s test scenarios), routing is defined once in init’s config. In Sculpt OS (Section 6), the routing configuration can be modified at runtime, allowing users to install applications and connect them to services dynamically.

3. VFS on Top of Capabilities

The VFS Layer

Genode’s VFS (Virtual File System) is a library-level abstraction, not a kernel feature. It provides a path-based file-like interface implemented as a plugin architecture within a component’s address space.

The VFS exists because many existing applications (and libc) expect file-like access patterns. Rather than forcing all code to use Genode’s native session/capability model, the VFS provides a translation layer.

Architecture:

Application code
  |
  |  POSIX: open(), read(), write()
  v
libc (Genode's port of FreeBSD libc)
  |
  |  VFS API: vfs_open(), vfs_read(), vfs_write()
  v
VFS library (in-process)
  |
  |  Plugin dispatch based on mount point
  v
VFS plugins (in-process)
  |
  +--> ram_fs plugin (in-memory file system)
  +--> <fs> plugin (delegates to File_system session)
  +--> <terminal> plugin (delegates to Terminal session)
  +--> <log> plugin (delegates to LOG session)
  +--> <nic> plugin (delegates to Nic session, for socket layer)
  +--> <block> plugin (delegates to Block session)
  +--> <dir> plugin (combines subtrees)
  +--> <tar> plugin (read-only tar archive)
  +--> <import> plugin (populate from ROM)
  +--> <pipe> plugin (in-process pipe pair)
  +--> <rtc> plugin (system clock)
  +--> <zero> plugin (/dev/zero equivalent)
  +--> <null> plugin (/dev/null equivalent)
  ...

VFS Plugin Architecture

Each VFS plugin is a dynamically loadable library (or statically linked module) that implements a file-system-like interface. Plugins handle:

  • open/close – create/destroy file handles
  • read/write – data transfer
  • stat – metadata queries
  • readdir – directory enumeration
  • ioctl – device-specific control (limited)

Plugins are composed by the VFS configuration, which is XML embedded in the component’s config:

<config>
  <vfs>
    <dir name="dev">
      <log/>
      <null/>
      <zero/>
      <terminal name="stdin" label="input"/>
      <inline name="rtc">2024-01-01 00:00</inline>
    </dir>
    <dir name="tmp"> <ram/> </dir>
    <dir name="data"> <fs label="persistent"/> </dir>
    <dir name="socket"> <lxip dhcp="yes"/> </dir>
  </vfs>
  <libc stdout="/dev/log" stderr="/dev/log" stdin="/dev/stdin"
        rtc="/dev/rtc" socket="/socket"/>
</config>

This config creates a virtual filesystem tree:

  • /dev/log – writes go to the LOG session
  • /dev/null, /dev/zero – standard synthetic files
  • /dev/stdin – reads from a Terminal session
  • /tmp/ – in-memory filesystem (RAM-backed)
  • /data/ – delegates to a File_system session labeled “persistent”
  • /socket/ – network sockets via lwIP stack (in-process)

The <fs> plugin is the bridge from VFS to Genode’s capability world. When the application does open("/data/foo.txt"), the <fs> plugin translates this into a File_system::Session RPC call to the external file system server that the component’s routing connects to.

File System Components

Genode has several file system server components:

  • ram_fs – in-memory file system server. Multiple components can share files through it by routing their File_system sessions to it.
  • vfs_server (previously vfs) – a file system server backed by the VFS plugin architecture itself. This enables recursive composition: a VFS server can mount another VFS server.
  • fatfs – FAT file system driver over a Block session.
  • ext2_fs – ext2/3/4 via a ported Linux implementation (rump kernel).
  • store_fs / recall_fs – content-hash-based storage (experimental in some Genode releases).

The file system server is a regular Genode component. It receives a Block session (from a block device driver), provides File_system sessions, and the routing determines who can access what:

block_driver -> provides Block session
       |
       v
fatfs -> consumes Block session, provides File_system session
       |
       v
application -> consumes File_system session via VFS <fs> plugin

Libc Integration

Genode ports a substantial subset of FreeBSD’s libc. The integration point is the VFS: libc’s file operations are implemented by calling the VFS layer, which dispatches to plugins, which invoke Genode sessions as needed.

The libc port modifies FreeBSD libc minimally. Most changes are in the “backend” layer that replaces kernel syscalls with VFS calls:

  • open() -> vfs_open() -> VFS plugin dispatch
  • read() -> vfs_read() -> VFS plugin
  • socket() -> via VFS socket plugin (<lxip> or <lwip>)
  • mmap() -> supported for anonymous mappings and file-backed read-only
  • fork() -> NOT supported (no fork() in Genode)
  • exec() -> NOT supported (no in-place process replacement)
  • pthreads -> supported via Genode’s Thread API
  • select()/poll() -> supported via VFS notification mechanism
  • signal() -> partial support (SIGCHLD, basic signal delivery)

The key architectural decision: libc talks to the VFS library (in-process), the VFS talks to Genode sessions (cross-process RPC). Application code never directly touches Genode capabilities – the VFS mediates everything.

4. POSIX Compatibility

The Noux Approach (Historical)

Genode’s early POSIX approach was Noux, a process runtime that emulated Unix-like process semantics (fork, exec, pipe) on top of Genode. Noux ran as a single Genode component containing multiple “Noux processes” that shared an address space but had separate VFS views.

Noux supported:

  • fork() via copy-on-write within the Noux address space
  • exec() via in-place program replacement
  • pipe() for inter-process communication
  • A shared file system namespace

Noux was eventually deprecated because:

  1. It conflated multiple processes in one address space, undermining Genode’s isolation model.
  2. Fork emulation was fragile and slow.
  3. The libc-based VFS approach (Section 3) achieved better compatibility with less complexity.

Current Approach: libc + VFS

The current POSIX compatibility strategy:

  1. FreeBSD libc port. Provides standard C library functions. Modified to use Genode’s VFS instead of kernel syscalls.

  2. VFS plugins as POSIX backends. Each POSIX I/O pattern maps to a VFS plugin:

    • File I/O -> <fs> plugin -> File_system session
    • Sockets -> <lxip> or <lwip> plugin -> Nic session (in-process TCP/IP stack)
    • Terminal I/O -> <terminal> plugin -> Terminal session
    • Device access -> custom VFS plugins
  3. No fork(). The most significant POSIX omission. Programs that require fork() must be modified to use posix_spawn() or Genode’s native child-spawning mechanism. In practice, many programs use fork() only for daemon patterns or subprocess creation, and can be adapted.

  4. No exec(). Related to no fork(): there’s no in-place process replacement. New processes are created as new Genode components.

  5. Signals. Basic support – enough for SIGCHLD notification and simple signal handling. Complex signal semantics (real-time signals, signal-driven I/O) are not supported.

  6. pthreads. Fully supported via Genode’s native threading.

  7. mmap. Anonymous mappings and read-only file-backed mappings work. MAP_SHARED with write semantics is limited.

What Works in Practice

Genode has successfully ported:

  • Qt5/Qt6 – the full widget toolkit, including QtWebEngine (Chromium). This is the basis of Sculpt’s GUI.
  • VirtualBox – full x86 virtualization (runs Windows, Linux guests).
  • Mesa/Gallium – GPU-accelerated 3D graphics.
  • curl, wget, fetchmail – network utilities.
  • GCC toolchain – compiler, assembler, linker running on Genode.
  • bash – with limitations (no job control via signals, no fork-heavy patterns). Works for simple scripting.
  • vim, nano – terminal editors.
  • OpenSSL/LibreSSL – cryptographic libraries.
  • Various system utilities – ls, cp, rm, etc. via Coreutils port.

Applications that don’t port well:

  • Anything deeply dependent on fork+exec patterns (e.g., traditional Unix shells for complex scripting).
  • Programs relying on procfs, sysfs, or Linux-specific interfaces.
  • Daemons using inotify or Linux-specific async I/O.
  • Programs that assume global file system namespace visibility.

Practical Porting Effort

For most POSIX applications, porting involves:

  1. Build the application using Genode’s ports system (downloads upstream source, applies patches, builds with Genode’s toolchain).
  2. Write a VFS configuration that provides the file-like resources the application expects.
  3. Write a routing configuration that connects the application to required services.
  4. Patch fork() calls if present (usually replacing with posix_spawn() or restructuring to avoid subprocess creation).

The VFS configuration is where the “impedance mismatch” between POSIX expectations and Genode capabilities is resolved. The application thinks it’s accessing /etc/resolv.conf – the VFS plugin infrastructure translates this to capability-mediated access.

5. Component Architecture

Core, Init, and User Components

Core (or base-hw/base-nova/etc.): the lowest-level component, running directly on the microkernel. Core provides the fundamental services: RAM allocation, CPU time (PD sessions), ROM access (boot modules), IRQ delivery, and I/O memory access. Core is the only component with direct hardware access. Everything else goes through core.

Init: the first user-level component, child of core. Init reads its XML configuration and manages the component tree. Init’s responsibilities:

  • Parse <start> entries and spawn components.
  • Route session requests between components according to <route> rules.
  • Manage component lifecycle (restart policies, resource reclamation).
  • Propagate configuration changes (dynamic reconfiguration in Sculpt).

User components: all other components. They can be:

  • Servers that provide sessions (drivers, file systems, network stacks).
  • Clients that consume sessions (applications).
  • Both simultaneously (a network stack consumes NIC sessions and provides socket-level sessions).
  • Sub-inits – components that run their own init-like management for a subtree of components.

Resource Trading in Practice

Resources in Genode flow through the tree. A concrete example:

  1. Core has 256 MB RAM total.
  2. Core grants 250 MB to init, keeps 6 MB for kernel structures.
  3. Init grants 10 MB to the timer driver, 50 MB to the GUI subsystem, 20 MB to the network subsystem, 5 MB to a log server.
  4. When the GUI subsystem starts a framebuffer driver, it donates 8 MB from its 50 MB budget to the driver as a session quota.
  5. The framebuffer driver uses this donated RAM for the frame buffer allocation.

If the GUI subsystem wants more RAM for a new application, it can reclaim RAM by closing sessions (getting donated RAM back) or requesting more from its parent (init).

The accounting is strict: at any point, the sum of all RAM budgets across all components equals the total system RAM. There is no over-commit. This prevents the “OOM killer” problem – each component knows exactly how much RAM it can use.

Practical Component Patterns

Driver components follow a common pattern:

  • Receive: Platform session (for I/O port/memory access), IRQ session
  • Provide: A device-specific session (NIC, Block, GPU, Audio, etc.)
  • Stateless: all per-client state funded by session quota

Multiplexer components:

  • Receive: one instance of a service
  • Provide: multiple instances to clients
  • Example: NIC router receives one NIC session, provides multiple sessions with packet routing between clients

Proxy components:

  • Forward one session type, possibly filtering or transforming
  • Example: nic_bridge, nitpicker (GUI multiplexer), VFS server

Subsystem inits:

  • A component running its own init for a group of related components
  • Isolates the subtree: crash of the subsystem doesn’t affect siblings
  • Example: Sculpt’s drivers subsystem, network subsystem

6. Sculpt OS

What Sculpt Demonstrates

Sculpt OS is Genode’s demonstration desktop operating system. It turns the component framework into a usable system where:

  • Users install and run applications at runtime.
  • Each application runs in its own isolated component with explicitly configured capabilities.
  • A GUI lets users connect applications to services (routing).
  • The entire system is reconfigurable without reboot.

Architecture

Sculpt’s component tree:

core
  |
  init
    |
    +--> drivers subsystem (sub-init)
    |      +--> platform_drv (PCI, IOMMU)
    |      +--> fb_drv (framebuffer)
    |      +--> usb_drv (USB host controller)
    |      +--> wifi_drv (wireless)
    |      +--> ahci_drv (SATA)
    |      +--> nvme_drv (NVMe)
    |      +--> ...
    |
    +--> runtime subsystem (sub-init, user-managed)
    |      +--> (user-installed applications)
    |
    +--> leitzentrale (management GUI)
    |      +--> system shell
    |      +--> config editor
    |
    +--> nitpicker (GUI multiplexer)
    +--> nic_router (network multiplexer)
    +--> ram_fs (shared file system)
    +--> ...

User Experience of Capabilities

In Sculpt, installing an application means:

  1. Download the package (a Genode component archive).
  2. Edit a “deploy” configuration that specifies which services the application can access (routing rules).
  3. The runtime subsystem spawns the component with the specified routing.

A text editor gets: File_system session (to read/write files), GUI session (for display), Terminal session (optionally). It does NOT get: network access, block device access, or access to other applications’ file systems.

A web browser gets: GUI session, Nic session (for network), GPU session (for rendering), File_system session (for downloads). Each service connection is an explicit choice.

The deploy config is the security policy. A user can see exactly what authority each application has, and can change it by editing the config.

Lessons from Sculpt

  1. Capabilities need a management UI. Raw capability graphs are incomprehensible to users. Sculpt provides a GUI that presents service connections in an understandable way (though it’s still oriented toward power users).

  2. Routing is the killer feature. Being able to route the same session type to different servers for different clients is extremely powerful. One application’s “file system” is local storage; another’s is a network share – same code, different routing.

  3. Sub-inits provide failure isolation. The drivers subsystem can crash and restart without affecting applications. Sculpt’s robustness comes from this hierarchical isolation.

  4. Dynamic reconfiguration is essential. A static boot config (like capOS’s current manifest) is fine for servers and embedded systems, but a general-purpose OS needs to add/remove/reconfigure components at runtime.

  5. Package management is a routing problem. Installing an application in Sculpt is not “copy binary to disk” – it’s “add a component to the runtime subsystem with specific routing rules.” The binary is almost secondary to the routing.

  6. POSIX compat through VFS works. Sculpt runs real desktop applications (Qt-based apps, VirtualBox, web browser) using the VFS-mediated POSIX layer. The capability model doesn’t prevent running complex existing software – it just requires explicit service configuration.

7. Relevance to capOS

VFS Capability Design

Genode’s approach: The VFS is an in-process library with a plugin architecture. It mediates between libc/POSIX and Genode sessions. The VFS configuration is per-component XML.

Lessons for capOS:

  1. Don’t put the VFS in the kernel. Genode’s VFS is entirely userspace, which is correct for a capability OS. capOS should do the same – the VFS is a library linked into processes that need POSIX compatibility, not a kernel subsystem.

  2. Plugin model maps well to Cap’n Proto. Each Genode VFS plugin bridges to a specific session type. In capOS, each VFS “backend” would bridge to a specific capability interface:

    Genode VFS plugincapOS VFS backend
    <fs> -> File_system sessionFsBackend -> Namespace + Store caps
    <terminal> -> Terminal sessionTerminalBackend -> Console cap
    <lxip> -> Nic sessionNetBackend -> TcpSocket/UdpSocket caps
    <log> -> LOG sessionLogBackend -> Console cap
    <ram> -> in-process RAMRamBackend -> in-process (no cap needed)
  3. VFS config should be declarative. Rather than hardcoding mount points, capOS processes using libcapos-posix should receive a VFS mount table as part of their initial capability set. This could be a Cap’n Proto struct:

    struct VfsMountTable {
        mounts @0 :List(VfsMount);
    }
    
    struct VfsMount {
        path @0 :Text;           # mount point, e.g. "/data"
        union {
            namespace @1 :Void;  # use the Namespace cap named in capName
            console @2 :Void;    # use a Console cap
            ram @3 :Void;        # in-memory filesystem
            socket @4 :Void;     # socket interface
        }
        capName @5 :Text;        # name of the cap in CapSet backing this mount
    }
    

    This separates the VFS topology (a deployment decision) from the application code (which just calls open()).

  4. Genode’s <fs> plugin is the key analog. capOS’s Namespace capability is equivalent to Genode’s File_system session. The libcapos-posix path resolution layer (open() -> namespace.resolve()) is exactly Genode’s <fs> VFS plugin. The existing capOS design in docs/proposals/userspace-binaries-proposal.md is already on the right track.

  5. Consider streaming for large files. Genode uses shared-memory dataspaces for bulk data transfer in file system sessions. capOS’s current Store interface returns Data (a capnp blob), which means the entire object is copied per get() call. For large files, a streaming interface (with a shared-memory buffer and cursor) would be more efficient. This is capOS’s Open Question #4.

Session Routing Patterns

Genode’s approach: XML-configured routing in init, label-based dispatch, parent mediates all session requests.

Lessons for capOS:

  1. The manifest IS the routing config. capOS’s SystemManifest with structured CapRef source entries such as { service = { service = "net-stack", export = "nic" } } is functionally equivalent to Genode’s init routing config. The capOS design already handles the static case well.

  2. Label-based routing is valuable. Genode’s ability to route different requests from the same client to different servers (based on labels) maps directly to capOS’s capability naming. capOS already does this implicitly – a process can receive separate Namespace caps for “config” and “data”. The key insight is that this should be a deployment-time decision, not an application-time decision.

  3. Consider dynamic routing. capOS’s current manifest is static (baked into the ISO). For a more flexible system, init should support runtime reconfiguration:

    • Reload the manifest from a Store cap.
    • Add/remove services without reboot.
    • Re-route sessions when services restart.

    Genode achieves this via init’s config ROM, which can be updated at runtime. capOS could achieve it by having init watch a Namespace cap for manifest updates.

  4. Parent-mediated routing has costs. In Genode, every session request traverses the component tree. This adds latency and complexity. capOS’s direct capability passing (a process holds a cap directly, not through its parent) avoids this overhead. The tradeoff: capOS has less runtime control over routing (once a cap is passed, the parent can’t intercept invocations on it).

    This is a deliberate design choice. capOS favors direct caps (lower overhead, simpler) over proxied caps (more control). Genode’s session routing is powerful but adds a layer of indirection that may not be worth it for capOS’s use case.

  5. Service export needs a protocol. Genode’s session model has server components explicitly announce what services they provide. capOS’s ProcessHandle.exported() mechanism serves the same purpose. The manifest’s exports field pre-declares what a service will export, which helps init plan the dependency graph before spawning anything.

POSIX Compatibility Without Compromising Capabilities

Genode’s approach: libc port + VFS + per-component VFS config. No global namespace. No fork(). Applications see a curated file tree, not the real system.

Lessons for capOS:

  1. The VFS is a capability adapter, not a capability. The VFS library runs inside the process that needs POSIX compatibility. It doesn’t weaken the capability model because it can only access capabilities the process was granted. This matches capOS’s libcapos-posix design exactly.

  2. musl over FreeBSD libc. Genode uses FreeBSD libc because of its clean backend interface. capOS plans to use musl, which has an even cleaner __syscall() interface. This is a good choice. Genode’s experience shows that the libc implementation matters less than the VFS/backend layer quality.

  3. No fork() is fine. Genode has operated without fork() for over 15 years and runs complex software (Qt, VirtualBox, Chromium). The applications that truly need fork() are rare and usually need only posix_spawn() semantics. capOS should not attempt to implement fork() – focus on posix_spawn() backed by ProcessSpawner cap.

  4. Sockets via in-process TCP/IP stack. Genode’s <lxip> VFS plugin runs an lwIP TCP/IP stack inside the application process, using the NIC session for raw packet I/O. This avoids the overhead of routing every socket call through a separate network stack component.

    capOS could offer a similar choice:

    • Out-of-process: socket calls go to the network stack component via TcpSocket/UdpSocket caps (safer, more isolated, more overhead).
    • In-process: an lwIP/smoltcp library runs inside the application, consuming a raw Nic cap (less isolation, less overhead, more authority).

    For most applications, out-of-process sockets via caps are fine. For high-performance networking (database, web server), an in-process stack over a raw NIC cap may be needed.

  5. select/poll/epoll need async caps. Genode implements select/poll via VFS notifications (signals on file readiness). capOS needs the async capability rings (io_uring-inspired) from Stage 4 before select/poll can work. This is a natural fit: each polled fd maps to a pending capability invocation in the completion ring.

Component Patterns for Cap’n Proto Interfaces

Genode’s patterns and their capOS/Cap’n Proto equivalents:

  1. Session creation = factory method on a capability.

    Genode: client requests a Nic::Session from its parent, which routes to a NIC driver server.

    capOS: client holds a NetworkManager cap and calls create_tcp_socket() to get a TcpSocket cap. The factory pattern is the same, but capOS does it via direct cap invocation instead of parent-mediated session requests.

    Cap’n Proto naturally supports this via interfaces that return interfaces:

    interface NetworkManager {
        createTcpSocket @0 () -> (socket :TcpSocket);
        createUdpSocket @1 () -> (socket :UdpSocket);
        createTcpListener @2 (addr :IpAddress, port :UInt16)
            -> (listener :TcpListener);
    }
    
  2. Resource quotas in session creation.

    Genode: session requests include a RAM quota donated from client to server.

    capOS should consider this pattern. Currently, capOS processes receive a FrameAllocator cap for memory. If a server needs to allocate memory per-client, the client should fund it. Cap’n Proto schema could encode this:

    interface FileSystem {
        open @0 (path :Text, bufferPages :UInt32)
            -> (file :File);
        # bufferPages: number of pages the client donates for
        # server-side buffering. Server allocates from a shared
        # FrameAllocator or the client passes frames explicitly.
    }
    

    This prevents the denial-of-service problem where a client opens many sessions, exhausting the server’s memory.

  3. Multiplexer components.

    Genode: nic_router takes one NIC session, provides many. nitpicker takes one framebuffer, provides many GUI sessions.

    capOS equivalent: a process that consumes a Nic cap and provides multiple TcpSocket/UdpSocket caps. This is already what the network stack component does in capOS’s service architecture proposal. Cap’n Proto’s interface model makes this natural – the multiplexer implements one interface (NetworkManager) using another (Nic).

  4. Attenuation = capability narrowing.

    Genode: servers can return restricted capabilities (e.g., a read-only file handle from a read-write file system session).

    capOS: already planned via Fetch -> HttpEndpoint narrowing, Store -> read-only Store, Namespace -> scoped Namespace. The pattern is sound. Cap’n Proto interfaces make the attenuation explicit in the schema.

  5. Dataspace pattern for bulk data.

    Genode uses shared-memory dataspaces for efficient bulk transfer (file contents, network packets, framebuffers). The RPC path carries only small control messages and capability references.

    capOS currently moves Cap’n Proto control messages through capability rings and bounded kernel scratch, with no zero-copy bulk-data object yet. For bulk data, capOS should add a SharedBuffer capability:

    interface SharedBuffer {
        # Map a shared memory region into caller's address space
        map @0 () -> (addr :UInt64, size :UInt64);
        # Notify that data has been written to the buffer
        signal @1 (offset :UInt64, length :UInt64) -> ();
    }
    

    File system and network operations would use SharedBuffer for data transfer and capability invocations for control, matching Genode’s split between RPC and dataspaces.

  6. Sub-init pattern for failure domains.

    Genode: a sub-init manages a subtree of components. If the subtree crashes, only the sub-init restarts it.

    capOS: a supervisor process (not necessarily init) holds a ProcessSpawner cap and manages a group of services. This is already described in the service architecture proposal’s supervision tree. The key addition from Genode: make sub- supervisors a first-class pattern with their own manifest fragments, not just ad-hoc supervision loops.

Summary of Key Takeaways for capOS

AreaGenode approachcapOS adaptation
Capability modelKernel-enforced caps to RPC objectsKernel-enforced caps to Cap’n Proto objects (aligned)
Service discoveryParent-mediated session routingManifest-driven cap passing at spawn (simpler, less dynamic)
VFSIn-process library with plugin architecturelibcapos-posix with mount table from CapSet (same pattern)
POSIXFreeBSD libc + VFS backendsmusl + libcapos-posix backends (same architecture)
fork()Not supportedNot supported (use posix_spawn -> ProcessSpawner)
Bulk dataShared-memory dataspacesSharedBuffer design exists; implementation pending
Resource accountingSession quotas (RAM donated per session)Authority-accounting design exists; unified ledgers pending
Routing labelsString labels on session requests, routed by initCap naming in manifest serves same purpose
Dynamic reconfigInit config ROM updated at runtimeManifest reload via Store cap (future)
Failure isolationSub-inits as failure domainsSupervisor processes (same concept, different mechanism)
Async notificationSignal capabilitiesAsync cap rings / io_uring model (more general)

Top Recommendations

  1. Add session quotas / resource trading. This is the most important Genode pattern capOS hasn’t adopted yet. Without it, a malicious client can exhaust a server’s memory by opening many capability sessions. Design resource donation into the Cap’n Proto schema for session-creating interfaces.

  2. Design a SharedBuffer capability. Copying capnp messages through the kernel works for control messages but not for bulk data. A shared-memory mechanism (like Genode’s dataspaces) is essential for file I/O, networking, and GPU rendering.

  3. Keep VFS as a library, not a service. Genode’s in-process VFS is the right pattern. capOS’s libcapos-posix should work the same way – a library that translates POSIX calls to capability invocations within the process. No VFS server component needed (though a file system server implementing the Namespace/Store interface is separate).

  4. Add a declarative VFS mount table to process init. Each POSIX-compat process should receive a mount table (as a capnp struct) that maps paths to capabilities. This separates deployment policy from application code, matching Genode’s per-component VFS config.

  5. Plan for dynamic reconfiguration. The static manifest is fine for now, but Sculpt shows that a usable capability OS needs runtime service management. Design init so it can accept manifest updates through a cap, not just from the boot image.

  6. Don’t over-engineer routing. Genode’s parent-mediated session routing is powerful but complex. capOS’s direct capability passing is simpler and sufficient for most use cases. Add proxy/mediator patterns only when specific needs arise (e.g., capability revocation, load balancing).

References

  • Genode Foundations book (genode.org/documentation/genode-foundations/) – the authoritative source for architecture, session model, routing, VFS, and component composition.
  • Norman Feske, “Genode Operating System Framework” (2008-2025) – release notes and design documentation at genode.org.
  • Sculpt OS documentation at genode.org/download/sculpt – practical deployment of the capability model.
  • Genode source repository: github.com/genodelabs/genode – reference implementations of VFS plugins, file system servers, libc port.