Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Capability Model

How capabilities work in capOS.

Status: Partially implemented. Generation-tagged cap tables, typed schema interface IDs, manifest/CapSet grants, badges, transport-level release, and Endpoint copy/move transfer are implemented. Revocation propagation, persistence, and bulk-data capabilities remain future work.

What is a Capability

A capability in capOS is a reference to a kernel object that carries:

  • An interface (what methods can be called), defined by a Cap’n Proto schema
  • A permission (the object it references, enforced by the kernel)
  • A wire format (Cap’n Proto serialized messages for all invocations)

A process can only access a resource if it holds a capability to it. There is no ambient authority – no global namespace, no “open by path” syscall, no implicit resource access.

Schema as Contract

Capability interfaces are defined in .capnp schema files under schema/. The schema is the canonical interface definition. Currently defined:

interface Console {
    write @0 (data :Data) -> ();
    writeLine @1 (text :Text) -> ();
}

interface FrameAllocator {
    allocFrame @0 () -> (physAddr :UInt64);
    freeFrame @1 (physAddr :UInt64) -> ();
    allocContiguous @2 (count :UInt32) -> (physAddr :UInt64);
}

interface VirtualMemory {
    map @0 (hint :UInt64, size :UInt64, prot :UInt32) -> (addr :UInt64);
    unmap @1 (addr :UInt64, size :UInt64) -> ();
    protect @2 (addr :UInt64, size :UInt64, prot :UInt32) -> ();
}

interface Endpoint {}

interface ProcessSpawner {
    spawn @0 (name :Text, binaryName :Text, grants :List(CapGrant)) -> (handleIndex :UInt16);
}

interface ProcessHandle {
    wait @0 () -> (exitCode :Int64);
}

interface BootPackage {
    manifestSize @0 () -> (size :UInt64);
    readManifest @1 (offset :UInt64, maxBytes :UInt32) -> (data :Data);
}

# Management-only introspection. Ordinary handle release uses the system
# transport opcode CAP_OP_RELEASE, not a method here.
interface CapabilityManager {
    list @0 () -> (capabilities :List(CapabilityInfo));
    # grant is planned for Stage 6 (IPC and Capability Transfer)
}

Each interface has a unique 64-bit TYPE_ID generated by the Cap’n Proto compiler. TYPE_ID is the schema constant. interface_id is the runtime metadata used by CapSet/bootstrap descriptions and endpoint delivery headers. Method dispatch uses the interface assigned to the capability entry plus method_id; method_id selects a method inside that schema.

This is not capability identity. A CapId is the authority-bearing handle in a process table, analogous to an fd. Multiple capabilities can expose the same interface:

  • cap_id=3 -> serial-backed Console
  • cap_id=4 -> log-buffer-backed Console
  • cap_id=5 -> Console proxy served by another process

All three use the same Console TYPE_ID, but they are different objects with different authority. The manifest/CapSet should record the expected schema TYPE_ID as interface metadata for typed handle construction. Normal CALL SQEs do not need to repeat it because the kernel or serving transport can derive it from the target capability entry. CapSqe keeps reserved tail padding for ABI stability.

The kernel exposes the initial CapSet to each process as a read-only 4 KiB page mapped at capos_config::capset::CAPSET_VADDR and passes its address in RDX to _start. The page starts with a CapSetHeader { magic, version, count } and is followed by CapSetEntry { cap_id, name_len, interface_id, name: [u8; 32] } records in manifest declaration order. Userspace looks up caps by the manifest name rather than by numeric index (capos_config::capset::find), so grants can be reordered in system.cue without breaking clients. The mapping is installed without WRITABLE so userspace cannot mutate its own bootstrap authority map.

Security invariant: a CapTable entry exposes one public interface. If the same backing state must be available through multiple interfaces, mint multiple capability entries, each wrapping the same state with a narrower interface. Do not grant one handle that accepts unrelated interface_id values; that makes hidden authority easy to miss during review.

Invocation Path

Capabilities are invoked via a shared-memory capability ring (io_uring- inspired). Each process has a submission queue (SQ) and completion queue (CQ) mapped into its address space. Two invocation paths exist:

Caller builds capnp params message
    → serialize to bytes (write_message_to_words)
    → write CALL SQE to SQ ring (pure userspace memory write)
    → advance SQ tail
    → caller invokes cap_enter for ordinary capability methods
      (timer polling only runs explicitly interrupt-safe CALL targets)
    → kernel reads SQE, validates user buffers
    → CapTable.call(cap_id, method_id, bytes)
    → kernel writes CQE to CQ ring
    ... caller reads CQE after cap_enter, or spin-polls only for
        interrupt-safe/non-CALL ring work ...
    → caller reads CQE result

CapObject::call does not receive a caller-supplied interface ID. The cap table derives the invoked interface from the target entry before invoking the object. The SQE carries only the capability handle and method ID because each capability entry owns one public interface:

#![allow(unused)]
fn main() {
pub trait CapObject: Send + Sync {
    fn interface_id(&self) -> u64;
    fn label(&self) -> &str;
    fn call(
        &self,
        method_id: u16,
        params: &[u8],
        result: &mut [u8],
        reply_scratch: &mut dyn ReplyScratch,
    ) -> capnp::Result<CapInvokeResult>;
}
}

All communication goes through serialized capnp messages, even when caller and callee are in the same address space. This ensures the wire format is always exercised and makes the transition to cross-address-space IPC seamless.

The result buffer is supplied by the caller (the user-validated SQE result region). Implementations serialize directly into it and return the number of bytes written, so the kernel’s dispatch path does not allocate an intermediate Vec<u8> per invocation.

Capability Table

Each process has its own capability table (CapTable), created at process startup. The kernel also maintains a global table (KERNEL_CAPS) for kernel-internal use. Each table maps a CapId (u32) to a boxed CapObject.

CapId encoding: [generation:8 | index:24]. The generation counter increments when a slot is freed, so stale CapIds (from a previous occupant of the slot) are rejected with CapError::StaleGeneration rather than accidentally referring to a different capability.

Operations:

  • insert(obj) – register a new capability, returns its CapId
  • get(id) – look up a capability by ID (validates generation)
  • remove(id) – revoke a capability, bumps slot generation
  • call(id, method_id, params) – dispatch a method call against the interface assigned to the capability entry

Each service receives capabilities from cap::create_all_service_caps(), which runs a two-pass resolution over the whole manifest: pass 1 materializes each service’s kernel-sourced caps as Arc<dyn CapObject> and records its declared exports; pass 2 assembles each service’s CapTable in declaration order, cloning the exported Arc when another service’s CapRef resolves via CapSource::Service. Declaration order is preserved because numeric CapIds are assigned by insertion order and smoke tests depend on specific indices. CapRef.source is a structured capnp union, not an authority string:

struct CapRef {
    name @0 :Text;
    expectedInterfaceId @1 :UInt64;
    union {
        unset @2 :Void; # invalid; keeps omitted sources fail-closed
        kernel @3 :KernelCapSource;
        service @4 :ServiceCapSource;
    }
}

enum KernelCapSource {
    console @0;
    endpoint @1;
    frameAllocator @2;
    virtualMemory @3;
}

struct ServiceCapSource {
    service @0 :Text;
    export @1 :Text;
}

The source selector chooses the object or authority to grant. The expectedInterfaceId value is a schema compatibility check against the constructed object, not the authority selector itself. This distinction matters because different objects can implement the same interface.

Transport-Level Capability Lifetime

Cap’n Proto applications do not usually model capability lifetime as an application method on every interface. The RPC transport owns capability reference bookkeeping.

The standard Cap’n Proto RPC protocol is stateful per connection. Each side keeps four tables: questions, answers, imports, and exports. Import/export IDs are connection-local, not global object names. When an exported capability is sent over the connection, the export reference count is incremented. When the importing side drops its last local reference, the transport sends Release to decrement the remote export count. Implementations may batch these releases. If the connection is lost, in-flight questions fail, imports become broken, and exports/answers are implicitly released. Persistent capabilities, when implemented, are a separate SturdyRef mechanism and should not be treated as owned pointers.

References:

This distinction matters for capOS:

  • close() is application protocol. A File.close() method can flush dirty state, commit metadata, or tell a server that a session should end.
  • Release / cap drop is transport protocol. It removes one reference from the caller’s local capability namespace and eventually lets the serving side reclaim the object if no references remain.
  • Process exit is bulk transport cleanup. Dropping the process must release all caps in its table, cancel pending calls, and wake peers waiting on those calls.

capOS therefore needs a system transport layer in the userspace runtime (capos-rt / later language runtimes), not just raw SQE helpers. That transport should own typed client handles, local reference counts, promise-pipelined answers, and broken-cap state. When the last local handle is dropped, it should submit a transport-level release operation to the kernel ring.

Ordinary handle release is a transport concern, not an application method. The target design: the generated client drops the last local handle (RAII / GC / finalizer), the runtime transport submits the CAP_OP_RELEASE ring opcode, and the kernel removes the caller’s CapTable slot with mutable access to that table. Encoding release as a regular method call on CapabilityManager was rejected because it would mutate the same table used to dispatch the call; CapabilityManager is therefore management-only (list(), later grant()), not the default release path. CAP_OP_FINISH remains reserved in the same transport opcode namespace for application-level “end of work” signals that the transport must deliver reliably, so the kernel can tell them apart from a truly malformed opcode.

Current status: the kernel dispatches CAP_OP_RELEASE as a local cap-table slot removal and fails closed for stale or non-owned cap IDs. capos-rt bootstrap handles remain explicitly non-owning, while adopted owned handles queue CAP_OP_RELEASE on final drop. Result-cap adoption validates the kernel-supplied interface ID before producing an owned typed handle. CAP_OP_FINISH remains reserved and returns CAP_ERR_UNSUPPORTED_OPCODE. Process exit remains the fallback cleanup path for unreleased local slots.

Access Control: Interfaces, Not Rights Bitmasks

capOS deliberately does not use a rights bitmask (READ/WRITE/EXECUTE) on capability entries, despite this being standard in Zircon and seL4. The reason is that Cap’n Proto typed interfaces already serve as the access control mechanism, and a parallel rights system creates an impedance mismatch.

Why rights bitmasks exist in other systems: Zircon and seL4 use rights because their syscall interfaces are untyped – a handle is an opaque reference to a kernel object, and the kernel needs something to decide which fixed syscalls are allowed. capOS has typed interfaces where the .capnp schema defines exactly what methods exist.

capOS’s approach: the interface IS the permission. To restrict what a caller can do, grant a narrower capability:

  • Fetch (full HTTP) → HttpEndpoint (scoped to one origin)
  • Store (read-write) → Store wrapper that rejects write methods
  • Namespace (full) → Namespace scoped to a prefix

The “restricted” capability is a different CapObject implementation that wraps the original. The kernel doesn’t know or care – it dispatches to whatever CapObject is in the slot. Attenuation is userspace/schema logic, not a kernel mechanism.

When transfer control is needed (Stage 6): meta-rights for the capability itself (can it be transferred? duplicated?) may be added as a small bitmask. These are about the reference, not the referenced object, and don’t overlap with interface-level method access control.

See research.md for the cross-system analysis that led to this decision (§1 Capability Table Design).

Planned Enhancements (from research)

Tracked in ROADMAP.md Stages 5-6:

  • Badge (from seL4) – u64 value per capability entry, delivered to the server on invocation. Implemented for manifest cap refs, IPC transfer, and ProcessSpawner endpoint-client minting so servers can distinguish callers without separate capability objects per client.
  • Epoch (from EROS) – per-object revocation epoch. Incrementing the epoch invalidates all outstanding references. O(1) revoke, O(1) check.

Current Limitations

  • Blocking wait exists, but waits are still process-level. cap_enter(min_complete, timeout_ns) processes pending SQEs and can block the current process until enough CQEs exist or a finite timeout expires. It is not yet a general futex/thread wait primitive; in-process threading and futex-shaped measurements are tracked separately.
  • No persistence. Capabilities exist only at runtime.
  • Capability transfer is implemented for Endpoint CALL/RECV/RETURN. Transfer descriptors on the capability ring let callers and receivers copy or move transferable local caps through IPC messages. See storage-and-naming-proposal.md “IPC and Capability Transfer” for the full design.
  • Transfer ABI (3.6.0 draft). Sideband transfer descriptors are defined in capos-config/src/ring.rs as CapTransferDescriptor:
    • cap_id is the sender-side local capability-table handle.
    • transfer_mode is either CAP_TRANSFER_MODE_COPY or CAP_TRANSFER_MODE_MOVE.
    • xfer_cap_count in CapSqe is the descriptor count.
    • For CALL/RETURN, descriptors are packed at addr + len after the payload bytes and must be aligned to CAP_TRANSFER_DESCRIPTOR_ALIGNMENT.
    • Result-cap insertion semantics are defined by CapCqe: result reports normal payload bytes, while cap_count reports how many CapTransferResult { cap_id, interface_id } records were appended immediately after those payload bytes in result_addr when CAP_CQE_TRANSFER_RESULT_CAPS is set. User space must bound-check result + cap_count * CAP_TRANSFER_RESULT_SIZE against its requested result_len.
    • Transfer-bearing SQEs are fail-closed:
      • unsupported-by-kernel-transfer path: CAP_ERR_TRANSFER_NOT_SUPPORTED (until sideband transfer is enabled),
      • malformed descriptor metadata (invalid mode, reserved bits, non-zero _reserved0, misalignment, overflow): CAP_ERR_INVALID_TRANSFER_DESCRIPTOR,
      • all other reserved-field misuse remains CAP_ERR_INVALID_REQUEST.
  • No revocation propagation. Removing a table entry doesn’t invalidate copies or derived capabilities. Epoch-based revocation is planned.
  • No bulk data path. All data goes through capnp message copy. SharedBuffer / MemoryObject capability needed for file I/O, networking, GPU data plane. See storage-and-naming-proposal.md “Shared Memory for Bulk Data” for the interface design.

Future Directions

  • Capability transfer. Cross-process capability calls already go through the kernel via Endpoint objects with RECV/RETURN SQE opcodes on the existing per-process capability ring (no new syscalls). The remaining transfer work will carry capability references with sideband descriptors and install result caps in the receiver’s local table. See storage-and-naming-proposal.md for how this enables Directory.open() returning File caps, Namespace.sub() returning scoped Namespace caps, etc.
  • Persistence. Serialize capability state to storage using capnp format. Restore capabilities across reboots.
  • Network transparency. Forward capability calls to remote machines using the same capnp wire format. A remote Console capability looks identical to a local one.