Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Fuchsia Zircon Kernel: Research Report for capOS

Research into Zircon’s design for informing capOS capability model, IPC, virtual memory, async I/O, and interface definition decisions.

1. Handle-Based Capability Model

Overview

Zircon implements capabilities as handles. A handle is a process-local integer (similar to a Unix file descriptor) that references a kernel object and carries a bitmask of rights. The kernel maintains a per-process handle table that maps handle values to (kernel_object_pointer, rights) pairs. Processes can only interact with kernel objects through handles they hold.

There is no ambient authority in Zircon. A process cannot address kernel objects by name, path, or global ID – it must possess a handle. The initial set of handles is passed to a process at creation time by its parent (or by the component framework).

Handle Representation

Internally, a handle is:

  • A process-local 32-bit integer (the “handle value”). The low two bits encode a generation counter to detect use-after-close.
  • A reference to a kernel object (refcounted Dispatcher in Zircon’s C++).
  • A rights bitmask (zx_rights_t, a uint32_t).

The handle table is per-process, so handle value 0x1234 in process A and 0x1234 in process B refer to completely different objects (or nothing).

Rights

Rights are a bitmask that constrain what operations a handle can perform. Key rights include:

RightMeaning
ZX_RIGHT_DUPLICATECan be duplicated via zx_handle_duplicate()
ZX_RIGHT_TRANSFERCan be sent through a channel
ZX_RIGHT_READCan read data (channel messages, VMO bytes)
ZX_RIGHT_WRITECan write data
ZX_RIGHT_EXECUTEVMO can be mapped as executable
ZX_RIGHT_MAPVMO can be mapped into a VMAR
ZX_RIGHT_GET_PROPERTYCan query object properties
ZX_RIGHT_SET_PROPERTYCan modify object properties
ZX_RIGHT_SIGNALCan set user signals on the object
ZX_RIGHT_WAITCan wait on the object’s signals
ZX_RIGHT_MANAGE_PROCESSCan perform management ops on a process
ZX_RIGHT_MANAGE_THREADCan manage threads

When a syscall is invoked on a handle, the kernel checks that the handle’s rights include the rights required by that syscall. For example, zx_channel_write() requires ZX_RIGHT_WRITE on the channel handle.

Rights can only be reduced, never amplified. zx_handle_duplicate() takes a rights mask and the new handle gets original_rights & requested_rights.

Handle Lifecycle

Creation: Syscalls that create kernel objects return handles. For example, zx_channel_create() returns two handles (one for each endpoint). zx_vmo_create() returns a VMO handle. The initial rights are defined per object type (e.g., a new channel endpoint gets READ|WRITE|TRANSFER|DUPLICATE|SIGNAL|WAIT).

Duplication: zx_handle_duplicate(handle, rights) -> new_handle. Creates a second handle to the same kernel object, possibly with reduced rights. The original is untouched. Requires ZX_RIGHT_DUPLICATE on the source handle.

Transfer: Handles are transferred through channels. When a message is written to a channel, handles listed in the message are moved from the sender’s handle table to a transient state inside the channel message. When the message is read, those handles are installed into the receiver’s handle table with new handle values. The original handle values in the sender become invalid. Transfer requires ZX_RIGHT_TRANSFER on each handle being sent.

Replacement: zx_handle_replace(handle, rights) -> new_handle. Atomically invalidates the old handle and creates a new one with the specified rights (must be a subset). This avoids a window where two handles exist simultaneously (unlike duplicate-then-close). Useful for reducing rights before transferring.

Closing: zx_handle_close(handle). Removes the handle from the process’s table and decrements the kernel object’s refcount. When the last handle to an object is closed, the object is destroyed (with some exceptions like the kernel itself keeping references).

Comparison to capOS

capOS’s current CapTable maps CapId (u32) to an Arc<dyn CapObject>. The shared Arc lets a single kernel capability (for example, a kernel:endpoint owned by one service and referenced by another through CapSource::Service) back multiple per-process CapTable slots for cross-process IPC. This is conceptually similar to Zircon’s handle table, but with key differences:

AspectZirconcapOS (current)
RightsBitmask per handleNone (all-or-nothing)
Object typesFixed kernel types (Channel, VMO, etc.)Extensible via CapObject trait
TransferMove semantics through channelsCopy/move descriptors through Endpoint IPC
DuplicationExplicit with rights reductionCopy transfer for transferable holds
RevocationClose handle; object dies with last refRemove from table; no propagation
InterfaceFixed syscall per object typeCap’n Proto method dispatch
Generation counterLow bits of handle valueUpper bits of CapId

Recommendations for capOS:

  1. Keep method authority in typed interfaces for now. Zircon’s rights bitmask is useful for an untyped syscall surface. capOS currently uses narrow Cap’n Proto interfaces plus hold-edge transfer metadata; generic READ/WRITE flags would duplicate schema-level authority unless a concrete cross-interface need appears.

  2. Handle generation counters. Implemented: capOS encodes a generation tag in the upper bits of CapId, with lower bits selecting the table slot. This catches stale CapId use after slot reuse.

  3. Move semantics for transfer. Implemented for Endpoint CALL/RETURN sideband descriptors. Copy transfer remains explicit and requires a transferable source hold.

  4. replace operation. An atomic replace (invalidate old, create new with reduced rights) is cleaner than duplicate-then-close for rights attenuation before transfer.

2. Channels

Overview

Zircon channels are the fundamental IPC primitive. A channel is a bidirectional, asynchronous message-passing pipe with two endpoints. Each endpoint is a separate kernel object referenced by a handle.

Creation and Structure

zx_channel_create(options, &handle0, &handle1) creates a channel and returns handles to both endpoints. Each endpoint can be independently transferred to different processes. When one endpoint is closed, the other becomes “peer-closed” (signaled with ZX_CHANNEL_PEER_CLOSED).

Message Format

A channel message consists of:

  • Data: Up to 65,536 bytes (64 KiB) of arbitrary byte payload.
  • Handles: Up to 64 handles transferred with the message.

Messages are discrete and ordered (FIFO). There is no streaming or partial reads – you read a complete message or nothing.

Write and Read Syscalls

Write: zx_channel_write(handle, options, bytes, num_bytes, handles, num_handles)

  • Copies bytes into the kernel message queue.
  • Moves each handle in the handles array from the caller’s handle table into the message. If any handle is invalid or lacks ZX_RIGHT_TRANSFER, the entire write fails and no handles are moved.
  • The write is non-blocking. If the peer has been closed, returns ZX_ERR_PEER_CLOSED.

Read: zx_channel_read(handle, options, bytes, handles, num_bytes, num_handles, actual_bytes, actual_handles)

  • Dequeues the next message. Copies data into bytes, installs handles into the caller’s handle table, writing new handle values into the handles array.
  • If the buffer is too small, returns ZX_ERR_BUFFER_TOO_SMALL and fills actual_bytes/actual_handles so the caller can retry with a larger buffer.
  • Non-blocking by default.

zx_channel_call: A synchronous call primitive. Writes a message to the channel, then blocks waiting for a reply with a matching transaction ID. This is the primary mechanism for client-server RPC. The kernel optimizes this path to avoid unnecessary scheduling: if the server thread is waiting to read, the kernel can directly switch to it (similar to L4 IPC optimizations).

Handle Transfer Mechanics

When handles are sent through a channel:

  1. The kernel validates all handles (exist, have TRANSFER right).
  2. Handles are atomically removed from the sender’s table.
  3. Handle objects are stored inside the kernel message structure.
  4. On read, handles are inserted into the receiver’s table with fresh handle values.
  5. If the channel is destroyed with unread messages containing handles, those handles are closed (objects’ refcounts decremented).

This is critical: handle transfer is move, not copy. The sender loses the handle. To keep a copy, the sender must duplicate before sending.

Signals

Each channel endpoint has associated signals:

  • ZX_CHANNEL_READABLE – at least one message is queued.
  • ZX_CHANNEL_PEER_CLOSED – the other endpoint was closed.

Processes can wait on these signals using zx_object_wait_one(), zx_object_wait_many(), or by binding to a port (see Section 4).

FIDL Relationship

Channels carry raw bytes + handles. FIDL (Section 5) provides the structured protocol layer on top: it defines how bytes are laid out (message header with transaction ID, ordinal, flags; then the payload) and how handles in the message correspond to protocol-level concepts (client endpoints, server endpoints, VMOs, etc.).

Every FIDL protocol communication happens over a channel. A FIDL “client end” is a channel endpoint handle where the client sends requests and reads responses. A “server end” is the other endpoint where the server reads requests and sends responses.

Comparison to capOS

capOS currently uses shared submission/completion rings with Endpoint objects for cross-process CALL/RECV/RETURN routing. Same-process capabilities dispatch directly through the holder’s table; cross-process Endpoint calls queue to the server ring and can trigger a direct IPC handoff when the receiver is blocked.

AspectZircon ChannelscapOS
TopologyPoint-to-point, 2 endpointsEndpoint-routed capability calls
AsyncNon-blocking read/write + signal waitsShared SQ/CQ rings
Handle/cap transferEmbedded in messagesSideband transfer descriptors
Message formatRaw bytes + handlesCap’n Proto serialized
Size limits64 KiB data, 64 handles64 KiB params (current limit)
BufferingKernel-side message queueEndpoint queues plus per-process rings

Recommendations for capOS:

  1. Capability transfer alongside capnp messages. Zircon embeds handles as out-of-band data alongside message bytes. capOS has adopted the same separation with ring sideband transfer descriptors and result-cap records. That keeps the kernel from parsing arbitrary Cap’n Proto payload graphs.

  2. Two-endpoint channels vs. Endpoint calls. Zircon’s channels are general-purpose pipes. capOS uses a lighter Endpoint CALL/RECV/RETURN model where a capability invocation is routed to the serving process rather than requiring a channel object per connection.

  3. Message size limits. Zircon’s 64 KiB limit has been a pain point (large data must go through VMOs). capOS’s capnp messages naturally handle this because large data can be a separate VMO-like capability referenced in the message. Keep the per-message limit reasonable (64 KiB is a good default) and use capability references for bulk data.

3. VMARs and VMOs

Virtual Memory Objects (VMOs)

A VMO is a kernel object representing a contiguous region of virtual memory that can be mapped into address spaces. VMOs are the fundamental unit of memory in Zircon.

Types:

  • Paged VMO: Backed by the page fault handler. Pages are allocated on demand. This is the default.
  • Physical VMO: Backed by a specific contiguous range of physical memory. Used for device MMIO.
  • Contiguous VMO: Like a paged VMO but guarantees physically contiguous pages. Used for DMA.

Key operations:

  • zx_vmo_create(size, options) -> handle: Create a paged VMO.
  • zx_vmo_read(handle, buffer, offset, length): Read bytes from a VMO.
  • zx_vmo_write(handle, buffer, offset, length): Write bytes to a VMO.
  • zx_vmo_get_size() / zx_vmo_set_size(): Query/resize.
  • zx_vmo_op_range(): Operations like commit (force-allocate pages), decommit (release pages back to system), cache ops.

VMOs can be read/written directly via syscalls without mapping them. This is useful for small transfers but less efficient than mapping for large data.

Copy-on-Write (CoW) Cloning

zx_vmo_create_child(handle, options, offset, size) -> child_handle

Creates a child VMO that is a CoW clone of a range within the parent. Several clone types exist:

  • Snapshot (ZX_VMO_CHILD_SNAPSHOT): Point-in-time snapshot. Both parent and child see CoW pages. Writes to either side trigger page copies. The child is fully independent after creation – closing the parent does not affect committed pages in the child.

  • Slice (ZX_VMO_CHILD_SLICE): A window into the parent. No CoW – writes to the slice are visible through the parent and vice versa. The child cannot outlive the parent.

  • Snapshot-at-least-on-write (ZX_VMO_CHILD_SNAPSHOT_AT_LEAST_ON_WRITE): Like snapshot but allows the implementation to share unchanged pages between parent and child more aggressively (pages only diverge when written).

CoW cloning is central to how Fuchsia implements fork()-like semantics for memory (though Fuchsia doesn’t have fork()) and how it shares immutable data (e.g., shared libraries are CoW-cloned VMOs).

Virtual Memory Address Regions (VMARs)

A VMAR represents a contiguous range of virtual address space within a process. VMARs form a tree rooted at the process’s root VMAR, which covers the entire user-accessible address space.

Hierarchy:

Root VMAR (entire user address space)
  +-- Sub-VMAR A (e.g., 0x1000..0x10000)
  |     +-- Mapping of VMO X at offset 0x1000
  |     +-- Sub-VMAR B (0x5000..0x8000)
  |           +-- Mapping of VMO Y at offset 0x5000
  +-- Sub-VMAR C (0x20000..0x30000)
        +-- Mapping of VMO Z at offset 0x20000

Key operations:

  • zx_vmar_map(vmar, options, offset, vmo, vmo_offset, len) -> addr: Map a VMO (or a range of it) into the VMAR at a specific offset or let the kernel choose (ASLR).
  • zx_vmar_unmap(vmar, addr, len): Remove a mapping.
  • zx_vmar_protect(vmar, options, addr, len): Change permissions (read/write/execute) on a mapped range.
  • zx_vmar_allocate(vmar, options, offset, size) -> child_vmar, addr: Create a sub-VMAR.
  • zx_vmar_destroy(vmar): Recursively unmap everything and destroy all sub-VMARs. Prevents new mappings.

ASLR: Zircon implements address space layout randomization through VMARs. When ZX_VM_OFFSET_IS_UPPER_LIMIT or no specific offset is given, the kernel randomizes placement within the VMAR.

Permissions: Mapping permissions (R/W/X) are constrained by the VMO handle’s rights. A VMO handle without ZX_RIGHT_EXECUTE cannot be mapped as executable, regardless of what the zx_vmar_map() call requests.

Why VMARs Matter

VMARs provide:

  1. Sandboxing within a process. A component can be given a sub-VMAR handle instead of the root VMAR, limiting where it can map memory.
  2. Hierarchical cleanup. Destroying a VMAR recursively unmaps everything beneath it.
  3. Controlled mapping. The parent decides the address space layout for child components by allocating sub-VMARs and passing only sub-VMAR handles.

Comparison to capOS

capOS currently has AddressSpace plus a VirtualMemory capability for anonymous map/unmap/protect operations. FrameAllocator returns typed MemoryObject ownership caps rather than raw physical frame grants, but MemoryObject does not yet provide mapping, cloning, or zero-copy sharing.

AspectZirconcapOS (current)
Memory objectsVMO (paged, physical, contiguous)Owned MemoryObject caps plus anonymous VirtualMemory mappings
CoWVMO child clones (snapshot, slice)Not implemented
Address spaceVMAR treeFlat AddressSpace plus VirtualMemory cap
SharingMap same VMO in multiple processesNot implemented
PermissionsPer-mapping + per-handle rightsPer-page flags at mapping time

Recommendations for capOS:

  1. VMO-equivalent capability. A “MemoryObject” capability that represents a range of memory (backed by demand-paging or physical pages). This becomes the unit of sharing: pass a MemoryObject cap through IPC, and the receiver maps it into their address space. Define it in schema/capos.capnp.

  2. Sub-VMAR capabilities for sandboxing. When spawning a process, instead of granting access to the full address space, grant a sub-region capability. This limits where the process can map memory.

  3. CoW cloning is valuable but not urgent. The primary use case (shared libraries, fork) may not apply to capOS’s early stages. Design the VMO interface to support cloning later.

  4. VMO read/write without mapping. Zircon allows reading/writing VMO contents via syscall without mapping. This is useful for small IPC data and avoids TLB pressure. Consider supporting this in capOS’s MemoryObject.

4. Async Model (Ports)

Overview

Zircon’s async I/O model is built around ports – kernel objects that receive event packets. A port is similar to Linux’s epoll but with important differences. It is the foundation for all async programming in Fuchsia.

Port Basics

A port is a kernel object with a queue of packets (zx_port_packet_t). Packets arrive either from signal-based waits or from direct user queuing.

Key operations:

  • zx_port_create(options) -> handle: Create a port.
  • zx_port_wait(port, deadline) -> packet: Dequeue the next packet, blocking until one is available or the deadline expires.
  • zx_port_queue(port, packet): Manually enqueue a user packet.
  • zx_port_cancel(port, source, key): Cancel pending waits.

Signal-Based Async (Object Wait Async)

zx_object_wait_async(object, port, key, signals, options):

This is the primary mechanism. It tells the kernel: “when object has any of these signals asserted, deliver a packet to port with this key.”

Two modes:

  • One-shot (ZX_WAIT_ASYNC_ONCE): The wait fires once and is automatically removed. The user must re-register after handling.
  • Edge-triggered (ZX_WAIT_ASYNC_EDGE): Fires each time a signal transitions from deasserted to asserted. Stays registered.

Packet Format

typedef struct zx_port_packet {
    uint64_t key;              // User-defined key (set during wait_async)
    uint32_t type;             // ZX_PKT_TYPE_SIGNAL_ONE, ZX_PKT_TYPE_USER, etc.
    zx_status_t status;        // Result status
    union {
        zx_packet_signal_t signal;   // Which signals triggered
        zx_packet_user_t user;       // User-queued packet payload (32 bytes)
        zx_packet_guest_bell_t guest_bell;
        // ... other packet types
    };
} zx_port_packet_t;

The signal variant includes trigger (which signals were waited on), observed (current signal state), and a count (for edge-triggered, how many transitions).

Async Dispatching (libasync)

Fuchsia’s userspace async library (libfidl, async-loop) provides a higher-level event loop:

  1. async::Loop: An event loop that owns a port and dispatches events to registered handlers.
  2. async::Wait: Wraps zx_object_wait_async() with a callback. When the signal fires, the loop calls the handler.
  3. async::Task: Runs a closure on the loop’s dispatcher.
  4. FIDL bindings: The async FIDL bindings register channel-readable waits on the loop’s port. When a message arrives, the FIDL dispatcher decodes it and calls the appropriate protocol method handler.

The typical pattern:

loop = async::Loop()
loop.port -> zx_port_create()

// Register interest in channel readability
zx_object_wait_async(channel, loop.port, key, ZX_CHANNEL_READABLE)

// Event loop
while True:
    packet = zx_port_wait(loop.port)
    handler = lookup(packet.key)
    handler(packet)
    // Re-register if one-shot

Comparison to Linux io_uring

AspectZircon PortsLinux io_uring
ModelEvent notification (signals)Operation submission/completion
SubmissionNo SQ; operations are separate syscallsSQ ring: batch operations
CompletionPort packet queueCQ ring in shared memory
Kernel transitionsOne per wait_async + one per port_waitOne per io_uring_enter (batched)
Memory sharingNo shared ring buffersSQ/CQ are mmap’d shared memory
Zero-copyNot for port packetsRegistered buffers, fixed files
BatchingNo inherent batchingCore design: submit N ops, one syscall
ChainingNot supportedSQE linking (sequential/parallel)
ScopeSignal notification onlyFull I/O operations (read, write, send, recv, fsync, …)

Key differences:

  1. Ports are notification-based; io_uring is operation-based. A port tells you “something happened” (a signal was asserted), then you do separate syscalls to act on it (read the channel, accept the socket, etc.). io_uring lets you submit the actual I/O operation and the kernel does it asynchronously, returning the result in the completion ring.

  2. io_uring avoids syscalls for submission. The submission queue is shared memory – userspace writes SQEs and the kernel reads them without a syscall (in polling mode) or with a single io_uring_enter() for a batch of operations. Ports require a syscall per wait_async registration.

  3. io_uring supports chaining. SQE linking allows dependent operations (e.g., “read from file, then write to socket”) without returning to userspace between steps.

  4. Ports are simpler. The signal model is straightforward and composes well with Zircon’s object model. io_uring’s complexity (dozens of opcodes, registered buffers, fixed files, kernel-side polling) is much higher.

Performance Tradeoffs

Ports:

  • Pro: Simple, well-integrated with kernel object model, easy to reason about.
  • Con: Extra syscalls per operation (wait_async to register, port_wait to receive, then the actual operation syscall). At least 3 syscalls per async operation.

io_uring:

  • Pro: Can batch many operations in a single syscall. Shared-memory rings avoid copies. Kernel-side polling can eliminate syscalls entirely.
  • Con: Complex API surface, security attack surface (many kernel bugs have been in io_uring), complex state management.

Comparison to capOS’s Planned Async Rings

capOS plans io_uring-inspired capability rings: an SQ where userspace submits capnp-serialized capability invocations and a CQ where the kernel posts completions.

AspectZircon PortscapOS Planned Rings
SubmissionSeparate syscallsSQ in shared memory
CompletionPort packet queue (kernel-owned)CQ in shared memory
Operation scopeSignal notification onlyFull capability invocations
BatchingNoneNatural (fill SQ, single syscall)
Wire formatFixed packet structCap’n Proto messages

Recommendations for capOS:

  1. The io_uring model is better than ports for capOS’s use case. Since every operation in capOS is a capability invocation (not just a signal notification), putting the full operation in the submission ring eliminates the extra round-trip that ports require. This is the right choice.

  2. Keep a signal/notification mechanism too. Even with async rings, capOS needs a way to wait for events (e.g., “data available on this channel”, “process exited”). Consider a simple signal/wait mechanism alongside the capability rings – perhaps signal delivery goes through the CQ as a special completion type.

  3. Study io_uring’s SQE linking. Chaining dependent capability calls (e.g., “read from FileStore, then write to Console”) without returning to userspace is powerful. This maps naturally to Cap’n Proto promise pipelining: “call method A on cap X, then call method B on the result’s capability” – the kernel can chain these internally.

  4. Registered/fixed capabilities. io_uring has “fixed files” (registered fd set for faster lookup). capOS could have a “hot set” of capabilities pinned in the SQ context for faster dispatch (avoid per-call table lookup).

  5. Completion ordering. io_uring completions can arrive out of order. capOS’s CQ should also support out-of-order completion (each SQE has a user_data tag echoed in the CQE) to enable true async pipelining.

5. FIDL (Fuchsia Interface Definition Language)

Overview

FIDL is Fuchsia’s IDL for defining protocols that communicate over channels. It serves a similar role to Cap’n Proto schemas in capOS: defining the contract between client and server.

FIDL vs. Cap’n Proto: Schema Language

FIDL example:

library fuchsia.example;

type Color = strict enum : uint32 {
    RED = 1;
    GREEN = 2;
    BLUE = 3;
};

protocol Painter {
    SetColor(struct { color Color; }) -> ();
    DrawLine(struct { x0 float32; y0 float32; x1 float32; y1 float32; }) -> ();
    -> OnPaintComplete(struct { num_pixels uint64; });
};

Equivalent Cap’n Proto:

enum Color { red @0; green @1; blue @2; }

interface Painter {
    setColor @0 (color :Color) -> ();
    drawLine @1 (x0 :Float32, y0 :Float32, x1 :Float32, y1 :Float32) -> ();
}

Key differences in the schema language:

FeatureFIDLCap’n Proto
Unionsflexible union, strict unionAnonymous unions in structs
Enumsstrict enum, flexible enumenum (always strict)
Optionalitybox<T>, nullable typesDefault values, union with Void
Evolutionflexible keyword for forward compatField numbering, @N ordinals
Tablestable (like protobuf, sparse)struct with default values
Events-> EventName(...) server-sentNo built-in events
Error syntax-> () error uint32Must encode in return struct
Capability typesclient_end:P, server_end:Pinterface P as field type

FIDL’s table type is analogous to Cap’n Proto structs in terms of evolvability (can add fields without breaking), but Cap’n Proto structs are more compact on the wire (fixed-size inline section + pointers) while FIDL tables use an envelope-based encoding.

Wire Format Comparison

FIDL wire format:

  • Little-endian, 8-byte aligned.
  • Messages have a 16-byte header: txid (4 bytes), flags (3 bytes), magic byte (0x01), ordinal (8 bytes).
  • Structs are laid out inline with natural alignment and explicit padding.
  • Out-of-line data (strings, vectors, tables) uses offset-based indirection via “envelopes” (inline 8-byte entry: 4 bytes num_bytes, 2 bytes num_handles, 2 bytes flags).
  • Handles are out-of-band. The wire format contains ZX_HANDLE_PRESENT (0xFFFFFFFF) or ZX_HANDLE_ABSENT (0x00000000) markers where handles appear. The actual handles are in the channel message’s handle array, consumed in order of appearance in the linearized message.
  • Encoding is done into a contiguous byte buffer + a separate handle array, matching the channel write API.
  • No pointer arithmetic. FIDL v2 uses a “depth-first traversal order” encoding where out-of-line objects are laid out sequentially. Offsets are not stored; the decoder walks the type schema to find boundaries.

Cap’n Proto wire format:

  • Little-endian, 8-byte aligned (word-based).
  • Messages have a segment table header listing segment sizes.
  • Structs have a fixed data section + pointer section. Pointers are relative offsets (self-relative, in words).
  • Uses pointer-based random access: can read any field without parsing the entire message.
  • Capabilities are indexed. Cap’n Proto’s RPC protocol assigns capability table indices to interface references in messages. The actual capability (file descriptor, handle, etc.) is transferred out-of-band.
  • Supports multi-segment messages (FIDL is always single-segment).
  • Zero-copy read: can read directly from the wire buffer without deserialization.

Key wire format differences:

PropertyFIDLCap’n Proto
Random accessNo (sequential decode)Yes (pointer-based)
Zero-copy readPartial (decode-on-access for some types)Full (read from buffer)
SegmentsSingle contiguous bufferMulti-segment
PointersImplicit (traversal order)Explicit (relative offsets)
Size overheadSmaller (no pointer words)Larger (pointer section)
Decode costMust validate sequentiallyCan validate lazily
Handle/cap encodingPresence markers + out-of-band arrayCap table indices + out-of-band

FIDL Capability Transfer

FIDL has first-class syntax for capability transfer in protocols:

protocol FileSystem {
    Open(resource struct {
        path string:256;
        flags uint32;
        object server_end:File;
    }) -> ();
};

protocol File {
    Read(struct { count uint64; }) -> (struct { data vector<uint8>:MAX; });
    GetBuffer(struct { flags uint32; }) -> (resource struct { buffer zx.Handle:VMO; });
};
  • server_end:File – a channel endpoint where the server will serve the File protocol. The client creates a channel, keeps the client end, and sends the server end through this call.
  • client_end:File – a channel endpoint for a client of the File protocol.
  • zx.Handle:VMO – a handle to a specific kernel object type (VMO).
  • The resource keyword marks types that contain handles (and thus cannot be copied, only moved).

The FIDL compiler tracks handle ownership: types containing handles are “resource types” with move semantics. This is enforced at the language binding level (e.g., in C++, resource types are move-only; in Rust, they implement Drop but not Clone).

Comparison to capOS’s Cap’n Proto Usage

Cap’n Proto natively supports capability transfer through its interface types:

interface FileSystem {
    open @0 (path :Text, flags :UInt32) -> (file :File);
}

interface File {
    read @0 (count :UInt64) -> (data :Data);
    getBuffer @1 (flags :UInt32) -> (buffer :MemoryObject);
}

In standard Cap’n Proto RPC, file :File in the return type means “a capability to a File interface.” The RPC system assigns a capability table index, transfers it out-of-band, and the receiver gets a live reference to invoke further methods.

Recommendations for capOS:

  1. Use out-of-band capability transfer beside Cap’n Proto payloads. Cap’n Proto RPC has capability descriptors indexed into a capability table, but capOS currently keeps kernel transfer semantics in ring sideband records so the kernel can treat Cap’n Proto payload bytes as opaque. Promise pipelining should build on that sideband result-cap namespace rather than requiring general payload traversal in the kernel.

  2. No need to switch to FIDL. Cap’n Proto’s wire format is superior for capOS’s use case:

    • Random access means runtimes and services can inspect specific fields without full deserialization. The kernel should keep using bounded sideband metadata for transport decisions.
    • Zero-copy read means less allocation in userspace protocol handling.
    • Multi-segment messages allow avoiding large contiguous allocations.
    • Promise pipelining is native to Cap’n Proto RPC, aligning with capOS’s planned async ring chaining.
  3. FIDL’s resource keyword is worth imitating. Mark capnp types that contain capabilities differently from pure-data types. This could be done at the schema level (Cap’n Proto already distinguishes interface fields) or as a convention. This enables the kernel to fast-path messages that contain no capabilities (no need to scan for capability descriptors).

  4. FIDL’s table type for evolution. Cap’n Proto structs already support adding fields, but capOS should be aware that FIDL tables are more explicitly designed for cross-version compatibility. For system interfaces that will evolve, consider using Cap’n Proto groups or designing structs with generous ordinal spacing.

6. Synthesis: Relevance to capOS

Handle Model vs. Typed Capability Dispatch

Zircon’s handle model is untyped at the handle level – a handle is just (object_ref, rights). The type comes from the object. All operations go through fixed syscalls (zx_channel_write, zx_vmo_read, etc.).

capOS’s model is typed at the capability level – each capability implements a Cap’n Proto interface with method dispatch. Operations go through ring SQEs such as CAP_OP_CALL, with Cap’n Proto params and results carried in userspace buffers.

Both are valid. Zircon’s approach is lower overhead (no serialization for simple operations like vmo_read), while capOS’s approach gives uniformity (every operation has the same wire format, enabling persistence and network transparency).

Hybrid recommendation: For performance-critical operations (memory mapping, signal waiting), consider adding “fast-path” syscalls that bypass capnp serialization, similar to how Zircon has dedicated syscalls per object type. The capnp path remains the general mechanism and the “canonical” interface.

Async Rings vs. Ports: The Right Call

capOS’s io_uring-inspired async rings are a better fit than Zircon’s port model for a capability OS:

  1. Ports require separate syscalls for registration, waiting, and the actual operation. Async rings batch everything.
  2. Cap’n Proto’s promise pipelining maps naturally to SQE chaining.
  3. The shared-memory ring design avoids kernel-side queuing overhead.

However, learn from ports:

  • The signal model (each object has a signal set, watchers are notified) is clean and composable. Consider making “wait for signal” a CQ event type.
  • zx_port_queue() (user-initiated packets) is useful for waking up event loops from user code. Support user-initiated CQ entries.

VMO/VMAR vs. capOS Memory Model

capOS should implement VMO-equivalent capabilities after the current Endpoint and transfer baseline:

  • IPC already has shared rings, but bulk data still needs explicit shared memory objects.
  • Capability transfer of memory regions (passing a MemoryObject cap through IPC) is the standard pattern for bulk data transfer.
  • CoW cloning enables efficient process creation.

Proposed capability interfaces:

interface MemoryObject {
    read @0 (offset :UInt64, count :UInt64) -> (data :Data);
    write @1 (offset :UInt64, data :Data) -> ();
    getSize @2 () -> (size :UInt64);
    setSize @3 (size :UInt64) -> ();
    createChild @4 (offset :UInt64, size :UInt64, options :UInt32) -> (child :MemoryObject);
}

interface AddressRegion {
    map @0 (offset :UInt64, vmo :MemoryObject, vmoOffset :UInt64, len :UInt64, flags :UInt32) -> (addr :UInt64);
    unmap @1 (addr :UInt64, len :UInt64) -> ();
    protect @2 (addr :UInt64, len :UInt64, flags :UInt32) -> ();
    allocateSubRegion @3 (offset :UInt64, size :UInt64) -> (region :AddressRegion, addr :UInt64);
}

FIDL vs. Cap’n Proto: Stay with Cap’n Proto

Cap’n Proto is the right choice for capOS. The advantages over FIDL:

  1. Language-independent standard. FIDL is Fuchsia-only. Cap’n Proto has implementations in C++, Rust, Go, Python, Java, etc.
  2. Zero-copy random access. The kernel can inspect message fields without full deserialization.
  3. Promise pipelining. Native to capnp-rpc, enabling the async ring chaining that capOS plans.
  4. Persistence. Cap’n Proto messages are self-describing (with schema) and suitable for on-disk storage – important for capOS’s planned capability persistence.

The one thing FIDL does better: tight integration of handle/capability metadata in the type system (the resource keyword, client_end/server_end syntax, handle type constraints). capOS should ensure its capnp schemas clearly distinguish capability-carrying types and that the kernel enforces capability transfer semantics.

Concrete Action Items for capOS

Ordered by priority and dependency:

  1. Keep typed-interface authority model. Do not add a Zircon-style generic rights bitmask until a concrete method-attenuation need beats narrow wrapper capabilities and transfer-mode metadata.

  2. Handle generation counters. Done: upper bits of CapId detect stale references.

  3. Design MemoryObject/SharedBuffer capability. Define and implement the shared-memory object that replaces raw-frame transfer for bulk IPC.

  4. Design AddressRegion capability (Stage 5). Sub-VMAR-like sandboxing. The root VMAR handle is part of the initial capability set.

  5. Capability transfer sideband. Baseline CALL/RETURN copy and move transfer is implemented; promise-pipelined result-cap mapping still needs a precise rule before pipeline dispatch lands.

  6. Async rings with signal delivery. SQ/CQ capability rings are implemented for transport; notification objects and promise pipelining remain future work.

  7. User-queued CQ entries (with async rings). Allow userspace to post wake-up events to its own CQ, enabling pure-userspace event loop integration.

Appendix: Key Zircon Syscall Reference

For reference, the most architecturally significant Zircon syscalls:

SyscallPurpose
zx_handle_closeClose a handle
zx_handle_duplicateDuplicate with rights reduction
zx_handle_replaceAtomic replace with new rights
zx_channel_createCreate channel pair
zx_channel_readRead message + handles from channel
zx_channel_writeWrite message + handles to channel
zx_channel_callSynchronous write-then-read (RPC)
zx_port_createCreate async port
zx_port_waitWait for next packet
zx_port_queueEnqueue user packet
zx_object_wait_asyncRegister signal wait on port
zx_object_wait_oneSynchronous wait on one object
zx_vmo_createCreate virtual memory object
zx_vmo_read / writeDirect VMO access
zx_vmo_create_childCoW clone
zx_vmar_mapMap VMO into address region
zx_vmar_unmapUnmap
zx_vmar_allocateCreate sub-VMAR
zx_process_createCreate process (with root VMAR)
zx_process_startStart process execution