Fuchsia Zircon Kernel: Research Report for capOS
Research into Zircon’s design for informing capOS capability model, IPC, virtual memory, async I/O, and interface definition decisions.
1. Handle-Based Capability Model
Overview
Zircon implements capabilities as handles. A handle is a process-local integer (similar to a Unix file descriptor) that references a kernel object and carries a bitmask of rights. The kernel maintains a per-process handle table that maps handle values to (kernel_object_pointer, rights) pairs. Processes can only interact with kernel objects through handles they hold.
There is no ambient authority in Zircon. A process cannot address kernel objects by name, path, or global ID – it must possess a handle. The initial set of handles is passed to a process at creation time by its parent (or by the component framework).
Handle Representation
Internally, a handle is:
- A process-local 32-bit integer (the “handle value”). The low two bits encode a generation counter to detect use-after-close.
- A reference to a kernel object (refcounted
Dispatcherin Zircon’s C++). - A rights bitmask (
zx_rights_t, auint32_t).
The handle table is per-process, so handle value 0x1234 in process A and
0x1234 in process B refer to completely different objects (or nothing).
Rights
Rights are a bitmask that constrain what operations a handle can perform. Key rights include:
| Right | Meaning |
|---|---|
ZX_RIGHT_DUPLICATE | Can be duplicated via zx_handle_duplicate() |
ZX_RIGHT_TRANSFER | Can be sent through a channel |
ZX_RIGHT_READ | Can read data (channel messages, VMO bytes) |
ZX_RIGHT_WRITE | Can write data |
ZX_RIGHT_EXECUTE | VMO can be mapped as executable |
ZX_RIGHT_MAP | VMO can be mapped into a VMAR |
ZX_RIGHT_GET_PROPERTY | Can query object properties |
ZX_RIGHT_SET_PROPERTY | Can modify object properties |
ZX_RIGHT_SIGNAL | Can set user signals on the object |
ZX_RIGHT_WAIT | Can wait on the object’s signals |
ZX_RIGHT_MANAGE_PROCESS | Can perform management ops on a process |
ZX_RIGHT_MANAGE_THREAD | Can manage threads |
When a syscall is invoked on a handle, the kernel checks that the handle’s
rights include the rights required by that syscall. For example,
zx_channel_write() requires ZX_RIGHT_WRITE on the channel handle.
Rights can only be reduced, never amplified. zx_handle_duplicate() takes
a rights mask and the new handle gets original_rights & requested_rights.
Handle Lifecycle
Creation: Syscalls that create kernel objects return handles. For example,
zx_channel_create() returns two handles (one for each endpoint).
zx_vmo_create() returns a VMO handle. The initial rights are defined per
object type (e.g., a new channel endpoint gets
READ|WRITE|TRANSFER|DUPLICATE|SIGNAL|WAIT).
Duplication: zx_handle_duplicate(handle, rights) -> new_handle. Creates
a second handle to the same kernel object, possibly with reduced rights. The
original is untouched. Requires ZX_RIGHT_DUPLICATE on the source handle.
Transfer: Handles are transferred through channels. When a message is
written to a channel, handles listed in the message are moved from the
sender’s handle table to a transient state inside the channel message. When the
message is read, those handles are installed into the receiver’s handle table
with new handle values. The original handle values in the sender become invalid.
Transfer requires ZX_RIGHT_TRANSFER on each handle being sent.
Replacement: zx_handle_replace(handle, rights) -> new_handle. Atomically
invalidates the old handle and creates a new one with the specified rights
(must be a subset). This avoids a window where two handles exist simultaneously
(unlike duplicate-then-close). Useful for reducing rights before transferring.
Closing: zx_handle_close(handle). Removes the handle from the process’s
table and decrements the kernel object’s refcount. When the last handle to an
object is closed, the object is destroyed (with some exceptions like the
kernel itself keeping references).
Comparison to capOS
capOS’s current CapTable maps CapId (u32) to an Arc<dyn CapObject>. The
shared Arc lets a single kernel capability (for example, a kernel:endpoint
owned by one service and referenced by another through CapSource::Service)
back multiple per-process CapTable slots for cross-process IPC. This is
conceptually similar to Zircon’s handle table, but with key differences:
| Aspect | Zircon | capOS (current) |
|---|---|---|
| Rights | Bitmask per handle | None (all-or-nothing) |
| Object types | Fixed kernel types (Channel, VMO, etc.) | Extensible via CapObject trait |
| Transfer | Move semantics through channels | Copy/move descriptors through Endpoint IPC |
| Duplication | Explicit with rights reduction | Copy transfer for transferable holds |
| Revocation | Close handle; object dies with last ref | Remove from table; no propagation |
| Interface | Fixed syscall per object type | Cap’n Proto method dispatch |
| Generation counter | Low bits of handle value | Upper bits of CapId |
Recommendations for capOS:
-
Keep method authority in typed interfaces for now. Zircon’s rights bitmask is useful for an untyped syscall surface. capOS currently uses narrow Cap’n Proto interfaces plus hold-edge transfer metadata; generic READ/WRITE flags would duplicate schema-level authority unless a concrete cross-interface need appears.
-
Handle generation counters. Implemented: capOS encodes a generation tag in the upper bits of
CapId, with lower bits selecting the table slot. This catches stale CapId use after slot reuse. -
Move semantics for transfer. Implemented for Endpoint CALL/RETURN sideband descriptors. Copy transfer remains explicit and requires a transferable source hold.
-
replaceoperation. An atomic replace (invalidate old, create new with reduced rights) is cleaner than duplicate-then-close for rights attenuation before transfer.
2. Channels
Overview
Zircon channels are the fundamental IPC primitive. A channel is a bidirectional, asynchronous message-passing pipe with two endpoints. Each endpoint is a separate kernel object referenced by a handle.
Creation and Structure
zx_channel_create(options, &handle0, &handle1) creates a channel and returns
handles to both endpoints. Each endpoint can be independently transferred to
different processes. When one endpoint is closed, the other becomes
“peer-closed” (signaled with ZX_CHANNEL_PEER_CLOSED).
Message Format
A channel message consists of:
- Data: Up to 65,536 bytes (64 KiB) of arbitrary byte payload.
- Handles: Up to 64 handles transferred with the message.
Messages are discrete and ordered (FIFO). There is no streaming or partial reads – you read a complete message or nothing.
Write and Read Syscalls
Write: zx_channel_write(handle, options, bytes, num_bytes, handles, num_handles)
- Copies
bytesinto the kernel message queue. - Moves each handle in the
handlesarray from the caller’s handle table into the message. If any handle is invalid or lacksZX_RIGHT_TRANSFER, the entire write fails and no handles are moved. - The write is non-blocking. If the peer has been closed, returns
ZX_ERR_PEER_CLOSED.
Read: zx_channel_read(handle, options, bytes, handles, num_bytes, num_handles, actual_bytes, actual_handles)
- Dequeues the next message. Copies data into
bytes, installs handles into the caller’s handle table, writing new handle values into thehandlesarray. - If the buffer is too small, returns
ZX_ERR_BUFFER_TOO_SMALLand fillsactual_bytes/actual_handlesso the caller can retry with a larger buffer. - Non-blocking by default.
zx_channel_call: A synchronous call primitive. Writes a message to the
channel, then blocks waiting for a reply with a matching transaction ID. This
is the primary mechanism for client-server RPC. The kernel optimizes this path
to avoid unnecessary scheduling: if the server thread is waiting to read, the
kernel can directly switch to it (similar to L4 IPC optimizations).
Handle Transfer Mechanics
When handles are sent through a channel:
- The kernel validates all handles (exist, have
TRANSFERright). - Handles are atomically removed from the sender’s table.
- Handle objects are stored inside the kernel message structure.
- On read, handles are inserted into the receiver’s table with fresh handle values.
- If the channel is destroyed with unread messages containing handles, those handles are closed (objects’ refcounts decremented).
This is critical: handle transfer is move, not copy. The sender loses the
handle. To keep a copy, the sender must duplicate before sending.
Signals
Each channel endpoint has associated signals:
ZX_CHANNEL_READABLE– at least one message is queued.ZX_CHANNEL_PEER_CLOSED– the other endpoint was closed.
Processes can wait on these signals using zx_object_wait_one(),
zx_object_wait_many(), or by binding to a port (see Section 4).
FIDL Relationship
Channels carry raw bytes + handles. FIDL (Section 5) provides the structured protocol layer on top: it defines how bytes are laid out (message header with transaction ID, ordinal, flags; then the payload) and how handles in the message correspond to protocol-level concepts (client endpoints, server endpoints, VMOs, etc.).
Every FIDL protocol communication happens over a channel. A FIDL “client end” is a channel endpoint handle where the client sends requests and reads responses. A “server end” is the other endpoint where the server reads requests and sends responses.
Comparison to capOS
capOS currently uses shared submission/completion rings with Endpoint objects for cross-process CALL/RECV/RETURN routing. Same-process capabilities dispatch directly through the holder’s table; cross-process Endpoint calls queue to the server ring and can trigger a direct IPC handoff when the receiver is blocked.
| Aspect | Zircon Channels | capOS |
|---|---|---|
| Topology | Point-to-point, 2 endpoints | Endpoint-routed capability calls |
| Async | Non-blocking read/write + signal waits | Shared SQ/CQ rings |
| Handle/cap transfer | Embedded in messages | Sideband transfer descriptors |
| Message format | Raw bytes + handles | Cap’n Proto serialized |
| Size limits | 64 KiB data, 64 handles | 64 KiB params (current limit) |
| Buffering | Kernel-side message queue | Endpoint queues plus per-process rings |
Recommendations for capOS:
-
Capability transfer alongside capnp messages. Zircon embeds handles as out-of-band data alongside message bytes. capOS has adopted the same separation with ring sideband transfer descriptors and result-cap records. That keeps the kernel from parsing arbitrary Cap’n Proto payload graphs.
-
Two-endpoint channels vs. Endpoint calls. Zircon’s channels are general-purpose pipes. capOS uses a lighter Endpoint CALL/RECV/RETURN model where a capability invocation is routed to the serving process rather than requiring a channel object per connection.
-
Message size limits. Zircon’s 64 KiB limit has been a pain point (large data must go through VMOs). capOS’s capnp messages naturally handle this because large data can be a separate VMO-like capability referenced in the message. Keep the per-message limit reasonable (64 KiB is a good default) and use capability references for bulk data.
3. VMARs and VMOs
Virtual Memory Objects (VMOs)
A VMO is a kernel object representing a contiguous region of virtual memory that can be mapped into address spaces. VMOs are the fundamental unit of memory in Zircon.
Types:
- Paged VMO: Backed by the page fault handler. Pages are allocated on demand. This is the default.
- Physical VMO: Backed by a specific contiguous range of physical memory. Used for device MMIO.
- Contiguous VMO: Like a paged VMO but guarantees physically contiguous pages. Used for DMA.
Key operations:
zx_vmo_create(size, options) -> handle: Create a paged VMO.zx_vmo_read(handle, buffer, offset, length): Read bytes from a VMO.zx_vmo_write(handle, buffer, offset, length): Write bytes to a VMO.zx_vmo_get_size()/zx_vmo_set_size(): Query/resize.zx_vmo_op_range(): Operations like commit (force-allocate pages), decommit (release pages back to system), cache ops.
VMOs can be read/written directly via syscalls without mapping them. This is useful for small transfers but less efficient than mapping for large data.
Copy-on-Write (CoW) Cloning
zx_vmo_create_child(handle, options, offset, size) -> child_handle
Creates a child VMO that is a CoW clone of a range within the parent. Several clone types exist:
-
Snapshot (
ZX_VMO_CHILD_SNAPSHOT): Point-in-time snapshot. Both parent and child see CoW pages. Writes to either side trigger page copies. The child is fully independent after creation – closing the parent does not affect committed pages in the child. -
Slice (
ZX_VMO_CHILD_SLICE): A window into the parent. No CoW – writes to the slice are visible through the parent and vice versa. The child cannot outlive the parent. -
Snapshot-at-least-on-write (
ZX_VMO_CHILD_SNAPSHOT_AT_LEAST_ON_WRITE): Like snapshot but allows the implementation to share unchanged pages between parent and child more aggressively (pages only diverge when written).
CoW cloning is central to how Fuchsia implements fork()-like semantics for
memory (though Fuchsia doesn’t have fork()) and how it shares immutable data
(e.g., shared libraries are CoW-cloned VMOs).
Virtual Memory Address Regions (VMARs)
A VMAR represents a contiguous range of virtual address space within a process. VMARs form a tree rooted at the process’s root VMAR, which covers the entire user-accessible address space.
Hierarchy:
Root VMAR (entire user address space)
+-- Sub-VMAR A (e.g., 0x1000..0x10000)
| +-- Mapping of VMO X at offset 0x1000
| +-- Sub-VMAR B (0x5000..0x8000)
| +-- Mapping of VMO Y at offset 0x5000
+-- Sub-VMAR C (0x20000..0x30000)
+-- Mapping of VMO Z at offset 0x20000
Key operations:
zx_vmar_map(vmar, options, offset, vmo, vmo_offset, len) -> addr: Map a VMO (or a range of it) into the VMAR at a specific offset or let the kernel choose (ASLR).zx_vmar_unmap(vmar, addr, len): Remove a mapping.zx_vmar_protect(vmar, options, addr, len): Change permissions (read/write/execute) on a mapped range.zx_vmar_allocate(vmar, options, offset, size) -> child_vmar, addr: Create a sub-VMAR.zx_vmar_destroy(vmar): Recursively unmap everything and destroy all sub-VMARs. Prevents new mappings.
ASLR: Zircon implements address space layout randomization through VMARs.
When ZX_VM_OFFSET_IS_UPPER_LIMIT or no specific offset is given, the kernel
randomizes placement within the VMAR.
Permissions: Mapping permissions (R/W/X) are constrained by the VMO
handle’s rights. A VMO handle without ZX_RIGHT_EXECUTE cannot be mapped
as executable, regardless of what the zx_vmar_map() call requests.
Why VMARs Matter
VMARs provide:
- Sandboxing within a process. A component can be given a sub-VMAR handle instead of the root VMAR, limiting where it can map memory.
- Hierarchical cleanup. Destroying a VMAR recursively unmaps everything beneath it.
- Controlled mapping. The parent decides the address space layout for child components by allocating sub-VMARs and passing only sub-VMAR handles.
Comparison to capOS
capOS currently has AddressSpace plus a VirtualMemory capability for
anonymous map/unmap/protect operations. There is no VMO-like shared memory
object yet; FrameAllocator still exposes raw physical frame grants.
| Aspect | Zircon | capOS (current) |
|---|---|---|
| Memory objects | VMO (paged, physical, contiguous) | Raw frames plus anonymous VirtualMemory mappings |
| CoW | VMO child clones (snapshot, slice) | Not implemented |
| Address space | VMAR tree | Flat AddressSpace plus VirtualMemory cap |
| Sharing | Map same VMO in multiple processes | Not implemented |
| Permissions | Per-mapping + per-handle rights | Per-page flags at mapping time |
Recommendations for capOS:
-
VMO-equivalent capability. A “MemoryObject” capability that represents a range of memory (backed by demand-paging or physical pages). This becomes the unit of sharing: pass a MemoryObject cap through IPC, and the receiver maps it into their address space. Define it in
schema/capos.capnp. -
Sub-VMAR capabilities for sandboxing. When spawning a process, instead of granting access to the full address space, grant a sub-region capability. This limits where the process can map memory.
-
CoW cloning is valuable but not urgent. The primary use case (shared libraries, fork) may not apply to capOS’s early stages. Design the VMO interface to support cloning later.
-
VMO read/write without mapping. Zircon allows reading/writing VMO contents via syscall without mapping. This is useful for small IPC data and avoids TLB pressure. Consider supporting this in capOS’s MemoryObject.
4. Async Model (Ports)
Overview
Zircon’s async I/O model is built around ports – kernel objects that
receive event packets. A port is similar to Linux’s epoll but with important
differences. It is the foundation for all async programming in Fuchsia.
Port Basics
A port is a kernel object with a queue of packets (zx_port_packet_t).
Packets arrive either from signal-based waits or from direct user queuing.
Key operations:
zx_port_create(options) -> handle: Create a port.zx_port_wait(port, deadline) -> packet: Dequeue the next packet, blocking until one is available or the deadline expires.zx_port_queue(port, packet): Manually enqueue a user packet.zx_port_cancel(port, source, key): Cancel pending waits.
Signal-Based Async (Object Wait Async)
zx_object_wait_async(object, port, key, signals, options):
This is the primary mechanism. It tells the kernel: “when object has any of
these signals asserted, deliver a packet to port with this key.”
Two modes:
- One-shot (
ZX_WAIT_ASYNC_ONCE): The wait fires once and is automatically removed. The user must re-register after handling. - Edge-triggered (
ZX_WAIT_ASYNC_EDGE): Fires each time a signal transitions from deasserted to asserted. Stays registered.
Packet Format
typedef struct zx_port_packet {
uint64_t key; // User-defined key (set during wait_async)
uint32_t type; // ZX_PKT_TYPE_SIGNAL_ONE, ZX_PKT_TYPE_USER, etc.
zx_status_t status; // Result status
union {
zx_packet_signal_t signal; // Which signals triggered
zx_packet_user_t user; // User-queued packet payload (32 bytes)
zx_packet_guest_bell_t guest_bell;
// ... other packet types
};
} zx_port_packet_t;
The signal variant includes trigger (which signals were waited on),
observed (current signal state), and a count (for edge-triggered, how many
transitions).
Async Dispatching (libasync)
Fuchsia’s userspace async library (libfidl, async-loop) provides a
higher-level event loop:
async::Loop: An event loop that owns a port and dispatches events to registered handlers.async::Wait: Wrapszx_object_wait_async()with a callback. When the signal fires, the loop calls the handler.async::Task: Runs a closure on the loop’s dispatcher.- FIDL bindings: The async FIDL bindings register channel-readable waits on the loop’s port. When a message arrives, the FIDL dispatcher decodes it and calls the appropriate protocol method handler.
The typical pattern:
loop = async::Loop()
loop.port -> zx_port_create()
// Register interest in channel readability
zx_object_wait_async(channel, loop.port, key, ZX_CHANNEL_READABLE)
// Event loop
while True:
packet = zx_port_wait(loop.port)
handler = lookup(packet.key)
handler(packet)
// Re-register if one-shot
Comparison to Linux io_uring
| Aspect | Zircon Ports | Linux io_uring |
|---|---|---|
| Model | Event notification (signals) | Operation submission/completion |
| Submission | No SQ; operations are separate syscalls | SQ ring: batch operations |
| Completion | Port packet queue | CQ ring in shared memory |
| Kernel transitions | One per wait_async + one per port_wait | One per io_uring_enter (batched) |
| Memory sharing | No shared ring buffers | SQ/CQ are mmap’d shared memory |
| Zero-copy | Not for port packets | Registered buffers, fixed files |
| Batching | No inherent batching | Core design: submit N ops, one syscall |
| Chaining | Not supported | SQE linking (sequential/parallel) |
| Scope | Signal notification only | Full I/O operations (read, write, send, recv, fsync, …) |
Key differences:
-
Ports are notification-based; io_uring is operation-based. A port tells you “something happened” (a signal was asserted), then you do separate syscalls to act on it (read the channel, accept the socket, etc.). io_uring lets you submit the actual I/O operation and the kernel does it asynchronously, returning the result in the completion ring.
-
io_uring avoids syscalls for submission. The submission queue is shared memory – userspace writes SQEs and the kernel reads them without a syscall (in polling mode) or with a single
io_uring_enter()for a batch of operations. Ports require a syscall perwait_asyncregistration. -
io_uring supports chaining. SQE linking allows dependent operations (e.g., “read from file, then write to socket”) without returning to userspace between steps.
-
Ports are simpler. The signal model is straightforward and composes well with Zircon’s object model. io_uring’s complexity (dozens of opcodes, registered buffers, fixed files, kernel-side polling) is much higher.
Performance Tradeoffs
Ports:
- Pro: Simple, well-integrated with kernel object model, easy to reason about.
- Con: Extra syscalls per operation (wait_async to register, port_wait to receive, then the actual operation syscall). At least 3 syscalls per async operation.
io_uring:
- Pro: Can batch many operations in a single syscall. Shared-memory rings avoid copies. Kernel-side polling can eliminate syscalls entirely.
- Con: Complex API surface, security attack surface (many kernel bugs have been in io_uring), complex state management.
Comparison to capOS’s Planned Async Rings
capOS plans io_uring-inspired capability rings: an SQ where userspace submits capnp-serialized capability invocations and a CQ where the kernel posts completions.
| Aspect | Zircon Ports | capOS Planned Rings |
|---|---|---|
| Submission | Separate syscalls | SQ in shared memory |
| Completion | Port packet queue (kernel-owned) | CQ in shared memory |
| Operation scope | Signal notification only | Full capability invocations |
| Batching | None | Natural (fill SQ, single syscall) |
| Wire format | Fixed packet struct | Cap’n Proto messages |
Recommendations for capOS:
-
The io_uring model is better than ports for capOS’s use case. Since every operation in capOS is a capability invocation (not just a signal notification), putting the full operation in the submission ring eliminates the extra round-trip that ports require. This is the right choice.
-
Keep a signal/notification mechanism too. Even with async rings, capOS needs a way to wait for events (e.g., “data available on this channel”, “process exited”). Consider a simple signal/wait mechanism alongside the capability rings – perhaps signal delivery goes through the CQ as a special completion type.
-
Study io_uring’s SQE linking. Chaining dependent capability calls (e.g., “read from FileStore, then write to Console”) without returning to userspace is powerful. This maps naturally to Cap’n Proto promise pipelining: “call method A on cap X, then call method B on the result’s capability” – the kernel can chain these internally.
-
Registered/fixed capabilities. io_uring has “fixed files” (registered fd set for faster lookup). capOS could have a “hot set” of capabilities pinned in the SQ context for faster dispatch (avoid per-call table lookup).
-
Completion ordering. io_uring completions can arrive out of order. capOS’s CQ should also support out-of-order completion (each SQE has a user_data tag echoed in the CQE) to enable true async pipelining.
5. FIDL (Fuchsia Interface Definition Language)
Overview
FIDL is Fuchsia’s IDL for defining protocols that communicate over channels. It serves a similar role to Cap’n Proto schemas in capOS: defining the contract between client and server.
FIDL vs. Cap’n Proto: Schema Language
FIDL example:
library fuchsia.example;
type Color = strict enum : uint32 {
RED = 1;
GREEN = 2;
BLUE = 3;
};
protocol Painter {
SetColor(struct { color Color; }) -> ();
DrawLine(struct { x0 float32; y0 float32; x1 float32; y1 float32; }) -> ();
-> OnPaintComplete(struct { num_pixels uint64; });
};
Equivalent Cap’n Proto:
enum Color { red @0; green @1; blue @2; }
interface Painter {
setColor @0 (color :Color) -> ();
drawLine @1 (x0 :Float32, y0 :Float32, x1 :Float32, y1 :Float32) -> ();
}
Key differences in the schema language:
| Feature | FIDL | Cap’n Proto |
|---|---|---|
| Unions | flexible union, strict union | Anonymous unions in structs |
| Enums | strict enum, flexible enum | enum (always strict) |
| Optionality | box<T>, nullable types | Default values, union with Void |
| Evolution | flexible keyword for forward compat | Field numbering, @N ordinals |
| Tables | table (like protobuf, sparse) | struct with default values |
| Events | -> EventName(...) server-sent | No built-in events |
| Error syntax | -> () error uint32 | Must encode in return struct |
| Capability types | client_end:P, server_end:P | interface P as field type |
FIDL’s table type is analogous to Cap’n Proto structs in terms of
evolvability (can add fields without breaking), but Cap’n Proto structs are
more compact on the wire (fixed-size inline section + pointers) while FIDL
tables use an envelope-based encoding.
Wire Format Comparison
FIDL wire format:
- Little-endian, 8-byte aligned.
- Messages have a 16-byte header:
txid(4 bytes), flags (3 bytes), magic byte (0x01), ordinal (8 bytes). - Structs are laid out inline with natural alignment and explicit padding.
- Out-of-line data (strings, vectors, tables) uses offset-based indirection via “envelopes” (inline 8-byte entry: 4 bytes num_bytes, 2 bytes num_handles, 2 bytes flags).
- Handles are out-of-band. The wire format contains
ZX_HANDLE_PRESENT(0xFFFFFFFF) orZX_HANDLE_ABSENT(0x00000000) markers where handles appear. The actual handles are in the channel message’s handle array, consumed in order of appearance in the linearized message. - Encoding is done into a contiguous byte buffer + a separate handle array, matching the channel write API.
- No pointer arithmetic. FIDL v2 uses a “depth-first traversal order” encoding where out-of-line objects are laid out sequentially. Offsets are not stored; the decoder walks the type schema to find boundaries.
Cap’n Proto wire format:
- Little-endian, 8-byte aligned (word-based).
- Messages have a segment table header listing segment sizes.
- Structs have a fixed data section + pointer section. Pointers are relative offsets (self-relative, in words).
- Uses pointer-based random access: can read any field without parsing the entire message.
- Capabilities are indexed. Cap’n Proto’s RPC protocol assigns capability table indices to interface references in messages. The actual capability (file descriptor, handle, etc.) is transferred out-of-band.
- Supports multi-segment messages (FIDL is always single-segment).
- Zero-copy read: can read directly from the wire buffer without deserialization.
Key wire format differences:
| Property | FIDL | Cap’n Proto |
|---|---|---|
| Random access | No (sequential decode) | Yes (pointer-based) |
| Zero-copy read | Partial (decode-on-access for some types) | Full (read from buffer) |
| Segments | Single contiguous buffer | Multi-segment |
| Pointers | Implicit (traversal order) | Explicit (relative offsets) |
| Size overhead | Smaller (no pointer words) | Larger (pointer section) |
| Decode cost | Must validate sequentially | Can validate lazily |
| Handle/cap encoding | Presence markers + out-of-band array | Cap table indices + out-of-band |
FIDL Capability Transfer
FIDL has first-class syntax for capability transfer in protocols:
protocol FileSystem {
Open(resource struct {
path string:256;
flags uint32;
object server_end:File;
}) -> ();
};
protocol File {
Read(struct { count uint64; }) -> (struct { data vector<uint8>:MAX; });
GetBuffer(struct { flags uint32; }) -> (resource struct { buffer zx.Handle:VMO; });
};
server_end:File– a channel endpoint where the server will serve theFileprotocol. The client creates a channel, keeps the client end, and sends the server end through this call.client_end:File– a channel endpoint for a client of theFileprotocol.zx.Handle:VMO– a handle to a specific kernel object type (VMO).- The
resourcekeyword marks types that contain handles (and thus cannot be copied, only moved).
The FIDL compiler tracks handle ownership: types containing handles are
“resource types” with move semantics. This is enforced at the language binding
level (e.g., in C++, resource types are move-only; in Rust, they implement
Drop but not Clone).
Comparison to capOS’s Cap’n Proto Usage
Cap’n Proto natively supports capability transfer through its interface
types:
interface FileSystem {
open @0 (path :Text, flags :UInt32) -> (file :File);
}
interface File {
read @0 (count :UInt64) -> (data :Data);
getBuffer @1 (flags :UInt32) -> (buffer :MemoryObject);
}
In standard Cap’n Proto RPC, file :File in the return type means “a
capability to a File interface.” The RPC system assigns a capability table
index, transfers it out-of-band, and the receiver gets a live reference to
invoke further methods.
Recommendations for capOS:
-
Use out-of-band capability transfer beside Cap’n Proto payloads. Cap’n Proto RPC has capability descriptors indexed into a capability table, but capOS currently keeps kernel transfer semantics in ring sideband records so the kernel can treat Cap’n Proto payload bytes as opaque. Promise pipelining should build on that sideband result-cap namespace rather than requiring general payload traversal in the kernel.
-
No need to switch to FIDL. Cap’n Proto’s wire format is superior for capOS’s use case:
- Random access means runtimes and services can inspect specific fields without full deserialization. The kernel should keep using bounded sideband metadata for transport decisions.
- Zero-copy read means less allocation in userspace protocol handling.
- Multi-segment messages allow avoiding large contiguous allocations.
- Promise pipelining is native to Cap’n Proto RPC, aligning with capOS’s planned async ring chaining.
-
FIDL’s
resourcekeyword is worth imitating. Mark capnp types that contain capabilities differently from pure-data types. This could be done at the schema level (Cap’n Proto already distinguishesinterfacefields) or as a convention. This enables the kernel to fast-path messages that contain no capabilities (no need to scan for capability descriptors). -
FIDL’s
tabletype for evolution. Cap’n Proto structs already support adding fields, but capOS should be aware that FIDL tables are more explicitly designed for cross-version compatibility. For system interfaces that will evolve, consider using Cap’n Proto groups or designing structs with generous ordinal spacing.
6. Synthesis: Relevance to capOS
Handle Model vs. Typed Capability Dispatch
Zircon’s handle model is untyped at the handle level – a handle is just
(object_ref, rights). The type comes from the object. All operations go through
fixed syscalls (zx_channel_write, zx_vmo_read, etc.).
capOS’s model is typed at the capability level – each capability
implements a Cap’n Proto interface with method dispatch. Operations go through
ring SQEs such as CAP_OP_CALL, with Cap’n Proto params and results carried
in userspace buffers.
Both are valid. Zircon’s approach is lower overhead (no serialization for simple
operations like vmo_read), while capOS’s approach gives uniformity (every
operation has the same wire format, enabling persistence and network
transparency).
Hybrid recommendation: For performance-critical operations (memory mapping, signal waiting), consider adding “fast-path” syscalls that bypass capnp serialization, similar to how Zircon has dedicated syscalls per object type. The capnp path remains the general mechanism and the “canonical” interface.
Async Rings vs. Ports: The Right Call
capOS’s io_uring-inspired async rings are a better fit than Zircon’s port model for a capability OS:
- Ports require separate syscalls for registration, waiting, and the actual operation. Async rings batch everything.
- Cap’n Proto’s promise pipelining maps naturally to SQE chaining.
- The shared-memory ring design avoids kernel-side queuing overhead.
However, learn from ports:
- The signal model (each object has a signal set, watchers are notified) is clean and composable. Consider making “wait for signal” a CQ event type.
zx_port_queue()(user-initiated packets) is useful for waking up event loops from user code. Support user-initiated CQ entries.
VMO/VMAR vs. capOS Memory Model
capOS should implement VMO-equivalent capabilities after the current Endpoint and transfer baseline:
- IPC already has shared rings, but bulk data still needs explicit shared memory objects.
- Capability transfer of memory regions (passing a MemoryObject cap through IPC) is the standard pattern for bulk data transfer.
- CoW cloning enables efficient process creation.
Proposed capability interfaces:
interface MemoryObject {
read @0 (offset :UInt64, count :UInt64) -> (data :Data);
write @1 (offset :UInt64, data :Data) -> ();
getSize @2 () -> (size :UInt64);
setSize @3 (size :UInt64) -> ();
createChild @4 (offset :UInt64, size :UInt64, options :UInt32) -> (child :MemoryObject);
}
interface AddressRegion {
map @0 (offset :UInt64, vmo :MemoryObject, vmoOffset :UInt64, len :UInt64, flags :UInt32) -> (addr :UInt64);
unmap @1 (addr :UInt64, len :UInt64) -> ();
protect @2 (addr :UInt64, len :UInt64, flags :UInt32) -> ();
allocateSubRegion @3 (offset :UInt64, size :UInt64) -> (region :AddressRegion, addr :UInt64);
}
FIDL vs. Cap’n Proto: Stay with Cap’n Proto
Cap’n Proto is the right choice for capOS. The advantages over FIDL:
- Language-independent standard. FIDL is Fuchsia-only. Cap’n Proto has implementations in C++, Rust, Go, Python, Java, etc.
- Zero-copy random access. The kernel can inspect message fields without full deserialization.
- Promise pipelining. Native to capnp-rpc, enabling the async ring chaining that capOS plans.
- Persistence. Cap’n Proto messages are self-describing (with schema) and suitable for on-disk storage – important for capOS’s planned capability persistence.
The one thing FIDL does better: tight integration of handle/capability metadata
in the type system (the resource keyword, client_end/server_end syntax,
handle type constraints). capOS should ensure its capnp schemas clearly
distinguish capability-carrying types and that the kernel enforces capability
transfer semantics.
Concrete Action Items for capOS
Ordered by priority and dependency:
-
Keep typed-interface authority model. Do not add a Zircon-style generic rights bitmask until a concrete method-attenuation need beats narrow wrapper capabilities and transfer-mode metadata.
-
Handle generation counters. Done: upper bits of
CapIddetect stale references. -
Design MemoryObject/SharedBuffer capability. Define and implement the shared-memory object that replaces raw-frame transfer for bulk IPC.
-
Design AddressRegion capability (Stage 5). Sub-VMAR-like sandboxing. The root VMAR handle is part of the initial capability set.
-
Capability transfer sideband. Baseline CALL/RETURN copy and move transfer is implemented; promise-pipelined result-cap mapping still needs a precise rule before pipeline dispatch lands.
-
Async rings with signal delivery. SQ/CQ capability rings are implemented for transport; notification objects and promise pipelining remain future work.
-
User-queued CQ entries (with async rings). Allow userspace to post wake-up events to its own CQ, enabling pure-userspace event loop integration.
Appendix: Key Zircon Syscall Reference
For reference, the most architecturally significant Zircon syscalls:
| Syscall | Purpose |
|---|---|
zx_handle_close | Close a handle |
zx_handle_duplicate | Duplicate with rights reduction |
zx_handle_replace | Atomic replace with new rights |
zx_channel_create | Create channel pair |
zx_channel_read | Read message + handles from channel |
zx_channel_write | Write message + handles to channel |
zx_channel_call | Synchronous write-then-read (RPC) |
zx_port_create | Create async port |
zx_port_wait | Wait for next packet |
zx_port_queue | Enqueue user packet |
zx_object_wait_async | Register signal wait on port |
zx_object_wait_one | Synchronous wait on one object |
zx_vmo_create | Create virtual memory object |
zx_vmo_read / write | Direct VMO access |
zx_vmo_create_child | CoW clone |
zx_vmar_map | Map VMO into address region |
zx_vmar_unmap | Unmap |
zx_vmar_allocate | Create sub-VMAR |
zx_process_create | Create process (with root VMAR) |
zx_process_start | Start process execution |