Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

OS Error Handling in Capability Systems: Research Notes

Research on error handling patterns in capability-based and microkernel operating systems. Used as input for the capOS error handling proposal.


1. seL4

Error Codes

seL4 defines 11 kernel error codes in errors.h:

typedef enum {
    seL4_NoError            = 0,
    seL4_InvalidArgument    = 1,
    seL4_InvalidCapability  = 2,
    seL4_IllegalOperation   = 3,
    seL4_RangeError         = 4,
    seL4_AlignmentError     = 5,
    seL4_FailedLookup       = 6,
    seL4_TruncatedMessage   = 7,
    seL4_DeleteFirst        = 8,
    seL4_RevokeFirst        = 9,
    seL4_NotEnoughMemory    = 10,
} seL4_Error;

Error Return Mechanism

  • Capability invocations (kernel object operations) return seL4_Error directly.
  • IPC messages use seL4_MessageInfo_t with label, length, extraCaps, capsUnwrapped. The label is copied unmodified – kernel doesn’t interpret it.
  • MR0 (Message Register 0) carries return codes for kernel object invocations via seL4_Call.

Error Propagation

Fault handler mechanism: each TCB has a fault endpoint capability. On fault (capability fault, VM fault, etc.):

  1. Kernel blocks the faulting thread.
  2. Kernel sends an IPC to the fault endpoint with fault-type-specific fields.
  3. Fault handler (separate process) receives, fixes, and replies.
  4. Kernel resumes the faulting thread.

Design Choices

  • seL4_NBSend on invalid capability: silently fails (prevents covert channels).
  • seL4_Send/seL4_Call on invalid capability: returns seL4_FailedLookup.
  • No application-level error convention – user servers choose their own protocol.
  • Partial capability transfer: if some caps in a multi-cap transfer fail, already-transferred caps succeed; extraCaps reflects the successful count.

Sources

  • seL4 errors.h: https://github.com/seL4/seL4/blob/master/libsel4/include/sel4/errors.h
  • seL4 IPC tutorial: https://docs.sel4.systems/Tutorials/ipc.html
  • seL4 fault handlers: https://docs.sel4.systems/Tutorials/fault-handlers.html
  • seL4 API reference: https://docs.sel4.systems/projects/sel4/api-doc.html

2. Fuchsia / Zircon

zx_status_t

Signed 32-bit integer. Negative = error, ZX_OK (0) = success.

Categories:

CategoryExamples
GeneralZX_ERR_INTERNAL, ZX_ERR_NOT_SUPPORTED, ZX_ERR_NO_RESOURCES, ZX_ERR_NO_MEMORY
ParameterZX_ERR_INVALID_ARGS, ZX_ERR_WRONG_TYPE, ZX_ERR_BAD_HANDLE, ZX_ERR_BUFFER_TOO_SMALL
StateZX_ERR_BAD_STATE, ZX_ERR_NOT_FOUND, ZX_ERR_TIMED_OUT, ZX_ERR_ALREADY_EXISTS, ZX_ERR_PEER_CLOSED
PermissionZX_ERR_ACCESS_DENIED
I/OZX_ERR_IO, ZX_ERR_IO_REFUSED, ZX_ERR_IO_DATA_INTEGRITY, ZX_ERR_IO_DATA_LOSS

FIDL Error Handling (Three Layers)

Layer 1: Transport errors. Channel broke. Currently all transport-level FIDL errors close the channel. Client observes ZX_ERR_PEER_CLOSED.

Layer 2: Epitaphs (RFC-0053). Server sends a special final message before closing a channel, explaining why. Wire format: ordinal 0xFFFFFFFF, error status in the reserved uint32 of the FIDL message header. After sending, server closes the channel.

Layer 3: Application errors (RFC-0060). Methods declare error types:

Method() -> (string result) error int32;

Serialized as:

union MethodReturn {
    MethodResult result;
    int32 err;
};

Error types constrained to int32, uint32, or an enum thereof. Deliberately no standard error enum – each service defines its own error domain. Rationale: standard error enums “try to capture more detail than we think is appropriate.”

C++ binding: zx::result<T> (specialization of fit::result<zx_status_t, T>).

Sources

  • Zircon errors: https://fuchsia.dev/fuchsia-src/concepts/kernel/errors
  • RFC-0060 error handling: https://fuchsia.dev/fuchsia-src/contribute/governance/rfcs/0060_error_handling
  • RFC-0053 epitaphs: https://fuchsia.dev/fuchsia-src/contribute/governance/rfcs/0053_epitaphs

3. EROS / KeyKOS / Coyotos

KeyKOS Invocation Message Format

KC (Key, Order_code)
   STRUCTFROM(arg_structure)
   KEYSFROM(arg_key_slots)
   STRUCTTO(reply_structure)
   KEYSTO(reply_key_slots)
   RCTO(return_code_variable)
  • Order code: small integer selecting the operation (method selector).
  • Return code: integer returned by the invoked object via RCTO.
  • Data string: bulk data parameter (up to ~4KB).
  • Keys: up to 4 capability parameters in each direction.

Invocation Primitives

  • CALL: send + block for reply. Kernel synthesizes a resume key (capability to resume caller) as 4th key parameter to callee.
  • RETURN: reply using a resume key + go back to waiting.
  • FORK: send and continue (fire-and-forget).

Keeper Error Handling

Every domain has a domain keeper slot. On hardware trap (illegal instruction, divide-by-zero, protection fault):

  1. Kernel invokes the keeper as if the domain had issued a CALL.
  2. Keeper receives fault information in the message.
  3. Keeper can fix and resume (via resume key) or terminate.
  4. A non-zero return code from a key invocation triggers the keeper mechanism.

Coyotos (EROS Successor) – Formalized Error Model

Cleanly separates invocation-level vs application-level exceptions:

Invocation-level (before the target processes the message): MalformedSyscall, InvalidAddress, AccessViolation, DataAccessTypeError, CapAccessTypeError, MalformedSpace, MisalignedReference

Application-level: signaled via OPR0.ex flag bit in the reply control word. If set, remaining parameter words contain a 64-bit exception code plus optional info.

Sources

  • KeyKOS architecture: https://dl.acm.org/doi/pdf/10.1145/858336.858337
  • Coyotos spec: https://hydra-www.ietfng.org/capbib/cache/shapiro:coyotosspec.html
  • EROS (SOSP 1999): https://sites.cs.ucsb.edu/~chris/teaching/cs290/doc/eros-sosp99.pdf

4. Plan 9 / 9P

9P2000 Rerror Format

size[4] Rerror tag[2] ename[s]
  • ename[s]: variable-length UTF-8 string describing the error.
  • No Terror message – only servers send errors.
  • String-based, not numeric. Conventional strings (“permission denied”, “file not found”) but no fixed taxonomy.

9P2000.u Extension (Unix compatibility)

size[4] Rerror tag[2] ename[s] errno[4]

Adds a 4-byte Unix errno as a hint. Clients should prefer the string. ERRUNDEF sentinel when Unix errno doesn’t apply.

Design Rationale

Avoids “errno fragmentation” where different Unix variants assign different numbers to the same condition. The string is authoritative; the number is an optimization for Unix-compatibility clients.

Sources

  • 9P2000 RFC: https://ericvh.github.io/9p-rfc/rfc9p2000.html
  • 9P2000.u RFC: https://ericvh.github.io/9p-rfc/rfc9p2000.u.html

5. Genode

RPC Exception Propagation

GENODE_RPC_THROW(func_type, ret_type, func_name,
                 GENODE_TYPE_LIST(Exception1, Exception2, ...),
                 arg_type...)

Only the exception type crosses the boundary – exception objects (fields, messages) are not transferred. Server encodes a numeric Rpc_exception_code, client reconstructs a default-constructed exception of the matching type.

Undeclared exceptions: undefined behavior (server crash or hung RPC).

Infrastructure-Level Errors

  • RPC_INVALID_OPCODE: dispatched operation code doesn’t match.
  • Rpc_exception_code: integral type, computed as RPC_EXCEPTION_BASE - index_in_exception_list.
  • Ipc_error: kernel IPC failure (server unreachable).
  • Server death: capabilities become invalid, subsequent invocations produce Ipc_error.

Sources

  • Genode RPC: https://genode.org/documentation/genode-foundations/20.05/functional_specification/Remote_procedure_calls.html
  • Genode IPC: https://genode.org/documentation/genode-foundations/23.05/architecture/Inter-component_communication.html

6. Cross-System Comparison: Transport vs Application Errors

Every capability/microkernel IPC system separates two failure modes:

  1. Transport errors – the invocation mechanism failed before the target processed the request (bad handle, insufficient rights, target dead, malformed message, timeout).

  2. Application errors – the service processed the request and returned a meaningful error (not found, resource exhausted, invalid operation).

SystemTransport errorsApplication errors
seL4seL4_Error (11 values) from syscallIPC message payload (user-defined)
Zirconzx_status_t (~30 values) from syscallFIDL per-method error type
EROS/CoyotosInvocation exceptions (kernel)OPR0.ex flag + code in reply
Plan 9Connection lossRerror with string
GenodeIpc_error + RPC_INVALID_OPCODEC++ exceptions via GENODE_RPC_THROW
Cap’n Proto RPCdisconnected/unimplementedfailed/overloaded or schema types

Common pattern: small kernel error code set for transport + typed service-specific errors for application.


7. POSIX errno: Strengths and Weaknesses for Capability Systems

Strengths

  • Simple (single integer, zero overhead on success).
  • Universal (every Unix developer knows it).
  • Low overhead (no allocation on error path).

Weaknesses for Capability Systems

  • Ambient authority assumption: EACCES/EPERM assume ACL-style access control. In capability systems, having the capability IS the permission.
  • Global flat namespace: all errors share one integer space. Capability systems have typed interfaces; errors should be scoped per-interface.
  • No structured information: just an integer, no “which argument” or “how much memory needed.”
  • Thread-local state: clobbered by intermediate calls, breaks down with async IPC or promise pipelining.
  • No transport/application distinction: EBADF (transport) and ENOENT (application) in the same space.
  • Not composable across trust boundaries: callee’s errno meaningless in caller’s address space without explicit serialization.

No capability system uses a POSIX-style global errno namespace.