Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Proposal: Capability-Based Binaries, Language Support, and POSIX Compatibility

How userspace binaries receive, use, and compose capabilities — from the native Rust runtime through POSIX compatibility to running unmodified software.

Current State

The init binary (init/src/main.rs) and smoke services are no_std Rust binaries over capos-rt. The runtime owns _start, fixed heap initialization, CapSet parsing, exit/cap_enter syscall wrappers, typed clients, result-cap adoption, queued release flushing, and panic output. Init reads the BootPackage manifest, validates the metadata-only service graph, spawns child services through ProcessSpawner, waits on ProcessHandles, and exits. The former raw bootstrap syscall and demo-support runtime shims are historical; demo support now keeps only low-level transport helpers for intentionally malformed SQE/CQE smokes.

The kernel-side roadmap (Stages 4-6) provides the capability ring (SQ/CQ shared memory + cap_enter syscall, implemented), scheduling, and IPC. This proposal covers the userspace half: what binaries look like, how they’re built, and how existing software runs on a system with no ambient authority.

Part 1: Native Userspace Runtime (capos-rt)

The Problem

Every userspace binary currently needs to:

  • Define _start and a panic handler
  • Set up an allocator
  • Construct raw syscall wrappers
  • Manually serialize/deserialize capnp messages
  • Know the syscall ABI (register layout, method IDs)

This is fine for one proof-of-concept binary. It won’t scale to dozens of services.

Solution: A Userspace Runtime Crate

capos-rt is a no_std + alloc Rust crate that every native capOS binary depends on. It provides:

1. Entry point and allocator setup.

// capos-rt provides the real _start that:
// - initializes the heap allocator (bump allocator over a fixed region,
//   or grows via FrameAllocator cap if granted)
// - parses the initial capability set from a kernel-provided page
// - calls the user's main(CapSet)
// - calls sys_exit with the return value

#[capos_rt::main]
fn main(caps: CapSet) -> Result<(), Error> {
    let console = caps.get::<Console>("console")?;
    console.write_line("Hello from capOS")?;
    Ok(())
}

2. Syscall layer. Raw syscall asm wrapped in safe Rust functions. The entire syscall surface is 2 calls – new operations are SQE opcodes, not new syscalls:

  • sys_exit(code) – terminate process (syscall 1)
  • sys_cap_enter(min_complete, timeout_ns) – flush pending SQEs, then wait until N completions are available or the timeout expires (syscall 2)

Capability invocations go through the per-process SQ/CQ ring. capos-rt provides helpers for writing SQEs and reading CQEs:

#![allow(unused)]
fn main() {
/// Submit a CALL SQE to the capability ring and wait for the CQE.
pub fn cap_call(
    ring: &mut CapRing,
    cap_id: u32,
    method_id: u16,
    params: &[u8],
    result_buf: &mut [u8],
) -> Result<usize, CapError> {
    ring.push_call_sqe(cap_id, method_id, params);
    sys_cap_enter(1, u64::MAX);
    ring.pop_cqe(result_buf)
}
}

3. Cap’n Proto integration. Re-exports generated types from schema/capos.capnp and provides typed wrappers:

#![allow(unused)]
fn main() {
// Generated from schema + thin wrapper in capos-rt
impl Console {
    pub fn write(&self, data: &[u8]) -> Result<(), CapError> {
        let mut msg = capnp::message::Builder::new_default();
        let mut req = msg.init_root::<console::write_params::Builder>();
        req.set_data(data);
        self.invoke(0, &msg)  // method @0
    }

    pub fn write_line(&self, text: &str) -> Result<(), CapError> {
        let mut msg = capnp::message::Builder::new_default();
        let mut req = msg.init_root::<console::write_line_params::Builder>();
        req.set_text(text);
        self.invoke(1, &msg)  // method @1
    }
}
}

4. CapSet – the initial capability environment.

At spawn time, the kernel writes the process’s initial capabilities into a well-known page (or passes them via registers/stack – ABI TBD). capos-rt parses this into a CapSet: a name-to-CapId map.

#![allow(unused)]
fn main() {
pub struct CapSet {
    caps: BTreeMap<String, CapEntry>,
}

struct CapEntry {
    id: u32,            // authority-bearing slot in the process CapTable
    interface_id: u64,  // generated capnp TYPE_ID, carried for type checking
}

impl CapSet {
    /// Get a typed capability by name. Fails if not present or wrong type.
    pub fn get<T: Capability>(&self, name: &str) -> Result<T, CapError> { ... }

    /// List available capability names (for debugging/discovery).
    pub fn list(&self) -> impl Iterator<Item = (&str, u64)> { ... }
}
}

interface_id is not a handle. It is metadata carrying the generated Cap’n Proto TYPE_ID for the interface expected by the typed client. The handle is id (CapId). A typed client constructor must check that entry.interface_id == T::TYPE_ID, then store the CapId. Normal CALL SQEs do not need to repeat the interface ID because each capability table entry exposes one public interface. The ring SQE keeps fixed-size reserved padding for ABI stability, not a required interface field for the system transport.

This matters for the system transport because several capabilities can expose the same interface while representing different authority: a serial console, a log-buffer console, and a console proxy all have the Console TYPE_ID, but different CapId values.

Crate Structure

capos-rt/
  Cargo.toml          # no_std + alloc, depends on capnp
  build.rs            # capnpc codegen from schema/capos.capnp
  src/
    lib.rs            # re-exports, #[capos_rt::main] macro
    syscall.rs        # raw asm syscall wrappers
    caps.rs           # CapSet, CapEntry, Capability trait
    alloc.rs          # userspace heap allocator setup
    generated.rs      # include!(capnp generated code)

capos-rt is NOT a workspace member (same as init/ – needs different code model and linker script). It’s a path dependency for userspace crates.

Init After capos-rt

// init/src/main.rs -- after capos-rt exists
use capos_rt::prelude::*;

#[capos_rt::main]
fn main(caps: CapSet) -> Result<(), Error> {
    let console = caps.get::<Console>("console")?;
    let spawner = caps.get::<ProcessSpawner>("spawner")?;
    let manifest = caps.get::<Manifest>("manifest")?;

    console.write_line("capOS init starting")?;

    for entry in manifest.services()? {
        let binary_name = entry.binary();
        let granted = resolve_caps(&entry, &running_services, &caps)?;
        let handle = spawner.spawn(entry.name(), binary_name, &granted)?;
        running_services.insert(entry.name(), handle);
    }

    supervisor_loop(&running_services, &spawner)
}

Part 2: Capability-Based Binary Model

Binary Format

ELF64, same as now. The kernel’s ELF loader (kernel/src/elf.rs) already handles PT_LOAD segments. No changes to the binary format itself.

What changes is the ABI contract between kernel and binary:

AspectCurrent (Stage 3)After capos-rt
Entry point_start(), no args_start(cap_page: *const u8) or via well-known address
Syscall ABIad-hoc (rax=0 write, rax=1 exit)SQ/CQ ring + sys_cap_enter + sys_exit
Capability accessNoneCapSet parsed from kernel-provided page
SerializationNonecapnp messages
AllocatorNone (no heap)Bump allocator, optionally backed by FrameAllocator cap

Initial Capability Passing

The kernel needs to communicate the initial cap set to the new process. Options:

Option A: Well-known page. Kernel maps a read-only page at a fixed virtual address (e.g., 0x1000) containing a capnp-serialized InitialCaps message:

struct InitialCaps {
    entries @0 :List(InitialCapEntry);
}

struct InitialCapEntry {
    name @0 :Text;
    id @1 :UInt32;
    interfaceId @2 :UInt64;
}

Option B: Register convention. Pass pointer and length in rdi/rsi at entry. Simpler, but the data still needs to live somewhere in user memory.

Option C: Stack. Push the cap descriptor onto the user stack before iretq. Similar to how Linux passes auxv to _start.

Option A is cleanest – the page is always there, no calling-convention dependency, and it naturally extends to passing additional boot info later.

Service Binary Lifecycle

1. Kernel loads ELF, creates address space, populates cap table
2. Kernel maps InitialCaps page at well-known address
3. Kernel enters userspace at _start

4. capos-rt _start:
   a. Initialize heap allocator
   b. Parse InitialCaps page into CapSet
   c. Call user's main(CapSet)

5. User main:
   a. Extract needed caps from CapSet
   b. Do work (invoke caps, serve requests)
   c. Optionally export caps to parent once ProcessHandle export lookup exists

6. On return from main (or sys_exit):
   a. Kernel destroys process
   b. All caps in process's cap table are dropped
   c. Parent's ProcessHandle receives exit notification

Part 3: Language Support Roadmap

Tier 1: Rust (native, now)

Rust is the only language that matters until the runtime is stable. Reasons:

  • no_std + alloc works today with the existing kernel
  • capnp crate (v0.25) has no_std support with codegen
  • Zero runtime overhead – no GC, no dynamic linker, no libc
  • Same language as the kernel, shared understanding of the memory model
  • Ownership model maps naturally to capability lifecycle

All system services (drivers, network stack, store) will be Rust.

Tier 2: C (via libcapos.h, after Stage 6)

C is the second target because most existing driver code and system software is C, and the FFI boundary with Rust is trivial.

libcapos is a static library providing:

#include <capos.h>

// Ring-based capability invocation (synchronous wrapper around SQ/CQ ring)
int cap_call(cap_ring_t *ring, uint32_t cap_id, uint16_t method_id,
             const void *params, size_t params_len,
             void *result, size_t result_len);

// Typed wrappers (generated from .capnp schema)
int console_write(cap_t console, const void *data, size_t len);
int console_write_line(cap_t console, const char *text);

// CapSet access
cap_t capset_get(const char *name);
uint64_t capset_interface_id(const char *name);

// Syscalls (the entire syscall surface -- 2 calls total)
_Noreturn void sys_exit(int code);                   // terminate
uint32_t sys_cap_enter(uint32_t min_complete,        // flush SQEs + wait
                       uint64_t timeout_ns);

Implementation: libcapos is Rust compiled to a static .a with a C ABI (#[no_mangle] extern "C"). The capnp message construction happens in Rust behind the C API. This avoids requiring a C capnp implementation.

C binaries link against libcapos.a and use the same linker script as Rust userspace binaries. The entry point and allocator setup are in libcapos.

Tier 3: Regular Rust Runtime Support

After the native capos-rt service model is stable, the next language priority is making capOS build and run ordinary Rust programs as far as the capability model permits. The target is not an ambient POSIX clone; it is a Rust runtime path where common crates can use allocation, time, threading where available, and capability-backed I/O through capOS-native shims.

This has higher priority than C++ and should be evaluated before broad POSIX compatibility work, because Rust is already the system language and can reuse the existing capos-rt ownership and ring abstractions directly.

Tier 4: Go (GOOS=capos)

Go is the next high-priority runtime after regular Rust. It needs in-process threading, futex-like wait/wake, TLS/runtime metadata support, GC integration, and a network poller mapped to capOS capabilities. See docs/proposals/go-runtime-proposal.md for the dedicated plan.

Go has higher priority than C++ because it unlocks CUE and a large practical tooling/runtime ecosystem; C++ support should not displace the Go runtime track.

Tier 5: Any Language Targeting WASI (longer term)

See Part 5 below. Languages that compile to WASI (Rust, C, Go, etc.) can run on capOS through a WASI-to-capability translation layer.

Important distinction: WASI works differently for compiled vs. interpreted languages:

  • Compiled languages (Rust, C) compile directly to .wasm — no interpreter in the loop. WASI is a clean, efficient execution path.
  • Interpreted languages (Python, JS, Lua) still need their interpreter (CPython, QuickJS, etc.) — it’s just compiled to .wasm instead of native code. The stack becomes: script → interpreter.wasm → WASI runtime → kernel. You pay for a wasm sandbox layer on top of the interpreter you’d need anyway.

For interpreted languages, WASI sandboxing is valuable when running untrusted code (plugins, user-submitted scripts) where you don’t trust the interpreter itself. For trusted system scripts, native CPython/QuickJS via the POSIX layer (Part 4) is simpler and faster — the capability model already constrains what the process can do.

Tier 6: Managed Runtimes (much later)

Languages with their own runtimes (Java, .NET) would need their runtime ported to capOS. This is large effort and low priority. WASI is the pragmatic answer for these languages.

Go is a special case — see docs/proposals/go-runtime-proposal.md for the custom GOOS=capos path (motivated by CUE support). Go via WASI (GOOS=wasip1) is an alternative for CPU-bound use cases but lacks goroutine parallelism and networking.

C++ Note: pg83/std

pg83/std (https://github.com/pg83/std) was reviewed as a possible easier path to C++ on capOS. It is MIT licensed and centered on ObjPool, an arena-owned object graph model with small containers and lightweight public interfaces.

The useful subset for capOS is the low-level core: std/mem, std/lib, std/str, std/map, std/alg, std/typ, and std/sys/atomic. The main shim boundary is std/sys/crt.cpp, which currently provides allocation, memory/string intrinsics, and monotonic time through hosted libc calls.

The full library is not a shortcut to C++ support. It assumes hosted/POSIX facilities in large areas: malloc/free, clock_gettime, pthreads, poll, epoll/kqueue, sockets, fd I/O, DNS, and optional TLS libraries. Its build also expects a C++26-capable compiler. On the current development host, g++ 13.3.0 rejected -std=c++26 and clang++ was unavailable.

Treat it as a later C++ experiment after libcapos and C/C++ startup exist: port only the freestanding arena/container subset first, with exceptions and RTTI disabled unless a concrete C++ ABI decision enables them. Regular Rust and Go remain higher-priority runtime tracks.

Language-Specific Notes

Python

CPython is a C program. It can reach capOS via two paths:

  1. WASI: CPython compiled to python.wasm, runs inside Wasmtime/WAMR on capOS. Note: this is still CPython — WASI doesn’t eliminate the interpreter, it just compiles it to wasm. The stack is: script.py → python.wasm → WASI runtime (native) → kernel.
  2. POSIX layer: CPython compiled to native ELF via musl + libcapos-posix. Direct: script.py → cpython (native) → kernel.

WASI path — upstream status (as of March 2026):

  • CPython on WASI is Tier 2 since Python 3.13 (PEP 816)
  • Works for compute-only workloads (no I/O beyond stdout)
  • No sockets/networking — blocked on WASI 0.3 (no release date)
  • No threading — WASI SDK 26/27 have bugs, skipped by CPython
  • WASI 0.2 skipped entirely — going straight to 0.3
  • Python 3.14 targets WASI 0.1, SDK 24

POSIX path:

  • Full CPython built against musl + libcapos-posix
  • Networking works immediately (via TcpSocket/UdpSocket caps behind the POSIX socket shim), no dependency on WASI 0.3
  • More integration work than WASI, but unblocked

MicroPython: Small C program (~100K source) designed for embedded use. Builds against musl + libcapos-posix with minimal effort. No threading, no mmap, minimal syscall surface. Good for early scripting needs before full CPython is ported.

When to use which:

Use casePathWhy
Untrusted Python pluginsWASIWasm sandbox isolates interpreter bugs
System scripts, config toolingPOSIX (native CPython)Simpler, faster, networking works
Early scripting before POSIX layerWASI (compute-only)Works today, no porting needed
Lightweight embedded scriptingMicroPython via POSIXTiny footprint, minimal deps

Recommendation: Use POSIX path (native CPython) as the primary Python target once the POSIX layer exists. WASI path for sandboxed/untrusted execution. MicroPython for early experimentation. No custom Python runtime port needed — both paths reuse upstream CPython.

JavaScript / TypeScript

Same situation as Python — JS engines (V8, SpiderMonkey, QuickJS) are C/C++ programs that can be compiled to native via POSIX layer or to wasm via WASI. In both cases, the engine interprets JS; WASI just sandboxes the engine itself.

QuickJS is the MicroPython equivalent — tiny (~50K lines C), embeddable, trivially builds against libcapos. Good candidate for embedded scripting in capOS services without pulling in a full V8.

Lua

Tiny C implementation (~30K lines). Trivially builds against libcapos. Good candidate for an embedded scripting language in capOS services. Alternatively, runs via WASI with near-zero overhead.

Part 4: POSIX Compatibility Layer

Why POSIX at All?

capOS is not POSIX and doesn’t want to be. But:

  1. Existing software. Most useful software assumes POSIX. A DNS resolver, an HTTP server, a database – all speak open()/read()/write()/socket(). Without some compatibility layer, every piece of software must be rewritten.

  2. Developer familiarity. Programmers know POSIX. A compatibility layer lowers the barrier to writing capOS software, even if native caps are better.

  3. Gradual migration. Port software first with POSIX compat, then incrementally convert to native capabilities for tighter sandboxing.

The goal is NOT full POSIX compliance. It’s a pragmatic translation layer that maps POSIX concepts to capabilities, enabling existing software to run with minimal modification while preserving capability-based security.

Architecture: libcapos-posix

Application (C/Rust, uses POSIX APIs)
  │
  │  open(), read(), write(), socket(), ...
  │
  v
libcapos-posix (POSIX-to-capability translation)
  │
  │  Maps fds to caps, paths to namespace lookups
  │
  v
libcapos (native capability invocation)
  │
  │  SQ/CQ ring + cap_enter syscall
  │
  v
Kernel (capability dispatch)

libcapos-posix is a static library that provides POSIX-like function signatures. It is NOT libc – it doesn’t provide malloc (that’s the allocator in capos-rt/libcapos), locale support, or the thousand other things in glibc. It’s the ~50 syscall wrappers that matter for I/O.

File Descriptor Table

POSIX programs think in file descriptors. capOS has capabilities. The translation is a per-process fd-to-cap mapping table inside libcapos-posix:

#![allow(unused)]
fn main() {
struct FdTable {
    entries: BTreeMap<i32, FdEntry>,
    next_fd: i32,
}

enum FdEntry {
    /// Backed by a Console cap (stdout/stderr)
    Console { cap_id: u32 },
    /// Backed by a Namespace + hash (opened "file")
    StoreObject { namespace_cap: u32, hash: Vec<u8>, cursor: usize },
    /// Backed by a TcpSocket cap
    TcpSocket { cap_id: u32 },
    /// Backed by a UdpSocket cap
    UdpSocket { cap_id: u32 },
    /// Backed by a TcpListener cap
    Listener { cap_id: u32 },
    /// Pipe (IPC channel between two caps)
    Pipe { read_cap: u32, write_cap: u32 },
}
}

On process startup, libcapos-posix pre-populates:

  • fd 0 (stdin): if a Console or StdinReader cap is in the CapSet
  • fd 1 (stdout): mapped to Console cap
  • fd 2 (stderr): mapped to Console cap (or a separate Log cap)

Path Resolution

POSIX open("/etc/config.toml", O_RDONLY) becomes:

  1. libcapos-posix looks up the process’s Namespace cap (from CapSet, name "fs" or "root")
  2. Strips leading / (there is no global root – the namespace IS the root)
  3. Calls namespace.resolve("etc/config.toml") to get a store hash
  4. Calls store.get(hash) to retrieve the object data
  5. Creates an FdEntry::StoreObject with cursor at 0
  6. Returns the fd number

Relative paths work the same way – there’s no cwd concept by default, but libcapos-posix can maintain a synthetic cwd string and prepend it.

Path scoping is automatic. If the process was granted a Namespace scoped to "myapp/", then open("/data.db") resolves to "myapp/data.db" in the store. The process can’t escape its namespace – there’s no .. traversal because namespaces are flat prefix scopes, not hierarchical directories.

Supported POSIX Functions

Grouped by what capability backs them:

Console cap -> stdio:

POSIXcapOS translation
write(1, buf, len)console.write(buf[..len])
write(2, buf, len)console.write(buf[..len]) (or log cap)
read(0, buf, len)stdin.read(buf, len) if stdin cap exists

Namespace + Store caps -> file I/O:

POSIXcapOS translation
open(path, flags)namespace.resolve(path) -> store.get(hash) -> fd
read(fd, buf, len)memcpy from cached store object at cursor
write(fd, buf, len)buffer writes, flush to store.put() on close
close(fd)if modified: store.put(data) -> namespace.bind(path, hash)
lseek(fd, off, whence)update cursor in FdEntry
stat(path, buf)namespace.resolve(path) -> synthesize stat from object metadata
unlink(path)namespace.unbind(path) (object remains in store if referenced elsewhere)
opendir/readdirnamespace.list() filtered by prefix
mkdir(path)no-op or create empty namespace prefix (namespaces are implicit)

TcpSocket/UdpSocket caps -> networking:

POSIXcapOS translation
socket(AF_INET, SOCK_STREAM, 0)net_mgr.create_tcp_socket() -> fd
connect(fd, addr, len)tcp_socket.connect(addr)
bind(fd, addr, len)tcp_listener.bind(addr)
listen(fd, backlog)no-op (listener cap is already listening)
accept(fd)tcp_listener.accept() -> new fd
send(fd, buf, len, 0)tcp_socket.send(buf[..len])
recv(fd, buf, len, 0)tcp_socket.recv(buf, len)

Not supported (returns ENOSYS or EACCES):

POSIXWhy not
fork()No address space cloning. Use posix_spawn() (maps to ProcessSpawner)
exec()No in-place replacement. Use posix_spawn()
kill(pid, sig)No signals. Future lifecycle work may add ProcessHandle kill semantics
chmod/chownNo permission bits. Authority is structural
mmap(MAP_SHARED)No shared memory yet (future: SharedMemory cap)
ioctlNo device files. Use typed capability methods
ptraceNo debugging interface yet
pipe()Possible via IPC caps, but not in initial version
select/poll/epollRequires async cap invocation (Stage 5+). Initial version is blocking only

Process Creation Compatibility

capOS process creation is spawn-style, not fork/exec-style. A new process is a fresh ELF instance selected by ProcessSpawner, with an explicit initial CapSet assembled from granted capabilities. The parent address space is not cloned, and an existing process image is not replaced in place.

posix_spawn() is the compatibility primitive for subprocess creation. A libcapos-posix implementation maps it to ProcessSpawner.spawn(), translates file actions into fd-table setup and capability grants, and passes argv and environment data through the process bootstrap channel once that ABI exists. Programs that use the common fork() followed immediately by exec() pattern should be patched to call posix_spawn() directly.

Full fork() is intentionally not a native kernel primitive. Supporting it would require copy-on-write address-space cloning, parent/child register return semantics, fd-table duplication, a per-capability inheritance policy, safe handling for outstanding SQEs/CQEs, and defined behavior for endpoint calls, timers, waits, and process handles that are in flight at the fork point. Threaded POSIX processes add another constraint: only the calling thread is cloned, while locks and async-signal-safe state must remain coherent in the child.

If a concrete port needs more than posix_spawn(), the next step should be a narrow compatibility shim with vfork()/fork-for-exec semantics backed by ProcessSpawner, not a general kernel clone operation. That shim would suspend the parent, restrict child actions to exec-or-exit, and avoid pretending that arbitrary address-space cloning exists.

Security Model

The POSIX layer does NOT weaken capability security. Every POSIX call translates to a capability invocation on caps the process was actually granted:

  • open("/etc/passwd") fails if the process’s namespace doesn’t contain "etc/passwd" – not because of permission bits, but because the name doesn’t resolve
  • socket(AF_INET, SOCK_STREAM, 0) fails if the process wasn’t granted a NetworkManager cap
  • fork() fails unconditionally – there’s no way to synthesize it from caps

A POSIX binary on capOS is more constrained than on Linux, not less. The compatibility layer provides familiar function signatures, not familiar authority.

Building POSIX-Compatible Binaries

my-app/
  Cargo.toml        # depends on capos-posix (which depends on capos-rt)
  src/main.rs       # uses libc-style APIs

Or for C:

#include <capos/posix.h>   // open, read, write, close, socket, ...
#include <capos/capos.h>   // cap_call, capset_get, ...

int main() {
    // Works -- stdout is mapped to Console cap
    write(1, "hello\n", 6);

    // Works -- if "data" namespace cap was granted
    int fd = open("/config.toml", O_RDONLY);
    char buf[4096];
    ssize_t n = read(fd, buf, sizeof(buf));
    close(fd);

    // Works -- if NetworkManager cap was granted
    int sock = socket(AF_INET, SOCK_STREAM, 0);
    // ...
}

The linker pulls in libcapos-posix.a -> libcapos.a -> startup code. Same ELF output, same kernel loader.

musl as a Base (Optional, Later)

For broader C compatibility (printf, string functions, math), libcapos-posix can be layered under musl libc. musl has a clean syscall interface – all system calls go through a single __syscall() function. Replacing that function with capability-based dispatch gives you full libc on top of capOS capabilities:

// musl's syscall entry point -- we replace this
long __syscall(long n, ...) {
    switch (n) {
        case SYS_write: return capos_write(fd, buf, len);
        case SYS_open:  return capos_open(path, flags, mode);
        case SYS_socket: return capos_socket(domain, type, protocol);
        // ...
        default: return -ENOSYS;
    }
}

This is the same approach Fuchsia uses with fdio + musl, and Redox OS uses with relibc. It works and it gives you printf, fopen, getaddrinfo, and most of the C standard library.

Priority: after native capos-rt and libcapos are stable. musl integration is a significant engineering effort and should only be done when there’s actual software to port.

Part 5: WASI as an Alternative to POSIX

Why WASI Fits capOS Better Than POSIX

WASI (WebAssembly System Interface) was designed from the start as a capability-based system interface. Its concepts map almost directly to capOS:

WASI conceptcapOS equivalent
fd (pre-opened directory)Namespace cap
fd (socket)TcpSocket/UdpSocket cap
fd_write on stdoutConsole.write()
Pre-opened dirs at startupCapSet at spawn
No ambient filesystem accessNo ambient authority
path_open scoped to pre-opened dirnamespace.resolve() scoped to granted prefix

WASI programs already assume they get no ambient authority. A WASI binary compiled for capOS would need essentially zero translation for the security model – just a thin ABI adapter.

Architecture: Wasm Runtime as a capOS Service

WASI binary (.wasm)
  │
  │  WASI syscalls (fd_read, fd_write, path_open, ...)
  │
  v
wasm-runtime process (Wasmtime/wasm-micro-runtime, native capOS binary)
  │
  │  Translates WASI calls to capability invocations
  │  Each wasm instance gets its own CapSet
  │
  v
libcapos (native capability invocation)
  │
  v
Kernel

The wasm runtime is itself a native capOS process. It receives caps from its parent and partitions them among the wasm modules it hosts. This gives you:

  • Language independence. Any language that compiles to WASI (Rust, C, C++, Go, Python, JS, …) runs on capOS
  • Sandboxing for free. Wasm’s memory isolation + capOS capability scoping
  • No porting effort for software that already targets WASI
  • Density. Multiple wasm modules in one process, each with different caps

WASI vs Native Performance

Wasm adds overhead: bounds-checked memory, indirect calls, no SIMD (WASI preview 2 adds some). For system services (drivers, network stack), native Rust is the right choice. For application-level code (business logic, CLI tools, web services), wasm overhead is acceptable and the portability is worth it.

WASI Implementation Phases

Phase 1: wasm-micro-runtime as a capOS service. WAMR is a lightweight C wasm runtime designed for embedded/OS use. Build it as a native capOS C binary (via libcapos). Support fd_write (Console), proc_exit, and args_get – enough to run “hello world” wasm modules.

Phase 2: WASI filesystem via Namespace. Map WASI path_open/fd_read/ fd_write to Namespace + Store caps. Pre-opened directories become Namespace caps.

Phase 3: WASI sockets. Map WASI socket APIs to TcpSocket/UdpSocket caps.

Phase 4: WASI component model. WASI preview 2 components can expose and consume typed interfaces. These map naturally to capOS capability interfaces – a wasm component that exports an HTTP handler becomes a capability that other processes can invoke.

Part 6: Putting It All Together – Porting Strategy

Spectrum of Integration

Most native                                              Most compatible
     |                                                          |
     v                                                          v
Native Rust    C with libcapos    C with POSIX layer    WASI binary
(capos-rt)     (typed caps)       (libcapos-posix)      (wasm runtime)

- Best perf     - Good perf        - Familiar API        - Any language
- Full cap      - Full cap         - Auto sandboxing     - Auto sandboxing
  control         control            via cap scoping       via wasm + caps
- Most work     - Moderate work    - Least rewrite       - Zero rewrite
  to write        to write           for existing C        for WASI targets

Example: Porting a DNS Resolver

Native Rust: Rewrite using capos-rt. Receives UdpSocket cap, serves DNS lookups as a DnsResolver capability. Other processes get a DnsResolver cap instead of calling getaddrinfo(). Clean, typed, minimal authority.

C with POSIX layer: Take an existing DNS resolver (e.g., musl’s getaddrinfo implementation or a standalone resolver). Compile against libcapos-posix. Give it a UdpSocket cap and a Namespace cap for /etc/resolv.conf. It calls socket(), sendto(), recvfrom() – all translated to cap invocations. Works with minimal changes, but can’t export a typed DnsResolver cap (it speaks POSIX, not caps).

WASI: Compile a Rust DNS resolver to WASI. Run it in the wasm runtime. Same capability scoping, but through the wasm sandbox.

  1. System services: native Rust only. Drivers, network stack, store, init – these are the foundation and must use capabilities natively. No POSIX layer here.

  2. First applications: native Rust. While the ecosystem is young, applications should use capos-rt directly. This validates the cap model.

  3. C compatibility: when porting specific software. Don’t build the POSIX layer speculatively. Build it when there’s a specific C program to port (e.g., a DNS resolver, an HTTP server, a database). Let real porting needs drive which POSIX functions to implement.

  4. WASI: as the general-purpose application runtime. Once the native runtime is stable, the wasm runtime becomes the “run anything” answer. Lower priority than native Rust, but higher priority than full POSIX/musl compat, because WASI’s capability model is a natural fit.

Part 7: Schema Extensions

New schema types needed for the userspace runtime:

# Extend schema/capos.capnp

struct InitialCaps {
    entries @0 :List(InitialCapEntry);
}

struct InitialCapEntry {
    name @0 :Text;
    id @1 :UInt32;
    interfaceId @2 :UInt64;
}

interface ProcessSpawner {
    spawn @0 (name :Text, binaryName :Text, grants :List(CapGrant)) -> (handleIndex :UInt16);
}

struct CapGrant {
    name @0 :Text;
    capId @1 :UInt32;
    interfaceId @2 :UInt64;
}

interface ProcessHandle {
    wait @0 () -> (exitCode :Int64);
}

These definitions now live in schema/capos.capnp as the single source of truth. spawn() returns the ProcessHandle through the ring result-cap list; handleIndex identifies that transferred cap in the completion. The first slice passes a boot-package binaryName instead of raw ELF bytes so spawn requests stay inside the bounded ring parameter buffer; manifest-byte exposure and bulk-buffer spawning remain later work. kill, post-spawn grants, and exported-cap lookup are deferred until their lifecycle semantics are implemented.

Implementation Phases

Phase 1: capos-rt (parallel with Stage 4)

  • Create capos-rt/ crate (no_std + alloc, path dependency)
  • Implement syscall wrappers (sys_exit, sys_cap_enter) and ring helpers
  • Implement CapSet parsing from well-known page
  • Implement typed Console wrapper (first cap used from userspace)
  • Rewrite init/ to use capos-rt
  • Entry point macro, panic handler, allocator setup

Deliverable: init prints “Hello” via Console cap invocation through capos-rt, not raw asm.

Phase 2: Service binaries (after Stage 6)

  • Add capnp codegen to capos-rt build.rs (shared with kernel)
  • Implement typed wrappers for all schema-defined caps
  • Build the first multi-process demo: init spawns server + client, client invokes server cap
  • Establish the pattern for service binaries (Cargo.toml template, linker script, build integration)

Deliverable: two userspace processes communicate via typed capabilities.

Phase 3: libcapos for C (after Phase 2)

  • Expose capos-rt functionality via extern "C" API
  • Write capos.h header
  • Build system support for C userspace binaries (linker script, startup)
  • Port one small C program as validation

Deliverable: a C “hello world” using console_write_line().

Phase 4: POSIX compatibility (driven by need)

  • Implement FdTable and path resolution
  • Start with file I/O (open/read/write/close over Namespace + Store)
  • Add socket wrappers when networking is userspace
  • Optionally integrate musl for full libc

Deliverable: an existing C program (e.g., a simple HTTP server) runs on capOS with minimal source changes.

Phase 5: WASI runtime (after Phase 3)

  • Build wasm-micro-runtime as a native capOS C binary
  • Map WASI fd_write/proc_exit to caps
  • Extend to filesystem and socket WASI APIs
  • Run a “hello world” wasm module

Deliverable: hello.wasm runs on capOS.

Open Questions

  1. Allocator strategy. Should the userspace heap be a fixed-size region (simple, but limits memory), or should it grow by invoking a FrameAllocator cap (flexible, but every allocation might syscall)? Likely answer: fixed initial region + grow-on-demand via cap.

  2. Async I/O. The SQ/CQ ring is inherently asynchronous (submit SQEs, poll CQEs), but the initial capos-rt wrappers provide blocking convenience (submit one CALL SQE + cap_enter(1, MAX)). Real services need batched async patterns. Options:

    • Submit multiple SQEs, poll CQEs in an event loop (io_uring style)
    • Green threads in capos-rt, each blocking on its own cap_enter
    • Userspace executor (like tokio) driving the ring
  3. Cap passing in POSIX layer. POSIX has SCM_RIGHTS for passing fds over Unix sockets. Should the POSIX layer support something similar for passing caps? Or is this native-only?

  4. Dynamic linking. Currently all binaries are statically linked. Should capOS support shared libraries? Probably not initially – static linking is simpler and the binaries are small. Revisit if binary size becomes a concern.

  5. WASI component model integration. WASI preview 2 components have typed imports/exports that could map to capnp interfaces. Should the wasm runtime auto-generate capnp-to-WIT adapters from schemas? This would let wasm components participate natively in the capability graph.

  6. Build system. How are userspace binaries packed into the boot image? Currently the Makefile builds init/ separately. With multiple service binaries, need a more scalable approach (build manifest that lists all binaries, Makefile target that builds and packs them all).

Relationship to Other Proposals

  • Service architecture proposal – defines what services exist and how they compose. This proposal defines how those service binaries are built, what runtime they use, and how non-Rust software fits in.
  • Storage and naming proposal – the POSIX open()/read()/write() translation targets the Store and Namespace caps defined there.
  • Networking proposal – the POSIX socket translation targets the TcpSocket/UdpSocket caps from the network stack.