Proposal: Capability-Based Binaries, Language Support, and POSIX Compatibility
How userspace binaries receive, use, and compose capabilities — from the native Rust runtime through POSIX compatibility to running unmodified software.
Current State
The init binary (init/src/main.rs) and smoke services are no_std Rust
binaries over capos-rt. The runtime owns _start, fixed heap initialization,
CapSet parsing, exit/cap_enter syscall wrappers, typed clients, result-cap
adoption, queued release flushing, and panic output. Init reads the BootPackage
manifest, validates the metadata-only service graph, spawns child services
through ProcessSpawner, waits on ProcessHandles, and exits. The former raw
bootstrap syscall and demo-support runtime shims are historical; demo support
now keeps only low-level transport helpers for intentionally malformed SQE/CQE
smokes.
The kernel-side roadmap (Stages 4-6) provides the capability ring (SQ/CQ
shared memory + cap_enter syscall, implemented), scheduling, and IPC. This
proposal covers the userspace half: what binaries look like, how they’re built,
and how existing software runs on a system with no ambient authority.
Part 1: Native Userspace Runtime (capos-rt)
The Problem
Every userspace binary currently needs to:
- Define
_startand a panic handler - Set up an allocator
- Construct raw syscall wrappers
- Manually serialize/deserialize capnp messages
- Know the syscall ABI (register layout, method IDs)
This is fine for one proof-of-concept binary. It won’t scale to dozens of services.
Solution: A Userspace Runtime Crate
capos-rt is a no_std + alloc Rust crate that every native capOS binary
depends on. It provides:
1. Entry point and allocator setup.
// capos-rt provides the real _start that:
// - initializes the heap allocator (bump allocator over a fixed region,
// or grows via FrameAllocator cap if granted)
// - parses the initial capability set from a kernel-provided page
// - calls the user's main(CapSet)
// - calls sys_exit with the return value
#[capos_rt::main]
fn main(caps: CapSet) -> Result<(), Error> {
let console = caps.get::<Console>("console")?;
console.write_line("Hello from capOS")?;
Ok(())
}
2. Syscall layer. Raw syscall asm wrapped in safe Rust functions.
The entire syscall surface is 2 calls – new operations are SQE opcodes, not
new syscalls:
sys_exit(code)– terminate process (syscall 1)sys_cap_enter(min_complete, timeout_ns)– flush pending SQEs, then wait until N completions are available or the timeout expires (syscall 2)
Capability invocations go through the per-process SQ/CQ ring. capos-rt
provides helpers for writing SQEs and reading CQEs:
#![allow(unused)]
fn main() {
/// Submit a CALL SQE to the capability ring and wait for the CQE.
pub fn cap_call(
ring: &mut CapRing,
cap_id: u32,
method_id: u16,
params: &[u8],
result_buf: &mut [u8],
) -> Result<usize, CapError> {
ring.push_call_sqe(cap_id, method_id, params);
sys_cap_enter(1, u64::MAX);
ring.pop_cqe(result_buf)
}
}
3. Cap’n Proto integration. Re-exports generated types from schema/capos.capnp
and provides typed wrappers:
#![allow(unused)]
fn main() {
// Generated from schema + thin wrapper in capos-rt
impl Console {
pub fn write(&self, data: &[u8]) -> Result<(), CapError> {
let mut msg = capnp::message::Builder::new_default();
let mut req = msg.init_root::<console::write_params::Builder>();
req.set_data(data);
self.invoke(0, &msg) // method @0
}
pub fn write_line(&self, text: &str) -> Result<(), CapError> {
let mut msg = capnp::message::Builder::new_default();
let mut req = msg.init_root::<console::write_line_params::Builder>();
req.set_text(text);
self.invoke(1, &msg) // method @1
}
}
}
4. CapSet – the initial capability environment.
At spawn time, the kernel writes the process’s initial capabilities into a
well-known page (or passes them via registers/stack – ABI TBD). capos-rt
parses this into a CapSet: a name-to-CapId map.
#![allow(unused)]
fn main() {
pub struct CapSet {
caps: BTreeMap<String, CapEntry>,
}
struct CapEntry {
id: u32, // authority-bearing slot in the process CapTable
interface_id: u64, // generated capnp TYPE_ID, carried for type checking
}
impl CapSet {
/// Get a typed capability by name. Fails if not present or wrong type.
pub fn get<T: Capability>(&self, name: &str) -> Result<T, CapError> { ... }
/// List available capability names (for debugging/discovery).
pub fn list(&self) -> impl Iterator<Item = (&str, u64)> { ... }
}
}
interface_id is not a handle. It is metadata carrying the generated Cap’n
Proto TYPE_ID for the interface expected by the typed client. The handle is
id (CapId). A typed client constructor must check that
entry.interface_id == T::TYPE_ID, then store the CapId. Normal CALL SQEs
do not need to repeat the interface ID because each capability table entry
exposes one public interface. The ring SQE keeps fixed-size reserved padding
for ABI stability, not a required interface field for the system transport.
This matters for the system transport because several capabilities can expose
the same interface while representing different authority: a serial console, a
log-buffer console, and a console proxy all have the Console TYPE_ID, but
different CapId values.
Crate Structure
capos-rt/
Cargo.toml # no_std + alloc, depends on capnp
build.rs # capnpc codegen from schema/capos.capnp
src/
lib.rs # re-exports, #[capos_rt::main] macro
syscall.rs # raw asm syscall wrappers
caps.rs # CapSet, CapEntry, Capability trait
alloc.rs # userspace heap allocator setup
generated.rs # include!(capnp generated code)
capos-rt is NOT a workspace member (same as init/ – needs different
code model and linker script). It’s a path dependency for userspace crates.
Init After capos-rt
// init/src/main.rs -- after capos-rt exists
use capos_rt::prelude::*;
#[capos_rt::main]
fn main(caps: CapSet) -> Result<(), Error> {
let console = caps.get::<Console>("console")?;
let spawner = caps.get::<ProcessSpawner>("spawner")?;
let manifest = caps.get::<Manifest>("manifest")?;
console.write_line("capOS init starting")?;
for entry in manifest.services()? {
let binary_name = entry.binary();
let granted = resolve_caps(&entry, &running_services, &caps)?;
let handle = spawner.spawn(entry.name(), binary_name, &granted)?;
running_services.insert(entry.name(), handle);
}
supervisor_loop(&running_services, &spawner)
}
Part 2: Capability-Based Binary Model
Binary Format
ELF64, same as now. The kernel’s ELF loader (kernel/src/elf.rs) already
handles PT_LOAD segments. No changes to the binary format itself.
What changes is the ABI contract between kernel and binary:
| Aspect | Current (Stage 3) | After capos-rt |
|---|---|---|
| Entry point | _start(), no args | _start(cap_page: *const u8) or via well-known address |
| Syscall ABI | ad-hoc (rax=0 write, rax=1 exit) | SQ/CQ ring + sys_cap_enter + sys_exit |
| Capability access | None | CapSet parsed from kernel-provided page |
| Serialization | None | capnp messages |
| Allocator | None (no heap) | Bump allocator, optionally backed by FrameAllocator cap |
Initial Capability Passing
The kernel needs to communicate the initial cap set to the new process. Options:
Option A: Well-known page. Kernel maps a read-only page at a fixed virtual
address (e.g., 0x1000) containing a capnp-serialized InitialCaps message:
struct InitialCaps {
entries @0 :List(InitialCapEntry);
}
struct InitialCapEntry {
name @0 :Text;
id @1 :UInt32;
interfaceId @2 :UInt64;
}
Option B: Register convention. Pass pointer and length in rdi/rsi at
entry. Simpler, but the data still needs to live somewhere in user memory.
Option C: Stack. Push the cap descriptor onto the user stack before iretq.
Similar to how Linux passes auxv to _start.
Option A is cleanest – the page is always there, no calling-convention dependency, and it naturally extends to passing additional boot info later.
Service Binary Lifecycle
1. Kernel loads ELF, creates address space, populates cap table
2. Kernel maps InitialCaps page at well-known address
3. Kernel enters userspace at _start
4. capos-rt _start:
a. Initialize heap allocator
b. Parse InitialCaps page into CapSet
c. Call user's main(CapSet)
5. User main:
a. Extract needed caps from CapSet
b. Do work (invoke caps, serve requests)
c. Optionally export caps to parent once ProcessHandle export lookup exists
6. On return from main (or sys_exit):
a. Kernel destroys process
b. All caps in process's cap table are dropped
c. Parent's ProcessHandle receives exit notification
Part 3: Language Support Roadmap
Tier 1: Rust (native, now)
Rust is the only language that matters until the runtime is stable. Reasons:
no_std + allocworks today with the existing kernelcapnpcrate (v0.25) hasno_stdsupport with codegen- Zero runtime overhead – no GC, no dynamic linker, no libc
- Same language as the kernel, shared understanding of the memory model
- Ownership model maps naturally to capability lifecycle
All system services (drivers, network stack, store) will be Rust.
Tier 2: C (via libcapos.h, after Stage 6)
C is the second target because most existing driver code and system software is C, and the FFI boundary with Rust is trivial.
libcapos is a static library providing:
#include <capos.h>
// Ring-based capability invocation (synchronous wrapper around SQ/CQ ring)
int cap_call(cap_ring_t *ring, uint32_t cap_id, uint16_t method_id,
const void *params, size_t params_len,
void *result, size_t result_len);
// Typed wrappers (generated from .capnp schema)
int console_write(cap_t console, const void *data, size_t len);
int console_write_line(cap_t console, const char *text);
// CapSet access
cap_t capset_get(const char *name);
uint64_t capset_interface_id(const char *name);
// Syscalls (the entire syscall surface -- 2 calls total)
_Noreturn void sys_exit(int code); // terminate
uint32_t sys_cap_enter(uint32_t min_complete, // flush SQEs + wait
uint64_t timeout_ns);
Implementation: libcapos is Rust compiled to a static .a with a C ABI
(#[no_mangle] extern "C"). The capnp message construction happens in Rust
behind the C API. This avoids requiring a C capnp implementation.
C binaries link against libcapos.a and use the same linker script as Rust
userspace binaries. The entry point and allocator setup are in libcapos.
Tier 3: Regular Rust Runtime Support
After the native capos-rt service model is stable, the next language
priority is making capOS build and run ordinary Rust programs as far as the
capability model permits. The target is not an ambient POSIX clone; it is a
Rust runtime path where common crates can use allocation, time, threading
where available, and capability-backed I/O through capOS-native shims.
This has higher priority than C++ and should be evaluated before broad POSIX
compatibility work, because Rust is already the system language and can reuse
the existing capos-rt ownership and ring abstractions directly.
Tier 4: Go (GOOS=capos)
Go is the next high-priority runtime after regular Rust. It needs in-process threading, futex-like wait/wake, TLS/runtime metadata support, GC integration, and a network poller mapped to capOS capabilities. See docs/proposals/go-runtime-proposal.md for the dedicated plan.
Go has higher priority than C++ because it unlocks CUE and a large practical tooling/runtime ecosystem; C++ support should not displace the Go runtime track.
Tier 5: Any Language Targeting WASI (longer term)
See Part 5 below. Languages that compile to WASI (Rust, C, Go, etc.) can run on capOS through a WASI-to-capability translation layer.
Important distinction: WASI works differently for compiled vs. interpreted languages:
- Compiled languages (Rust, C) compile directly to
.wasm— no interpreter in the loop. WASI is a clean, efficient execution path. - Interpreted languages (Python, JS, Lua) still need their interpreter
(CPython, QuickJS, etc.) — it’s just compiled to
.wasminstead of native code. The stack becomes: script → interpreter.wasm → WASI runtime → kernel. You pay for a wasm sandbox layer on top of the interpreter you’d need anyway.
For interpreted languages, WASI sandboxing is valuable when running untrusted code (plugins, user-submitted scripts) where you don’t trust the interpreter itself. For trusted system scripts, native CPython/QuickJS via the POSIX layer (Part 4) is simpler and faster — the capability model already constrains what the process can do.
Tier 6: Managed Runtimes (much later)
Languages with their own runtimes (Java, .NET) would need their runtime ported to capOS. This is large effort and low priority. WASI is the pragmatic answer for these languages.
Go is a special case — see docs/proposals/go-runtime-proposal.md
for the custom GOOS=capos path (motivated by CUE support). Go via WASI
(GOOS=wasip1) is an alternative for CPU-bound use cases but lacks
goroutine parallelism and networking.
C++ Note: pg83/std
pg83/std (https://github.com/pg83/std) was reviewed as a possible easier
path to C++ on capOS. It is MIT licensed and centered on ObjPool, an
arena-owned object graph model with small containers and lightweight public
interfaces.
The useful subset for capOS is the low-level core: std/mem, std/lib,
std/str, std/map, std/alg, std/typ, and std/sys/atomic. The main
shim boundary is std/sys/crt.cpp, which currently provides allocation,
memory/string intrinsics, and monotonic time through hosted libc calls.
The full library is not a shortcut to C++ support. It assumes hosted/POSIX
facilities in large areas: malloc/free, clock_gettime, pthreads, poll,
epoll/kqueue, sockets, fd I/O, DNS, and optional TLS libraries. Its build also
expects a C++26-capable compiler. On the current development host, g++
13.3.0 rejected -std=c++26 and clang++ was unavailable.
Treat it as a later C++ experiment after libcapos and C/C++ startup exist:
port only the freestanding arena/container subset first, with exceptions and
RTTI disabled unless a concrete C++ ABI decision enables them. Regular Rust and
Go remain higher-priority runtime tracks.
Language-Specific Notes
Python
CPython is a C program. It can reach capOS via two paths:
- WASI: CPython compiled to
python.wasm, runs inside Wasmtime/WAMR on capOS. Note: this is still CPython — WASI doesn’t eliminate the interpreter, it just compiles it to wasm. The stack is:script.py → python.wasm → WASI runtime (native) → kernel. - POSIX layer: CPython compiled to native ELF via musl +
libcapos-posix. Direct:script.py → cpython (native) → kernel.
WASI path — upstream status (as of March 2026):
- CPython on WASI is Tier 2 since Python 3.13 (PEP 816)
- Works for compute-only workloads (no I/O beyond stdout)
- No sockets/networking — blocked on WASI 0.3 (no release date)
- No threading — WASI SDK 26/27 have bugs, skipped by CPython
- WASI 0.2 skipped entirely — going straight to 0.3
- Python 3.14 targets WASI 0.1, SDK 24
POSIX path:
- Full CPython built against musl +
libcapos-posix - Networking works immediately (via TcpSocket/UdpSocket caps behind the POSIX socket shim), no dependency on WASI 0.3
- More integration work than WASI, but unblocked
MicroPython: Small C program (~100K source) designed for embedded use.
Builds against musl + libcapos-posix with minimal effort. No threading,
no mmap, minimal syscall surface. Good for early scripting needs before
full CPython is ported.
When to use which:
| Use case | Path | Why |
|---|---|---|
| Untrusted Python plugins | WASI | Wasm sandbox isolates interpreter bugs |
| System scripts, config tooling | POSIX (native CPython) | Simpler, faster, networking works |
| Early scripting before POSIX layer | WASI (compute-only) | Works today, no porting needed |
| Lightweight embedded scripting | MicroPython via POSIX | Tiny footprint, minimal deps |
Recommendation: Use POSIX path (native CPython) as the primary Python target once the POSIX layer exists. WASI path for sandboxed/untrusted execution. MicroPython for early experimentation. No custom Python runtime port needed — both paths reuse upstream CPython.
JavaScript / TypeScript
Same situation as Python — JS engines (V8, SpiderMonkey, QuickJS) are C/C++ programs that can be compiled to native via POSIX layer or to wasm via WASI. In both cases, the engine interprets JS; WASI just sandboxes the engine itself.
QuickJS is the MicroPython equivalent — tiny (~50K lines C), embeddable,
trivially builds against libcapos. Good candidate for embedded scripting
in capOS services without pulling in a full V8.
Lua
Tiny C implementation (~30K lines). Trivially builds against libcapos.
Good candidate for an embedded scripting language in capOS services.
Alternatively, runs via WASI with near-zero overhead.
Part 4: POSIX Compatibility Layer
Why POSIX at All?
capOS is not POSIX and doesn’t want to be. But:
-
Existing software. Most useful software assumes POSIX. A DNS resolver, an HTTP server, a database – all speak
open()/read()/write()/socket(). Without some compatibility layer, every piece of software must be rewritten. -
Developer familiarity. Programmers know POSIX. A compatibility layer lowers the barrier to writing capOS software, even if native caps are better.
-
Gradual migration. Port software first with POSIX compat, then incrementally convert to native capabilities for tighter sandboxing.
The goal is NOT full POSIX compliance. It’s a pragmatic translation layer that maps POSIX concepts to capabilities, enabling existing software to run with minimal modification while preserving capability-based security.
Architecture: libcapos-posix
Application (C/Rust, uses POSIX APIs)
│
│ open(), read(), write(), socket(), ...
│
v
libcapos-posix (POSIX-to-capability translation)
│
│ Maps fds to caps, paths to namespace lookups
│
v
libcapos (native capability invocation)
│
│ SQ/CQ ring + cap_enter syscall
│
v
Kernel (capability dispatch)
libcapos-posix is a static library that provides POSIX-like function
signatures. It is NOT libc – it doesn’t provide malloc (that’s the
allocator in capos-rt/libcapos), locale support, or the thousand other
things in glibc. It’s the ~50 syscall wrappers that matter for I/O.
File Descriptor Table
POSIX programs think in file descriptors. capOS has capabilities. The
translation is a per-process fd-to-cap mapping table inside libcapos-posix:
#![allow(unused)]
fn main() {
struct FdTable {
entries: BTreeMap<i32, FdEntry>,
next_fd: i32,
}
enum FdEntry {
/// Backed by a Console cap (stdout/stderr)
Console { cap_id: u32 },
/// Backed by a Namespace + hash (opened "file")
StoreObject { namespace_cap: u32, hash: Vec<u8>, cursor: usize },
/// Backed by a TcpSocket cap
TcpSocket { cap_id: u32 },
/// Backed by a UdpSocket cap
UdpSocket { cap_id: u32 },
/// Backed by a TcpListener cap
Listener { cap_id: u32 },
/// Pipe (IPC channel between two caps)
Pipe { read_cap: u32, write_cap: u32 },
}
}
On process startup, libcapos-posix pre-populates:
- fd 0 (stdin): if a
ConsoleorStdinReadercap is in the CapSet - fd 1 (stdout): mapped to
Consolecap - fd 2 (stderr): mapped to
Consolecap (or a separateLogcap)
Path Resolution
POSIX open("/etc/config.toml", O_RDONLY) becomes:
libcapos-posixlooks up the process’sNamespacecap (from CapSet, name"fs"or"root")- Strips leading
/(there is no global root – the namespace IS the root) - Calls
namespace.resolve("etc/config.toml")to get a store hash - Calls
store.get(hash)to retrieve the object data - Creates an
FdEntry::StoreObjectwith cursor at 0 - Returns the fd number
Relative paths work the same way – there’s no cwd concept by default, but
libcapos-posix can maintain a synthetic cwd string and prepend it.
Path scoping is automatic. If the process was granted a Namespace scoped
to "myapp/", then open("/data.db") resolves to "myapp/data.db" in the
store. The process can’t escape its namespace – there’s no .. traversal
because namespaces are flat prefix scopes, not hierarchical directories.
Supported POSIX Functions
Grouped by what capability backs them:
Console cap -> stdio:
| POSIX | capOS translation |
|---|---|
write(1, buf, len) | console.write(buf[..len]) |
write(2, buf, len) | console.write(buf[..len]) (or log cap) |
read(0, buf, len) | stdin.read(buf, len) if stdin cap exists |
Namespace + Store caps -> file I/O:
| POSIX | capOS translation |
|---|---|
open(path, flags) | namespace.resolve(path) -> store.get(hash) -> fd |
read(fd, buf, len) | memcpy from cached store object at cursor |
write(fd, buf, len) | buffer writes, flush to store.put() on close |
close(fd) | if modified: store.put(data) -> namespace.bind(path, hash) |
lseek(fd, off, whence) | update cursor in FdEntry |
stat(path, buf) | namespace.resolve(path) -> synthesize stat from object metadata |
unlink(path) | namespace.unbind(path) (object remains in store if referenced elsewhere) |
opendir/readdir | namespace.list() filtered by prefix |
mkdir(path) | no-op or create empty namespace prefix (namespaces are implicit) |
TcpSocket/UdpSocket caps -> networking:
| POSIX | capOS translation |
|---|---|
socket(AF_INET, SOCK_STREAM, 0) | net_mgr.create_tcp_socket() -> fd |
connect(fd, addr, len) | tcp_socket.connect(addr) |
bind(fd, addr, len) | tcp_listener.bind(addr) |
listen(fd, backlog) | no-op (listener cap is already listening) |
accept(fd) | tcp_listener.accept() -> new fd |
send(fd, buf, len, 0) | tcp_socket.send(buf[..len]) |
recv(fd, buf, len, 0) | tcp_socket.recv(buf, len) |
Not supported (returns ENOSYS or EACCES):
| POSIX | Why not |
|---|---|
fork() | No address space cloning. Use posix_spawn() (maps to ProcessSpawner) |
exec() | No in-place replacement. Use posix_spawn() |
kill(pid, sig) | No signals. Future lifecycle work may add ProcessHandle kill semantics |
chmod/chown | No permission bits. Authority is structural |
mmap(MAP_SHARED) | No shared memory yet (future: SharedMemory cap) |
ioctl | No device files. Use typed capability methods |
ptrace | No debugging interface yet |
pipe() | Possible via IPC caps, but not in initial version |
select/poll/epoll | Requires async cap invocation (Stage 5+). Initial version is blocking only |
Process Creation Compatibility
capOS process creation is spawn-style, not fork/exec-style. A new process is a
fresh ELF instance selected by ProcessSpawner, with an explicit initial
CapSet assembled from granted capabilities. The parent address space is not
cloned, and an existing process image is not replaced in place.
posix_spawn() is the compatibility primitive for subprocess creation. A
libcapos-posix implementation maps it to ProcessSpawner.spawn(), translates
file actions into fd-table setup and capability grants, and passes argv and
environment data through the process bootstrap channel once that ABI exists.
Programs that use the common fork() followed immediately by exec() pattern
should be patched to call posix_spawn() directly.
Full fork() is intentionally not a native kernel primitive. Supporting it
would require copy-on-write address-space cloning, parent/child register return
semantics, fd-table duplication, a per-capability inheritance policy, safe
handling for outstanding SQEs/CQEs, and defined behavior for endpoint calls,
timers, waits, and process handles that are in flight at the fork point.
Threaded POSIX processes add another constraint: only the calling thread is
cloned, while locks and async-signal-safe state must remain coherent in the
child.
If a concrete port needs more than posix_spawn(), the next step should be a
narrow compatibility shim with vfork()/fork-for-exec semantics backed by
ProcessSpawner, not a general kernel clone operation. That shim would suspend
the parent, restrict child actions to exec-or-exit, and avoid pretending that
arbitrary address-space cloning exists.
Security Model
The POSIX layer does NOT weaken capability security. Every POSIX call translates to a capability invocation on caps the process was actually granted:
open("/etc/passwd")fails if the process’s namespace doesn’t contain"etc/passwd"– not because of permission bits, but because the name doesn’t resolvesocket(AF_INET, SOCK_STREAM, 0)fails if the process wasn’t granted aNetworkManagercapfork()fails unconditionally – there’s no way to synthesize it from caps
A POSIX binary on capOS is more constrained than on Linux, not less. The compatibility layer provides familiar function signatures, not familiar authority.
Building POSIX-Compatible Binaries
my-app/
Cargo.toml # depends on capos-posix (which depends on capos-rt)
src/main.rs # uses libc-style APIs
Or for C:
#include <capos/posix.h> // open, read, write, close, socket, ...
#include <capos/capos.h> // cap_call, capset_get, ...
int main() {
// Works -- stdout is mapped to Console cap
write(1, "hello\n", 6);
// Works -- if "data" namespace cap was granted
int fd = open("/config.toml", O_RDONLY);
char buf[4096];
ssize_t n = read(fd, buf, sizeof(buf));
close(fd);
// Works -- if NetworkManager cap was granted
int sock = socket(AF_INET, SOCK_STREAM, 0);
// ...
}
The linker pulls in libcapos-posix.a -> libcapos.a -> startup code.
Same ELF output, same kernel loader.
musl as a Base (Optional, Later)
For broader C compatibility (printf, string functions, math), libcapos-posix
can be layered under musl libc. musl has a clean
syscall interface – all system calls go through a single __syscall() function.
Replacing that function with capability-based dispatch gives you full libc on
top of capOS capabilities:
// musl's syscall entry point -- we replace this
long __syscall(long n, ...) {
switch (n) {
case SYS_write: return capos_write(fd, buf, len);
case SYS_open: return capos_open(path, flags, mode);
case SYS_socket: return capos_socket(domain, type, protocol);
// ...
default: return -ENOSYS;
}
}
This is the same approach Fuchsia uses with fdio + musl, and Redox OS uses
with relibc. It works and it gives you printf, fopen, getaddrinfo, and
most of the C standard library.
Priority: after native capos-rt and libcapos are stable. musl integration is a significant engineering effort and should only be done when there’s actual software to port.
Part 5: WASI as an Alternative to POSIX
Why WASI Fits capOS Better Than POSIX
WASI (WebAssembly System Interface) was designed from the start as a capability-based system interface. Its concepts map almost directly to capOS:
| WASI concept | capOS equivalent |
|---|---|
fd (pre-opened directory) | Namespace cap |
fd (socket) | TcpSocket/UdpSocket cap |
fd_write on stdout | Console.write() |
| Pre-opened dirs at startup | CapSet at spawn |
| No ambient filesystem access | No ambient authority |
path_open scoped to pre-opened dir | namespace.resolve() scoped to granted prefix |
WASI programs already assume they get no ambient authority. A WASI binary compiled for capOS would need essentially zero translation for the security model – just a thin ABI adapter.
Architecture: Wasm Runtime as a capOS Service
WASI binary (.wasm)
│
│ WASI syscalls (fd_read, fd_write, path_open, ...)
│
v
wasm-runtime process (Wasmtime/wasm-micro-runtime, native capOS binary)
│
│ Translates WASI calls to capability invocations
│ Each wasm instance gets its own CapSet
│
v
libcapos (native capability invocation)
│
v
Kernel
The wasm runtime is itself a native capOS process. It receives caps from its parent and partitions them among the wasm modules it hosts. This gives you:
- Language independence. Any language that compiles to WASI (Rust, C, C++, Go, Python, JS, …) runs on capOS
- Sandboxing for free. Wasm’s memory isolation + capOS capability scoping
- No porting effort for software that already targets WASI
- Density. Multiple wasm modules in one process, each with different caps
WASI vs Native Performance
Wasm adds overhead: bounds-checked memory, indirect calls, no SIMD (WASI preview 2 adds some). For system services (drivers, network stack), native Rust is the right choice. For application-level code (business logic, CLI tools, web services), wasm overhead is acceptable and the portability is worth it.
WASI Implementation Phases
Phase 1: wasm-micro-runtime as a capOS service. WAMR
is a lightweight C wasm runtime designed for embedded/OS use. Build it as a
native capOS C binary (via libcapos). Support fd_write (Console),
proc_exit, and args_get – enough to run “hello world” wasm modules.
Phase 2: WASI filesystem via Namespace. Map WASI path_open/fd_read/
fd_write to Namespace + Store caps. Pre-opened directories become
Namespace caps.
Phase 3: WASI sockets. Map WASI socket APIs to TcpSocket/UdpSocket caps.
Phase 4: WASI component model. WASI preview 2 components can expose and consume typed interfaces. These map naturally to capOS capability interfaces – a wasm component that exports an HTTP handler becomes a capability that other processes can invoke.
Part 6: Putting It All Together – Porting Strategy
Spectrum of Integration
Most native Most compatible
| |
v v
Native Rust C with libcapos C with POSIX layer WASI binary
(capos-rt) (typed caps) (libcapos-posix) (wasm runtime)
- Best perf - Good perf - Familiar API - Any language
- Full cap - Full cap - Auto sandboxing - Auto sandboxing
control control via cap scoping via wasm + caps
- Most work - Moderate work - Least rewrite - Zero rewrite
to write to write for existing C for WASI targets
Example: Porting a DNS Resolver
Native Rust: Rewrite using capos-rt. Receives UdpSocket cap, serves
DNS lookups as a DnsResolver capability. Other processes get a
DnsResolver cap instead of calling getaddrinfo(). Clean, typed, minimal
authority.
C with POSIX layer: Take an existing DNS resolver (e.g., musl’s
getaddrinfo implementation or a standalone resolver). Compile against
libcapos-posix. Give it a UdpSocket cap and a Namespace cap for
/etc/resolv.conf. It calls socket(), sendto(), recvfrom() – all
translated to cap invocations. Works with minimal changes, but can’t export
a typed DnsResolver cap (it speaks POSIX, not caps).
WASI: Compile a Rust DNS resolver to WASI. Run it in the wasm runtime. Same capability scoping, but through the wasm sandbox.
Recommended Approach for capOS
-
System services: native Rust only. Drivers, network stack, store, init – these are the foundation and must use capabilities natively. No POSIX layer here.
-
First applications: native Rust. While the ecosystem is young, applications should use
capos-rtdirectly. This validates the cap model. -
C compatibility: when porting specific software. Don’t build the POSIX layer speculatively. Build it when there’s a specific C program to port (e.g., a DNS resolver, an HTTP server, a database). Let real porting needs drive which POSIX functions to implement.
-
WASI: as the general-purpose application runtime. Once the native runtime is stable, the wasm runtime becomes the “run anything” answer. Lower priority than native Rust, but higher priority than full POSIX/musl compat, because WASI’s capability model is a natural fit.
Part 7: Schema Extensions
New schema types needed for the userspace runtime:
# Extend schema/capos.capnp
struct InitialCaps {
entries @0 :List(InitialCapEntry);
}
struct InitialCapEntry {
name @0 :Text;
id @1 :UInt32;
interfaceId @2 :UInt64;
}
interface ProcessSpawner {
spawn @0 (name :Text, binaryName :Text, grants :List(CapGrant)) -> (handleIndex :UInt16);
}
struct CapGrant {
name @0 :Text;
capId @1 :UInt32;
interfaceId @2 :UInt64;
}
interface ProcessHandle {
wait @0 () -> (exitCode :Int64);
}
These definitions now live in schema/capos.capnp as the single source of
truth. spawn() returns the ProcessHandle through the ring result-cap list;
handleIndex identifies that transferred cap in the completion. The first
slice passes a boot-package binaryName instead of raw ELF bytes so spawn
requests stay inside the bounded ring parameter buffer; manifest-byte exposure
and bulk-buffer spawning remain later work. kill, post-spawn grants, and
exported-cap lookup are deferred until their lifecycle semantics are
implemented.
Implementation Phases
Phase 1: capos-rt (parallel with Stage 4)
- Create
capos-rt/crate (no_std + alloc, path dependency) - Implement syscall wrappers (
sys_exit,sys_cap_enter) and ring helpers - Implement CapSet parsing from well-known page
- Implement typed Console wrapper (first cap used from userspace)
- Rewrite
init/to use capos-rt - Entry point macro, panic handler, allocator setup
Deliverable: init prints “Hello” via Console cap invocation through capos-rt, not raw asm.
Phase 2: Service binaries (after Stage 6)
- Add capnp codegen to capos-rt build.rs (shared with kernel)
- Implement typed wrappers for all schema-defined caps
- Build the first multi-process demo: init spawns server + client, client invokes server cap
- Establish the pattern for service binaries (Cargo.toml template, linker script, build integration)
Deliverable: two userspace processes communicate via typed capabilities.
Phase 3: libcapos for C (after Phase 2)
- Expose capos-rt functionality via
extern "C"API - Write
capos.hheader - Build system support for C userspace binaries (linker script, startup)
- Port one small C program as validation
Deliverable: a C “hello world” using console_write_line().
Phase 4: POSIX compatibility (driven by need)
- Implement FdTable and path resolution
- Start with file I/O (open/read/write/close over Namespace + Store)
- Add socket wrappers when networking is userspace
- Optionally integrate musl for full libc
Deliverable: an existing C program (e.g., a simple HTTP server) runs on capOS with minimal source changes.
Phase 5: WASI runtime (after Phase 3)
- Build wasm-micro-runtime as a native capOS C binary
- Map WASI fd_write/proc_exit to caps
- Extend to filesystem and socket WASI APIs
- Run a “hello world” wasm module
Deliverable: hello.wasm runs on capOS.
Open Questions
-
Allocator strategy. Should the userspace heap be a fixed-size region (simple, but limits memory), or should it grow by invoking a FrameAllocator cap (flexible, but every allocation might syscall)? Likely answer: fixed initial region + grow-on-demand via cap.
-
Async I/O. The SQ/CQ ring is inherently asynchronous (submit SQEs, poll CQEs), but the initial
capos-rtwrappers provide blocking convenience (submit one CALL SQE +cap_enter(1, MAX)). Real services need batched async patterns. Options:- Submit multiple SQEs, poll CQEs in an event loop (io_uring style)
- Green threads in capos-rt, each blocking on its own
cap_enter - Userspace executor (like tokio) driving the ring
-
Cap passing in POSIX layer. POSIX has
SCM_RIGHTSfor passing fds over Unix sockets. Should the POSIX layer support something similar for passing caps? Or is this native-only? -
Dynamic linking. Currently all binaries are statically linked. Should capOS support shared libraries? Probably not initially – static linking is simpler and the binaries are small. Revisit if binary size becomes a concern.
-
WASI component model integration. WASI preview 2 components have typed imports/exports that could map to capnp interfaces. Should the wasm runtime auto-generate capnp-to-WIT adapters from schemas? This would let wasm components participate natively in the capability graph.
-
Build system. How are userspace binaries packed into the boot image? Currently the Makefile builds
init/separately. With multiple service binaries, need a more scalable approach (build manifest that lists all binaries, Makefile target that builds and packs them all).
Relationship to Other Proposals
- Service architecture proposal – defines what services exist and how they compose. This proposal defines how those service binaries are built, what runtime they use, and how non-Rust software fits in.
- Storage and naming proposal – the POSIX
open()/read()/write()translation targets the Store and Namespace caps defined there. - Networking proposal – the POSIX socket translation targets the TcpSocket/UdpSocket caps from the network stack.