S.9 Design: Authority Graph and Resource Accounting for Transfer
This document defines the concrete S.9 design gate for:
- WORKPLAN 3.6 capability transfer (
xfer_cap_count, copy/move, rollback) - WORKPLAN 5.2 ProcessSpawner prerequisites (spawn quotas and result-cap insertion)
S.9 is complete when this design contract is concrete enough to guide implementation. The invariants and acceptance criteria below are implementation gates for later work in 3.6/5.2/S.8/S.12, not requirements for declaring the S.9 design artifact complete.
1. Authority Graph Model
Authority is modeled as a directed multigraph:
- Nodes:
Process(Pid)Object(ObjectId)(kernel object identity, independent of per-processCapId)
- Edges:
Hold(Pid -> ObjectId)with metadata:cap_id(table-local handle)interface_idbadgetransfer_mode(copy,move,non_transferable)origin(kernel,spawn_grant,ipc_transfer,result_cap)
Security invariant A1: all authority is represented by Hold edges; no
operation can create object authority outside this graph.
Security invariant A2: each process mutates only its own CapTable edges except
through explicit transfer/spawn transactions validated by the kernel.
Security invariant A3: for every live Hold edge there is exactly one
cap_id slot in one process table referencing the object generation.
2. Per-Process Resource Ledger and Quotas
Each process owns a kernel-maintained ResourceLedger with hard limits.
Enforcement is fail-closed at reservation time (before side effects).
ResourceLedger {
cap_slots_used / cap_slots_max
endpoint_queue_used / endpoint_queue_max
outstanding_calls_used / outstanding_calls_max
scratch_bytes_used / scratch_bytes_max
frame_grant_pages_used / frame_grant_pages_max
log_bytes_window_used / log_bytes_per_window (token bucket)
cpu_time_us_window_used / cpu_budget_us_per_window (token bucket)
}
Initial quota profile for Stage 6/5.2 bring-up (tunable by kernel config):
cap_slots_max: 256endpoint_queue_max: 128 messagesoutstanding_calls_max: 64scratch_bytes_max: 256 KiBframe_grant_pages_max: 4096 pages (16 MiB at 4 KiB pages)log_bytes_per_window: 64 KiB/sec with 256 KiB burstcpu_budget_us_per_window: 10,000 us per 100,000 us window
Security invariant Q1: no counter may exceed its max.
Security invariant Q2: every resource reservation has a matched release on all success, error, timeout, process-exit, and rollback paths.
Security invariant Q3: quota checks for transfer/spawn happen before mutating sender or receiver capability state.
3. Diagnostic Rate Limiting and Aggregation
Repeated invalid ring/cap submissions are aggregated per process and error key.
- Key:
(pid, error_code, opcode, cap_id_bucket) - Buckets:
cap_id_bucket = exact cap idfor stale/invalid cap failurescap_id_bucket = 0for structural ring errors
- Per-key token bucket: allow first
N=4emissions/sec, then suppress. - Suppressed counts are flushed once per second as one summary line:
pid=X invalid submissions suppressed=Y last_err=...
Security invariant D1: invalid submission floods cannot consume unbounded serial bandwidth or scheduler time in log formatting.
Security invariant D2: suppression never hides first-observation diagnostics for
a new (pid,error,opcode,cap bucket) key.
4. Transfer and Rollback Semantics
Transfers (xfer_cap_count > 0) use a kernel transfer transaction
(TransferTxn) scoped to a single SQE dispatch. The current ring ABI does not
provide kernel-owned SQE sequence numbers or a durable transaction table, so
userspace replay of a copy-transfer SQE is repeatable: each replay is treated
as a new copy grant. Move-transfer replay fails closed after the source slot is
removed or reserved by the first successful dispatch.
Future exactly-once replay suppression requires transaction identity scoped to
(sender_pid, call_id, sqe_seq) and a monotonic transfer epoch. Until that
exists, exactly-once claims apply only within one dispatch attempt, not across
malicious rewrites of shared SQ ring indexes.
Phases:
Prepare:- validate SQE transport fields and
xfer_cap_count - validate sender ownership/generation/transferability for each exported cap
- reserve receiver quota (
cap_slots,outstanding_calls, scratch if needed) - pin sender entries in txn state (no sender table mutation yet)
- validate SQE transport fields and
Commit:- insert destination edges exactly once
- for
copy: increment object refcount/export ref - for
move: remove sender slot only after destination insertion succeeds - publish completion/result
Finalize:- release transient reservations
- mark txn terminal (
committedoraborted)
On any error before Commit, rollback is full:
- receiver inserts are not visible
- sender slots/refcounts unchanged
- reservations released
- CQE returns transfer failure (
CAP_ERR_TRANSFER_ABORTED/ subtype)
On error during Commit, kernel executes compensating rollback to preserve
exactly-once visibility: either all inserts are visible with matching sender
state transition, or none are visible.
Security invariant T1: each transfer descriptor is applied at most once within a single SQE dispatch attempt.
Security invariant T2: move transfer is atomic from observer perspective; no state exists where both sender and receiver lose authority due to partial apply.
Security invariant T3: copy-transfer SQE replay is explicitly repeatable until kernel-owned transaction identity exists. Move-transfer replay fails closed after source removal or source reservation.
Security invariant T4: CAP_OP_RELEASE removes one local hold edge only from
the caller table and decrements remote export refs exactly once.
5. Integration with 3.6 Capability Transfer
3.6 implementation must consume this design directly:
CALLandRETURNvalidate all currently-reserved transfer fields fail-closed when unsupported.xfer_cap_countpath is wired throughTransferTxn(no ad hoc direct inserts).- Badge propagation is explicit in transfer descriptors and copied into destination edge metadata.
CAP_OP_RELEASEuses the same authority ledger and refcount bookkeeping.
3.6 acceptance criteria:
- Copy transfer produces one new receiver edge and retains sender edge.
- Move transfer produces one new receiver edge and deletes sender edge atomically.
- Any transfer failure leaves sender and receiver
CapTables unchanged. - Copy replay is an explicit repeatable-grant policy until a kernel-owned transaction identity is added; move replay fails closed after source removal or reservation.
CAP_OP_RELEASEon stale/non-owned cap fails closed without mutating other process tables.
6. Integration with 5.2 ProcessSpawner Prerequisites
5.2 must use the same accounting and transfer machinery:
spawn()preflights child quotas (cap_slots,outstanding_calls,scratch,frame_grant_pages, endpoint queue baseline) before mapping child memory or scheduling.- Parent-provided
CapGrantentries are inserted via the same transfer transaction semantics (copy for initial grants in 5.2.2). - Returned
ProcessHandleis inserted through the standard result-cap insertion path and accounted as a normal cap slot. - Child setup rollback must unwind:
- address space mappings
- ring page
- CapSet page
- kernel stack
- allocated frames
- provisional capability edges/reservations
5.2 acceptance criteria:
- Spawn failure at any step leaves no child-visible process and no leaked ledger usage.
- Successful spawn accounts all child bootstrap resources within quotas.
- Parent and child cap-table accounting remains balanced under repeated spawn/exit cycles.
ProcessHandle.waitand exit cleanup release outstanding-call/scratch/frame usage deterministically.
7. Implementation Notes for Verification Tracks
This design unblocks:
- S.8 hostile-input tests for quota and invalid-transfer failures.
- S.12 Kani bounds refresh for ledger and transfer invariants.
- Target 12 in
docs/proposals/security-and-verification-proposal.mdwith explicit allocator hooks and fail-closed exhaustion behavior.