Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Proposal: Device Manager Refactor

Before the current module split, kernel/src/device_manager.rs was the convergence point for production device authority, transitional userspace cap surfaces, QEMU proof harnesses, audit labels, DMA/MMIO/IRQ policy, and serialization checks. That shape was useful while the Device Driver Foundation gate was evolving, but it hid the target userspace-driver model behind a single large file.

The refactor should keep the kernel device manager as the authoritative ownership ledger for claimed devices while separating proof scaffolding and domain-specific record logic into clearer modules. It must not weaken the single ownership transaction across DMAPool, DMABuffer, DeviceMmio, and Interrupt.

Implementation Status

The first mechanical proof split landed at 99c37592 (refactor(kernel): split device-manager proof scaffolding). Current main keeps public proof wrapper functions in kernel/src/device_manager/mod.rs for existing virtio.rs call sites, with the moved proof scaffolding in kernel/src/device_manager/proofs.rs.

Later mechanical slices split handles/errors and domain record helpers into the current kernel/src/device_manager/ module tree. The transaction-helper cleanup also landed at 98dddb72 (device_manager: share authority admission helpers), with the aggregate PciDeviceRecord still serving as the single claimed-device ledger and existing proof/audit labels preserved.

Design Grounding

  • DMA Isolation Design requires one device-manager ledger of record for each claimed device before userspace NIC or block drivers receive hardware authority.
  • Service Architecture makes init the holder of DeviceManager and ProcessSpawner, with child hardware drivers receiving only scoped device caps.
  • Networking defines the target NIC split: a userspace NIC driver holds DeviceMmio, Interrupt, and DMAPool, then exports a Nic cap to a separate network stack.
  • Device Driver Foundation is the active implementation track for the hardware authority gates that make this refactor useful. The plan explicitly schedules this refactor as high-priority DDF risk reduction subordinate to behavior-moving authority slices, and any further split slice must be mechanical, behavior-preserving, and reduce review risk for upcoming DeviceMmio, Interrupt, or DMAPool authority work.
  • Pass AttachedDmaPoolRecord by reference is the ready DDF prerequisite that converts the device-manager ledger record from by-value to by-reference threading through the proof emission paths. The current by-value layout exhausted the BSP boot stack when the inline AttachedDmaPoolRecord::proof_buffers slot count was grown past three; the by-reference conversion unlocks further provider-TX descriptor concurrency without expanding the per-frame footprint of nested proof emissions.

No external prior-art report is required for the initial split: this is a repo-local maintainability refactor that preserves the existing accepted authority model rather than selecting a new OS design.

Module Shape

The accepted current shape has converted kernel/src/device_manager.rs into a kernel/src/device_manager/ module tree:

kernel/src/device_manager/
  mod.rs          public API, re-exports, lock-order notes
  handles.rs      BDF, owner/state, and handle structs
  error.rs        production DeviceManagerError and display helpers
  mmio.rs         DeviceMmio records and map/unmap/read/write admission
  dma_pool.rs     DMAPool records, accounting, budget, teardown evidence
  dma_buffer.rs   DMABuffer records, map/free/submit/complete admission
  interrupt.rs    interrupt records and route/wait/ack/mask/unmask admission
  proofs.rs       transitional QEMU proof entry points and proof logs

Future cleanup is limited to optional registry, ledger, or proof-internal splits if they reduce review risk for upcoming DDF work. The current accepted proof split is proofs.rs, not a proofs/ directory.

PciDeviceRecord should remain the aggregate owner of a claimed device’s ledger. The split should move record-specific logic behind modules, not create independent managers that can diverge during teardown.

Follow-Up Risk Reduction

Two adjacent tracks reduce review risk for further DDF proof growth without disturbing the accepted module shape:

  • The by-reference ledger-record threading prerequisite tracked in docs/tasks/ddf-attached-dmapool-record-by-ref.md converts AttachedDmaPoolRecord from by-value to by-reference through the affected proof emission paths so that growing inline proof slot counts no longer multiplies cumulative stack frames across nested proof calls.
  • The scheduler off-stack release work that landed under d322a78f (sched: make thread stack drops off-stack explicit) and 9b94ea7f (sched: release qemu proof stacks off-stack) already pulls QEMU release-proof process kernel stacks off the dropping thread, which removes one stack-pressure axis on the BSP boot path that previously interacted with the device-manager proof emissions.

Refactor Strategy

  1. Split proof code first. Landed at 99c37592. prove_qemu_*, proof log structs, proof-only error enums, and bounded proof helper functions moved into a proof module. Wrapper functions remain in mod.rs so current virtio.rs call sites do not churn.

  2. Split handles and errors. Landed at 734383f9. PciBdf, owner/state enums, handle structs, DeviceMmioRegion, and DeviceManagerError moved into dedicated modules.

  3. Split record domains. Landed at af539f6c. MMIO, DMA pool, DMA buffer, and interrupt attached-record logic moved into domain modules while PciDeviceRecord remains the aggregate ledger owner.

  4. Preserve one authoritative ledger. Every operation that creates, consumes, or releases device-visible authority must still update the claimed-device ledger as part of the same ownership transaction that changes device-manager state.

  5. Improve internal APIs after the split. Landed at 98dddb72. Narrow transaction helpers and typed admission contexts now remove repeated stale-handle, owner, generation, state, and attached-record checks while preserving the single aggregate ledger.

Constraints

  • Preserve the existing lock order: PCI_DEVICE_MANAGER before DEVICE_INTERRUPT_ROUTES.
  • Preserve cap semantics, audit labels, proof labels, and QEMU smoke output during the initial split.
  • Keep userspace-driver authority blocked until the Device Driver Foundation gates still marked open are closed.
  • Avoid broad call-site churn. Compatibility wrappers are acceptable during the mechanical phase.
  • Do not move authority decisions into userspace. Userspace drivers receive scoped caps, but the kernel remains the ledger and enforcement point.
  • Keep proof code available until userspace-driver production gates have equivalent coverage.

Validation

For mechanical file movement, run:

  • make fmt-check
  • cargo build --features qemu
  • make workflow-check

When a slice moves code that emits or validates device proof labels, also run the affected QEMU gates:

  • make run-net
  • make run-ddf-provider-consumer
  • make run-devicemmio-grant
  • make run-dmapool-grant
  • make run-interrupt-grant
  • relevant make run-hardware-audit* targets when audit or proof labels move

Choose those gates from the moved authority surface, not from the file move alone:

  • proof-log or proof-label movement needs the QEMU target that asserts those exact proof lines;
  • grant-source or cap-object movement needs the matching run-*-grant target plus any parent lifecycle target it depends on;
  • audit emission, snapshot decode, or audit-label movement needs the matching run-hardware-audit* target;
  • DMABuffer, DeviceMmio, Interrupt, selected provider TX, proof labels, or schema-comment movement for those surfaces needs make run-ddf-provider-consumer;
  • MMIO, DMA, IRQ, or teardown transaction movement needs the focused grant target and the broader device proof target that exercises stale handle, revoke/reset, or release behavior;
  • pure type, handle, or error-module movement may stop at make fmt-check, cargo build --features qemu, and make workflow-check only when the public diff leaves proof labels, grant behavior, and authority transactions unchanged.

Success Criteria

  • device_manager is a module tree rather than one monolithic source file.
  • Production authority paths are visibly separated from QEMU proof scaffolding.
  • Public behavior and existing proof/audit labels are unchanged by the initial split.
  • The module boundaries match the target userspace-driver design: kernel code owns claim, revoke, teardown, MMIO, DMA, and IRQ authority; userspace drivers consume only scoped capabilities.