Proposal: Device Manager Refactor
Before the current module split, kernel/src/device_manager.rs was the
convergence point for production device authority, transitional userspace cap
surfaces, QEMU proof harnesses, audit labels, DMA/MMIO/IRQ policy, and
serialization checks. That shape was useful while the Device Driver Foundation
gate was evolving, but it hid the target userspace-driver model behind a
single large file.
The refactor should keep the kernel device manager as the authoritative
ownership ledger for claimed devices while separating proof scaffolding and
domain-specific record logic into clearer modules. It must not weaken the
single ownership transaction across DMAPool, DMABuffer, DeviceMmio, and
Interrupt.
Implementation Status
The first mechanical proof split landed at 99c37592
(refactor(kernel): split device-manager proof scaffolding). Current main
keeps public proof wrapper functions in kernel/src/device_manager/mod.rs for
existing virtio.rs call sites, with the moved proof scaffolding in
kernel/src/device_manager/proofs.rs.
Later mechanical slices split handles/errors and domain record helpers into the
current kernel/src/device_manager/ module tree. The transaction-helper
cleanup also landed at 98dddb72 (device_manager: share authority admission helpers), with the aggregate PciDeviceRecord still serving as the single
claimed-device ledger and existing proof/audit labels preserved.
Design Grounding
- DMA Isolation Design requires one device-manager ledger of record for each claimed device before userspace NIC or block drivers receive hardware authority.
- Service Architecture makes init the
holder of
DeviceManagerandProcessSpawner, with child hardware drivers receiving only scoped device caps. - Networking defines the target NIC split: a
userspace NIC driver holds
DeviceMmio,Interrupt, andDMAPool, then exports aNiccap to a separate network stack. - Device Driver Foundation is the
active implementation track for the hardware authority gates that make this
refactor useful. The plan explicitly schedules this refactor as
high-priority DDF risk reduction subordinate to behavior-moving authority
slices, and any further split slice must be mechanical, behavior-preserving,
and reduce review risk for upcoming
DeviceMmio,Interrupt, orDMAPoolauthority work. - Pass AttachedDmaPoolRecord by reference
is the ready DDF prerequisite that converts the device-manager ledger record
from by-value to by-reference threading through the proof emission paths.
The current by-value layout exhausted the BSP boot stack when the inline
AttachedDmaPoolRecord::proof_buffersslot count was grown past three; the by-reference conversion unlocks further provider-TX descriptor concurrency without expanding the per-frame footprint of nested proof emissions.
No external prior-art report is required for the initial split: this is a repo-local maintainability refactor that preserves the existing accepted authority model rather than selecting a new OS design.
Module Shape
The accepted current shape has converted kernel/src/device_manager.rs into a
kernel/src/device_manager/ module tree:
kernel/src/device_manager/
mod.rs public API, re-exports, lock-order notes
handles.rs BDF, owner/state, and handle structs
error.rs production DeviceManagerError and display helpers
mmio.rs DeviceMmio records and map/unmap/read/write admission
dma_pool.rs DMAPool records, accounting, budget, teardown evidence
dma_buffer.rs DMABuffer records, map/free/submit/complete admission
interrupt.rs interrupt records and route/wait/ack/mask/unmask admission
proofs.rs transitional QEMU proof entry points and proof logs
Future cleanup is limited to optional registry, ledger, or proof-internal
splits if they reduce review risk for upcoming DDF work. The current accepted
proof split is proofs.rs, not a proofs/ directory.
PciDeviceRecord should remain the aggregate owner of a claimed device’s
ledger. The split should move record-specific logic behind modules, not create
independent managers that can diverge during teardown.
Follow-Up Risk Reduction
Two adjacent tracks reduce review risk for further DDF proof growth without disturbing the accepted module shape:
- The by-reference ledger-record threading prerequisite tracked in
docs/tasks/ddf-attached-dmapool-record-by-ref.md
converts
AttachedDmaPoolRecordfrom by-value to by-reference through the affected proof emission paths so that growing inline proof slot counts no longer multiplies cumulative stack frames across nested proof calls. - The scheduler off-stack release work that landed under
d322a78f(sched: make thread stack drops off-stack explicit) and9b94ea7f(sched: release qemu proof stacks off-stack) already pulls QEMU release-proof process kernel stacks off the dropping thread, which removes one stack-pressure axis on the BSP boot path that previously interacted with the device-manager proof emissions.
Refactor Strategy
-
Split proof code first. Landed at
99c37592.prove_qemu_*, proof log structs, proof-only error enums, and bounded proof helper functions moved into a proof module. Wrapper functions remain inmod.rsso currentvirtio.rscall sites do not churn. -
Split handles and errors. Landed at
734383f9.PciBdf, owner/state enums, handle structs,DeviceMmioRegion, andDeviceManagerErrormoved into dedicated modules. -
Split record domains. Landed at
af539f6c. MMIO, DMA pool, DMA buffer, and interrupt attached-record logic moved into domain modules whilePciDeviceRecordremains the aggregate ledger owner. -
Preserve one authoritative ledger. Every operation that creates, consumes, or releases device-visible authority must still update the claimed-device ledger as part of the same ownership transaction that changes device-manager state.
-
Improve internal APIs after the split. Landed at
98dddb72. Narrow transaction helpers and typed admission contexts now remove repeated stale-handle, owner, generation, state, and attached-record checks while preserving the single aggregate ledger.
Constraints
- Preserve the existing lock order:
PCI_DEVICE_MANAGERbeforeDEVICE_INTERRUPT_ROUTES. - Preserve cap semantics, audit labels, proof labels, and QEMU smoke output during the initial split.
- Keep userspace-driver authority blocked until the Device Driver Foundation gates still marked open are closed.
- Avoid broad call-site churn. Compatibility wrappers are acceptable during the mechanical phase.
- Do not move authority decisions into userspace. Userspace drivers receive scoped caps, but the kernel remains the ledger and enforcement point.
- Keep proof code available until userspace-driver production gates have equivalent coverage.
Validation
For mechanical file movement, run:
make fmt-checkcargo build --features qemumake workflow-check
When a slice moves code that emits or validates device proof labels, also run the affected QEMU gates:
make run-netmake run-ddf-provider-consumermake run-devicemmio-grantmake run-dmapool-grantmake run-interrupt-grant- relevant
make run-hardware-audit*targets when audit or proof labels move
Choose those gates from the moved authority surface, not from the file move alone:
- proof-log or proof-label movement needs the QEMU target that asserts those exact proof lines;
- grant-source or cap-object movement needs the matching
run-*-granttarget plus any parent lifecycle target it depends on; - audit emission, snapshot decode, or audit-label movement needs the matching
run-hardware-audit*target; DMABuffer,DeviceMmio,Interrupt, selected provider TX, proof labels, or schema-comment movement for those surfaces needsmake run-ddf-provider-consumer;- MMIO, DMA, IRQ, or teardown transaction movement needs the focused grant target and the broader device proof target that exercises stale handle, revoke/reset, or release behavior;
- pure type, handle, or error-module movement may stop at
make fmt-check,cargo build --features qemu, andmake workflow-checkonly when the public diff leaves proof labels, grant behavior, and authority transactions unchanged.
Success Criteria
device_manageris a module tree rather than one monolithic source file.- Production authority paths are visibly separated from QEMU proof scaffolding.
- Public behavior and existing proof/audit labels are unchanged by the initial split.
- The module boundaries match the target userspace-driver design: kernel code owns claim, revoke, teardown, MMIO, DMA, and IRQ authority; userspace drivers consume only scoped capabilities.