# Proposal: Device Manager Refactor

Before the current module split, `kernel/src/device_manager.rs` was the
convergence point for production device authority, transitional userspace cap
surfaces, QEMU proof harnesses, audit labels, DMA/MMIO/IRQ policy, and
serialization checks. That shape was useful while the Device Driver Foundation
gate was evolving, but it hid the target userspace-driver model behind a
single large file.

The refactor should keep the kernel device manager as the authoritative
ownership ledger for claimed devices while separating proof scaffolding and
domain-specific record logic into clearer modules. It must not weaken the
single ownership transaction across `DMAPool`, `DMABuffer`, `DeviceMmio`, and
`Interrupt`.

## Implementation Status

The first mechanical proof split landed at `99c37592`
(`refactor(kernel): split device-manager proof scaffolding`). Current `main`
keeps public proof wrapper functions in `kernel/src/device_manager/mod.rs` for
existing `virtio.rs` call sites, with the moved proof scaffolding in
`kernel/src/device_manager/proofs.rs`.

Later mechanical slices split handles/errors and domain record helpers into the
current `kernel/src/device_manager/` module tree. The transaction-helper
cleanup also landed at `98dddb72` (`device_manager: share authority admission
helpers`), with the aggregate `PciDeviceRecord` still serving as the single
claimed-device ledger and existing proof/audit labels preserved.

## Design Grounding

- [DMA Isolation Design](../dma-isolation-design.md) requires one
  device-manager ledger of record for each claimed device before userspace NIC
  or block drivers receive hardware authority.
- [Service Architecture](service-architecture-proposal.md) makes init the
  holder of `DeviceManager` and `ProcessSpawner`, with child hardware drivers
  receiving only scoped device caps.
- [Networking](networking-proposal.md) defines the target NIC split: a
  userspace NIC driver holds `DeviceMmio`, `Interrupt`, and `DMAPool`, then
  exports a `Nic` cap to a separate network stack.
- [Device Driver Foundation](../backlog/hardware-boot-storage.md) is the
  active implementation track for the hardware authority gates that make this
  refactor useful. The plan explicitly schedules this refactor as
  high-priority DDF risk reduction subordinate to behavior-moving authority
  slices, and any further split slice must be mechanical, behavior-preserving,
  and reduce review risk for upcoming `DeviceMmio`, `Interrupt`, or `DMAPool`
  authority work.
- [Pass AttachedDmaPoolRecord by reference](../tasks/done/2026-05-13/ddf-attached-dmapool-record-by-ref.md)
  is the ready DDF prerequisite that converts the device-manager ledger record
  from by-value to by-reference threading through the proof emission paths.
  The current by-value layout exhausted the BSP boot stack when the inline
  `AttachedDmaPoolRecord::proof_buffers` slot count was grown past three; the
  by-reference conversion unlocks further provider-TX descriptor concurrency
  without expanding the per-frame footprint of nested proof emissions.

No external prior-art report is required for the initial split: this is a
repo-local maintainability refactor that preserves the existing accepted
authority model rather than selecting a new OS design.

## Module Shape

The accepted current shape has converted `kernel/src/device_manager.rs` into a
`kernel/src/device_manager/` module tree:

```text
kernel/src/device_manager/
  mod.rs          public API, re-exports, lock-order notes
  handles.rs      BDF, owner/state, and handle structs
  error.rs        production DeviceManagerError and display helpers
  mmio.rs         DeviceMmio records and map/unmap/read/write admission
  dma_pool.rs     DMAPool records, accounting, budget, teardown evidence
  dma_buffer.rs   DMABuffer records, map/free/submit/complete admission
  interrupt.rs    interrupt records and route/wait/ack/mask/unmask admission
  proofs.rs       transitional QEMU proof entry points and proof logs
```

Future cleanup is limited to optional registry, ledger, or proof-internal
splits if they reduce review risk for upcoming DDF work. The current accepted
proof split is `proofs.rs`, not a `proofs/` directory.

`PciDeviceRecord` should remain the aggregate owner of a claimed device's
ledger. The split should move record-specific logic behind modules, not create
independent managers that can diverge during teardown.

### Follow-Up Risk Reduction

Two adjacent tracks reduce review risk for further DDF proof growth without
disturbing the accepted module shape:

- The by-reference ledger-record threading prerequisite tracked in
  [docs/tasks/ddf-attached-dmapool-record-by-ref.md](../tasks/done/2026-05-13/ddf-attached-dmapool-record-by-ref.md)
  converts `AttachedDmaPoolRecord` from by-value to by-reference through the
  affected proof emission paths so that growing inline proof slot counts no
  longer multiplies cumulative stack frames across nested proof calls.
- The scheduler off-stack release work that landed under `d322a78f`
  (`sched: make thread stack drops off-stack explicit`) and `9b94ea7f`
  (`sched: release qemu proof stacks off-stack`) already pulls QEMU
  release-proof process kernel stacks off the dropping thread, which removes
  one stack-pressure axis on the BSP boot path that previously interacted
  with the device-manager proof emissions.

## Refactor Strategy

1. **Split proof code first.** Landed at `99c37592`. `prove_qemu_*`, proof log
   structs, proof-only error enums, and bounded proof helper functions moved
   into a proof module. Wrapper functions remain in `mod.rs` so current
   `virtio.rs` call sites do not churn.

2. **Split handles and errors.** Landed at `734383f9`. `PciBdf`, owner/state
   enums, handle structs, `DeviceMmioRegion`, and `DeviceManagerError` moved
   into dedicated modules.

3. **Split record domains.** Landed at `af539f6c`. MMIO, DMA pool, DMA buffer,
   and interrupt attached-record logic moved into domain modules while
   `PciDeviceRecord` remains the aggregate ledger owner.

4. **Preserve one authoritative ledger.** Every operation that creates,
   consumes, or releases device-visible authority must still update the
   claimed-device ledger as part of the same ownership transaction that changes
   device-manager state.

5. **Improve internal APIs after the split.** Landed at `98dddb72`. Narrow
   transaction helpers and typed admission contexts now remove repeated
   stale-handle, owner, generation, state, and attached-record checks while
   preserving the single aggregate ledger.

## Constraints

- Preserve the existing lock order: `PCI_DEVICE_MANAGER` before
  `DEVICE_INTERRUPT_ROUTES`.
- Preserve cap semantics, audit labels, proof labels, and QEMU smoke output
  during the initial split.
- Keep userspace-driver authority blocked until the Device Driver Foundation
  gates still marked open are closed.
- Avoid broad call-site churn. Compatibility wrappers are acceptable during
  the mechanical phase.
- Do not move authority decisions into userspace. Userspace drivers receive
  scoped caps, but the kernel remains the ledger and enforcement point.
- Keep proof code available until userspace-driver production gates have
  equivalent coverage.

## Validation

For mechanical file movement, run:

- `make fmt-check`
- `cargo build --features qemu`
- `make workflow-check`

When a slice moves code that emits or validates device proof labels, also run
the affected QEMU gates:

- `make run-net`
- `make run-ddf-provider-consumer`
- `make run-devicemmio-grant`
- `make run-dmapool-grant`
- `make run-interrupt-grant`
- relevant `make run-hardware-audit*` targets when audit or proof labels move

Choose those gates from the moved authority surface, not from the file move
alone:

- proof-log or proof-label movement needs the QEMU target that asserts those
  exact proof lines;
- grant-source or cap-object movement needs the matching `run-*-grant` target
  plus any parent lifecycle target it depends on;
- audit emission, snapshot decode, or audit-label movement needs the matching
  `run-hardware-audit*` target;
- `DMABuffer`, `DeviceMmio`, `Interrupt`, selected provider TX, proof labels,
  or schema-comment movement for those surfaces needs
  `make run-ddf-provider-consumer`;
- MMIO, DMA, IRQ, or teardown transaction movement needs the focused grant
  target and the broader device proof target that exercises stale handle,
  revoke/reset, or release behavior;
- pure type, handle, or error-module movement may stop at `make fmt-check`,
  `cargo build --features qemu`, and `make workflow-check` only when the
  public diff leaves proof labels, grant behavior, and authority transactions
  unchanged.

## Success Criteria

- `device_manager` is a module tree rather than one monolithic source file.
- Production authority paths are visibly separated from QEMU proof scaffolding.
- Public behavior and existing proof/audit labels are unchanged by the initial
  split.
- The module boundaries match the target userspace-driver design: kernel code
  owns claim, revoke, teardown, MMIO, DMA, and IRQ authority; userspace drivers
  consume only scoped capabilities.
