Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

IOMMU Remapping Grounding

This note records primary-source facts for IOMMU/remapping work. The Intel VT-d path has landed under #[cfg(feature = "qemu")] in kernel/src/iommu.rs as a QEMU q35 smoke (make run-iommu-remapping); AMD-Vi table programming remains future work. DMAPool has manager-owned domain identity and mapping-lifecycle preflight records. For the QEMU Intel IOMMU path, real VT-d table programming, hardware-DMA translation proof, two-phase invalidation/IOTLB-flush revocation, and IOMMU-backed hostile stale-DMA smokes have all landed (see ddf-iommu-qemu-intel-remapping-smoke). For QEMU shapes without intel-iommu, the kernel-owned bounce-buffer fallback remains active (remapping_tables=not-programmed, hostile_hardware_isolation=not-claimed). AMD-Vi table programming and a bounce-buffer policy for non-IOMMU devices remain open.

Sources

  • Intel, Intel Virtualization Technology for Directed I/O Architecture Specification, content ID 671081. Intel page metadata on 2026-05-12 listed Date 2022-06-02 and Version 5.1 (Latest). Sections used: 6.2.2 “Context-Cache”, 6.2.4 “IOTLB”, 6.5.1 “Register-based Invalidation Interface”, 6.5.2 “Queued Invalidation Interface”, 6.5.3 “IOTLB Invalidation Considerations”, 6.6 “Set Root Table Pointer Operation”, 6.8 “Write Buffer Flushing”, 7.10 “Software Steps to Drain Page Requests & Responses”, 8.3 “DMA Remapping Hardware Unit Definition Structure”, 8.3.1 “Device Scope Structure”, 9.1 “Root Entry”, 9.3 “Context Entry”, 9.4 “Scalable-Mode Context-Entry”, and 11.4.5-11.4.9 covering the root-table-address, invalidation, fault, protected-memory-range, and invalidation-queue registers.
  • AMD, AMD I/O Virtualization Technology (IOMMU) Specification 48882, 48882-PUB Rev 3.10, February 2025. Sections used: 2.2 device table, device-table entry, I/O page table, and interrupt-remapping material; 2.4 “Commands”; 2.5 “Event Logging”; 3.4 “IOMMU MMIO Registers”; IVRS/device-table/page-table, command-buffer, completion-wait, invalidation, and event-log material.
  • QEMU, qemu-manpage entries for -device intel-iommu, -device amd-iommu, and -device virtio-iommu-pci; and QEMU PCI developer documentation for PCI IOMMU and IOTLB notifier APIs. These are current-master QEMU docs, not a frozen release manual; the qemu-manpage and PCI developer pages observed on 2026-05-12 were generated for QEMU version 11.0.50.

Intel VT-d Grounding

Intel VT-d identifies DMA request sources through PCI requester/source IDs and resolves them through DMA remapping hardware units described by DMAR DRHD structures. The table path is rooted at a root table and context tables. Root entries select context tables, context entries bind a source to a translation type, domain identifier, address width, and second-level page-table root, and scalable-mode context entries extend that context format. The landed QEMU smoke (kernel/src/iommu.rs, cfg(qemu)) uses exactly this path: DRHD unit, PCI segment and BDF/source ID, domain ID, aw-bits=39 address width, and a 3-level second-level page-table root. Scalable-mode context entries, 48-bit IOVA space, interrupt remapping, and multi-device domains remain out of scope for the current slice.

Invalidation is part of the mapping lifetime, not a diagnostic detail. Intel’s register-based and queued invalidation interfaces cover context-cache, IOTLB, device-TLB, interrupt-entry-cache, and wait/completion descriptors. The landed smoke uses register-based context-cache invalidation (CCMD.ICC global granularity) and domain-selective IOTLB invalidation (IOTLB.IVT, CAP.IRO-decoded offset), both with bounded completion-bit polling. Page reuse is ordered strictly after invalidation completion; a poll exhausted without observing completion fails closed and does not free the backing pages. Queued invalidation (GCMD.QIE) is not set in the current slice. Fault-reporting registers (FSTS.PPF, FRCD[0].F) are the minimum diagnostic surface for translation failures and protection faults, and are exercised by the unmapped-IOVA and stale-DMA hostile proofs.

QEMU’s intel-iommu documentation is useful for focused emulator smokes but should not be treated as hardware coverage. It is q35-only in QEMU current master. Relevant options include intremap, caching-mode, device-iotlb, and aw-bits=39|48; QEMU documents 39-bit IOVA space for 3-level IOMMU page tables and 48-bit IOVA space for 4-level tables.

AMD-Vi Grounding

AMD-Vi uses a different vocabulary and table root. Device requests are keyed by DeviceID and resolved through a Device Table Entry. A DTE carries validity, translation, interrupt-remapping, DomainID, mode/page-table-depth, and page-table-root information. Future shared capOS abstractions can name the logical domain and IOVA lifetime generically, but AMD-specific code should not pretend it is programming Intel root/context tables.

AMD invalidation and completion are command-buffer operations. The future mapping lifetime must include command-buffer invalidation commands, completion wait, and event-log handling. The event log is the basic hardware-facing diagnostic record for malformed requests, page faults, and table errors; the MMIO register set covers control/status, command and event pointers, event-log state, alternate event-log buffers, device-table segment bases, and extended features.

QEMU’s amd-iommu documentation is also q35-only in current master. The documented options include dma-remap for DMA address translation and permission checking and intremap for interrupt remapping. Treat these as emulator smoke inputs until capOS has separate hardware or provider evidence.

QEMU Test Surface

QEMU provides the emulator-level test surface for IOMMU smokes:

  • intel-iommu on q35 with aw-bits=39 (3-level second-level page tables) is the shape used by the landed make run-iommu-remapping smoke, pinned to QEMU 8.2.2. The smoke asserts table programming, hardware-DMA translation (mapped_iova_translated=hardware-dma), unmapped-IOVA fault observation (unmapped_iova_fault=observed), two-phase invalidation/IOTLB-flush, and IOMMU-backed hostile stale-DMA proofs.
  • amd-iommu on q35 with DMA remapping enabled is grounded here for a future AMD-Vi table-programming slice.
  • virtio-iommu-pci on q35 x86_64 or virt ARM covers a portable virtio-IOMMU frontend if selected later.
  • PCI IOMMU/IOTLB notifier APIs in QEMU developer docs describe how emulated devices observe translation changes; they are not guest architectural requirements.

QEMU citations in the Sources section are current-master documentation observed on 2026-05-12. Tests pin the local qemu-system-x86_64 --version, machine type, and full device option string in the smoke evidence.

Implementation Status and Future Slices

Intel VT-d QEMU smoke (landed, cfg(qemu)):

  • DMAR/DRHD discovery, MMIO/fault-status diagnostics, and disabled IOVA ledger preflight records: landed as prerequisites.
  • kernel/src/iommu.rs real VT-d legacy-mode entry programming, RTAR write, GCMD/GSTS SRTP-then-TE handshake, hardware-DMA translation proof via virtio-rng, unmapped-IOVA fault observation via FSTS/FRCD, two-phase invalidation/IOTLB-flush revocation, and IOMMU-backed hostile stale-DMA smokes: all landed as of 2026-05-14 (slices A1/A2/B/C). See ddf-iommu-qemu-intel-remapping-smoke.
  • IOVA export stays disabled for this slice (iova_export=disabled-this-slice); hostile_hardware_isolation=not-claimed in all evidence.

Future slices (not yet started):

  • AMD-Vi table programming: separate source grounding and evidence; AMD-specific DTE, DeviceID, command-buffer, and event-log names must not be conflated with Intel root/context tables.
  • Source-grounding refresh for AMD or additional Intel features (48-bit IOVA, scalable-mode context entries, interrupt remapping, device-IOTLB) when a real branch selects them.
  • Bounce-buffer policy for QEMU shapes without intel-iommu: an explicit decision on IOMMU/remapping or an explicit bounce-buffer policy for non-IOMMU devices remains open.
  • Trusted multi-device sharing groups, production NIC or storage driver ownership, and moving the live virtio-net path off bounce buffers are not in scope for the current slice.