Scheduling
Scheduling decides which process runs, preserves CPU state across preemption and blocking, and integrates capability-ring progress with process execution.
Status: Partially implemented. Single-CPU preemptive round-robin scheduling, PIT timer interrupts, full context
switches, cap_enter blocking waits, user-mode idle, process exit, and direct
IPC handoff are implemented. SMP, per-CPU data, kernel-mode idle, priority, and
restart policy are future work.
Current Behavior
The scheduler stores processes in a BTreeMap<Pid, Process> and ready pids in
a VecDeque. PIT fires at roughly 100 Hz through IRQ0. On each timer tick, the
kernel wakes timed-out or satisfied cap_enter waiters, processes the current
process’s ring in timer mode, saves the current context, rotates ready
processes, switches CR3, updates TSS.RSP0 and the syscall kernel stack, restores
FS base, and returns to the next user context.
cap_enter(min_complete, timeout_ns) processes pending SQEs immediately. If
the requested completion count is not available and the timeout permits
blocking, the current process enters Blocked(CapEnter { ... }) and the syscall
entry path switches to another process.
When endpoint delivery satisfies a blocked server RECV, the scheduler can set a direct IPC target. The next scheduling decision runs that server before ordinary round-robin work when it is ready and its process generation still matches the captured direct target.
Design
The implementation keeps ring dispatch outside the global scheduler lock. Timer dispatch extracts ring/cap/scratch handles, releases the scheduler lock, processes bounded SQEs, then reacquires the scheduler lock to choose the next process. This prevents Cap’n Proto decode, serial output, and capability method bodies from running under the global scheduler lock.
The idle task is currently a user-mode process with one code page and one stack page. It exists because the timer return path assumes interrupts entered from CPL3. A future kernel-mode idle loop requires distinct IRQ entry/restore handling for CPL0 and CPL3 frames.
Exit switches to the kernel PML4 before tearing down the exiting address space, releases capability authority, completes process waiters, and defers final drop until the scheduler is running on another kernel stack.
Invariants
- The idle process must never block in
cap_enteror exit. - Ring dispatch must not hold the scheduler lock.
- Timer dispatch runs with the current process CR3, so user buffers are accessible only for that process.
- Blocked
cap_enterwaiters wake when enough CQEs are available or their finite timeout expires. - Direct IPC handoff is a scheduling preference, not a bypass of process liveness, generation, or state checks.
- The scheduler must update TSS.RSP0 and syscall kernel RSP on each switch.
- FS base is saved and restored across context switches for TLS.
- The final drop of an exiting process must not occur on its own kernel stack.
Code Map
kernel/src/sched.rs- process table, run queue, blocking, wakeups, timer scheduling, exit, direct IPC target.kernel/src/arch/x86_64/context.rs- CPU context layout, timer entry/restore, tick counter.kernel/src/arch/x86_64/idt.rs- timer interrupt handler wiring.kernel/src/arch/x86_64/pic.rsandkernel/src/arch/x86_64/pit.rs- PIC remap and PIT setup.kernel/src/arch/x86_64/gdt.rs- TSS and kernel stack updates.kernel/src/arch/x86_64/syscall.rs- blocking syscall transition forcap_enter.kernel/src/arch/x86_64/tls.rs- FS base save/restore.kernel/src/process.rs- process state, kernel stacks, idle process.
Validation
make runvalidates timer preemption, ring fairness, direct IPC handoff, blockedcap_enterwakeups, process exit, and clean halt.make run-spawnvalidates process wait blocking and child exit completion throughProcessHandle.wait.cargo build --features qemuverifies QEMU-only scheduler and halt paths.- QEMU smoke output for IPC includes direct handoff diagnostics when the server is woken from a blocked RECV.
Open Work
- Replace the user-mode idle process with a kernel/per-CPU idle context after interrupt restore paths support CPL0 timer entries.
- Implement SMP with per-CPU scheduler state, per-CPU syscall stacks, and TLB shootdown.
- Add priority or policy scheduling only after the current authority and IPC semantics remain stable.
- Add service restart policy outside the static boot graph.