How to Design an Operating System for Quantum Computers

A quantum OS is not 'quantum Linux' — it is a three-plane control system spanning a cloud/batch orchestrator (seconds–minutes), a near-time hybrid runtime (milliseconds), and a hard-real-time QEC microkernel (sub-microsecond). Covers the no-cloning constraints that break the classical OS playbook, an IR-first/provenance-first compiler stack (QIR/OpenQASM 3, SABRE, Pauli-frame tracking), the decoder-latency bottleneck (Google Willow 63 µs, Riverlane 16.32 µs, Quantinuum+NVQLink 67 µs qLDPC), a capability-descriptor HAL, and a three-phase roadmap toward IBM Starling / Quantinuum Apollo (2029).

Quantum ComputingOperating SystemsQuantum Error CorrectionFTQCNISQReal-Time SystemsCompilerQIRDecoderAI HardwareResearch

Disclaimer: This article aggregates publicly reported research, peer-reviewed results, and vendor announcements. Some latency figures and hardware roadmaps (IBM Starling, Quantinuum Apollo, Riverlane Deltaflow 3, NVQLink) are vendor-reported or forward-looking projections rather than delivered systems. The reasoning involved AI-assisted generation, has not undergone peer review, and may contain errors. Data are current as of June 2026.

TL;DR

  • A quantum OS is not "quantum Linux": it is a three-plane control system that manages decaying, non-clonable qubits, calibration state, and tight classical-quantum feedback loops across three irreconcilable latency domains — a cloud/batch plane (seconds–minutes), a near-real-time hybrid runtime (milliseconds), and a hard-real-time control/QEC microkernel (sub-microsecond). The single most important design decision is to separate these planes physically and architecturally rather than force them into one scheduler, because the QEC plane must meet ~1 µs deadlines that no general-purpose OS can guarantee.
  • The design should be IR-first and provenance-first: a multi-level lowering stack (Problem/Hamiltonian → Algorithm → Circuit/QIR → Logical → Physical/Pulse) where every level is auditable and replayable, paired with a capability-descriptor HAL that exposes per-modality truth (gate times spanning ~4,000× from superconducting tens of ns to trapped-ion tens-to-hundreds of µs) instead of over-abstracting.
  • Build it in three phases tracking the hardware: (1) a NISQ resource-management OS today (fidelity-aware scheduling, multi-programming, sessions — proven by the QOS/Qiskit Runtime generation); (2) a dynamic-circuit OS for mid-circuit measurement and feed-forward; (3) an FTQC microkernel with FPGA/ASIC real-time decoders, Pauli-frame tracking, and lattice-surgery scheduling for the 2029-era machines (IBM Starling, Quantinuum Apollo).

Key Findings

  1. Real "quantum OS" research already exists and converges on resource management, not virtualization. The QOS system (Giortamis, Romão, Tornow, Bhatotia, OSDI '25, USENIX) is a modular cloud OS built on a "Qernel" abstraction with four components — error mitigator, fidelity estimator, multi-programmer, and scheduler — evaluated on 7,000 real quantum runs of more than 70,000 benchmark instances on IBM hardware. It reports that "the QOS achieves 2.6–456.5× higher fidelity, increases resource utilization by up to 9.6×, and reduces waiting times by up to 5× while sacrificing only 1–3% fidelity, on average, compared to the baselines." QNodeOS (An operating system for executing applications on quantum network nodes, Wehner et al., Nature, March 12, 2025, DOI 10.1038/s41586-025-08704-w, led by Stephanie Wehner at TU Delft/QuTech and demonstrated across trapped-ion and NV color-center platforms) is the first OS for quantum network nodes, splitting a Classical Network Processing Unit (CNPU) from a Quantum Network Processing Unit (QNPU) running C++ on FreeRTOS, with a hardware-specific QDriver HAL. Both validate the central thesis: a quantum OS manages scarce, noisy, heterogeneous resources and abstracts hardware — it does not provide processes-and-virtual-memory in the classical sense.

  2. The no-cloning theorem breaks the classical OS playbook. You cannot copy a qubit, so there is no swap, no checkpoint, no fork(), no core dump, no live migration of quantum state. Qubits also decohere on a fixed clock (superconducting T1 on the order of ~100 µs; idle decoherence in hundreds of µs). This forces a fundamentally different memory abstraction: time-bounded qubit leases rather than allocations, and "checkpoints" that can only be classical (Pauli-frame state, calibration data, measurement records, RNG seeds, parameter vectors) — never quantum amplitudes.

  3. Latency classes are the organizing principle. Hybrid workloads span: hard-real-time (<1 µs QEC rounds, sub-µs decode), near-time (ms feedback for variational loops, error mitigation), and batch/cloud (seconds–minutes queueing). Qiskit Runtime already codifies this with job/batch/session execution modes and Sampler/Estimator primitives. The QEC plane's deadline is brutal: superconducting surface codes generate syndromes every ~1 µs (Google Willow ran at a 1.1 µs cycle time), and if the decoder cannot keep pace, the backlog problem causes exponential slowdown in T-gate depth.

  4. Decoder latency is the FTQC bottleneck, and it is now an engineering problem. Google's Willow real-time decoder achieved an average latency of 63 µs (net 63 ± 17 µs) at distance-5 over a million cycles (Quantum error correction below the surface code threshold, Nature, 2024, s41586-024-08449-y); Riverlane's Deltaflow 2 demonstrated 16.32 µs on the same Google dataset and 6.5 µs decode-and-feedback on Rigetti hardware, with a Local Clustering Decoder running under 1 µs per round on FPGA. The utility-scale target is ~10 µs reaction time. Syndrome bandwidth scales as O(d²) per logical qubit and reaches Gb/s–Tb/s aggregate for large machines.

  5. Hardware is converging on dates but diverging on physics. Per IBM's June 10, 2025 roadmap, "by 2029, we will deliver IBM Quantum Starling — a large-scale, fault-tolerant quantum computer capable of running quantum circuits comprising 100 million quantum gates on 200 logical qubits… built at our historic facility in Poughkeepsie, New York," using qLDPC codes. Quantinuum targets Apollo (~2029–2030, hundreds of logical qubits) on trapped-ion QCCD; in 2025 a Harvard/MIT/QuEra collaboration demonstrated an integrated fault-tolerant architecture using 448 atomic qubits — "combining all essential elements for error-corrected quantum computation for the first time," with below-threshold performance across up to 96 logical qubits, published in Nature. Their gate times, connectivity, and control electronics differ by orders of magnitude — which is precisely why a capability-descriptor HAL beats a lowest-common-denominator abstraction.


Details

1. Why a quantum OS — and why it is not "quantum Linux"

Start from what an OS is for: it multiplexes scarce hardware among competing workloads, abstracts heterogeneous devices behind stable interfaces, schedules work against deadlines, isolates tenants, and manages the lifecycle of resources (memory, files, processes). A quantum OS must do all of these — but every primitive changes meaning because the resource it manages is physically alien.

The resources a quantum OS actually manages are:

  • Qubits as decaying, non-clonable leases. A qubit is not a register you allocate and free at leisure. It has a coherence budget (T1/T2) that begins draining the instant it is initialized. The QOS paper measured that on IBM hardware, qubits left idle for "more than a few hundred microseconds" decohere to |0⟩ — effectively a register that resets itself if you don't use it fast enough. There is no memcpy, because the no-cloning theorem forbids copying an unknown state. This single fact eliminates swap, checkpoint/restore, copy-on-write, live migration, and redundant backup — the entire classical toolkit for surviving faults by duplication.
  • Calibration state as first-class, drifting configuration. QPUs are recalibrated regularly, and after each cycle the noise model changes "unpredictably." The QOS authors measured a 12-qubit GHZ circuit's fidelity differing by 38% from best to worst across nominally identical IBM QPUs of the same model, and found "20 pairs of days with more than 5% difference in fidelity" for a 6-qubit GHZ on IBM Perth across 120 calibration days. Calibration data is therefore not a static device file — it is live, per-qubit, per-gate, time-varying telemetry that the scheduler must read on every decision.
  • Measurement as a destructive, probabilistic, one-way operation. Reading a qubit collapses it. "Output" is a probability distribution estimated over many shots, not a deterministic value. Fidelity — the QOS metric, F(P_ideal, P_noisy) = (Σ_i √(P_ideal(i)·P_noisy(i)))² — is the currency of correctness.
  • The classical-quantum feedback channel itself. In dynamic circuits and QEC, classical compute sits inside the quantum coherence window. Managing that latency is a resource-management problem the classical OS never had.

What this means concretely: a quantum OS is not a port of Linux with quantum drivers. Processes do not have address spaces; "memory" cannot be paged; there are no page faults, no demand paging, no overcommit. The closest classical analog is a real-time control system fused with an HPC job scheduler — RTOS deadline discipline at the bottom, batch-scheduler fidelity/throughput optimization at the top, and a hybrid runtime bridging them.

2. The three-plane architecture

The core architectural claim of this design is that a quantum OS must be partitioned by latency domain, because no single scheduler can simultaneously honor a 1 µs QEC deadline and optimize multi-tenant cloud throughput over minutes. I propose three planes:

The three-plane architecture: cloud/batch orchestration (seconds–minutes), hybrid runtime (milliseconds), and a real-time QEC microkernel (nanoseconds–microseconds), over a QPU + capability-descriptor HAL

This mirrors how QNodeOS already split CNPU (general-purpose PC, classical logic) from QNPU (embedded FreeRTOS, quantum-block execution) from QDevice (hardware) "to allow for different timescales… realized at different timescale granularities." It also mirrors NVIDIA's NVQLink, which physically separates a Real-Time Host (GPUs/CPUs) from a QPU Control System (pulse processors) over a sub-4 µs interconnect. The three-plane split is the empirically validated shape of the field.

Why not unify? Because the failure modes are incompatible. A garbage-collection pause, a page fault, or a scheduler preemption — perfectly acceptable in Plane A — is catastrophic in Plane C, where missing a decode deadline triggers exponential backlog. Conversely, the hard-real-time discipline of Plane C (static allocation, bounded loops, no dynamic dispatch) would make Plane A's rich optimization logic impossible to express. The planes communicate through contracts, not shared mutable state: Plane A hands Plane B a compiled job and a session budget; Plane B hands Plane C a pulse/logical schedule plus a feed-forward map; Plane C streams back syndromes and measurement records.

3. The multi-level IR and compiler lowering stack

The compiler is the spine of the OS. The design principle is progressive lowering through auditable levels, each a well-defined IR with a stable schema, so that any execution can be replayed and any result traced to the exact circuit, calibration snapshot, and decoder configuration that produced it. This is the MLIR philosophy — and CUDA-Q already proves it works for quantum, lowering Python/C++ through its MLIR-based Quake (quantum) and CC (classical compute) dialects to QIR/LLVM and then to target code.

I propose the following stack, adapting the reference document's layering but grounding each level in an existing standard:

Level 0 — Problem/Intent IR. Users (or higher layers) express a scientific objective, not a circuit: a Hamiltonian, active space, observables, target accuracy, and a compute budget. This is the most speculative level and should be optional — most production users today submit circuits. But for chemistry/materials workloads it pays off, because it lets the planner choose algorithms and even decline to use the QPU (see §6). Example schema:

problem:
  kind: ground_state_energy
  system:
    hamiltonian_ref: "LiFePO4_interface_v3"   # content-addressed
    active_space: {orbitals: 24, electrons: 20}
  observables: [total_energy, dipole_z]
  target: {metric: energy, abs_error_Ha: 1.6e-3}  # chemical accuracy
  budget: {qpu_seconds: 600, usd: 250, wall_clock_h: 6}
  provenance: {owner: "lab/battery", run_group: "rg-0091"}

Level 1 — Hamiltonian IR. Fermionic/qubit operator with a chosen encoding (Jordan-Wigner, Bravyi-Kitaev). Auditable: the mapping is a pure function recorded with its parameters.

Level 2 — Algorithm IR. The chosen method (VQE ansatz, QAOA depth p, Trotter steps, QPE), still hardware-agnostic. This is where the "solver policy" picks between quantum and classical methods.

Level 3 — Circuit IR (QIR / OpenQASM 3). The hardware-agnostic gate/measurement program. QIR — the LLVM-based, QIR-Alliance standard adopted by NVIDIA, Quantinuum, Rigetti, and ORNL — is the right interchange format here because it represents qubits and results as opaque types via function calls, carries classical control flow natively (essential for hybrid and dynamic circuits), and inherits the entire LLVM optimization toolchain. OpenQASM 3 is the human-readable sibling, and crucially it adds dynamic-circuit constructs: mid-circuit measurement, classical feed-forward (if), real-time delay, and pulse-level grammar. A dynamic-circuit example:

OPENQASM 3.0;
include "stdgates.inc";
qubit[2] q;
bit[2] c;

h q[0];
c[0] = measure q[0];
if (c[0] == 1) {        // classical feed-forward inside coherence window
  x q[1];               // conditional correction
}
c[1] = measure q[1];

Level 4 — Logical IR (FTQC only). Logical qubits, code distance, logical operations expressed as lattice-surgery primitives and T-gate injections. This level did not meaningfully exist in production until 2025; it becomes load-bearing for Starling/Apollo-class machines.

Level 5 — Physical Mapping. Logical→physical qubit placement, routing, and SWAP insertion against the device coupling map. The workhorse algorithm is SABRE (Li, Ding, Xie, ASPLOS 2019) — a SWAP-based bidirectional heuristic search with reverse-traversal initial-layout optimization and a decay heuristic to trade off depth vs. SWAP count — now shipped in Qiskit as the LightSABRE variant. Qubit mapping is NP-hard, so this level is heuristic and must be calibration-aware: route around the noisiest qubits and couplers using live telemetry, not just topology.

Level 6 — Pulse IR (OpenPulse / device pulse dialect). Calibrated waveforms (microwave for superconducting, laser for ions/atoms). CUDA-Q's pipeline produces a pulse-level dialect before FPGA/runtime mediation; Qiskit Pulse / OpenPulse expose the same.

Why provenance must be built in, not bolted on. Quantum results are probabilistic and the hardware drifts. A result is only scientifically meaningful if you can reproduce the exact conditions: circuit hash, calibration snapshot ID, decoder version, shot count, RNG seeds, error-mitigation settings, and the full IR lowering trace. I recommend content-addressed storage of every IR level (hash each artifact), an append-only run ledger, and a rule that no result leaves the OS without a complete provenance manifest. This is the qsurf-style discipline of "same file → same hash → same results." It is also the only defense against the silent-drift failure mode where yesterday's calibration quietly invalidates today's comparison.

4. Core OS abstractions, redefined for quantum

ClassicalQuantum-OS equivalentWhy it differs
ProcessJob / Qernel (a compiled program + its session context)No address space; the unit is a circuit family + classical control + shot budget. QOS's "Qernel" is exactly this unifying execution unit.
ThreadShot / sub-experiment within a PUB (primitive unified bloc)Shots are embarrassingly parallel repetitions; PUBs vectorize over parameters/observables.
Virtual memory / address spaceQubit lease (time-bounded reservation of physical qubits + coherence budget)No paging, no overcommit, no copy. A lease expires when coherence runs out — the "allocation" has a wall-clock TTL.
memcpy / swap / checkpointForbidden (no-cloning) → only classical state may be savedYou may checkpoint Pauli frames, parameters, measurement records — never amplitudes.
FileResult set + provenance manifest (content-addressed)Immutable, probabilistic, carries its full lineage.
InterruptFeed-forward event / syndrome-ready signalA measurement outcome that must steer subsequent gates within the coherence window. Hard-real-time, not best-effort.
Device driverQDriver / HAL translating IR → physical instructionsOne per modality; the only hardware-specific component, exactly as in QNodeOS.
SchedulerFidelity- and calibration-aware multi-objective schedulerOptimizes fidelity × utilization × wait time, not just CPU fairness.

The "no-cloning means no swap" problem, handled. Since you cannot evict quantum state to disk and reload it, the OS must instead: (a) treat coherence time as a hard resource constraint baked into scheduling (a job that won't finish within the coherence budget must be cut or restructured); (b) use circuit cutting and qubit reuse to fit larger problems into fewer live qubits (both are in QOS's error-mitigation pipeline); and (c) for FTQC, rely on QEC itself as the "memory refresh" — a logical qubit's state is preserved not by copying but by continuous syndrome extraction and Pauli-frame tracking. The logical qubit is the persistence mechanism.

5. Scheduler design

The Plane-A scheduler is where most near-term value lives, because today's bottleneck is queueing and fidelity, not coherence. The design draws directly on QOS's validated results.

Calibration- and fidelity-aware QPU selection. Before scheduling, estimate the fidelity of each candidate (job, QPU) pairing using live calibration data and circuit features (width, depth, non-local gate count, plus the Supermarq feature vector). QOS's estimator "correctly identifies high-fidelity QPUs" without running the full job. A concrete scoring function for assigning job j to QPU q at time t:

score(j, q, t) =  w_f · F̂(j, q, t)                       # predicted fidelity (0..1)
               −  w_w · normalize(queue_wait(q, t))       # penalize loaded QPUs
               −  w_x · crosstalk_penalty(j, q)           # multi-programming risk
               −  w_c · cost(j, q)                         # $ / QPU-seconds
               +  w_u · effective_utilization_gain(j, q)   # packing benefit

where weights encode policy. QOS showed that sacrificing 1–3% fidelity to cut waiting times 5× is usually the right trade — so w_w should be non-trivial. The estimator must be re-queried after each calibration cycle because the noise model changes.

Multi-programming and isolation. Packing multiple jobs onto one QPU raises utilization (QOS measured average single-job utilization at only ~26%) but risks crosstalk. QOS introduces compatibility scoring and effective utilization to co-locate only jobs that won't degrade each other, improving fidelity 1.15–9.6× at a target utilization. But multi-programming is also a security boundary: research has demonstrated crosstalk-mediated attacks where a malicious circuit degrades a co-located victim's fidelity, readout-crosstalk side channels that leak a victim's measurement outcomes (demonstrated on Rigetti Ankaa-3 via Amazon Braket), and the finding that standard reset gates fail to fully clear qubit state, leaking information between consecutively scheduled circuits. The OS must therefore treat co-location as a trust decision: insert buffer (idle) qubits between tenants, apply dynamical-decoupling (XYXY) sequences to computation qubits, enforce frequency-aware scheduling, and offer a "no co-tenancy" isolation tier for sensitive workloads. Crosstalk is simultaneously a performance knob and an attack surface.

Session management for hybrid loops. Variational algorithms iterate classical-optimizer → circuit → measurement → optimizer hundreds of times. Re-queueing each iteration is fatal to throughput. The Qiskit Runtime session model — where "a session starts when the first job within the session is started, and subsequent jobs within the session are prioritized by the scheduler," granting the user a dedicated window with exclusive system access — is the right primitive. The OS should expose three modes mirroring Qiskit Runtime: job (single PUB), batch (independent jobs packed for efficiency), and session (interactive, exclusive window for feedback loops). The scheduler must guard against the failure mode the docs warn about: "using too much code between iterative calls can lock the QPU and use excessive QPU time" — so sessions need an idle-timeout watchdog.

6. The hybrid runtime (Plane B)

This plane runs the classical half of hybrid algorithms co-located with the QPU to minimize latency. Its responsibilities:

  • Adaptive shot allocation. Don't spend a fixed shot budget per circuit; allocate more shots where variance is high and the optimizer is sensitive, fewer where the answer is already clear. Estimator primitives already accept per-PUB precision targets; the runtime should drive these adaptively.
  • Early stopping and parameter sweeps. Terminate VQE/QAOA when the energy plateaus; vectorize parameter sweeps as PUB arrays.
  • Error mitigation. Apply zero-noise extrapolation (ZNE), probabilistic error cancellation (PEC), measurement twirling, and dynamical decoupling as near-time classical pre/post-processing — GPU-accelerated where it helps (this is what Qiskit Runtime's "near-time" computation tier does).
  • A Bayesian experiment planner (optional, justified for scientific campaigns). The reference document proposes a planner maintaining belief states P(model|data) that allocates QPU time by expected information gain per cost. I judge this worth building as an opt-in Plane-A/B service, not a mandatory core layer. The justification: QPU time is the single most expensive, scarce resource in the stack; an active-learning planner that picks the next sub-problem (which molecule geometry, which parameter point) by maximizing information-gain-per-dollar directly attacks that scarcity. It composes naturally with adaptive shots. But it is domain-specific (most valuable for chemistry/materials sweeps) and adds significant complexity, so it must be optional and must never sit on the critical path of a simple circuit submission.
  • A "solver policy" that knows when NOT to use the QPU. This is the most important and most under-appreciated idea in the reference design, and I fully endorse it. For many problems a classical method (tensor networks, DMRG, classical Monte Carlo, or even a better heuristic) will beat a noisy NISQ circuit. The OS should maintain a solver registry with cost/accuracy models for each backend (classical and quantum) and route each sub-problem to the predicted best option, falling back to the QPU only when its expected value justifies the cost. This is defensible because the field itself has repeatedly found classical methods matching quantum claims; an OS that burns QPU budget on problems classical hardware solves better is mismanaging its scarcest resource. Concretely:
def route(subproblem, budget):
    candidates = solver_registry.match(subproblem)   # {classical, quantum} backends
    scored = [(s, s.expected_accuracy(subproblem),
                  s.expected_cost(subproblem)) for s in candidates]
    # pick max expected information gain per cost within budget
    return argmax(scored, key=lambda x: x[1] / max(x[2], eps))

7. The real-time control / QEC microkernel (Plane C)

This is the hardest plane and the one that distinguishes an FTQC OS from a NISQ job manager. It is a hard-real-time microkernel with deadline guarantees, and it should be designed like avionics or motor control: static allocation, bounded execution, no garbage collector, no surprises.

The deadline that defines everything. Superconducting surface codes run a QEC round every ~1 µs (Google Willow used a 1.1 µs cycle time). Each round, a distance-d rotated surface code produces d²−1 syndrome bits per logical qubit (one per stabilizer), and the syndrome volume scales as O(d²) — the area of the code patch. A distance-3 code emits 8 syndrome bits per round; distance-21 emits 440 bits. Aggregated across a large machine this reaches Gb/s to Tb/s of classical syndrome traffic. A prototype open-source FPGA QEC root node (Xilinx ZCU216 RFSoC, four 10 Gb/s transceivers) achieves ~38.8 Gb/s effective throughput (~38,788 syndrome bits/µs), upgradeable to ~108 Gb/s with 28 Gb/s transceivers.

The backlog problem. If decoder throughput r_proc falls below syndrome generation rate r_gen, with f = r_gen/r_proc > 1, the runtime of a k T-gate-depth algorithm grows as c·fᵏ — exponential slowdown. Crucially, this is triggered by non-Clifford (T) gates: Clifford operations can be tracked entirely in software via the Pauli frame and never need the decoder to "catch up," but a T gate cannot proceed until the Pauli frame is resolved, creating a hard feed-forward barrier with a latency deadline. The decode engine must drain its backlog before each barrier.

Pauli-frame tracking — the key trick. Rather than physically applying every correction (which injects new errors), the OS maintains the Pauli frame as a classical record of accumulated X/Z corrections, updated as syndromes are decoded, and simply reinterprets measurement outcomes accordingly (Gottesman-Knill makes this efficient). This decouples Clifford-gate execution from decode latency — gates run at hardware speed while decoding proceeds in parallel. The frame manager only needs to resolve the frame at feed-forward barriers (T-gate injection, adaptive measurement, magic-state consumption). This is why "just-in-time decoding isn't needed for stabilizer codes" — corrections can be backdated into the frame and propagated forward.

A concrete 6-layer Plane-C reference stack (synthesizing the real-time QEC system-stack literature):

  1. QPU control — generates physical pulses, reads out qubits.
  2. Readout processing — produces the per-round syndrome bitstring.
  3. Syndrome interface — structured readout records streamed to the decoder.
  4. Decode engine — receives syndrome packets, dispatches to the decoder backend, manages the decode queue, enforces deadlines, implements a sliding-window protocol to bound per-step computation and enable streaming. Drains backlog before feed-forward barriers.
  5. Frame manager — maintains the Pauli frame; resolves it at barriers; tracks logical measurement outcomes for lattice surgery.
  6. Logical scheduler — compiles the logical circuit into lattice-surgery operations and T-gate injections, programs Layer 1 with physical gate sequences, queries the frame manager for logical outcomes, and gives the decode engine advance notice of upcoming barriers so backlog can be drained in time.

Choosing the decoder. The classic trade-off:

  • MWPM (Minimum-Weight Perfect Matching) — the accuracy reference (PyMatching/Blossom). Near-optimal threshold but polynomial scaling; harder to hit sub-µs at scale.
  • Union-Find — near-linear time, slightly lower accuracy; UF is provably an approximation of the blossom algorithm, which explains why their accuracy is close. The right default for hardware real-time decoding.
  • Neural decoders — can exceed MWPM threshold on small distances and scale (a CNN decoder reached distances 9–513) but latency and determinism are concerns for hard-real-time.
  • Riverlane's Local Clustering Decoder — FPGA-based, decodes under 1 µs/round, reduces physical-qubit overhead ~4× under leakage-dominated noise.

I recommend a layered/windowed decoder: a fast inner decoder (UF or a lookup-table micro-decoder) running per-window on FPGA for the common case, with a more accurate outer decoder (MWPM/neural) handling harder windows under a relaxed deadline — exactly the streaming/sliding-window approach that lets Deltaflow process "in chunks rather than waiting for an entire shot."

Demonstrated latency numbers to budget against (these are achieved milestones, not projections, except where noted):

MetricValueContext
Superconducting QEC round~1 µs (1.1 µs Google cycle)hard deadline
Syndrome bits/round/logical qubitd²−1 (8 at d=3, 440 at d=21)scales O(d²)
Google Willow decoder latency63 ± 17 µs at d=5, 10⁶ cyclesNature 2024
Riverlane Deltaflow 2 (Willow data)16.32 µs mean (4× < Google)one million rounds
Riverlane–Rigetti decode+feedback6.5 µs (8 qubits, 9 rounds)first low-latency hw QEC
Local Clustering Decoder< 1 µs / roundFPGA
Riverlane QECi round-trip< 400 nsinterconnect spec
Utility-scale reaction-time target~10 µsRSA-2048 class
NVQLink GPU–QPU400 Gb/s, < 4 µs round-tripNVIDIA, Nov 2025
Quantinuum Helios + NVQLink qLDPC67 µs reaction (32× under requirement)first real-time qLDPC decode

The Quantinuum Helios + NVQLink result (NVIDIA SC25 release, Nov 17, 2025) is worth detailing: using Bring's code (eight logical qubits in 30 physical qubits) decoded with BP+OSD, the system "achieved a reaction time of 67 microseconds, exceeding Helios' two-millisecond requirement by 32x… the world's first real-time use of a scalable decoder for… qLDPC codes," yielding a 5.4× error-rate improvement. Note that the 2 ms requirement reflects trapped-ion (millisecond-scale) gate timing — the same 67 µs decoder would not meet a superconducting machine's ~1 µs cadence, which is exactly why the deadline budget is modality-specific.

Implementation technology. Plane C should be built on FPGAs today, ASICs at scale, because they deliver deterministic low latency, jitter control, streaming bit-level compute, massive parallelism, and custom decode datapaths. Concrete, proven platforms:

  • QICK (Quantum Instrumentation Control Kit, Fermilab) — open-source Xilinx RFSoC controller, synthesizes pulses up to 6 GHz, achieved 99.93% average gate fidelity; firmware/software all open. Built on the XCZU28DR / ZCU216 RFSoC families.
  • QubiC (LBNL) — open-source RFSoC control supporting mid-circuit measurement and feed-forward.
  • Quantum Machines OPX1000 — Pulse Processing Unit (a 16-core classical processor, up to 128 cores per system) for real-time Turing-complete classical compute inline; OPX+ demonstrated 224 ns conditional-feedback latency.
  • Qblox and Zurich Instruments PQSC — control hardware deployed in real QEC integrations (OQC, IQM).

For firmware, Rust and C/C++ are the right languages: the RISC-Q real-time quantum-control SoC generator explicitly supports "C, C++, and Rust" via a bare-metal MMIO model. For deadline guarantees, adopt embedded Rust's no-heap discipline (#![no_std], the heapless crate for static-capacity collections): no allocator, no GC, no hidden allocations, so worst-case execution time is statically bounded — the same reason avionics avoids dynamic allocation. (Note: no-heap Rust is well-established for embedded systems and supported by quantum control SoCs, but I did not find a named production QEC decoder yet documented as no-heap Rust — this is a recommendation grounded in transferable practice, not a citation of existing quantum deployments.)

Cryo-CMOS and the wiring bottleneck. As physical-qubit counts head toward millions, running thousands of control wires from room temperature into the dilution refrigerator becomes infeasible. Intel's Horse Ridge (22 nm FinFET, verified at 4 K) and Horse Ridge II move control into the cryostat "as close as possible to the qubits… to streamline the complexity of control wiring," and Pando Tree pushes CMOS to the 10–20 mK stage to "enable quantum scaling to millions of qubits." A forward-looking Plane-C HAL should anticipate that some control and even pre-decoding will migrate into cryogenic silicon. IBM's qLDPC decoder is likewise designed to run in real time on classical hardware, reducing the need for co-located HPC systems.

Magic-state factories and lattice-surgery scheduling. Universal FTQC needs non-Clifford gates, supplied via magic states that cannot be prepared fault-tolerantly and must be distilled. The overhead is severe: a 15-to-1 distillation block needs 11 logical-qubit patches and outputs one magic state every 11 logical cycles, so a steady one-per-cycle rate requires ~121 logical qubits; two-level 225-to-1 distillation needs ~176 patches per output port. These factory patches sit on the grid boundary and are "never available for routing or data storage" — a hidden overhead the OS must account for explicitly in logical-qubit allocation. The logical scheduler must therefore treat magic-state supply as a first-class resource with its own pipeline, and recent work (dynamic distillation pipelines, magic-state cultivation) shows 16–70% qubit savings via dynamic scheduling — so the scheduler should be dynamic, not static.

8. The HAL: capability descriptors, not over-abstraction

The single biggest temptation — and mistake — in quantum-OS design is to define one universal gate model and hide all hardware behind it. The hardware differences are too large to hide:

Modality2-qubit gate timeCoherenceConnectivityControlQEC cycle
Superconducting (IBM Heron, Google Willow)~25–100 ns~100 µsfixed nearest-neighbormicrowave AWG, cryo~1 µs
Trapped ion (Quantinuum Helios)~tens–100s µsseconds–minutesall-to-all (QCCD shuttling)lasers~1 ms
Neutral atom (QuEra Gemini)~hundreds ns–µs1–10 sreconfigurable (tweezers)lasers, room-temp~10–100 µs
Photonic / spinvariesvariesvariesoptical / RFvaries

Superconducting and trapped-ion two-qubit gates differ by roughly 4,000×. A scheduler that assumes one timescale will make catastrophically wrong decisions on the other. The solution, proven by QNodeOS's QDriver and CUDA-Q's modality-dependent lowering, is a capability-descriptor HAL: each device publishes a machine-readable descriptor of what it actually is, and the compiler/scheduler reason over the descriptor rather than a fiction.

device: quantinuum_helios
modality: trapped_ion
topology: {kind: all_to_all, shuttling: true, zones: [storage, gate, readout]}
qubits: {physical: 98, species: "Ba-137"}
native_gates: [rz, rxy, zz]
timing: {1q_gate_ns: 5000, 2q_gate_ns: 80000, measure_ns: 120000,
         coherence_s: 60}
fidelity: {2q_gate: 0.99921, measure: 0.999}     # live, per-calibration
features: [mid_circuit_measurement, qubit_reuse, real_time_compile,
           feed_forward]
qec: {codes_supported: [color, qLDPC], logical_qubits: 48,
      ratio_phys_to_logical: 2.04}
calibration: {snapshot_id: "cal-2026-06-05T08:00Z", ttl_h: 12}
control: {stack: "real_time_engine", language: "Guppy/QIR"}

(The 99.921% two-qubit gate fidelity and 48 logical / 98 physical qubit figures here are Quantinuum's reported Helios numbers as of late 2025.) The compiler consumes native_gates, topology, and timing to lower and route; the scheduler consumes fidelity, coherence_s, and calibration to score; the runtime consumes features to decide whether dynamic circuits are even possible. New modalities are onboarded by writing a descriptor + QDriver, not by rewriting the OS.

9. Telemetry, calibration, drift detection, and the provenance store

A quantum OS lives or dies by its observability. Required subsystems:

  • Calibration manager — ingests each calibration cycle's data, versions it (content-addressed snapshot IDs with TTLs), and exposes it to the scheduler/estimator. A job's provenance pins the exact snapshot used.
  • Drift detector — runs lightweight benchmark circuits (a GHZ, a randomized-benchmarking sequence) on a cadence and alerts when fidelity drifts beyond threshold, triggering recalibration or QPU quarantine. QOS's data showed >5% day-to-day fidelity swings, so drift detection is not optional.
  • Provenance store — append-only ledger of every run: IR-level hashes, calibration snapshot, decoder version/config, shot count, mitigation settings, RNG seeds, raw measurement records, and the result distribution. Content-addressing gives "same hash → same result" reproducibility. This is the foundation of scientific credibility and of debugging non-deterministic failures.

10. Security and multi-tenancy

Beyond the crosstalk and readout side channels in §5, the OS must defend against: timing attacks (reset-operation timing leaks execution patterns), power-trace attacks (reverse-engineering gate-level circuits), malicious-compiler / QTrojan threats (a compromised pass disabling data encoding), and incomplete reset leakage between scheduled circuits. Defenses the OS should implement: tenant isolation tiers (with a "sole-occupancy" option), buffer qubits and DD sequences between co-located tenants, verified/attested compilation passes with provenance, mandatory active-reset verification between jobs, and frequency-aware scheduling to minimize crosstalk channels. Multi-tenancy is a throughput win but every shared physical resource is a potential covert channel — the OS is the trust boundary.


Recommendations

Phase 1 (now → ~2027): NISQ resource-management OS. Build Plane A and a thin Plane B first, because that is where today's hardware lives and where the proven wins are. Concretely:

  • Implement the Qernel/Job abstraction, a fidelity- and calibration-aware scheduler with QOS-style scoring, compatibility-based multi-programming, and session/batch/job execution modes (clone the Qiskit Runtime model).
  • Adopt QIR as the interchange IR and OpenQASM 3 as the human-facing circuit language; build the lowering stack on MLIR (follow CUDA-Q's Quake/CC precedent).
  • Stand up the provenance store and calibration manager from day one — retrofitting reproducibility is far harder than building it in.
  • Tech: Python/Go services for Plane A, C++/Rust runtime co-located with the QPU for Plane B, GPU for error mitigation.
  • Benchmark to advance: when multi-programming reliably holds fidelity loss under ~3% at >50% utilization, and the estimator predicts best-QPU selection accurately, move on.

Phase 2 (~2026–2028): dynamic-circuit OS. Add real-time classical feed-forward inside the coherence window.

  • Implement mid-circuit measurement orchestration, conditional execution, and the feed-forward map contract between Plane B and Plane C.
  • Build the first Plane-C primitives on FPGA (QICK/QubiC-class hardware): readout processing, syndrome streaming, and a fast lookup-table/Union-Find decoder for small codes. Target sub-µs per-round decode in the common case.
  • Validate against demonstrated numbers: aim for the ~10–20 µs reaction-time regime that Riverlane and Google have shown is achievable.
  • Benchmark to advance: sustained real-time decoding over ≥10⁶ rounds without backlog at distance-5, with feed-forward latency under the device's coherence-limited deadline.

Phase 3 (~2028–2030+): FTQC microkernel. Track IBM Starling (2029, 200 logical qubits), Quantinuum Apollo, and QuEra/Pasqal's logical-qubit roadmaps.

  • Build the full 6-layer Plane-C stack: windowed/streaming decoder (UF inner + MWPM/neural outer), Pauli-frame manager, logical/lattice-surgery scheduler, and a dynamic magic-state-factory pipeline with explicit logical-qubit accounting for factory patches.
  • Move to ASIC or cryo-CMOS decoders/control as qubit counts and wiring demand (track Horse Ridge / Pando Tree / IBM's real-time qLDPC decoder).
  • Use RDMA/NVQLink-class interconnects (400 Gb/s, <4 µs) between the real-time host and the control system.
  • Implement Plane C in no-heap Rust / HDL with statically bounded worst-case execution time; treat it as safety-critical software.
  • Benchmark to advance: execute a non-Clifford logical circuit (real T-gate injections via distilled magic states) with the decoder keeping pace at the logical clock — i.e., f = r_gen/r_proc < 1 sustained.

Cross-cutting: keep the three planes physically and architecturally separated; never let Plane A's optimization logic or Plane B's GC-bearing runtime leak into Plane C's deadline domain. Make the HAL a capability descriptor, not a universal abstraction. Make provenance non-negotiable. Build the solver policy (classical fallback) early — the cheapest QPU-second is the one you don't spend.


Caveats

  • Hardware roadmaps are vendor projections. IBM Starling (2029, 200 logical qubits per IBM's June 2025 roadmap), Quantinuum Apollo, and QuEra/Pasqal logical-qubit targets are stated plans, not delivered systems; dates and counts may slip. The design is staged precisely so each phase delivers value independent of whether the next-phase hardware arrives on schedule.
  • Some Plane-C numbers are vendor-reported. Riverlane's 16.32 µs and <400 ns QECi figures come from Riverlane communications and tech press; the 63 µs (Google, Nature 2024) and 6.5 µs (Rigetti–Riverlane) figures are in peer-reviewed/primary sources. The "440" figure is a syndrome bit count at distance-21, not a latency — earlier informal framings sometimes conflate it with a latency budget.
  • Forward-looking capabilities flagged as such: Riverlane Deltaflow 3 "streaming logic" and large-scale NVQLink adoption are announced, not yet demonstrated at scale; cryo-CMOS decoding is early-stage.
  • The Problem-IR and Bayesian-planner layers are the most speculative parts of this design. They are valuable for scientific-campaign workloads (chemistry/materials sweeps) but should remain optional; most production users today submit circuits, and forcing intent-level abstraction on them adds latency and complexity without benefit.
  • The no-cloning constraint is permanent. No amount of OS engineering will give you quantum swap, checkpoint, or live migration. Designs that assume classical fault-tolerance-by-duplication are fundamentally mistaken; resilience in the quantum domain comes only from QEC and from saving classical shadows (Pauli frames, parameters, records).
Helper Disconnected