Ruoqi Jin

The Problem: Developers as AI Babysitters

The current AI-assisted development experience has a dirty secret: developers have become babysitters. You open a terminal, invoke Claude Code or Cursor, and then — you watch. You watch it read files. You watch it think. You approve permissions. You wait. When the session ends, everything it learned vanishes. Next time, you start over.

This is the Copilot trap. The tools are getting smarter, but the developer's role hasn't fundamentally changed — you're still the bottleneck, serializing work through a single session with no memory, no background processing, and no coordination between agents.

For non-trivial codebases — the kind with 10+ microservices, complex deployment pipelines, and years of accumulated architectural decisions — the cold-start penalty is devastating. Every session repeats the same discovery work. The agent reads the same files, re-learns the same conventions, and has zero awareness of what happened in previous conversations. Meanwhile, the expensive compute sitting behind these tools is serialized into a single thread of work, while you sit there watching.

Real autopilot doesn't require a human watching the screen. That's the gap MissionD fills.

The Insight: The Terminal Is the Universal API

Every AI coding tool — Claude Code, Gemini CLI, Codex — exposes the same interface: a terminal. Not an API, not a structured protocol, but a raw PTY stream of ANSI escape codes, Unicode characters, and implicit state transitions.

This is simultaneously the problem and the opportunity. The terminal is the lowest common denominator. If you can understand what's happening in a terminal — detect when the agent is thinking, when it's asking for permission, when it's idle — you can orchestrate any AI coding tool without modifying it. No custom APIs, no vendor lock-in, no integration overhead.

MissionD is built on this insight. It manages AI coding agents the way an operating system manages processes: through a semantic terminal layer that transforms raw PTY byte streams into structured state machines, plus a persistent daemon that outlives individual sessions and accumulates knowledge over time.

Architecture: 12 Pillars, 10 Crates

MissionD is 111,000 lines of Rust across 327 source files, organized into 10 crates with strict dependency boundaries:

Crate	Responsibility
`missiond-shared`	CliEngine enum + default paths — zero-dep shared primitives
`missiond-semantic`	Semantic terminal parser: fingerprints, state machine, pattern matching
`missiond-pty`	PTY session management: spawn, read, screenshot, anomaly detection
`missiond-core`	Types, DB traits, IPC, embedding, context — depends on pty + semantic + shared
`missiond-daemon`	Business logic: handlers, engines, workers, LLM gateways
`missiond-mcp`	MCP JSON-RPC stdio server — tool schema + dispatch
`missiond-attach`	CLI utility for attaching to running PTY sessions
`missiond-runner`	Claude CLI process wrapper — spawn + lifecycle management
`semantic-terminal-napi`	Node.js N-API bindings for the semantic parser
`skill-store`	Standalone AI skill marketplace microservice

The system is organized into 12 architectural pillars: core business tables, observability, pipeline & code intelligence, agents & skills, state machines, event bus & workers, engines, semantic parser, LLM gateways & context pipeline, MCP dispatch, transport & bootstrap, and standalone services.

Semantic Terminal Parser

The semantic terminal parser is the foundation that makes everything else possible. It transforms raw PTY output into a finite state machine with 8 states:

Starting → Idle → Thinking → Responding → ToolRunning → Confirming
                ↑         ↗         ↗              ↗
              Error   SlashMenu  (cycles back to Idle)

State detection uses a multi-layer pipeline: pattern config (YAML-defined regex sets per CLI engine) → fingerprint registry (structural hashing of screen regions) → state parser (ordered rule evaluation) → confirm parser (permission dialog detection) → tool output parser (tool invocation tracking).

The parser supports multiple CLI engines — Claude Code and Gemini CLI each have their own detection logic and YAML pattern files, but share the same state machine abstraction. This means adding support for a new AI coding tool (Codex, Cursor, etc.) requires only writing a new pattern config, not modifying any orchestration logic.

A key technical challenge: terminal output is not a clean text stream. It contains ANSI escape sequences for cursor movement, color, and screen clearing. The parser must handle partial writes, screen reflows, and race conditions between PTY output and state transitions. Fingerprint-based detection (structural hashing of screen regions rather than exact string matching) provides robustness against rendering variations.

Slot-Based Compute Management

MissionD manages AI coding agents through a slot abstraction: 1 foreground slot (the user's active Claude Code session) + N background slots (daemon-managed processes). Each slot wraps a PTY session and tracks its semantic state, conversation ID, and assigned task.

The SlotManager is the single authority for slot lifecycle. It handles spawning, reclaiming, and health monitoring. Background slots can be dynamically allocated — the system maintains a pool of slots with configurable limits and automatic expiration.

On top of slots, a Board Task system provides DAG-based task management. Tasks have status (open → running → done/failed/blocked), priority levels, engineering phases (investigate → consult → plan → execute → finalize), dependency tracking, and lease-based claiming. An autopilot engine periodically ticks through the task board, dispatching eligible tasks to available slots.

Knowledge Base: Six-Stage Hybrid Retrieval Pipeline

Session amnesia is the single biggest productivity drain with AI coding tools. MissionD solves this with a persistent knowledge base (1,400+ entries across architecture memories, debug patterns, policy decisions, operational procedures) backed by PostgreSQL + SQLite dual backend via a MissionDB trait. The hard problem: how to retrieve the 10 most relevant entries in milliseconds?

Embedding Model: Qwen qwen3-embedding

Vectorization uses Alibaba's qwen3-embedding via local Ollama service, with automatic dimension detection on startup. There is no fallback, no low-quality degradation — if Ollama is unavailable, the Embedding Worker stops immediately, failing fast and surfacing the problem. Quality standards for worker output are strict: a low-tier model is never allowed to pollute MissionD's knowledge memory. All KB entries, conversation summaries, and AST node embeddings are generated asynchronously by the EmbeddingWorker and cached in memory (kb_search_cache) for zero-I/O search.

Six-Stage Retrieval Pipeline

Every KB search passes through six stages with explicit mathematical parameters:

[Stage 1] Dual-Path Recall
  FTS5 full-text → up to 100 candidates (auto-fallback to LIKE for Chinese)
  Vector cosine similarity → fetch_k = max(limit×3, 60) candidates
  Similarity floor: cos_sim < 0.3 discarded (filter semantic noise)

[Stage 2] RRF Rank Fusion
  score = 0.4/(60+rank_fts+1) + 0.6/(60+rank_vec+1)
  Vector-dominant design: 60% embedding weight, 40% FTS weight
  Merges ranks (not raw scores) — naturally scale-invariant

[Stage 3] Temporal Decay
  decay = exp(-ln2 / half_life × age_days)
  Category-specific half-lives:
    debug memories → 14 days (fast decay)
    ops → 21 days | bugfix → 30 days | feature → 90 days
    architecture/policy/preference → never decay (evergreen)

[Stage 4] Drop-off Filter
  Discard entries with RRF score < 50% of top score
  Eliminates single-path weak signal noise

[Stage 5] MMR Diversity Re-ranking (explore mode)
  mmr = 0.7×relevance - 0.3×max(cosine_sim to already selected)
  Greedy selection ensures top-k covers distinct semantic regions

[Stage 6] Paginated Output
  Default limit=10, max 50

Why This Design?

Pure FTS misses semantically related knowledge with different wording (“deploy failed” vs “deployment error”). Pure vector search loses exact keyword matches (error codes, function names). RRF fuses ranks rather than raw scores, naturally handling scale differences between the two systems. Temporal decay auto-retires stale debug memories while keeping architectural decisions permanently accessible. MMR prevents the top-10 from being dominated by 10 variants of the same topic.

Conflict Detection & Utility Scoring

On every KB write, the system detects existing entries with cosine similarity > 0.82, marks contradicts edges, and halves the new entry's confidence. This prevents outdated and current answers from coexisting.

Each entry also carries a utility score (0–1): every search hit adds 0.15 × (1 - current_score), asymptotically approaching 1.0 with regular access. Low-utility entries are pruned first during garbage collection — Darwinian knowledge evolution.

Additional Knowledge Structures

AST nodes — tree-sitter synced function/struct/enum definitions with vector embeddings for semantic code search
Beacons — named code landmarks enabling “show me the auth middleware” queries
Knowledge edges — typed relationships (prerequisite, supersedes, contradicts)
FTS snippets — auto-generated context fragments (highlighted matches, max 40 tokens) with category-based detail truncation (architecture modules: strip entirely; policy: 2000 chars; default: 800 chars)

When a new session starts, the Context Pipeline assembles a budget-constrained prompt from these sources in priority order: slot environment → skill context → KB entries → conversation history → topology map → CLAUDE.md. The budget allocator ensures the assembled context fits within token limits while maximizing relevance.

18 Background Workers

The daemon runs 18 background workers organized by their LLM dependency — a critical design decision for cost control and operational safety:

Tier	Workers	Trigger
Sonnet (5)	Embedding, Translation, Briefing, Architecture Maintenance, Retrospective	Channel / Interval / On-demand
Codex (2)	Step Narrator, Vision Worker	Event-driven (MessagePersisted)
Gemini (1)	Strategy Worker	Interval (300s, flag-gated)
Local (10)	Conversation Logger, Organizer, PTY Event, Tagger/Chunker, Experience Harvester, Reconcile (x2), AST Sync, Code Prefetch, Gemini Logger	Event-driven / Interval / Channel

All workers implement a unified BackgroundWorker trait with a KIND constant that declares their LLM dependency. This enables the ControlTree — a hierarchical pause/resume system with cascade priority:

Worker-level override — force-pause or force-resume individual workers (debug mode)
Global kill switch — pause all workers at once
Provider/domain cascade — disable all Sonnet workers by toggling one flag, or pause all knowledge-domain workers during maintenance

The ControlTree persists to disk and recovers on crash, ensuring operational state survives daemon restarts. Worker status is broadcast via a tokio watch channel for real-time observability.

Event Bus: Causal Timeline Architecture

The event bus is not a simple pub-sub channel — it is a Cognitive Timeline that guarantees persistent storage, global monotonic sequencing, and causal ordering. Every event flows through a single path:

Producer → MPSC (unbounded) → Timeline Writer → DB (seq assigned) → broadcast<TimelineEvent>

The MPSC channel is deliberately unbounded: events are never dropped, preserving causal chain integrity. Queue depth is naturally bounded by SQLite write throughput (>10K TPS in WAL mode) far exceeding the peak event production rate (~50/sec).

40+ Event Variants across 8 Categories

The DaemonEvent enum defines 40+ typed variants organized into 8 categories:

PTY events — PtyStateChanged, PtyOutput, PtyScreenshot
Message events — ConversationMessageLogged, ImageMessageInserted
Task events — TaskCreated, TaskCompleted, SlotTaskDispatched
Board events — BoardTaskCreated, BoardTaskUpdated, BoardTaskClaimed
Slot events — SlotBecameIdle, SlotStateChanged, SessionCompleted
Knowledge events — KBBatchMutated, DeepAnalysisCompleted
CLI engine events — CliRequestStarted, CliRequestCompleted, CliToolActivity
Cognitive pipeline events — SessionOrganized, TurnExtracted, IntentAnalyzed

Events are split into persistent and ephemeral at the Timeline Writer. Persistent events (slot state changes, session completions, KB mutations) are written to the system_timeline table with a monotonic sequence number, then broadcast. Ephemeral events (internal worker telemetry, batch progress) are broadcast to WebSocket clients and internal consumers but skip the database — preventing timeline table inflation from high-frequency worker chatter.

Causal Chain Tracking

Every TimelineEvent carries three fields for distributed tracing: trace_id (root ID spanning an entire causal chain, typically a conversation session ID), span_id (this event's unique ID), and parent_span_id (linking child events back to their cause). When a SlotBecameIdle triggers memory extraction, which produces a KB entry, which triggers embedding — the entire chain shares a trace_id, making it possible to reconstruct causality from the timeline.

9 Event-Driven Consumers with Trailing-Edge Debounce

Rather than polling on intervals, MissionD uses event-driven consumers with trailing-edge debounce. Each consumer subscribes to the broadcast channel, filters for relevant event variants, and fires its handler only after a quiet window:

Extraction consumer — SlotBecameIdle → 500ms debounce → schedule_memory_tasks
Submit dispatcher — TaskCreated/TaskCompleted → 100ms debounce → dispatch_queued_submit_tasks
Decision consumer — QuestionCreated → 100ms debounce → process_pending_master_questions
Harvest consumer — NarrationSessionCompleted → immediate (no debounce, already infrequent)
Realtime extraction — ConversationMessageLogged → 3s debounce → check_realtime_extraction
Session reflection — SessionCompleted → 5s debounce → Strategy Worker + Retro Worker + deep analysis
KB consolidation — DeepAnalysisCompleted → counter accumulation (threshold: 5) → check_kb_consolidation
Intent analyst — TurnExtracted → 5min debounce OR 5 accumulated turns → Sonnet LLM intent detection
Sweeper — 30min periodic + startup scan → reconciliation across all pipelines

Why trailing-edge debounce instead of polling? During a burst of activity (e.g. a background slot executing a complex task), a slot might transition to Idle and back several times in rapid succession. Polling would either miss intermediate states or waste compute re-processing unchanged data. Trailing-edge debounce absorbs the burst and fires exactly once after the storm settles — the right moment to consolidate knowledge.

Each consumer also handles broadcast::Lagged gracefully: exponential backoff (100ms → 200ms → … → 2000ms cap) with ±25% jitter prevents thundering herd after a lag event. When a consumer falls behind and the broadcast channel drops messages, it does a defensive re-process rather than silently losing data.

Cognitive Pipeline: Causal Event Chains

The most sophisticated use of the event bus is the Cognitive Pipeline — a multi-stage processing chain where each stage's output event triggers the next:

S2 Organizer: Listens for ConversationMessageLogged → repairs compaction fragment links and orphan parent references → emits SessionOrganized
S3 Tagger & Chunker: Listens for SessionOrganized → extracts structured Turns from flat message streams (pure rules, zero LLM calls), applies noise labels to overlong/binary tool results → emits TurnExtracted + sends EmbeddingTask::ProcessTurns to embedding channel
S4 Embedder: Receives ProcessTurns via MPSC → generates per-turn embeddings using qwen3-embedding via Ollama
S6 Intent Analyst: Listens for TurnExtracted → accumulates turns (5-min debounce OR 5 turns threshold) → Sonnet LLM analysis detects stuck retries, architecture exploration, refactoring shifts, scope creep → emits IntentAnalyzed

A single user message cascades through the pipeline: it arrives as a ConversationMessageLogged event, gets organized, chunked into turns, embedded for vector search, and analyzed for user intent — all through event-driven chaining with no polling, no cron jobs, no manual orchestration. Each stage operates independently with its own debounce window and backfill logic, and the sweeper provides a safety net for any events lost to broadcast lag or daemon restart.

71 MCP Tools across 4 Domains

The MCP server exposes 71 tools via JSON-RPC over stdio, organized into four domains:

Domain	Tools	Examples
Compute	18	PTY spawn/send/read/screenshot, task submit/query/cancel/delegate, slot management, worker control
Knowledge	26	KB query/mutate/remember, board CRUD + decompose, skill query/exec, code search, embedding ops, cascade planning
Communication	13	Conversation analysis, router chat, timeline, audit trail, retrospectives, LLM traces, beacons
System	14	Daemon control, config, logs, infrastructure ops, permissions, power control, inbox, incidents

The MCP architecture means the foreground Claude Code session (the “commander”) can control every aspect of the daemon through natural language: spawn background agents, query the knowledge base, check task status, pause workers, analyze past conversations. The tools are the vocabulary through which AI agents interact with persistent infrastructure.

Engines: Composite Orchestration

Three engine subsystems provide higher-order coordination:

Autopilot Engine — A tick-based pipeline that runs: memory scheduling → extraction check → board task dispatch → flow progression → supervision check. It automatically claims open tasks, assigns them to available slots, monitors progress, and handles failures.
Learning Engine — Contains the decision cascade (KB lookup → Gemini consult → decision slot → human escalation), experience extraction, intent analysis, timeline analysis, idle exploration, and historical scanning. When an agent encounters a decision point, the learning engine routes it through progressively more expensive resolution tiers.
Slot Orchestrator — Adapters for different AI tools. The CC Controller manages Claude Code instances; the Gemini Controller manages Gemini CLI instances. Each adapter translates between the slot abstraction and the specific CLI's interaction patterns.

Decision Cascade: Graceful Degradation

When an AI agent encounters a decision point — an ambiguous requirement, an unfamiliar code pattern, a permission question — most systems either hallucinate an answer or immediately escalate to the human. MissionD implements a four-tier decision cascade that routes questions through progressively more expensive resolution channels:

[Tier 1] KB Lookup — search the knowledge base for prior decisions
    ↓ not found
[Tier 2] Gemini Consult — ask Gemini for strategic guidance
    ↓ low confidence
[Tier 3] Decision Slot — spawn a dedicated Claude Code session to research
    ↓ still unresolved
[Tier 4] Human Escalation — surface the question to the developer

Each tier has a cost and a confidence threshold. Most questions resolve at Tier 1 (the KB already has the answer from a previous session). Tier 2 handles novel architectural questions cheaply. Tier 3 is reserved for complex decisions that require reading code. Tier 4 — human escalation — is the last resort, not the first. The result: the developer is interrupted only when genuinely needed, not on every minor decision.

Self-Learning: The Retrospective Loop

A system that doesn't learn from its failures is just an expensive automation. MissionD's retrospective pipeline ensures the system gets smarter with every session:

Session ends — the Retro Worker automatically pulls all tool calls, error rates, and operation trajectories from the completed session
Pattern analysis — identifies repeated mistakes, high-error tools, and inefficient sequences
Sonnet summarization — distills the raw data into structured retrospective results
Knowledge upsert — extracts “lessons learned” and persists them to the knowledge base as architecture memories, debug patterns, or operational procedures

This creates a compounding flywheel: the longer MissionD runs, the more it knows about your codebase. Past mistakes become future context. A bug fixed in session #47 prevents the same mistake in session #200. The Experience Harvester worker runs every 60 seconds, continuously scanning new messages for extractable knowledge. The AST Sync worker maintains a live map of every function, struct, and enum in the codebase with vector embeddings for semantic search.

The result: when a new session starts, the Context Pipeline can assemble a prompt that includes not just the relevant code, but the accumulated wisdom of all previous sessions — why certain decisions were made, what pitfalls to avoid, what patterns work best in this specific codebase.

Database: 37 Tables across 4 Pillars

The database schema is organized into four pillars, with 31,000 lines of Rust in the DB gateway layer alone:

Core Business — board_tasks, conversations, conversation_messages, knowledge, knowledge_edges, slot_sessions, slot_tasks, tasks, daemon_state, inbox_messages, prompt_snapshots
Observability — tool_calls, conversation_events, retrospective_results, system_timeline, gemini_requests, gemini_file_cache, incidents, token_usage, narration_cursors, message_narrations
Pipeline & Code Intel — ast_nodes, beacons, beacon_nodes, backfill_phases, watermarks, labels, translations, image_descriptions
Agents & Skills — agent_questions, dynamic_slots, skills, skill_topics, skill_blocks, router_chat_sessions, router_chat_messages

All database access goes through a single gateway layer in missiond-core/src/db/. Generated CRUD operations (from Forge codegen) live in a gen/ subdirectory; hand-written complex queries live alongside them. The MissionDB trait abstracts over PostgreSQL and SQLite backends, enabling the same daemon code to run against either.

State Machines

Six finite state machines govern lifecycle transitions throughout the system:

PTY Session (8 states) — Starting → Idle → Thinking → Responding → ToolRunning → Confirming → Error, with SlashMenu as a transient state
Board Task Status (5 states) — Open → Running → Done/Failed/Blocked
Engineering Phase (5 states) — Investigate → Consult → Plan → Execute → Finalize (cyclic)
Task (4 states) — Queued → Running → Completed/Failed
Question (3 states) — Pending → Answered/Dismissed
Extraction Phase (4 states) — Idle → Sending → WaitingForIdleness → Complete

Data Flows

Eight primary data flows thread through the system:

[1] User Message → Knowledge
    PTY session → semantic parser → conversation logger
    → tagger/chunker → embedding worker → knowledge store

[2] Board Task Lifecycle
    board create → autopilot tick → slot dispatch
    → CC controller → PTY session → result harvest → board done

[3] Decision Cascade
    question raised → KB lookup → Gemini consult
    → decision slot → human escalation → answer routed

[4] MCP Request
    stdio JSON-RPC → MCP server → IPC bridge
    → handler dispatch → DB query → JSON-RPC response

[5] Context Assembly
    slot activated → context pipeline → budget allocator
    → source ranker → KB/skill/history fetch → truncation → prompt

[6] Retrospective
    session end → retro worker → tool stats → pattern analysis
    → Sonnet summarize → retrospective result → knowledge upsert

[7] Embedding Pipeline
    content created → embedding worker → model inference
    → vector storage → search index ready

[8] Gemini LLM Call
    handler request → LLM gate → rate check
    → Gemini client → API call → Gemini logger → response

LLM Gateway: Multi-Provider Dispatch

The daemon integrates three LLM providers through a unified gateway: Sonnet (via API for embeddings, translations, summaries), Gemini (via API and CLI for strategic analysis), and MiniMax (legacy, being phased out). An LLM Gate provides rate limiting, 429 backoff, and quota tracking across all providers.

The Gemini integration is particularly deep: file-level caching (hash-based dedup with TTL), streaming responses, and a dedicated Gemini CLI controller that manages Gemini as a PTY process alongside Claude Code instances.

Transport & Frontends

The daemon communicates through three transport layers:

IPC — Unix socket / TCP between the MCP server and the daemon process
WebSocket — tokio-tungstenite for real-time board UI and screenshot streaming
PTY — pseudo-terminal sessions for managing AI coding tool processes

A web-based Board Frontend (React + TypeScript) provides a dashboard for monitoring slots, tasks, conversations, knowledge entries, questions, timeline events, architecture summaries, and system health. It connects to the daemon via API routes that proxy to the IPC layer.

Daemon Bootstrap

Initialization follows a strict 6-phase dependency order:

Phase 1: Infrastructure
  DB → Embedding Model → Event Bus

Phase 2: Core Modules
  → PTY Manager → Slot Manager → Mission Control

Phase 3: Gateways
  → Gemini Gateway → Sonnet Gateway → LLM Gateway

Phase 4: Pipelines
  → Context Pipeline → Worker Registry → Control Tree

Phase 5: Workers (18 spawns)
  → All background workers

Phase 6: Engines
  → Autopilot → IPC Handler → WebSocket Server

Each component declares its dependencies explicitly. The PTY Manager depends on the Event Bus. The Slot Manager depends on DB, PTY Manager, and Event Bus. The Autopilot Engine depends on DB, Slot Manager, Event Bus, LLM Gateway, and Context Pipeline. This DAG ensures initialization never races.

The Thesis: From Copilot to Autopilot

Software engineering is stuck in a transition. We've moved from “developer writes everything” to “developer + AI copilot” — but the next step, true autopilot, requires infrastructure that doesn't exist yet. Individual AI agents are increasingly capable, but they're trapped in stateless, single-threaded execution environments with no memory, no coordination, and no self-improvement.

MissionD argues that the missing layer is an operating system for AI coding agents — one that provides what every OS provides: process management (slots), persistent storage (knowledge base), inter-process communication (event bus + MCP), I/O abstraction (semantic terminal), and resource scheduling (ControlTree + autopilot).

The key architectural bet is that the terminal is the right abstraction boundary. By understanding terminals semantically rather than requiring custom APIs, MissionD can orchestrate any AI coding tool — present and future — without vendor cooperation. The semantic parser turns the chaos of raw PTY output into the structured state that orchestration requires.

The decision cascade ensures agents only escalate to humans when genuinely stuck. The retrospective loop means the system gets smarter with every session. The knowledge base means no context is ever truly lost.

The result: a single developer can operate a fleet of AI coding agents that share knowledge, coordinate tasks, and learn from every session. The developer's role shifts from watching the AI work to declaring intent and reviewing results. The cold-start problem becomes an infrastructure problem with an engineering solution. The babysitter becomes a commander.

By the Numbers

Metric	Value
Rust source lines	111,000
Source files	327
Crates	10
Database tables	37
Background workers	18
MCP tools	71
State machines	6
Data flows	8
Intent declaration	1,317 lines of S-expressions