Research / MissionD
MissionD
Multi-Agent Orchestration Daemon: Semantic Terminal Understanding, Persistent Knowledge, and 71 MCP Tools for AI Coding Agents
The Problem: Developers as AI Babysitters
The current AI-assisted development experience has a dirty secret: developers have become babysitters. You open a terminal, invoke Claude Code or Cursor, and then — you watch. You watch it read files. You watch it think. You approve permissions. You wait. When the session ends, everything it learned vanishes. Next time, you start over.
This is the Copilot trap. The tools are getting smarter, but the developer's role hasn't fundamentally changed — you're still the bottleneck, serializing work through a single session with no memory, no background processing, and no coordination between agents.
For non-trivial codebases — the kind with 10+ microservices, complex deployment pipelines, and years of accumulated architectural decisions — the cold-start penalty is devastating. Every session repeats the same discovery work. The agent reads the same files, re-learns the same conventions, and has zero awareness of what happened in previous conversations. Meanwhile, the expensive compute sitting behind these tools is serialized into a single thread of work, while you sit there watching.
Real autopilot doesn't require a human watching the screen. That's the gap MissionD fills.
The Insight: The Terminal Is the Universal API
Every AI coding tool — Claude Code, Gemini CLI, Codex — exposes the same interface: a terminal. Not an API, not a structured protocol, but a raw PTY stream of ANSI escape codes, Unicode characters, and implicit state transitions.
This is simultaneously the problem and the opportunity. The terminal is the lowest common denominator. If you can understand what's happening in a terminal — detect when the agent is thinking, when it's asking for permission, when it's idle — you can orchestrate any AI coding tool without modifying it. No custom APIs, no vendor lock-in, no integration overhead.
MissionD is built on this insight. It manages AI coding agents the way an operating system manages processes: through a semantic terminal layer that transforms raw PTY byte streams into structured state machines, plus a persistent daemon that outlives individual sessions and accumulates knowledge over time.
Architecture: 12 Pillars, 10 Crates
MissionD is 111,000 lines of Rust across 327 source files, organized into 10 crates with strict dependency boundaries:
| Crate | Responsibility |
|---|---|
missiond-shared | CliEngine enum + default paths — zero-dep shared primitives |
missiond-semantic | Semantic terminal parser: fingerprints, state machine, pattern matching |
missiond-pty | PTY session management: spawn, read, screenshot, anomaly detection |
missiond-core | Types, DB traits, IPC, embedding, context — depends on pty + semantic + shared |
missiond-daemon | Business logic: handlers, engines, workers, LLM gateways |
missiond-mcp | MCP JSON-RPC stdio server — tool schema + dispatch |
missiond-attach | CLI utility for attaching to running PTY sessions |
missiond-runner | Claude CLI process wrapper — spawn + lifecycle management |
semantic-terminal-napi | Node.js N-API bindings for the semantic parser |
skill-store | Standalone AI skill marketplace microservice |
The system is organized into 12 architectural pillars: core business tables, observability, pipeline & code intelligence, agents & skills, state machines, event bus & workers, engines, semantic parser, LLM gateways & context pipeline, MCP dispatch, transport & bootstrap, and standalone services.
Semantic Terminal Parser
The semantic terminal parser is the foundation that makes everything else possible. It transforms raw PTY output into a finite state machine with 8 states:
Starting → Idle → Thinking → Responding → ToolRunning → Confirming
↑ ↗ ↗ ↗
Error SlashMenu (cycles back to Idle)State detection uses a multi-layer pipeline: pattern config (YAML-defined regex sets per CLI engine) → fingerprint registry (structural hashing of screen regions) → state parser (ordered rule evaluation) → confirm parser (permission dialog detection) → tool output parser (tool invocation tracking).
The parser supports multiple CLI engines — Claude Code and Gemini CLI each have their own detection logic and YAML pattern files, but share the same state machine abstraction. This means adding support for a new AI coding tool (Codex, Cursor, etc.) requires only writing a new pattern config, not modifying any orchestration logic.
A key technical challenge: terminal output is not a clean text stream. It contains ANSI escape sequences for cursor movement, color, and screen clearing. The parser must handle partial writes, screen reflows, and race conditions between PTY output and state transitions. Fingerprint-based detection (structural hashing of screen regions rather than exact string matching) provides robustness against rendering variations.
Slot-Based Compute Management
MissionD manages AI coding agents through a slot abstraction: 1 foreground slot (the user's active Claude Code session) + N background slots (daemon-managed processes). Each slot wraps a PTY session and tracks its semantic state, conversation ID, and assigned task.
The SlotManager is the single authority for slot lifecycle. It handles spawning, reclaiming, and health monitoring. Background slots can be dynamically allocated — the system maintains a pool of slots with configurable limits and automatic expiration.
On top of slots, a Board Task system provides DAG-based task management. Tasks have status (open → running → done/failed/blocked), priority levels, engineering phases (investigate → consult → plan → execute → finalize), dependency tracking, and lease-based claiming. An autopilot engine periodically ticks through the task board, dispatching eligible tasks to available slots.
Knowledge Base: Six-Stage Hybrid Retrieval Pipeline
Session amnesia is the single biggest productivity drain with AI coding tools. MissionD solves this with a persistent knowledge base (1,400+ entries across architecture memories, debug patterns, policy decisions, operational procedures) backed by PostgreSQL + SQLite dual backend via a MissionDB trait. The hard problem: how to retrieve the 10 most relevant entries in milliseconds?
Embedding Model: Qwen qwen3-embedding
Vectorization uses Alibaba's qwen3-embedding via local Ollama service, with automatic dimension detection on startup. There is no fallback, no low-quality degradation — if Ollama is unavailable, the Embedding Worker stops immediately, failing fast and surfacing the problem. Quality standards for worker output are strict: a low-tier model is never allowed to pollute MissionD's knowledge memory. All KB entries, conversation summaries, and AST node embeddings are generated asynchronously by the EmbeddingWorker and cached in memory (kb_search_cache) for zero-I/O search.
Six-Stage Retrieval Pipeline
Every KB search passes through six stages with explicit mathematical parameters:
[Stage 1] Dual-Path Recall
FTS5 full-text → up to 100 candidates (auto-fallback to LIKE for Chinese)
Vector cosine similarity → fetch_k = max(limit×3, 60) candidates
Similarity floor: cos_sim < 0.3 discarded (filter semantic noise)
[Stage 2] RRF Rank Fusion
score = 0.4/(60+rank_fts+1) + 0.6/(60+rank_vec+1)
Vector-dominant design: 60% embedding weight, 40% FTS weight
Merges ranks (not raw scores) — naturally scale-invariant
[Stage 3] Temporal Decay
decay = exp(-ln2 / half_life × age_days)
Category-specific half-lives:
debug memories → 14 days (fast decay)
ops → 21 days | bugfix → 30 days | feature → 90 days
architecture/policy/preference → never decay (evergreen)
[Stage 4] Drop-off Filter
Discard entries with RRF score < 50% of top score
Eliminates single-path weak signal noise
[Stage 5] MMR Diversity Re-ranking (explore mode)
mmr = 0.7×relevance - 0.3×max(cosine_sim to already selected)
Greedy selection ensures top-k covers distinct semantic regions
[Stage 6] Paginated Output
Default limit=10, max 50Why This Design?
Pure FTS misses semantically related knowledge with different wording (“deploy failed” vs “deployment error”). Pure vector search loses exact keyword matches (error codes, function names). RRF fuses ranks rather than raw scores, naturally handling scale differences between the two systems. Temporal decay auto-retires stale debug memories while keeping architectural decisions permanently accessible. MMR prevents the top-10 from being dominated by 10 variants of the same topic.
Conflict Detection & Utility Scoring
On every KB write, the system detects existing entries with cosine similarity > 0.82, marks contradicts edges, and halves the new entry's confidence. This prevents outdated and current answers from coexisting.
Each entry also carries a utility score (0–1): every search hit adds 0.15 × (1 - current_score), asymptotically approaching 1.0 with regular access. Low-utility entries are pruned first during garbage collection — Darwinian knowledge evolution.
Additional Knowledge Structures
- AST nodes — tree-sitter synced function/struct/enum definitions with vector embeddings for semantic code search
- Beacons — named code landmarks enabling “show me the auth middleware” queries
- Knowledge edges — typed relationships (prerequisite, supersedes, contradicts)
- FTS snippets — auto-generated context fragments (highlighted matches, max 40 tokens) with category-based detail truncation (architecture modules: strip entirely; policy: 2000 chars; default: 800 chars)
When a new session starts, the Context Pipeline assembles a budget-constrained prompt from these sources in priority order: slot environment → skill context → KB entries → conversation history → topology map → CLAUDE.md. The budget allocator ensures the assembled context fits within token limits while maximizing relevance.
18 Background Workers
The daemon runs 18 background workers organized by their LLM dependency — a critical design decision for cost control and operational safety:
| Tier | Workers | Trigger |
|---|---|---|
| Sonnet (5) | Embedding, Translation, Briefing, Architecture Maintenance, Retrospective | Channel / Interval / On-demand |
| Codex (2) | Step Narrator, Vision Worker | Event-driven (MessagePersisted) |
| Gemini (1) | Strategy Worker | Interval (300s, flag-gated) |
| Local (10) | Conversation Logger, Organizer, PTY Event, Tagger/Chunker, Experience Harvester, Reconcile (x2), AST Sync, Code Prefetch, Gemini Logger | Event-driven / Interval / Channel |
All workers implement a unified BackgroundWorker trait with a KIND constant that declares their LLM dependency. This enables the ControlTree — a hierarchical pause/resume system with cascade priority:
- Worker-level override — force-pause or force-resume individual workers (debug mode)
- Global kill switch — pause all workers at once
- Provider/domain cascade — disable all Sonnet workers by toggling one flag, or pause all knowledge-domain workers during maintenance
The ControlTree persists to disk and recovers on crash, ensuring operational state survives daemon restarts. Worker status is broadcast via a tokio watch channel for real-time observability.
Event Bus: Causal Timeline Architecture
The event bus is not a simple pub-sub channel — it is a Cognitive Timeline that guarantees persistent storage, global monotonic sequencing, and causal ordering. Every event flows through a single path:
Producer → MPSC (unbounded) → Timeline Writer → DB (seq assigned) → broadcast<TimelineEvent>The MPSC channel is deliberately unbounded: events are never dropped, preserving causal chain integrity. Queue depth is naturally bounded by SQLite write throughput (>10K TPS in WAL mode) far exceeding the peak event production rate (~50/sec).
40+ Event Variants across 8 Categories
The DaemonEvent enum defines 40+ typed variants organized into 8 categories:
- PTY events — PtyStateChanged, PtyOutput, PtyScreenshot
- Message events — ConversationMessageLogged, ImageMessageInserted
- Task events — TaskCreated, TaskCompleted, SlotTaskDispatched
- Board events — BoardTaskCreated, BoardTaskUpdated, BoardTaskClaimed
- Slot events — SlotBecameIdle, SlotStateChanged, SessionCompleted
- Knowledge events — KBBatchMutated, DeepAnalysisCompleted
- CLI engine events — CliRequestStarted, CliRequestCompleted, CliToolActivity
- Cognitive pipeline events — SessionOrganized, TurnExtracted, IntentAnalyzed
Events are split into persistent and ephemeral at the Timeline Writer. Persistent events (slot state changes, session completions, KB mutations) are written to the system_timeline table with a monotonic sequence number, then broadcast. Ephemeral events (internal worker telemetry, batch progress) are broadcast to WebSocket clients and internal consumers but skip the database — preventing timeline table inflation from high-frequency worker chatter.
Causal Chain Tracking
Every TimelineEvent carries three fields for distributed tracing: trace_id (root ID spanning an entire causal chain, typically a conversation session ID), span_id (this event's unique ID), and parent_span_id (linking child events back to their cause). When a SlotBecameIdle triggers memory extraction, which produces a KB entry, which triggers embedding — the entire chain shares a trace_id, making it possible to reconstruct causality from the timeline.
9 Event-Driven Consumers with Trailing-Edge Debounce
Rather than polling on intervals, MissionD uses event-driven consumers with trailing-edge debounce. Each consumer subscribes to the broadcast channel, filters for relevant event variants, and fires its handler only after a quiet window:
- Extraction consumer — SlotBecameIdle → 500ms debounce → schedule_memory_tasks
- Submit dispatcher — TaskCreated/TaskCompleted → 100ms debounce → dispatch_queued_submit_tasks
- Decision consumer — QuestionCreated → 100ms debounce → process_pending_master_questions
- Harvest consumer — NarrationSessionCompleted → immediate (no debounce, already infrequent)
- Realtime extraction — ConversationMessageLogged → 3s debounce → check_realtime_extraction
- Session reflection — SessionCompleted → 5s debounce → Strategy Worker + Retro Worker + deep analysis
- KB consolidation — DeepAnalysisCompleted → counter accumulation (threshold: 5) → check_kb_consolidation
- Intent analyst — TurnExtracted → 5min debounce OR 5 accumulated turns → Sonnet LLM intent detection
- Sweeper — 30min periodic + startup scan → reconciliation across all pipelines
Why trailing-edge debounce instead of polling? During a burst of activity (e.g. a background slot executing a complex task), a slot might transition to Idle and back several times in rapid succession. Polling would either miss intermediate states or waste compute re-processing unchanged data. Trailing-edge debounce absorbs the burst and fires exactly once after the storm settles — the right moment to consolidate knowledge.
Each consumer also handles broadcast::Lagged gracefully: exponential backoff (100ms → 200ms → … → 2000ms cap) with ±25% jitter prevents thundering herd after a lag event. When a consumer falls behind and the broadcast channel drops messages, it does a defensive re-process rather than silently losing data.
Cognitive Pipeline: Causal Event Chains
The most sophisticated use of the event bus is the Cognitive Pipeline — a multi-stage processing chain where each stage's output event triggers the next:
- S2 Organizer: Listens for ConversationMessageLogged → repairs compaction fragment links and orphan parent references → emits
SessionOrganized - S3 Tagger & Chunker: Listens for SessionOrganized → extracts structured Turns from flat message streams (pure rules, zero LLM calls), applies noise labels to overlong/binary tool results → emits
TurnExtracted+ sendsEmbeddingTask::ProcessTurnsto embedding channel - S4 Embedder: Receives ProcessTurns via MPSC → generates per-turn embeddings using qwen3-embedding via Ollama
- S6 Intent Analyst: Listens for TurnExtracted → accumulates turns (5-min debounce OR 5 turns threshold) → Sonnet LLM analysis detects stuck retries, architecture exploration, refactoring shifts, scope creep → emits
IntentAnalyzed
A single user message cascades through the pipeline: it arrives as a ConversationMessageLogged event, gets organized, chunked into turns, embedded for vector search, and analyzed for user intent — all through event-driven chaining with no polling, no cron jobs, no manual orchestration. Each stage operates independently with its own debounce window and backfill logic, and the sweeper provides a safety net for any events lost to broadcast lag or daemon restart.
71 MCP Tools across 4 Domains
The MCP server exposes 71 tools via JSON-RPC over stdio, organized into four domains:
| Domain | Tools | Examples |
|---|---|---|
| Compute | 18 | PTY spawn/send/read/screenshot, task submit/query/cancel/delegate, slot management, worker control |
| Knowledge | 26 | KB query/mutate/remember, board CRUD + decompose, skill query/exec, code search, embedding ops, cascade planning |
| Communication | 13 | Conversation analysis, router chat, timeline, audit trail, retrospectives, LLM traces, beacons |
| System | 14 | Daemon control, config, logs, infrastructure ops, permissions, power control, inbox, incidents |
The MCP architecture means the foreground Claude Code session (the “commander”) can control every aspect of the daemon through natural language: spawn background agents, query the knowledge base, check task status, pause workers, analyze past conversations. The tools are the vocabulary through which AI agents interact with persistent infrastructure.
Engines: Composite Orchestration
Three engine subsystems provide higher-order coordination:
- Autopilot Engine — A tick-based pipeline that runs: memory scheduling → extraction check → board task dispatch → flow progression → supervision check. It automatically claims open tasks, assigns them to available slots, monitors progress, and handles failures.
- Learning Engine — Contains the decision cascade (KB lookup → Gemini consult → decision slot → human escalation), experience extraction, intent analysis, timeline analysis, idle exploration, and historical scanning. When an agent encounters a decision point, the learning engine routes it through progressively more expensive resolution tiers.
- Slot Orchestrator — Adapters for different AI tools. The CC Controller manages Claude Code instances; the Gemini Controller manages Gemini CLI instances. Each adapter translates between the slot abstraction and the specific CLI's interaction patterns.
Decision Cascade: Graceful Degradation
When an AI agent encounters a decision point — an ambiguous requirement, an unfamiliar code pattern, a permission question — most systems either hallucinate an answer or immediately escalate to the human. MissionD implements a four-tier decision cascade that routes questions through progressively more expensive resolution channels:
[Tier 1] KB Lookup — search the knowledge base for prior decisions
↓ not found
[Tier 2] Gemini Consult — ask Gemini for strategic guidance
↓ low confidence
[Tier 3] Decision Slot — spawn a dedicated Claude Code session to research
↓ still unresolved
[Tier 4] Human Escalation — surface the question to the developerEach tier has a cost and a confidence threshold. Most questions resolve at Tier 1 (the KB already has the answer from a previous session). Tier 2 handles novel architectural questions cheaply. Tier 3 is reserved for complex decisions that require reading code. Tier 4 — human escalation — is the last resort, not the first. The result: the developer is interrupted only when genuinely needed, not on every minor decision.
Self-Learning: The Retrospective Loop
A system that doesn't learn from its failures is just an expensive automation. MissionD's retrospective pipeline ensures the system gets smarter with every session:
- Session ends — the Retro Worker automatically pulls all tool calls, error rates, and operation trajectories from the completed session
- Pattern analysis — identifies repeated mistakes, high-error tools, and inefficient sequences
- Sonnet summarization — distills the raw data into structured retrospective results
- Knowledge upsert — extracts “lessons learned” and persists them to the knowledge base as architecture memories, debug patterns, or operational procedures
This creates a compounding flywheel: the longer MissionD runs, the more it knows about your codebase. Past mistakes become future context. A bug fixed in session #47 prevents the same mistake in session #200. The Experience Harvester worker runs every 60 seconds, continuously scanning new messages for extractable knowledge. The AST Sync worker maintains a live map of every function, struct, and enum in the codebase with vector embeddings for semantic search.
The result: when a new session starts, the Context Pipeline can assemble a prompt that includes not just the relevant code, but the accumulated wisdom of all previous sessions — why certain decisions were made, what pitfalls to avoid, what patterns work best in this specific codebase.
Database: 37 Tables across 4 Pillars
The database schema is organized into four pillars, with 31,000 lines of Rust in the DB gateway layer alone:
- Core Business — board_tasks, conversations, conversation_messages, knowledge, knowledge_edges, slot_sessions, slot_tasks, tasks, daemon_state, inbox_messages, prompt_snapshots
- Observability — tool_calls, conversation_events, retrospective_results, system_timeline, gemini_requests, gemini_file_cache, incidents, token_usage, narration_cursors, message_narrations
- Pipeline & Code Intel — ast_nodes, beacons, beacon_nodes, backfill_phases, watermarks, labels, translations, image_descriptions
- Agents & Skills — agent_questions, dynamic_slots, skills, skill_topics, skill_blocks, router_chat_sessions, router_chat_messages
All database access goes through a single gateway layer in missiond-core/src/db/. Generated CRUD operations (from Forge codegen) live in a gen/ subdirectory; hand-written complex queries live alongside them. The MissionDB trait abstracts over PostgreSQL and SQLite backends, enabling the same daemon code to run against either.
State Machines
Six finite state machines govern lifecycle transitions throughout the system:
- PTY Session (8 states) — Starting → Idle → Thinking → Responding → ToolRunning → Confirming → Error, with SlashMenu as a transient state
- Board Task Status (5 states) — Open → Running → Done/Failed/Blocked
- Engineering Phase (5 states) — Investigate → Consult → Plan → Execute → Finalize (cyclic)
- Task (4 states) — Queued → Running → Completed/Failed
- Question (3 states) — Pending → Answered/Dismissed
- Extraction Phase (4 states) — Idle → Sending → WaitingForIdleness → Complete
Data Flows
Eight primary data flows thread through the system:
[1] User Message → Knowledge
PTY session → semantic parser → conversation logger
→ tagger/chunker → embedding worker → knowledge store
[2] Board Task Lifecycle
board create → autopilot tick → slot dispatch
→ CC controller → PTY session → result harvest → board done
[3] Decision Cascade
question raised → KB lookup → Gemini consult
→ decision slot → human escalation → answer routed
[4] MCP Request
stdio JSON-RPC → MCP server → IPC bridge
→ handler dispatch → DB query → JSON-RPC response
[5] Context Assembly
slot activated → context pipeline → budget allocator
→ source ranker → KB/skill/history fetch → truncation → prompt
[6] Retrospective
session end → retro worker → tool stats → pattern analysis
→ Sonnet summarize → retrospective result → knowledge upsert
[7] Embedding Pipeline
content created → embedding worker → model inference
→ vector storage → search index ready
[8] Gemini LLM Call
handler request → LLM gate → rate check
→ Gemini client → API call → Gemini logger → responseLLM Gateway: Multi-Provider Dispatch
The daemon integrates three LLM providers through a unified gateway: Sonnet (via API for embeddings, translations, summaries), Gemini (via API and CLI for strategic analysis), and MiniMax (legacy, being phased out). An LLM Gate provides rate limiting, 429 backoff, and quota tracking across all providers.
The Gemini integration is particularly deep: file-level caching (hash-based dedup with TTL), streaming responses, and a dedicated Gemini CLI controller that manages Gemini as a PTY process alongside Claude Code instances.
Transport & Frontends
The daemon communicates through three transport layers:
- IPC — Unix socket / TCP between the MCP server and the daemon process
- WebSocket — tokio-tungstenite for real-time board UI and screenshot streaming
- PTY — pseudo-terminal sessions for managing AI coding tool processes
A web-based Board Frontend (React + TypeScript) provides a dashboard for monitoring slots, tasks, conversations, knowledge entries, questions, timeline events, architecture summaries, and system health. It connects to the daemon via API routes that proxy to the IPC layer.
Daemon Bootstrap
Initialization follows a strict 6-phase dependency order:
Phase 1: Infrastructure
DB → Embedding Model → Event Bus
Phase 2: Core Modules
→ PTY Manager → Slot Manager → Mission Control
Phase 3: Gateways
→ Gemini Gateway → Sonnet Gateway → LLM Gateway
Phase 4: Pipelines
→ Context Pipeline → Worker Registry → Control Tree
Phase 5: Workers (18 spawns)
→ All background workers
Phase 6: Engines
→ Autopilot → IPC Handler → WebSocket ServerEach component declares its dependencies explicitly. The PTY Manager depends on the Event Bus. The Slot Manager depends on DB, PTY Manager, and Event Bus. The Autopilot Engine depends on DB, Slot Manager, Event Bus, LLM Gateway, and Context Pipeline. This DAG ensures initialization never races.
The Thesis: From Copilot to Autopilot
Software engineering is stuck in a transition. We've moved from “developer writes everything” to “developer + AI copilot” — but the next step, true autopilot, requires infrastructure that doesn't exist yet. Individual AI agents are increasingly capable, but they're trapped in stateless, single-threaded execution environments with no memory, no coordination, and no self-improvement.
MissionD argues that the missing layer is an operating system for AI coding agents — one that provides what every OS provides: process management (slots), persistent storage (knowledge base), inter-process communication (event bus + MCP), I/O abstraction (semantic terminal), and resource scheduling (ControlTree + autopilot).
The key architectural bet is that the terminal is the right abstraction boundary. By understanding terminals semantically rather than requiring custom APIs, MissionD can orchestrate any AI coding tool — present and future — without vendor cooperation. The semantic parser turns the chaos of raw PTY output into the structured state that orchestration requires.
The decision cascade ensures agents only escalate to humans when genuinely stuck. The retrospective loop means the system gets smarter with every session. The knowledge base means no context is ever truly lost.
The result: a single developer can operate a fleet of AI coding agents that share knowledge, coordinate tasks, and learn from every session. The developer's role shifts from watching the AI work to declaring intent and reviewing results. The cold-start problem becomes an infrastructure problem with an engineering solution. The babysitter becomes a commander.
By the Numbers
| Metric | Value |
|---|---|
| Rust source lines | 111,000 |
| Source files | 327 |
| Crates | 10 |
| Database tables | 37 |
| Background workers | 18 |
| MCP tools | 71 |
| State machines | 6 |
| Data flows | 8 |
| Intent declaration | 1,317 lines of S-expressions |