Research / Forge

Forge

The Homoiconic Context: S-Expression Architecture Descriptors as Navigation Primitives for AI Coding Agents

Ruoqi Jin·April 2026·arXiv:2604.13108

The Problem

AI coding agents spend most of their time lost. Given a task in an unfamiliar codebase, they grep for keywords, open wrong files, backtrack, and retry — burning tokens on navigation instead of the actual work. The variance is enormous: the same agent on the same task might take 4 steps or 14, depending on which files it happens to open first.

Every major AI coding tool — Cursor, Devin, SWE-agent — attacks this problem by making the agent smarter. Better prompts, better retrieval, better planning. But the fundamental issue isn't agent intelligence. It's that codebases are illegible. There's no map.

The Insight: Formalization Matters, Format Doesn't

Our controlled experiments (25 tasks × 3 conditions, Sonnet 4.6) showed that providing any form of architectural context — S-expressions, Markdown, JSON — reduces navigation steps by 33–44% (Cohen's d=0.92, Wilcoxon p=0.009). The format made no measurable difference to the LLM.

This was surprising. We expected S-expressions to outperform Markdown. They didn't. The LLM doesn't care about syntax. What matters is that someone sat down and wrote a formal architectural declaration at all.

But if format doesn't matter to the reader, it matters enormously to the writer and the toolchain. And that's where S-expressions win.

Why S-Expressions?

We tested four formats across generation fidelity and error resilience (96 generation runs + 93 fault injections, Sonnet 4.6):

FormatParse ValidSilent CorruptionFatal Flaw
S-expr96%50%None
JSON100%21%Atomic failure (one error kills entire file)
YAML92%43%Silent semantic corruption
MarkdownN/AN/AZero error detection (0%)

S-expressions are the only format without a fatal flaw. JSON breaks atomically — one misplaced comma destroys the entire file. YAML silently corrupts semantics 43% of the time. Markdown is unparseable by definition. S-expressions degrade gracefully: errors are always detected, and a fault-tolerant parser can recover partial structure.

Field Observations

Beyond controlled experiments, we observed the effect of intent.lisp files across 4 production projects over 8 weeks:

  • IQR reduction: 52% — Agent behavior variance dropped by half. The spread between best-case and worst-case navigation collapsed.
  • Mid-task confusion: −27% — Fewer mid-edit exploratory searches (grepping for files the agent should already know about).
  • Pre-edit exploration: +4× — Agents shifted from reactive search to upfront planning. They read the intent file first, then edited.
  • Compression ratio: 34:1 — A 200-line intent file describes architecture that would take 6,800 lines of code to express directly.

The Formalization Effect

Experiment E tested whether the value comes from the file's content or the process of creating it. We compared auto-generated intent files (170 lines, created by Claude in seconds) against hand-curated ones (698 lines, refined over weeks):

Auto-generated: 100% task completion. Blind (no context): 80%. (d=1.04, p=0.002)

The auto-generated file outperformed blind — even though it was shorter, less detailed, and created without human refinement. This suggests the value lies in having any formal architectural declaration, not in the quality of the declaration itself.

Architecture: intent.lisp

An intent.lisp file declares what a project is, not how it's implemented. A typical file contains:

(intent my-service
  (service-type "rust-axum")
  (database "postgresql (SQLx)")

  (pillar api
    (route /api/users
      (GET  :purpose "list users" :auth required)
      (POST :purpose "create user" :auth admin))
    (route /api/health
      (GET  :purpose "liveness check" :auth none)))

  (pillar data-layer
    (table users
      :columns (id:uuid name:text email:text)
      :constraints "UNIQUE(email)"))

  (downstream-services
    (calls auth-service :endpoint "https://auth.example.com")
    (calls object-storage :purpose "file uploads")))

The syntax is deliberately minimal. Only atoms and parenthesized lists. No special forms, no macros, no evaluation. A human can read it. A 150-line parser can parse it. An LLM can generate it.

Honest Limitations

  • N=1 developer — All field observations come from a single developer's projects. We compensate with controlled experiments, but ecological validity remains limited.
  • S-expr error localization is poor — When errors are detected, the parser reports EOF rather than the actual error location (40-line offset in our tests). A fault-tolerant parser would improve this.
  • Same-session A/B showed no effect — Within sessions that had intent.lisp available, sessions where the agent read it vs. didn't read it showed identical efficiency (1.65 vs 1.65). The effect appears to be architectural, not per-session.
  • Multi-agent coordination is untested — We hypothesize that S-expressions enable partition-based agent coordination (each agent owns a pillar). This is future work.

The Thesis

AI coding agents don't need better models. They need better maps.

The act of formalizing a project's architecture into a machine-readable declaration — any format, any level of detail — cuts navigation waste by a third and halves behavioral variance. The format is an engineering tradeoff, not a scientific finding. S-expressions happen to be the most favorable tradeoff among tested formats: parseable, generatable, compact, and gracefully degrading.

One person managing 20 microservices with AI agents doesn't need 20× smarter agents. They need 20 intent files.

Citation

Jin, R. (2026). Formal Architecture Descriptors as
Navigation Primitives for AI Coding Agents.
arXiv:2604.13108. https://arxiv.org/abs/2604.13108
Helper Disconnected