03 / 20 Query Loop src/query.ts (1,729 lines) — async generator

Input → API → Tools → Loop.

The conversation loop is an async generator — the REPL consumes streaming events via for await. Tools start executing while the API response is still streaming. Post-sampling hooks run compaction, memory extraction, and optional dream mode.

1,729
Lines in query.ts — the heart of the system
async*
Generator — yields StreamEvent, Message, Terminal
Ctrl+C
Interrupt — aborts tools, preserves conversation

The Core Cycle

Every turn follows this exact sequence. The loop continues as long as the model returns tool_use blocks.

1
User sends message
Keyboard input or piped stdin. createUserMessage() wraps text in Anthropic format. Appended to conversation history (in-memory array). Attachments (images, PDFs) encoded as base64 content blocks.
2
Compaction phase
Before the API call, four strategies checked in order: snip (drop old messages entirely), microcompact (clear tool result content, cache-aware), autocompact (Claude summarizes conversation if tokens > threshold), context collapse (fold multiple messages). See slide 06 for deep dive.
3
System prompt assembled
Built from 10+ sources: role definition, tool schemas (with cache_control markers), CLAUDE.md files, MEMORY.md index, relevant memory files (Sonnet-selected), git context (branch, 5 commits, status ≤2000 chars), OS info, MCP instructions, hooks documentation, tips.
↓ streaming request
4
Stream to Claude API via SSE
Anthropic SDK sends BetaMessageStreamParams with system blocks (cache_control), messages, tools, thinking config, beta headers. Response tokens parsed incrementally, rendered to terminal in real-time. Fallback on 529: if FallbackTriggeredError, switch model and retry.
5
Tool execution — starts BEFORE stream ends
StreamingToolExecutor queues tool_use blocks as they arrive in the stream. Concurrent-safe tools (Read, Glob, Grep, etc.) start immediately in parallel (max 10). Non-concurrent tools (Bash, Edit, Write) wait for exclusive access.
Permission check → Pre-hooks → Execute → Post-hooks → Yield result
canUseTool() checks allow/deny rules. PreToolUse hooks can intercept. Tool runs. PostToolUse hooks can modify result. Result yielded as tool_result message.
↓ continuation decision
6
Continue, retry, or return
Decision tree based on stop_reason and error state (see below).

Continuation Decision Tree

ConditionActionMax
tool_use blocks presentLoop — run tools, send results, call API againmaxTurns
Prompt too longTrigger autocompact → retry1
Max output tokens exceededIncrease token limit → retry (floor 3000, safety buffer 1000)3
Media size errorReactive compact → retry1
529 capacity errorBackoff → retry (or fallback model after 3)3
max_turns reachedReturn { reason: 'max_turns_exceeded' }
end_turn (no tool_use)Return { reason: 'end_turn' }

Post-Sampling Hooks

After the model's final response (no more tool calls), background work fires:

A
Auto-compact check
If context approaching threshold (effectiveWindow - 13K buffer), schedules compaction for next turn.
B
Memory extraction (extractMemories)
Forked subagent analyzes the conversation and saves relevant memories. Fires every turn (gated by tengu_bramble_lintel, default: 1 = every turn).
C
Prompt suggestion + speculation
After 2+ turns, CHOMP generates next prompt suggestion. If speculation enabled (ANT-only), begins pre-computing the response. See slide 14.
D
Auto-dream (if eligible)
If ≥24h since last consolidation AND ≥5 sessions, autoDream fires for background memory consolidation. See slide 09.

Rendering Pipeline

How streaming tokens become terminal pixels.

React Components
389 component files
Custom DOM
ink/dom.ts
Yoga WASM
Flexbox layout
Frame Buffer
Double-buffered
Terminal
ANSI + mouse