03 / 20 Query Loop src/query.ts (1,729 lines) — async generator

Input → API → Tools → Loop.

The conversation loop is an async generator — the REPL consumes streaming events via for await. Tools start executing while the API response is still streaming. Post-sampling hooks run compaction, memory extraction, and optional dream mode.

1,729

Lines in query.ts — the heart of the system

async*

Generator — yields StreamEvent, Message, Terminal

Ctrl+C

Interrupt — aborts tools, preserves conversation

The Core Cycle

Every turn follows this exact sequence. The loop continues as long as the model returns tool_use blocks.

User sends message

Keyboard input or piped stdin. createUserMessage() wraps text in Anthropic format. Appended to conversation history (in-memory array). Attachments (images, PDFs) encoded as base64 content blocks.

Compaction phase

Before the API call, four strategies checked in order: snip (drop old messages entirely), microcompact (clear tool result content, cache-aware), autocompact (Claude summarizes conversation if tokens > threshold), context collapse (fold multiple messages). See slide 06 for deep dive.

System prompt assembled

Built from 10+ sources: role definition, tool schemas (with cache_control markers), CLAUDE.md files, MEMORY.md index, relevant memory files (Sonnet-selected), git context (branch, 5 commits, status ≤2000 chars), OS info, MCP instructions, hooks documentation, tips.

↓ streaming request

Stream to Claude API via SSE

Anthropic SDK sends BetaMessageStreamParams with system blocks (cache_control), messages, tools, thinking config, beta headers. Response tokens parsed incrementally, rendered to terminal in real-time. Fallback on 529: if FallbackTriggeredError, switch model and retry.

Tool execution — starts BEFORE stream ends

StreamingToolExecutor queues tool_use blocks as they arrive in the stream. Concurrent-safe tools (Read, Glob, Grep, etc.) start immediately in parallel (max 10). Non-concurrent tools (Bash, Edit, Write) wait for exclusive access.

✓

Permission check → Pre-hooks → Execute → Post-hooks → Yield result

canUseTool() checks allow/deny rules. PreToolUse hooks can intercept. Tool runs. PostToolUse hooks can modify result. Result yielded as tool_result message.

↓ continuation decision

Continue, retry, or return

Decision tree based on stop_reason and error state (see below).

Continuation Decision Tree

Condition	Action	Max
tool_use blocks present	Loop — run tools, send results, call API again	maxTurns
Prompt too long	Trigger autocompact → retry	1
Max output tokens exceeded	Increase token limit → retry (floor 3000, safety buffer 1000)	3
Media size error	Reactive compact → retry	1
529 capacity error	Backoff → retry (or fallback model after 3)	3
max_turns reached	Return `{ reason: 'max_turns_exceeded' }`	—
end_turn (no tool_use)	Return `{ reason: 'end_turn' }`	—

Post-Sampling Hooks

After the model's final response (no more tool calls), background work fires:

Auto-compact check

If context approaching threshold (effectiveWindow - 13K buffer), schedules compaction for next turn.

Memory extraction (extractMemories)

Forked subagent analyzes the conversation and saves relevant memories. Fires every turn (gated by tengu_bramble_lintel, default: 1 = every turn).

Prompt suggestion + speculation

After 2+ turns, CHOMP generates next prompt suggestion. If speculation enabled (ANT-only), begins pre-computing the response. See slide 14.

Auto-dream (if eligible)

If ≥24h since last consolidation AND ≥5 sessions, autoDream fires for background memory consolidation. See slide 09.

Rendering Pipeline

How streaming tokens become terminal pixels.

React Components
389 component files

→

Custom DOM
ink/dom.ts

→

Yoga WASM
Flexbox layout

→

Frame Buffer
Double-buffered

→

Terminal
ANSI + mouse