The conversation loop is an async generator — the REPL consumes streaming events via for await. Tools start executing while the API response is still streaming. Post-sampling hooks run compaction, memory extraction, and optional dream mode.
Every turn follows this exact sequence. The loop continues as long as the model returns tool_use blocks.
createUserMessage() wraps text in Anthropic format. Appended to conversation history (in-memory array). Attachments (images, PDFs) encoded as base64 content blocks.BetaMessageStreamParams with system blocks (cache_control), messages, tools, thinking config, beta headers. Response tokens parsed incrementally, rendered to terminal in real-time. Fallback on 529: if FallbackTriggeredError, switch model and retry.StreamingToolExecutor queues tool_use blocks as they arrive in the stream. Concurrent-safe tools (Read, Glob, Grep, etc.) start immediately in parallel (max 10). Non-concurrent tools (Bash, Edit, Write) wait for exclusive access.canUseTool() checks allow/deny rules. PreToolUse hooks can intercept. Tool runs. PostToolUse hooks can modify result. Result yielded as tool_result message.| Condition | Action | Max |
|---|---|---|
| tool_use blocks present | Loop — run tools, send results, call API again | maxTurns |
| Prompt too long | Trigger autocompact → retry | 1 |
| Max output tokens exceeded | Increase token limit → retry (floor 3000, safety buffer 1000) | 3 |
| Media size error | Reactive compact → retry | 1 |
| 529 capacity error | Backoff → retry (or fallback model after 3) | 3 |
| max_turns reached | Return { reason: 'max_turns_exceeded' } | — |
| end_turn (no tool_use) | Return { reason: 'end_turn' } | — |
After the model's final response (no more tool calls), background work fires:
tengu_bramble_lintel, default: 1 = every turn).How streaming tokens become terminal pixels.