14 / 20 Speculation System src/services/PromptSuggestion/speculation.ts (992 lines)

It thinks while you think.

While you're reading Claude's response, a background speculation system is already pre-computing the next turn. It uses an overlay filesystem so speculative work can't damage real files. When you accept the suggestion, the pre-computed result is injected instantly — saving seconds per turn.

992

Lines — speculation.ts is a substantial system

Max turns per speculation session

100

Max messages generated speculatively

/tmp

Overlay FS — isolated write sandbox

How Speculation Works — Step by Step

The system triggers when a prompt suggestion appears (after 2+ assistant turns). It forks a background agent to pre-compute what would happen if the user accepts that suggestion — running real API calls against the real Claude model.

Prompt suggestion appears

After 2+ assistant turns, the CHOMP system (tengu_chomp_inflection) generates a suggested next prompt. Suppressed during: permission prompts, plan mode, rate limits, elicitation.

Speculation starts — overlay filesystem created

A unique overlay directory is created at ~/.claude/tmp/speculation/[PID]/[8-char-uuid]/. All file writes will be redirected here. A fresh AbortController (child of the main one) is created for instant cancellation.

Forked agent runs — real API calls

Calls runForkedAgent() with the suggestion as prompt. This makes actual Claude API calls (not mocks) — costs real tokens. The agent inherits the current conversation context via cache-safe params for prompt cache hits.

↓ tool interception

Tool calls intercepted by permission layer

Every tool call passes through a custom canUseTool interceptor that enforces speculation-safe constraints (see next section).

Read-only tools → allowed directly

Read, Glob, Grep, LSP, ToolSearch, TaskGet, TaskList run against real filesystem.

Write tools → redirected to overlay

Edit, Write, NotebookEdit: original file copied to overlay first (copy-on-write), then write targets overlay path. Subsequent reads of that file also redirect to overlay.

Bash → read-only check, abort if not safe

checkReadOnlyConstraints() validates command is safe (ls, cat, git log). If not, speculation pauses at a bash boundary.

All other tools → denied

WebSearch, WebFetch, Agent, MCP, etc. — speculation aborts at a denied_tool boundary.

↓ completion or boundary

Speculation reaches a boundary or completes

Four possible outcomes tracked in CompletionBoundary: complete (full response with token count), bash (paused at unsafe command), edit (paused at file write needing permission), denied_tool (unsupported tool encountered).

Overlay Filesystem — Copy-on-Write Isolation

The key safety mechanism. All speculative writes go to an isolated temp directory. If speculation is aborted, the overlay is deleted — zero impact on real files.

Write Interception

When a write tool (Edit, Write, NotebookEdit) targets src/foo.ts:

Check if file already in overlay's writtenPaths set
If not: copy original from cwd/src/foo.ts to overlay/src/foo.ts
Add to writtenPaths set
Rewrite input path to point to overlay copy
Tool writes to overlay — real file untouched

Read Redirection

When a read tool targets a file that was previously written:

Check writtenPaths set for the relative path
If present: rewrite read path to overlay copy
If not: read from real filesystem (unchanged)

This means speculative edits are self-consistent — a Read after a speculative Write sees the speculated content, not the original.

Overlay path: ~/.claude/tmp/speculation/[PID]/[8-char-uuid]/ — cleaned up with 3 retries, 100ms delay each. Directory created with mkdir -p at speculation start.

Accept vs Abort

User accepts suggestion

Abort speculation (if still running)
Copy modified files from overlay → real filesystem
Delete overlay directory
Filter speculated messages: strip thinking blocks, orphaned tool calls, interrupt messages
Inject messages into conversation — appears instant to user
Extract file reads from speculated messages → merge into file cache
If speculation completed fully: no further API call needed
If speculation hit boundary: continue with normal query loop

User types something else

abortSpeculation() fires immediately
AbortController signals forked agent to stop
Delete overlay directory — all speculative changes gone
Log analytics event: tengu_speculation with outcome "aborted"
State → idle. Zero impact on conversation.

Pipelined Speculation — Two Steps Ahead

When the first speculation completes fully (no boundary hit), the system immediately starts a second speculation for the NEXT turn — pre-computing two steps ahead.

Speculation 1 — user's current suggestion

↓ completes (boundary.type = 'complete')

Pipelined Speculation 2 — next suggestion, runs in background while user decides on Spec 1

↓ user accepts Spec 1

Spec 2 result already ready — inject immediately for the turn after

Cost implication: Speculative API calls are real API calls counted against usage. This is not free — it trades token cost for latency reduction. Anthropic internal users see feedback: "Speculated 3 tool uses · 1,247 tokens · +2.3s saved (4.7s this session)"

Permission Mode Interaction

Speculation's write behavior depends on the user's permission mode.

Mode	Write tools during speculation	Effect
`acceptEdits`	Allowed (via overlay)	Full speculation with file edits
`bypassPermissions`	Allowed (via overlay)	Full speculation with file edits
`default`	Blocked — hits boundary	Speculation pauses at first edit, user must approve
`plan`	Blocked — hits boundary	Speculation pauses at first edit

Currently ANT-only: isSpeculationEnabled() checks process.env.USER_TYPE === 'ant'. External users don't have speculation yet. Toggleable via speculationEnabled in global config.

State Machine

idle — no speculation running

↓ prompt suggestion + startSpeculation()

active — id, abort(), messagesRef, writtenPathsRef, boundary, startTime, toolUseCount, isPipelined, contextRef

↓ user types

abortSpeculation() → cleanup → idle

↓ user accepts

acceptSpeculation() → copy files → inject messages → idle

Active State Fields

Field	Type	Purpose
`id`	string (8 chars)	Unique speculation session ID
`abort`	() => void	Kill switch — signals forked agent to stop
`messagesRef`	Mutable ref	Accumulates generated messages in real-time
`writtenPathsRef`	Set<string>	Relative paths written to overlay
`boundary`	CompletionBoundary \| null	What stopped speculation (complete, bash, edit, denied_tool)
`isPipelined`	boolean	Second-order speculation flag
`pipelinedSuggestion`	object \| null	Next-turn suggestion for pipelined spec

Analytics

Every speculation logs a tengu_speculation event with outcome, duration, tools executed, boundary type, and message count.

Metric	Description
`speculation_id`	8-char UUID
`outcome`	accepted / aborted / error
`duration_ms`	How long speculation ran
`tools_executed`	Count of successful tool calls
`boundary_type`	complete / bash / edit / denied_tool / null
`is_pipelined`	Whether this was a second-order speculation