14 / 20 Speculation System src/services/PromptSuggestion/speculation.ts (992 lines)

It thinks while you think.

While you're reading Claude's response, a background speculation system is already pre-computing the next turn. It uses an overlay filesystem so speculative work can't damage real files. When you accept the suggestion, the pre-computed result is injected instantly — saving seconds per turn.

992
Lines — speculation.ts is a substantial system
20
Max turns per speculation session
100
Max messages generated speculatively
/tmp
Overlay FS — isolated write sandbox

How Speculation Works — Step by Step

The system triggers when a prompt suggestion appears (after 2+ assistant turns). It forks a background agent to pre-compute what would happen if the user accepts that suggestion — running real API calls against the real Claude model.

1
Prompt suggestion appears
After 2+ assistant turns, the CHOMP system (tengu_chomp_inflection) generates a suggested next prompt. Suppressed during: permission prompts, plan mode, rate limits, elicitation.
2
Speculation starts — overlay filesystem created
A unique overlay directory is created at ~/.claude/tmp/speculation/[PID]/[8-char-uuid]/. All file writes will be redirected here. A fresh AbortController (child of the main one) is created for instant cancellation.
3
Forked agent runs — real API calls
Calls runForkedAgent() with the suggestion as prompt. This makes actual Claude API calls (not mocks) — costs real tokens. The agent inherits the current conversation context via cache-safe params for prompt cache hits.
↓ tool interception
4
Tool calls intercepted by permission layer
Every tool call passes through a custom canUseTool interceptor that enforces speculation-safe constraints (see next section).
R
Read-only tools → allowed directly
Read, Glob, Grep, LSP, ToolSearch, TaskGet, TaskList run against real filesystem.
W
Write tools → redirected to overlay
Edit, Write, NotebookEdit: original file copied to overlay first (copy-on-write), then write targets overlay path. Subsequent reads of that file also redirect to overlay.
!
Bash → read-only check, abort if not safe
checkReadOnlyConstraints() validates command is safe (ls, cat, git log). If not, speculation pauses at a bash boundary.
All other tools → denied
WebSearch, WebFetch, Agent, MCP, etc. — speculation aborts at a denied_tool boundary.
↓ completion or boundary
5
Speculation reaches a boundary or completes
Four possible outcomes tracked in CompletionBoundary: complete (full response with token count), bash (paused at unsafe command), edit (paused at file write needing permission), denied_tool (unsupported tool encountered).

Overlay Filesystem — Copy-on-Write Isolation

The key safety mechanism. All speculative writes go to an isolated temp directory. If speculation is aborted, the overlay is deleted — zero impact on real files.

Write Interception

When a write tool (Edit, Write, NotebookEdit) targets src/foo.ts:

  1. Check if file already in overlay's writtenPaths set
  2. If not: copy original from cwd/src/foo.ts to overlay/src/foo.ts
  3. Add to writtenPaths set
  4. Rewrite input path to point to overlay copy
  5. Tool writes to overlay — real file untouched

Read Redirection

When a read tool targets a file that was previously written:

  1. Check writtenPaths set for the relative path
  2. If present: rewrite read path to overlay copy
  3. If not: read from real filesystem (unchanged)

This means speculative edits are self-consistent — a Read after a speculative Write sees the speculated content, not the original.

Overlay path: ~/.claude/tmp/speculation/[PID]/[8-char-uuid]/ — cleaned up with 3 retries, 100ms delay each. Directory created with mkdir -p at speculation start.

Accept vs Abort

User accepts suggestion

  1. Abort speculation (if still running)
  2. Copy modified files from overlay → real filesystem
  3. Delete overlay directory
  4. Filter speculated messages: strip thinking blocks, orphaned tool calls, interrupt messages
  5. Inject messages into conversation — appears instant to user
  6. Extract file reads from speculated messages → merge into file cache
  7. If speculation completed fully: no further API call needed
  8. If speculation hit boundary: continue with normal query loop

User types something else

  1. abortSpeculation() fires immediately
  2. AbortController signals forked agent to stop
  3. Delete overlay directory — all speculative changes gone
  4. Log analytics event: tengu_speculation with outcome "aborted"
  5. State → idle. Zero impact on conversation.

Pipelined Speculation — Two Steps Ahead

When the first speculation completes fully (no boundary hit), the system immediately starts a second speculation for the NEXT turn — pre-computing two steps ahead.

Speculation 1 — user's current suggestion
↓ completes (boundary.type = 'complete')
Pipelined Speculation 2 — next suggestion, runs in background while user decides on Spec 1
↓ user accepts Spec 1
Spec 2 result already ready — inject immediately for the turn after
Cost implication: Speculative API calls are real API calls counted against usage. This is not free — it trades token cost for latency reduction. Anthropic internal users see feedback: "Speculated 3 tool uses · 1,247 tokens · +2.3s saved (4.7s this session)"

Permission Mode Interaction

Speculation's write behavior depends on the user's permission mode.

ModeWrite tools during speculationEffect
acceptEditsAllowed (via overlay)Full speculation with file edits
bypassPermissionsAllowed (via overlay)Full speculation with file edits
defaultBlocked — hits boundarySpeculation pauses at first edit, user must approve
planBlocked — hits boundarySpeculation pauses at first edit
Currently ANT-only: isSpeculationEnabled() checks process.env.USER_TYPE === 'ant'. External users don't have speculation yet. Toggleable via speculationEnabled in global config.

State Machine

idle — no speculation running
↓ prompt suggestion + startSpeculation()
active — id, abort(), messagesRef, writtenPathsRef, boundary, startTime, toolUseCount, isPipelined, contextRef
↓ user types
abortSpeculation() → cleanup → idle
↓ user accepts
acceptSpeculation() → copy files → inject messages → idle

Active State Fields

FieldTypePurpose
idstring (8 chars)Unique speculation session ID
abort() => voidKill switch — signals forked agent to stop
messagesRefMutable refAccumulates generated messages in real-time
writtenPathsRefSet<string>Relative paths written to overlay
boundaryCompletionBoundary | nullWhat stopped speculation (complete, bash, edit, denied_tool)
isPipelinedbooleanSecond-order speculation flag
pipelinedSuggestionobject | nullNext-turn suggestion for pipelined spec

Analytics

Every speculation logs a tengu_speculation event with outcome, duration, tools executed, boundary type, and message count.

MetricDescription
speculation_id8-char UUID
outcomeaccepted / aborted / error
duration_msHow long speculation ran
tools_executedCount of successful tool calls
boundary_typecomplete / bash / edit / denied_tool / null
is_pipelinedWhether this was a second-order speculation