Token counting uses a hybrid approach: API usage data when available, character/4 heuristic for new messages. Autocompact triggers at effectiveWindow - 13K buffer. Claude summarizes old messages in a single maxTurns=1 call with NO tools allowed. Circuit breaker after 3 consecutive failures.
Not a simple tokenizer call — a hybrid system designed for accuracy during streaming with parallel tool calls.
input_tokens from response headers).message.id. Function walks forward to first sibling to include all interleaved results.roughTokenCountEstimationForMessages() — uses character/4 ≈ token heuristic with 4/3× conservative padding for messages added since the last API call returned usage data.messages.countTokens() on the Anthropic SDK — used for precise budget checks. But the hybrid estimation is faster for the hot-path threshold check.Override: CLAUDE_AUTOCOMPACT_PCT_OVERRIDE (0-100%) allows % of effective window instead of fixed 13K buffer
Applied in order. Each layer is independent — they don't all fire every turn.
Feature-gated (HISTORY_SNIP). Removes messages above a boundary. Cheapest option — no API call. The snipTokensFreed parameter accounts for freed tokens that the API response usage data still reflects.
Time-based: If gap since last assistant message > 60min (server's cache TTL), clear all compactable tool results except last 5. Saves ~20-100K tokens.
Cached (ANT-only): Queue cache_edits for API to delete tool results without invalidating cached prefix.
Compactable tools: Read, Bash, Shell, Grep, Glob, WebSearch, WebFetch, Edit, Write.
Fires when tokenCountWithEstimation(messages) ≥ threshold. Calls Claude with maxTurns=1 and NO tools allowed (prevents slow streaming tool calls). Summary constrained to <analysis> (scratchpad, stripped) + <summary> blocks. Frees ~40-60% of context.
Feature-gated (CONTEXT_COLLAPSE). Mutually exclusive with autocompact — only one system active. Alternative approach that summarizes message groups rather than the whole conversation.
The compaction prompt (prompt.ts, 302 lines) is carefully designed to preserve critical information.
After compaction, a special marker is inserted into the conversation history.
| Field | Value |
|---|---|
type | 'system' |
subtype | 'compact_boundary' |
content | 'Conversation compacted' |
trigger | 'manual' | 'auto' |
preTokens | Context size before compaction |
messagesSummarized | Count of summarized messages |
preservedSegment | UUIDs of head/anchor/tail (partial compact) |
| Setting / Env Var | Effect |
|---|---|
autoCompactEnabled | Toggle auto-compact on/off (default: true) |
/compact | Manual trigger — always available even if auto is off |
DISABLE_COMPACT | Disables ALL compaction (auto + manual) |
DISABLE_AUTO_COMPACT | Disables auto only — /compact still works |
CLAUDE_CODE_AUTO_COMPACT_WINDOW | Override effective context window size |
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE | Trigger at X% of window instead of 13K buffer |
| Custom compaction instructions | Inject via settings or hooks — merged with base prompt |