Token Tracking¶
RAG Modulo includes comprehensive token tracking for cost monitoring and usage analytics.
Overview¶
Token tracking monitors LLM token usage across all operations:
- Search queries
- Conversation messages
- Chain of Thought reasoning steps
- Document summarization
Token Counting¶
Automatic Tracking¶
Tokens are automatically counted and logged for:
- Input tokens: Query + context sent to LLM
- Output tokens: Generated response
- Total tokens: Sum of input + output
Storage¶
Token counts are persisted to PostgreSQL:
{
"total_tokens": 1250,
"input_tokens": 800,
"output_tokens": 450,
"model": "gpt-4",
"timestamp": "2025-01-09T10:00:00Z"
}
Usage Monitoring¶
Per-User Tracking¶
# Get user token usage
./rag-cli users tokens user_123
# Get usage by date range
./rag-cli users tokens user_123 \
--start 2025-01-01 \
--end 2025-01-31
Per-Collection Tracking¶
API Access¶
Cost Estimation¶
Token costs vary by provider and model:
| Provider | Model | Input (per 1K) | Output (per 1K) |
|---|---|---|---|
| OpenAI | GPT-4 | $0.03 | $0.06 |
| OpenAI | GPT-3.5 | $0.001 | $0.002 |
| Anthropic | Claude 3 | $0.015 | $0.075 |
Calculate costs:
Configuration¶
Set token limits per user:
Warnings¶
Users receive warnings when approaching limits:
{
"warning": "Token usage at 85% of daily limit",
"current_usage": 85000,
"limit": 100000,
"remaining": 15000
}