LLM Provider Configuration¶
WikiMind supports multiple LLM providers with automatic selection, fallback, and cost tracking.
Supported Providers¶
| Provider | Default Model | API Key Variable | Notes |
|---|---|---|---|
| Anthropic | claude-sonnet-4-5 | ANTHROPIC_API_KEY | Default provider |
| OpenAI | gpt-4o | OPENAI_API_KEY | |
| Google Gemini | gemini-2.0-flash | GOOGLE_API_KEY | |
| Ollama | llama3.2 | -- | Local, no key needed |
| Mock | mock-1 | -- | CI/testing only |
Auto-Enable¶
Providers are auto-enabled when their API key is detected. You do not need to manually set ENABLED=true. Just set the key:
The auto-enable logic checks three sources:
- Environment variable (with or without
WIKIMIND_prefix) .envfile- OS keychain (via
keyring)
To explicitly disable a provider whose key is set:
Provider Selection¶
The LLM router selects providers in this order:
- Preferred provider -- If the request specifies one
- Default provider -- Configured via
WIKIMIND_LLM__DEFAULT_PROVIDER - Fallback providers -- All other enabled providers, in order
flowchart TD
A["LLM Request"] --> B{Preferred<br/>provider?}
B -- Yes --> C{Available?}
B -- No --> D["Default provider"]
C -- Yes --> E["Call provider"]
C -- No --> D
D --> F{Available?}
F -- Yes --> E
F -- No --> G{Fallback<br/>enabled?}
G -- Yes --> H["Try next"]
G -- No --> I["Error"]
H --> F
E --> J{Success?}
J -- Yes --> K["Return response"]
J -- No --> G Fallback¶
When WIKIMIND_LLM__FALLBACK_ENABLED=true (default), failed providers are skipped and the next available provider is tried. When disabled, the first failure raises an error.
Model Override¶
Override the default model for any provider:
WIKIMIND_LLM__ANTHROPIC__MODEL=claude-haiku-4-5-20251001
WIKIMIND_LLM__OPENAI__MODEL=gpt-4o-mini
WIKIMIND_LLM__GOOGLE__MODEL=gemini-2.0-flash
WIKIMIND_LLM__OLLAMA__MODEL=mistral
Ollama (Local Models)¶
Ollama does not require an API key but must be explicitly enabled:
WIKIMIND_LLM__OLLAMA__ENABLED=true
WIKIMIND_LLM__DEFAULT_PROVIDER=ollama
WIKIMIND_LLM__OLLAMA__MODEL=llama3.2
WIKIMIND_LLM__OLLAMA_BASE_URL=http://localhost:11434
Make sure Ollama is running locally with the model pulled:
Cost Tracking¶
Every LLM call is logged with:
- Provider and model used
- Input and output token counts
- Calculated USD cost
- Latency in milliseconds
- Task type (compile, qa, ingest, lint)
Pricing¶
Current pricing (USD per 1M tokens):
| Provider | Model | Input | Output |
|---|---|---|---|
| Anthropic | claude-sonnet-4-5 | $3.00 | $15.00 |
| Anthropic | claude-haiku-4-5-20251001 | $0.80 | $4.00 |
| OpenAI | gpt-4o | $2.50 | $10.00 |
| OpenAI | gpt-4o-mini | $0.15 | $0.60 |
| gemini-2.0-flash | $0.10 | $0.40 | |
| Ollama | * | $0.00 | $0.00 |
Budget Management¶
Set a monthly spending ceiling:
When spending reaches 80% of the budget, a WebSocket budget.warning event is emitted. At 100%, a budget.exceeded event fires. These are displayed in the UI.
View current spending via the API:
Mock Provider (Testing)¶
For CI and local testing without real API calls:
The mock provider returns deterministic canned responses for all task types. It never makes network calls and costs nothing.
Do not use in production
The mock provider returns static responses that do not reflect your actual source content.
API Key Storage¶
API keys can be stored in three locations (checked in order):
- Environment variable --
ANTHROPIC_API_KEY=sk-ant-...(with or withoutWIKIMIND_prefix) .envfile -- Loaded automatically by Pydantic BaseSettings- OS keychain -- Stored via the
keyringlibrary
Store a key in the keychain:
Keys stored as SecretStr are never logged in plaintext.
Runtime Preferences¶
Some LLM settings can be changed at runtime without restarting:
# Change default provider
curl -X PUT http://localhost:7842/settings/preferences \
-H "Content-Type: application/json" \
-d '{"key": "llm.default_provider", "value": "openai"}'
# Change monthly budget
curl -X PUT http://localhost:7842/settings/preferences \
-H "Content-Type: application/json" \
-d '{"key": "llm.monthly_budget_usd", "value": "100.0"}'
Runtime preferences are stored in the database and applied on startup.