07 / 20 API Layer & Providers src/services/api/client.ts · src/utils/model/providers.ts

4 providers. Same Claude, different infrastructure.

Claude Code talks to the same Claude model through 4 cloud providers — not different models, but different infrastructure wrappers. Enterprise customers route through their own cloud for data residency, compliance, and billing integration. Each has its own credential flow, region config, and feature limitations.

Providers — same model, different infrastructure

Max retries — exponential backoff (500ms base)

Prompt cache TTL — for subscribers

Why 4 Providers Exist

It's not about different models. It's about where the API call lands — which cloud, which region, whose billing, whose compliance boundary.

Anthropic Direct (firstParty)

Use when: You want full feature parity, team memory sync, prompt caching, and are comfortable with API key management or Claude.ai OAuth.

Auth: ANTHROPIC_API_KEY or OAuth tokens (Claude.ai subscribers).
Features: Full — team memory sync, prompt cache, batch API, custom headers, client request ID tracking.

AWS Bedrock

Use when: Your org standardizes on AWS, needs data to stay in VPC, has IAM/organization setup, pre-existing AWS billing.

Auth: AWS IAM default credential chain or AWS_BEARER_TOKEN_BEDROCK. Per-model region overrides (ANTHROPIC_SMALL_FAST_MODEL_AWS_REGION).
Missing: Team memory sync, client request IDs.

Google Vertex AI

Use when: Your org standardizes on GCP, needs model-specific region affinity, has GCP managed identity.

Auth: GCP Application Default Credentials via GoogleAuth. Per-model region overrides (VERTEX_REGION_CLAUDE_3_5_HAIKU, etc.).
Missing: Team memory sync. 12s metadata server timeout risk if project ID not set.

Azure Foundry

Use when: Your org standardizes on Azure, needs Azure AD integration, managed identity compliance.

Auth: ANTHROPIC_FOUNDRY_API_KEY OR Azure AD (DefaultAzureCredential → getBearerTokenProvider).
Missing: Team memory sync, likely limited prompt cache + batch.

Feature Matrix

Feature	Anthropic	Bedrock	Vertex	Foundry
Team Memory Sync	✓ OAuth	✕	✕	✕
Prompt Cache	✓	✓	✓	?
Custom Headers	✓	ignored	ignored	ignored
Client Request ID	✓	✕	✕	✕
Telemetry (Datadog)	✓	disabled	disabled	disabled
Data Residency	Anthropic CDN	AWS region	GCP region	Azure region

Retry Strategy

10 max retries with exponential backoff. Separate handling for 529 overloaded (3 max, then fallback model). Background queries bail immediately on 529 to prevent retry amplification.

Error	Strategy	Max
401 token expired	OAuth refresh → retry	1
403 token revoked	OAuth refresh → retry	1
429 rate limited	Backoff with `retry-after` header	10
529 overloaded (foreground)	Backoff → fallback model after 3	3
529 overloaded (background)	Bail immediately — no retry	0
ECONNRESET / EPIPE	Disable keep-alive → retry	1
Prompt too long	Autocompact → retry	1
Max output tokens	Increase limit (floor 3K, buffer 1K) → retry	3
Bedrock auth error	Refresh AWS credentials → retry	1
Vertex auth error	Refresh GCP credentials → retry	1

Backoff formula: min(500ms × 2^(attempt-1), 32s) + random(0, 25% of base). Server-specified retry-after header overrides the calculation. Unattended mode (CLAUDE_CODE_UNATTENDED_RETRY) retries forever with 5min max backoff.

Fast Mode

Same model (Opus 4.6), faster output. NOT a model switch. Has cooldown and permanent-disable states.

ENABLED

→ 429 →

COOLDOWN
10 minutes

→

ENABLED

PERMANENTLY DISABLED
overage-disabled header or API rejects param