07 / 20 API Layer & Providers src/services/api/client.ts · src/utils/model/providers.ts

4 providers. Same Claude, different infrastructure.

Claude Code talks to the same Claude model through 4 cloud providers — not different models, but different infrastructure wrappers. Enterprise customers route through their own cloud for data residency, compliance, and billing integration. Each has its own credential flow, region config, and feature limitations.

4
Providers — same model, different infrastructure
10
Max retries — exponential backoff (500ms base)
1h
Prompt cache TTL — for subscribers

Why 4 Providers Exist

It's not about different models. It's about where the API call lands — which cloud, which region, whose billing, whose compliance boundary.

Anthropic Direct (firstParty)

Use when: You want full feature parity, team memory sync, prompt caching, and are comfortable with API key management or Claude.ai OAuth.

Auth: ANTHROPIC_API_KEY or OAuth tokens (Claude.ai subscribers).
Features: Full — team memory sync, prompt cache, batch API, custom headers, client request ID tracking.

AWS Bedrock

Use when: Your org standardizes on AWS, needs data to stay in VPC, has IAM/organization setup, pre-existing AWS billing.

Auth: AWS IAM default credential chain or AWS_BEARER_TOKEN_BEDROCK. Per-model region overrides (ANTHROPIC_SMALL_FAST_MODEL_AWS_REGION).
Missing: Team memory sync, client request IDs.

Google Vertex AI

Use when: Your org standardizes on GCP, needs model-specific region affinity, has GCP managed identity.

Auth: GCP Application Default Credentials via GoogleAuth. Per-model region overrides (VERTEX_REGION_CLAUDE_3_5_HAIKU, etc.).
Missing: Team memory sync. 12s metadata server timeout risk if project ID not set.

Azure Foundry

Use when: Your org standardizes on Azure, needs Azure AD integration, managed identity compliance.

Auth: ANTHROPIC_FOUNDRY_API_KEY OR Azure AD (DefaultAzureCredentialgetBearerTokenProvider).
Missing: Team memory sync, likely limited prompt cache + batch.

Feature Matrix

FeatureAnthropicBedrockVertexFoundry
Team Memory Sync✓ OAuth
Prompt Cache?
Custom Headersignoredignoredignored
Client Request ID
Telemetry (Datadog)disableddisableddisabled
Data ResidencyAnthropic CDNAWS regionGCP regionAzure region

Retry Strategy

10 max retries with exponential backoff. Separate handling for 529 overloaded (3 max, then fallback model). Background queries bail immediately on 529 to prevent retry amplification.

ErrorStrategyMax
401 token expiredOAuth refresh → retry1
403 token revokedOAuth refresh → retry1
429 rate limitedBackoff with retry-after header10
529 overloaded (foreground)Backoff → fallback model after 33
529 overloaded (background)Bail immediately — no retry0
ECONNRESET / EPIPEDisable keep-alive → retry1
Prompt too longAutocompact → retry1
Max output tokensIncrease limit (floor 3K, buffer 1K) → retry3
Bedrock auth errorRefresh AWS credentials → retry1
Vertex auth errorRefresh GCP credentials → retry1
Backoff formula: min(500ms × 2^(attempt-1), 32s) + random(0, 25% of base). Server-specified retry-after header overrides the calculation. Unattended mode (CLAUDE_CODE_UNATTENDED_RETRY) retries forever with 5min max backoff.

Fast Mode

Same model (Opus 4.6), faster output. NOT a model switch. Has cooldown and permanent-disable states.

ENABLED
429
COOLDOWN
10 minutes
ENABLED
PERMANENTLY DISABLED
overage-disabled header or API rejects param