Claude Code talks to the same Claude model through 4 cloud providers — not different models, but different infrastructure wrappers. Enterprise customers route through their own cloud for data residency, compliance, and billing integration. Each has its own credential flow, region config, and feature limitations.
It's not about different models. It's about where the API call lands — which cloud, which region, whose billing, whose compliance boundary.
Use when: You want full feature parity, team memory sync, prompt caching, and are comfortable with API key management or Claude.ai OAuth.
Auth: ANTHROPIC_API_KEY or OAuth tokens (Claude.ai subscribers).
Features: Full — team memory sync, prompt cache, batch API, custom headers, client request ID tracking.
Use when: Your org standardizes on AWS, needs data to stay in VPC, has IAM/organization setup, pre-existing AWS billing.
Auth: AWS IAM default credential chain or AWS_BEARER_TOKEN_BEDROCK. Per-model region overrides (ANTHROPIC_SMALL_FAST_MODEL_AWS_REGION).
Missing: Team memory sync, client request IDs.
Use when: Your org standardizes on GCP, needs model-specific region affinity, has GCP managed identity.
Auth: GCP Application Default Credentials via GoogleAuth. Per-model region overrides (VERTEX_REGION_CLAUDE_3_5_HAIKU, etc.).
Missing: Team memory sync. 12s metadata server timeout risk if project ID not set.
Use when: Your org standardizes on Azure, needs Azure AD integration, managed identity compliance.
Auth: ANTHROPIC_FOUNDRY_API_KEY OR Azure AD (DefaultAzureCredential → getBearerTokenProvider).
Missing: Team memory sync, likely limited prompt cache + batch.
| Feature | Anthropic | Bedrock | Vertex | Foundry |
|---|---|---|---|---|
| Team Memory Sync | ✓ OAuth | ✕ | ✕ | ✕ |
| Prompt Cache | ✓ | ✓ | ✓ | ? |
| Custom Headers | ✓ | ignored | ignored | ignored |
| Client Request ID | ✓ | ✕ | ✕ | ✕ |
| Telemetry (Datadog) | ✓ | disabled | disabled | disabled |
| Data Residency | Anthropic CDN | AWS region | GCP region | Azure region |
10 max retries with exponential backoff. Separate handling for 529 overloaded (3 max, then fallback model). Background queries bail immediately on 529 to prevent retry amplification.
| Error | Strategy | Max |
|---|---|---|
| 401 token expired | OAuth refresh → retry | 1 |
| 403 token revoked | OAuth refresh → retry | 1 |
| 429 rate limited | Backoff with retry-after header | 10 |
| 529 overloaded (foreground) | Backoff → fallback model after 3 | 3 |
| 529 overloaded (background) | Bail immediately — no retry | 0 |
| ECONNRESET / EPIPE | Disable keep-alive → retry | 1 |
| Prompt too long | Autocompact → retry | 1 |
| Max output tokens | Increase limit (floor 3K, buffer 1K) → retry | 3 |
| Bedrock auth error | Refresh AWS credentials → retry | 1 |
| Vertex auth error | Refresh GCP credentials → retry | 1 |
min(500ms × 2^(attempt-1), 32s) + random(0, 25% of base). Server-specified retry-after header overrides the calculation. Unattended mode (CLAUDE_CODE_UNATTENDED_RETRY) retries forever with 5min max backoff.Same model (Opus 4.6), faster output. NOT a model switch. Has cooldown and permanent-disable states.