OpenAI-compatible and Anthropic-style endpoints on one base URL. UseDocumentation Index
Fetch the complete documentation index at: https://docs.routing.run/llms.txt
Use this file to discover all available pages before exploring further.
/v1/chat/completions as the default path. routing.run handles model routing, failover, and retries behind a stable route/... model ID.
In practice, most integrations should treat routing.run as an OpenAI-compatible gateway with a stable route/... model namespace.
Base URL
Primary:https://api.routing.run by default. If it is slow or returning errors, switch to https://ai.routing.sh, a secondary endpoint hosted by the routing.run team. The same /v1 paths, model IDs, and API keys work on both hosts.
All endpoints are prefixed with /v1.
Key concepts
The core model is simple:- You call one public endpoint.
- You choose a
route/...model ID. - routing.run decides which upstream to try first and when to fail over.
Routing chains
Each model has an ordered routing chain. You send one request to aroute/... model ID, and routing.run handles fallback when an upstream fails or times out.
Circuit breakers
If an upstream fails 5 consecutive times, the circuit breaker opens and skips that upstream for 60 seconds. This prevents cascading failures and ensures fast responses.Automatic retries
The router readsretry_count from routing config (default 2). It drives an outer attempt loop together with walking the routing chain on each attempt. If any message has role: tool, retries are disabled (retry_count forced to 0) and only the first chain entry is used (tool results must stay on the same upstream).
Message truncation
Before an upstream call,truncate_messages keeps all system messages, then adds the newest non-system messages (walking newest-first) until estimated input tokens fit a budget derived from the model’s configured context_size minus reserved output headroom, and until at most 50 non-system messages remain. If nothing fits, at least one non-system message is still attempted (available_tokens floor in code).
Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /v1/chat/completions | Recommended default for apps, SDKs, and coding agents |
| POST | /v1/messages | Compatibility path for Anthropic-style messages |
| GET | /v1/models | Public model catalog. This exact path is anonymous, and authenticated callers currently see the same published catalog. |
| GET | /v1/models/ | Model metadata (requires X-API-Key or Bearer) |
| POST | /v1/embeddings | Create OpenAI-compatible embeddings |
| POST | /v1/rerank | Rerank documents for search and RAG workflows |
| POST | /v1/images/generations | Generate images |
| POST | /v1/audio/speech | Convert text to speech |
| POST | /v1/audio/transcriptions | Transcribe audio files |
| GET | /v1/pricing | Per-model pricing table (requires middleware auth — any valid rk_ or access JWT) |
| GET | /v1/status | Public health and service status |
| GET | /v1/settings | Public settings (public) |
Rate limits
Per-account daily request limits and credits are enforced in the API and surfaced in the dashboard. When the daily cap is exceeded the server returns 429, bodyDaily request limit exceeded, header X-Error-Code: DAILY_REQUEST_LIMIT_EXCEEDED (plain text, not JSON — see Authentication).
Separate per-IP limits apply to some paths (auth, /v1/user/key*, defaults); those return JSON with retry_after and Retry-After.
Next steps
OpenAI compatibility
Recommended default endpoint with OpenAI SDK support.
Audio
Generate speech and transcribe audio files.
Embeddings and rerank
Create embeddings and rerank documents through the same routing.run API key.