API overview - routing.run docs

OpenAI-compatible and Anthropic-style endpoints on one base URL. Use /v1/chat/completions as the default path. routing.run handles model routing, failover, and retries behind a stable route/... model ID. In practice, most integrations should treat routing.run as an OpenAI-compatible gateway with a stable route/... model namespace.

Base URL

Primary:

https://api.routing.run

Secondary:

https://ai.routing.sh

Use https://api.routing.run by default. If it is slow or returning errors, switch to https://ai.routing.sh, a secondary endpoint hosted by the routing.run team. The same /v1 paths, model IDs, and API keys work on both hosts. All endpoints are prefixed with /v1.

Key concepts

The core model is simple:

You call one public endpoint.
You choose a route/... model ID.
routing.run decides which upstream to try first and when to fail over.

Routing chains

Each model has an ordered routing chain. You send one request to a route/... model ID, and routing.run handles fallback when an upstream fails or times out.

Circuit breakers

If an upstream fails 5 consecutive times, the circuit breaker opens and skips that upstream for 60 seconds. This prevents cascading failures and ensures fast responses.

Automatic retries

The router reads retry_count from routing config (default 2). It drives an outer attempt loop together with walking the routing chain on each attempt. If any message has role: tool, retries are disabled (retry_count forced to 0) and only the first chain entry is used (tool results must stay on the same upstream).

Message truncation

Before an upstream call, truncate_messages keeps all system messages, then adds the newest non-system messages (walking newest-first) until estimated input tokens fit a budget derived from the model’s configured context_size minus reserved output headroom, and until at most 50 non-system messages remain. If nothing fits, at least one non-system message is still attempted (available_tokens floor in code).

Endpoints

Method	Endpoint	Description
POST	`/v1/chat/completions`	Recommended default for apps, SDKs, and coding agents
POST	`/v1/messages`	Compatibility path for Anthropic-style messages
GET	`/v1/models`	Public model catalog. This exact path is anonymous, and authenticated callers currently see the same published catalog.
GET	`/v1/models/`	Model metadata (requires `X-API-Key` or Bearer)
POST	`/v1/embeddings`	Create OpenAI-compatible embeddings
POST	`/v1/rerank`	Rerank documents for search and RAG workflows
POST	`/v1/images/generations`	Generate images
POST	`/v1/audio/speech`	Convert text to speech
POST	`/v1/audio/transcriptions`	Transcribe audio files
GET	`/v1/pricing`	Per-model pricing table (requires middleware auth — any valid `rk_` or access JWT)
GET	`/v1/status`	Public health and service status
GET	`/v1/settings`	Public settings (public)

Rate limits

Per-account daily request limits and credits are enforced in the API and surfaced in the dashboard. When the daily cap is exceeded the server returns 429, body Daily request limit exceeded, header X-Error-Code: DAILY_REQUEST_LIMIT_EXCEEDED (plain text, not JSON — see Authentication). Separate per-IP limits apply to some paths (auth, /v1/user/key*, defaults); those return JSON with retry_after and Retry-After.

Next steps

https://mintcdn.com/routing/Bdepg-ZiTHSkFbP-/images/ai-tools/openai.svg?fit=max&auto=format&n=Bdepg-ZiTHSkFbP-&q=85&s=20abb0f26a0ce48b6bff9705347b8d49

OpenAI compatibility

Recommended default endpoint with OpenAI SDK support.

Audio

Generate speech and transcribe audio files.

Embeddings and rerank

Create embeddings and rerank documents through the same routing.run API key.

Documentation Index

​Base URL

​Key concepts

​Routing chains

​Circuit breakers

​Automatic retries

​Message truncation

​Endpoints

​Rate limits

​Next steps