Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.routing.run/llms.txt

Use this file to discover all available pages before exploring further.

OpenAI-compatible and Anthropic-style endpoints on one base URL. Use /v1/chat/completions as the default path. routing.run handles model routing, failover, and retries behind a stable route/... model ID. In practice, most integrations should treat routing.run as an OpenAI-compatible gateway with a stable route/... model namespace.

Base URL

Primary:
https://api.routing.run
Secondary:
https://ai.routing.sh
Use https://api.routing.run by default. If it is slow or returning errors, switch to https://ai.routing.sh, a secondary endpoint hosted by the routing.run team. The same /v1 paths, model IDs, and API keys work on both hosts. All endpoints are prefixed with /v1.

Key concepts

The core model is simple:
  • You call one public endpoint.
  • You choose a route/... model ID.
  • routing.run decides which upstream to try first and when to fail over.

Routing chains

Each model has an ordered routing chain. You send one request to a route/... model ID, and routing.run handles fallback when an upstream fails or times out.

Circuit breakers

If an upstream fails 5 consecutive times, the circuit breaker opens and skips that upstream for 60 seconds. This prevents cascading failures and ensures fast responses.

Automatic retries

The router reads retry_count from routing config (default 2). It drives an outer attempt loop together with walking the routing chain on each attempt. If any message has role: tool, retries are disabled (retry_count forced to 0) and only the first chain entry is used (tool results must stay on the same upstream).

Message truncation

Before an upstream call, truncate_messages keeps all system messages, then adds the newest non-system messages (walking newest-first) until estimated input tokens fit a budget derived from the model’s configured context_size minus reserved output headroom, and until at most 50 non-system messages remain. If nothing fits, at least one non-system message is still attempted (available_tokens floor in code).

Endpoints

MethodEndpointDescription
POST/v1/chat/completionsRecommended default for apps, SDKs, and coding agents
POST/v1/messagesCompatibility path for Anthropic-style messages
GET/v1/modelsPublic model catalog. This exact path is anonymous, and authenticated callers currently see the same published catalog.
GET/v1/models/Model metadata (requires X-API-Key or Bearer)
POST/v1/embeddingsCreate OpenAI-compatible embeddings
POST/v1/rerankRerank documents for search and RAG workflows
POST/v1/images/generationsGenerate images
POST/v1/audio/speechConvert text to speech
POST/v1/audio/transcriptionsTranscribe audio files
GET/v1/pricingPer-model pricing table (requires middleware auth — any valid rk_ or access JWT)
GET/v1/statusPublic health and service status
GET/v1/settingsPublic settings (public)

Rate limits

Per-account daily request limits and credits are enforced in the API and surfaced in the dashboard. When the daily cap is exceeded the server returns 429, body Daily request limit exceeded, header X-Error-Code: DAILY_REQUEST_LIMIT_EXCEEDED (plain text, not JSON — see Authentication). Separate per-IP limits apply to some paths (auth, /v1/user/key*, defaults); those return JSON with retry_after and Retry-After.

Next steps

https://mintcdn.com/routing/Bdepg-ZiTHSkFbP-/images/ai-tools/openai.svg?fit=max&auto=format&n=Bdepg-ZiTHSkFbP-&q=85&s=20abb0f26a0ce48b6bff9705347b8d49

OpenAI compatibility

Recommended default endpoint with OpenAI SDK support.

Audio

Generate speech and transcribe audio files.

Embeddings and rerank

Create embeddings and rerank documents through the same routing.run API key.