POSTDocumentation Index
Fetch the complete documentation index at: https://docs.routing.run/llms.txt
Use this file to discover all available pages before exploring further.
/v1/chat/completions matches OpenAI Chat Completions. Point the SDK base_url at routing.run.
This is the recommended default endpoint for apps, SDKs, and coding agents.
Base URL
Primary:https://api.routing.run is slow or returning errors. It is hosted by the routing.run team and supports the same API keys and paths.
Quick setup
Request
Chat models
| Route | Provider | Context |
|---|---|---|
route/minimax-m2.5 | minimax/opencode | 100k |
route/minimax-m2.7 | opencode/minimax | 100k |
route/kimi-k2.5 | crof | 131k |
route/kimi-k2.6 | crof | 131k |
route/glm-5 | crof | 200k |
route/deepseek-v3.2 | crof/chutes | 163k |
route/deepseek-v4-pro | crof | 163k |
route/qwen3.5-plus | opencode | 100k |
route/mistral-large-3 | routing-inference | See model metadata |
route/mistral-medium-2505 | routing-inference | See model metadata |
route/mistral-small-2503 | routing-inference | See model metadata |
Mistral chat and summarization
Use Mistral models through the same POST/v1/chat/completions endpoint.
Chat request
Summarization request
Mistral models
| Model | Use case |
|---|---|
route/mistral-large-3 | Premium chat completion and writing |
route/mistral-medium-2505 | Summarization and balanced generation |
route/mistral-small-2503 | Fast summarization and lightweight generation |
TypeScript example
Request body
Response
latency_ms, provider, and on some models message.reasoning_content are routing.run extensions on top of the OpenAI completion object.Live checks against
route/glm-5.1-precision, route/qwen3.6-plus, and route/kimi-k2.5 showed that some reasoning-oriented models may spend max_tokens on reasoning first. If you set a very small max_tokens, message.content can be empty while message.reasoning_content is populated and finish_reason may be length.Streaming
Set"stream": true to receive a streaming response:
Tool calling
routing.run supports tool calling (function calling) with compatible models:Error handling
Non-streaming failures from routing logic use plain text bodies andX-Error-Code (see Authentication). Streaming failures may emit an SSE data: line with JSON {"error":{"message":"…","type":"api_error"}}.
| Status | X-Error-Code (typical) | Meaning |
|---|---|---|
| 400 | INVALID_MODEL | Unknown route/… id |
| 401 | AUTHENTICATION_ERROR | Bad or missing rk_ / JWT on inference routes |
| 403 | MODEL_NOT_ALLOWED | Plan cannot call this model |
| 429 | DAILY_REQUEST_LIMIT_EXCEEDED | Daily request cap |
| 502 | PROVIDER_ERROR | Every upstream in the routing chain failed |
| 504 | PROVIDER_TIMEOUT | Upstream timeouts exhausted |