Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.routing.run/llms.txt

Use this file to discover all available pages before exploring further.

POST /v1/chat/completions matches OpenAI Chat Completions. Point the SDK base_url at routing.run.
This is the recommended default endpoint for apps, SDKs, and coding agents.

Base URL

Primary:
https://api.routing.run/v1
Secondary:
https://ai.routing.sh/v1
Use the secondary endpoint if https://api.routing.run is slow or returning errors. It is hosted by the routing.run team and supports the same API keys and paths.

Quick setup

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["ROUTING_RUN_API_KEY"],
    base_url="https://api.routing.run/v1",
)

response = client.chat.completions.create(
    model="route/deepseek-v3.2",
    messages=[
        {"role": "system", "content": "You are a staff engineer reviewing a pull request."},
        {"role": "user", "content": "List concrete issues in this diff: …"},
    ],
    temperature=0.2,
)

print(response.choices[0].message.content)

Request

curl -sS -X POST https://api.routing.run/v1/chat/completions \
  -H "X-API-Key: ${ROUTING_RUN_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "route/minimax-m2.7",
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }'

Chat models

RouteProviderContext
route/minimax-m2.5minimax/opencode100k
route/minimax-m2.7opencode/minimax100k
route/kimi-k2.5crof131k
route/kimi-k2.6crof131k
route/glm-5crof200k
route/deepseek-v3.2crof/chutes163k
route/deepseek-v4-procrof163k
route/qwen3.5-plusopencode100k
route/mistral-large-3routing-inferenceSee model metadata
route/mistral-medium-2505routing-inferenceSee model metadata
route/mistral-small-2503routing-inferenceSee model metadata

Mistral chat and summarization

Use Mistral models through the same POST /v1/chat/completions endpoint.

Chat request

{
  "model": "route/mistral-large-3",
  "messages": [
    {
      "role": "user",
      "content": "Write a concise product announcement for routing.run embeddings."
    }
  ]
}

Summarization request

{
  "model": "route/mistral-medium-2505",
  "messages": [
    {
      "role": "system",
      "content": "Summarize the user's text into 5 bullet points."
    },
    {
      "role": "user",
      "content": "<long document text here>"
    }
  ],
  "max_tokens": 700
}

Mistral models

ModelUse case
route/mistral-large-3Premium chat completion and writing
route/mistral-medium-2505Summarization and balanced generation
route/mistral-small-2503Fast summarization and lightweight generation
Mistral models are available on Premium, Max, and Ultra plan tiers.

TypeScript example

async function summarizeWithMistral(text: string) {
  const response = await fetch(`${API_BASE_URL}/v1/chat/completions`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-API-Key': ROUTING_API_KEY,
    },
    body: JSON.stringify({
      model: 'route/mistral-medium-2505',
      messages: [
        { role: 'system', content: 'Summarize the user text into 5 bullet points.' },
        { role: 'user', content: text },
      ],
      max_tokens: 700,
    }),
  })

  if (!response.ok) {
    throw new Error('Mistral summarization request failed')
  }

  return response.json()
}

Request body

Response

{
  "id": "chatcmpl_01J8rQvN4pK2mL9xYz3wAbcDef",
  "object": "chat.completion",
  "created": 1744701234,
  "model": "route/deepseek-v3.2",
  "latency_ms": 842,
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "- Line 42: possible null dereference …",
        "reasoning_content": "Internal reasoning trace may appear here on some models"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 128,
    "completion_tokens": 256,
    "total_tokens": 384
  }
}
latency_ms, provider, and on some models message.reasoning_content are routing.run extensions on top of the OpenAI completion object.
Live checks against route/glm-5.1-precision, route/qwen3.6-plus, and route/kimi-k2.5 showed that some reasoning-oriented models may spend max_tokens on reasoning first. If you set a very small max_tokens, message.content can be empty while message.reasoning_content is populated and finish_reason may be length.

Streaming

Set "stream": true to receive a streaming response:
stream = client.chat.completions.create(
    model="route/deepseek-v3.2",
    messages=[{"role": "user", "content": "Stream a short design for a rate limiter."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Tool calling

routing.run supports tool calling (function calling) with compatible models:
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="route/deepseek-v3.2",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto",
)

Error handling

Non-streaming failures from routing logic use plain text bodies and X-Error-Code (see Authentication). Streaming failures may emit an SSE data: line with JSON {"error":{"message":"…","type":"api_error"}}.
StatusX-Error-Code (typical)Meaning
400INVALID_MODELUnknown route/… id
401AUTHENTICATION_ERRORBad or missing rk_ / JWT on inference routes
403MODEL_NOT_ALLOWEDPlan cannot call this model
429DAILY_REQUEST_LIMIT_EXCEEDEDDaily request cap
502PROVIDER_ERROREvery upstream in the routing chain failed
504PROVIDER_TIMEOUTUpstream timeouts exhausted