OpenAI compatible API - routing.run docs

POST /v1/chat/completions matches OpenAI Chat Completions. Point the SDK base_url at routing.run.

This is the recommended default endpoint for apps, SDKs, and coding agents.

Base URL

Primary:

https://api.routing.run/v1

Secondary:

https://ai.routing.sh/v1

Use the secondary endpoint if https://api.routing.run is slow or returning errors. It is hosted by the routing.run team and supports the same API keys and paths.

Quick setup

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["ROUTING_RUN_API_KEY"],
    base_url="https://api.routing.run/v1",
)

response = client.chat.completions.create(
    model="route/deepseek-v3.2",
    messages=[
        {"role": "system", "content": "You are a staff engineer reviewing a pull request."},
        {"role": "user", "content": "List concrete issues in this diff: …"},
    ],
    temperature=0.2,
)

print(response.choices[0].message.content)

Request

curl -sS -X POST https://api.routing.run/v1/chat/completions \
  -H "X-API-Key: ${ROUTING_RUN_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "route/minimax-m2.7",
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }'

Chat models

Route	Provider	Context
`route/minimax-m2.5`	minimax/opencode	100k
`route/minimax-m2.7`	opencode/minimax	100k
`route/kimi-k2.5`	crof	131k
`route/kimi-k2.6`	crof	131k
`route/glm-5`	crof	200k
`route/deepseek-v3.2`	crof/chutes	163k
`route/deepseek-v4-pro`	crof	163k
`route/qwen3.5-plus`	opencode	100k
`route/mistral-large-3`	routing-inference	See model metadata
`route/mistral-medium-2505`	routing-inference	See model metadata
`route/mistral-small-2503`	routing-inference	See model metadata

Mistral chat and summarization

Use Mistral models through the same POST /v1/chat/completions endpoint.

Chat request

{
  "model": "route/mistral-large-3",
  "messages": [
    {
      "role": "user",
      "content": "Write a concise product announcement for routing.run embeddings."
    }
  ]
}

Summarization request

{
  "model": "route/mistral-medium-2505",
  "messages": [
    {
      "role": "system",
      "content": "Summarize the user's text into 5 bullet points."
    },
    {
      "role": "user",
      "content": "<long document text here>"
    }
  ],
  "max_tokens": 700
}

Mistral models

Model	Use case
`route/mistral-large-3`	Premium chat completion and writing
`route/mistral-medium-2505`	Summarization and balanced generation
`route/mistral-small-2503`	Fast summarization and lightweight generation

Mistral models are available on Premium, Max, and Ultra plan tiers.

TypeScript example

async function summarizeWithMistral(text: string) {
  const response = await fetch(`${API_BASE_URL}/v1/chat/completions`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-API-Key': ROUTING_API_KEY,
    },
    body: JSON.stringify({
      model: 'route/mistral-medium-2505',
      messages: [
        { role: 'system', content: 'Summarize the user text into 5 bullet points.' },
        { role: 'user', content: text },
      ],
      max_tokens: 700,
    }),
  })

  if (!response.ok) {
    throw new Error('Mistral summarization request failed')
  }

  return response.json()
}

Request body

Response

{
  "id": "chatcmpl_01J8rQvN4pK2mL9xYz3wAbcDef",
  "object": "chat.completion",
  "created": 1744701234,
  "model": "route/deepseek-v3.2",
  "latency_ms": 842,
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "- Line 42: possible null dereference …",
        "reasoning_content": "Internal reasoning trace may appear here on some models"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 128,
    "completion_tokens": 256,
    "total_tokens": 384
  }
}

latency_ms, provider, and on some models message.reasoning_content are routing.run extensions on top of the OpenAI completion object.

Live checks against route/glm-5.1-precision, route/qwen3.6-plus, and route/kimi-k2.5 showed that some reasoning-oriented models may spend max_tokens on reasoning first. If you set a very small max_tokens, message.content can be empty while message.reasoning_content is populated and finish_reason may be length.

Streaming

Set "stream": true to receive a streaming response:

stream = client.chat.completions.create(
    model="route/deepseek-v3.2",
    messages=[{"role": "user", "content": "Stream a short design for a rate limiter."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Tool calling

routing.run supports tool calling (function calling) with compatible models:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="route/deepseek-v3.2",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto",
)

Error handling

Non-streaming failures from routing logic use plain text bodies and X-Error-Code (see Authentication). Streaming failures may emit an SSE data: line with JSON {"error":{"message":"…","type":"api_error"}}.

Status	`X-Error-Code` (typical)	Meaning
400	`INVALID_MODEL`	Unknown `route/…` id
401	`AUTHENTICATION_ERROR`	Bad or missing `rk_` / JWT on inference routes
403	`MODEL_NOT_ALLOWED`	Plan cannot call this model
429	`DAILY_REQUEST_LIMIT_EXCEEDED`	Daily request cap
502	`PROVIDER_ERROR`	Every upstream in the routing chain failed
504	`PROVIDER_TIMEOUT`	Upstream timeouts exhausted

Documentation Index

​Base URL

​Quick setup

​Request

​Chat models

​Mistral chat and summarization

​Chat request

​Summarization request

​Mistral models

​TypeScript example

​Request body

​Response

​Streaming

​Tool calling

​Error handling

Base URL

Quick setup

Request

Chat models

Mistral chat and summarization

Chat request

Summarization request

Mistral models

TypeScript example

Request body

Response

Streaming

Tool calling

Error handling