Skip to main content
POST /v1/chat/completions matches OpenAI Chat Completions. Point the SDK base_url at routing.run.

Base URL

https://api.routing.run/v1

Quick setup

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["ROUTING_RUN_API_KEY"],
    base_url="https://api.routing.run/v1",
)

response = client.chat.completions.create(
    model="route/deepseek-v3.2",
    messages=[
        {"role": "system", "content": "You are a staff engineer reviewing a pull request."},
        {"role": "user", "content": "List concrete issues in this diff: …"},
    ],
    temperature=0.2,
)

print(response.choices[0].message.content)

Request

Response

{
  "id": "chatcmpl_01J8rQvN4pK2mL9xYz3wAbcDef",
  "object": "chat.completion",
  "created": 1744701234,
  "model": "route/deepseek-v3.2",
  "latency_ms": 842,
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "- Line 42: possible null dereference …"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 128,
    "completion_tokens": 256,
    "total_tokens": 384
  }
}
latency_ms (round-trip timing) and an internal upstream-id string field are routing.run extensions on top of the OpenAI completion object.

Streaming

Set "stream": true to receive a streaming response:
stream = client.chat.completions.create(
    model="route/deepseek-v3.2",
    messages=[{"role": "user", "content": "Stream a short design for a rate limiter."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Tool calling

routing.run supports tool calling (function calling) with compatible models:
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="route/deepseek-v3.2",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto",
)

Error handling

Non-streaming failures from routing logic use plain text bodies and X-Error-Code (see Authentication). Streaming failures may emit an SSE data: line with JSON {"error":{"message":"…","type":"api_error"}}.
StatusX-Error-Code (typical)Meaning
400INVALID_MODELUnknown route/… id
401AUTHENTICATION_ERRORBad or missing rk_ / JWT on inference routes
403MODEL_NOT_ALLOWEDPlan cannot call this model
429DAILY_REQUEST_LIMIT_EXCEEDEDDaily request cap
502PROVIDER_ERROREvery upstream in the routing chain failed
504PROVIDER_TIMEOUTUpstream timeouts exhausted