POST /v1/chat/completions matches OpenAI Chat Completions. Point the SDK base_url at routing.run.
Base URL
https://api.routing.run/v1
Quick setup
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["ROUTING_RUN_API_KEY"],
base_url="https://api.routing.run/v1",
)
response = client.chat.completions.create(
model="route/deepseek-v3.2",
messages=[
{"role": "system", "content": "You are a staff engineer reviewing a pull request."},
{"role": "user", "content": "List concrete issues in this diff: …"},
],
temperature=0.2,
)
print(response.choices[0].message.content)
Request
Response
{
"id": "chatcmpl_01J8rQvN4pK2mL9xYz3wAbcDef",
"object": "chat.completion",
"created": 1744701234,
"model": "route/deepseek-v3.2",
"latency_ms": 842,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "- Line 42: possible null dereference …"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 128,
"completion_tokens": 256,
"total_tokens": 384
}
}
latency_ms (round-trip timing) and an internal upstream-id string field are routing.run extensions on top of the OpenAI completion object.
Streaming
Set "stream": true to receive a streaming response:
stream = client.chat.completions.create(
model="route/deepseek-v3.2",
messages=[{"role": "user", "content": "Stream a short design for a rate limiter."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
routing.run supports tool calling (function calling) with compatible models:
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
}
]
response = client.chat.completions.create(
model="route/deepseek-v3.2",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools,
tool_choice="auto",
)
Error handling
Non-streaming failures from routing logic use plain text bodies and X-Error-Code (see Authentication). Streaming failures may emit an SSE data: line with JSON {"error":{"message":"…","type":"api_error"}}.
| Status | X-Error-Code (typical) | Meaning |
|---|
| 400 | INVALID_MODEL | Unknown route/… id |
| 401 | AUTHENTICATION_ERROR | Bad or missing rk_ / JWT on inference routes |
| 403 | MODEL_NOT_ALLOWED | Plan cannot call this model |
| 429 | DAILY_REQUEST_LIMIT_EXCEEDED | Daily request cap |
| 502 | PROVIDER_ERROR | Every upstream in the routing chain failed |
| 504 | PROVIDER_TIMEOUT | Upstream timeouts exhausted |