LLM Proxy API

The LLM proxy exposes standard-compatible endpoints. Any client that speaks the OpenAI or Anthropic protocol can connect without modification.

Base URL: http://localhost:3000/v1

Authentication: Authorization: Bearer sk-rt-YOUR_PROJECT_TOKEN

Chat Completions

POST /v1/chat/completions

OpenAI-compatible chat completions endpoint. Accepts the same request body as the OpenAI API.

Request

{
  "model": "gpt-5-mini",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello!" }
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 1024
}

The model field can be:

A specific model ID registered in Routerly (e.g. gpt-5-mini)
Any value — Routerly will use its routing policies to pick the best model regardless

Response (non-streaming)

Standard OpenAI ChatCompletion object, with an additional header:

x-routerly-trace-id: 018f3c2a-4b5d-7e8f-9012-34567890abcd

Response (streaming)

When "stream": true, the response is a Server-Sent Events stream. Each event has one of the following types:

SSE data prefix	Description
`data: {"type":"trace",...}`	Routing decision metadata (first event)
`data: {"type":"content",...}`	Token chunk from the model
`data: [DONE]`	End of stream

The trace event includes the selected model, policy scores, and request cost estimate.

Responses API

POST /v1/responses

OpenAI Responses API compatible endpoint. Supports stateful multi-turn conversations via previous_response_id.

Request

{
  "model": "gpt-5-mini",
  "input": "Tell me a joke.",
  "stream": false
}

Response

Standard OpenAI Response object structure.

Anthropic Messages

POST /v1/messages

Anthropic Messages API compatible endpoint. Use this with the Anthropic SDK by setting base_url to http://localhost:3000.

Request

{
  "model": "claude-haiku-4-5",
  "max_tokens": 1024,
  "messages": [
    { "role": "user", "content": "Hello!" }
  ]
}

Response

Standard Anthropic Message object.

Count Tokens

POST /v1/messages/count_tokens

Anthropic-compatible token counting endpoint. Returns the number of input tokens for a given message set without making an inference call.

Request

{
  "model": "claude-haiku-4-5",
  "messages": [
    { "role": "user", "content": "Hello!" }
  ]
}

Response

{ "input_tokens": 10 }

Project-Scoped Proxy

The same endpoints are available scoped to a specific project:

POST /projects/{slug}/v1/chat/completions
POST /projects/{slug}/v1/responses
POST /projects/{slug}/v1/messages

The project slug in the URL takes precedence over the slug inferred from the Bearer token. Use this when one token has access to multiple projects.

Streaming Protocol Details

Routerly extends the standard SSE stream with a trace event at the start:

data: {"type":"trace","model":"gpt-5-mini","provider":"openai","policies":["health","cheapest"],"costEstimate":0.000025}

data: {"type":"content","delta":"Hello"}

data: {"type":"content","delta":" there"}

data: [DONE]

Clients that only look for data: lines starting after the trace event will receive standard OpenAI delta chunks and will not need modification.

Chat Completions​

Request​

Response (non-streaming)​

Response (streaming)​

Responses API​

Request​

Response​

Anthropic Messages​

Request​

Response​

Count Tokens​

Request​

Response​

Project-Scoped Proxy​

Streaming Protocol Details​

Chat Completions

Request

Response (non-streaming)

Response (streaming)

Responses API

Request

Response

Anthropic Messages

Request

Response

Count Tokens

Request

Response

Project-Scoped Proxy

Streaming Protocol Details