Inference

Create chat completions by sending messages to a model. The platform routes your request to the best available provider based on your routing strategy. Authenticate with an API key or MPP payment credential.

Streaming

Streaming is enabled by default ("stream": true). Tokens are delivered as Server-Sent Events.

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{},"index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":2,"total_tokens":12}}

data: [DONE]

Usage statistics are included in the final chunk. The platform sets stream_options.include_usage automatically.

MPP Streaming Events

When paying via Tempo session, the stream may include additional payment events:

Balance exhausted — server requests a new voucher:

event: payment-need-voucher
data: {"channelId":"0x6d0f...","requiredCumulative":"250025","acceptedCumulative":"250000"}

Sign a new voucher with a higher cumulative amount to resume delivery. Stream closes after 60s if no voucher is received.

Completion — server confirms payment:

event: payment-receipt
data: {"challengeId":"...","method":"tempo","reference":"0x...","status":"success"}

For non-streaming requests, the receipt is in the Payment-Receipt HTTP response header instead.

Non-Streaming

Set "stream": false to receive the complete response as a single JSON object with a usage field.

Routing

Control provider selection with routing_strategy and price ceilings. See the Routing Guide. Routing works identically for both payment rails.

Endpoint Reference

POST/v1/chat/completionsapi-key | payment

Create a chat completion. Accepts Bearer (API key) or Payment (MPP credential) auth.

Parameters

Name	Type	Req	Description
model	string	Yes	Model identifier.
messages	array	Yes	Message objects with role and content.
stream	boolean	No	Enable SSE streaming. Default: `true`
max_tokens	integer	No	Max tokens to generate. Default: `4096`
temperature	number	No	Sampling temperature (0-2). Default: `1`
top_p	number	No	Nucleus sampling.
stop	string\|array	No	Stop sequences.
frequency_penalty	number	No	Penalize repeated tokens.
presence_penalty	number	No	Penalize present tokens.
timeout_ms	integer	No	Timeout in ms (max 30000). Default: `30000`
routing_strategy	string	No	'cheapest', 'fastest', or 'balanced' (default). Default: `balanced`
allow_external	boolean	No	Include OpenRouter. Default: true for balanced/fastest, false for cheapest.
max_input_price_per_million	integer	No	Max input price (cents/M tokens). Both must be set.
max_output_price_per_million	integer	No	Max output price (cents/M tokens). Both must be set.

Request

{
  "model": "qwen/qwen3.5-9b",
  "messages": [
    {
      "role": "user",
      "content": "Hello"
    }
  ],
  "stream": false
}

Response

{
  "id": "chatcmpl-abc",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qwen/qwen3.5-9b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello!"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 5,
    "total_tokens": 15
  }
}

Errors

400	Bad request
401	Invalid API key
402	Stripe: billing issue / MPP: payment challenge (WWW-Authenticate: Payment)
502	All providers failed
503	No providers available