Inference
Create chat completions by sending messages to a model. The platform routes your request to the best available provider based on your routing strategy. Authenticate with an API key or MPP payment credential.
Streaming
Streaming is enabled by default ("stream": true). Tokens are delivered as Server-Sent Events.
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{},"index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":2,"total_tokens":12}}
data: [DONE]Usage statistics are included in the final chunk. The platform sets stream_options.include_usage automatically.
MPP Streaming Events
When paying via Tempo session, the stream may include additional payment events:
Balance exhausted — server requests a new voucher:
event: payment-need-voucher
data: {"channelId":"0x6d0f...","requiredCumulative":"250025","acceptedCumulative":"250000"}Sign a new voucher with a higher cumulative amount to resume delivery. Stream closes after 60s if no voucher is received.
Completion — server confirms payment:
event: payment-receipt
data: {"challengeId":"...","method":"tempo","reference":"0x...","status":"success"}For non-streaming requests, the receipt is in the Payment-Receipt HTTP response header instead.
Non-Streaming
Set "stream": false to receive the complete response as a single JSON object with a usage field.
Routing
Control provider selection with routing_strategy and price ceilings. See the Routing Guide. Routing works identically for both payment rails.
Endpoint Reference
/v1/chat/completionsapi-key | paymentCreate a chat completion. Accepts Bearer (API key) or Payment (MPP credential) auth.
Parameters
| Name | Type | Req | Description |
|---|---|---|---|
| model | string | Yes | Model identifier. |
| messages | array | Yes | Message objects with role and content. |
| stream | boolean | No | Enable SSE streaming. Default: true |
| max_tokens | integer | No | Max tokens to generate. Default: 4096 |
| temperature | number | No | Sampling temperature (0-2). Default: 1 |
| top_p | number | No | Nucleus sampling. |
| stop | string|array | No | Stop sequences. |
| frequency_penalty | number | No | Penalize repeated tokens. |
| presence_penalty | number | No | Penalize present tokens. |
| timeout_ms | integer | No | Timeout in ms (max 30000). Default: 30000 |
| routing_strategy | string | No | 'cheapest', 'fastest', or 'balanced' (default). Default: balanced |
| allow_external | boolean | No | Include OpenRouter. Default: true for balanced/fastest, false for cheapest. |
| max_input_price_per_million | integer | No | Max input price (cents/M tokens). Both must be set. |
| max_output_price_per_million | integer | No | Max output price (cents/M tokens). Both must be set. |
Request
{
"model": "qwen/qwen3.5-9b",
"messages": [
{
"role": "user",
"content": "Hello"
}
],
"stream": false
}Response
{
"id": "chatcmpl-abc",
"object": "chat.completion",
"created": 1700000000,
"model": "qwen/qwen3.5-9b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello!"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 5,
"total_tokens": 15
}
}Errors
| 400 | Bad request |
| 401 | Invalid API key |
| 402 | Stripe: billing issue / MPP: payment challenge (WWW-Authenticate: Payment) |
| 502 | All providers failed |
| 503 | No providers available |