vram.supply

Inference

Create chat completions by sending messages to a model. The platform routes your request to the best available provider based on your routing strategy. Authenticate with an API key or MPP payment credential.

Streaming

Streaming is enabled by default ("stream": true). Tokens are delivered as Server-Sent Events.

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{},"index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":2,"total_tokens":12}}

data: [DONE]

Usage statistics are included in the final chunk. The platform sets stream_options.include_usage automatically.

MPP Streaming Events

When paying via Tempo session, the stream may include additional payment events:

Balance exhausted — server requests a new voucher:

event: payment-need-voucher
data: {"channelId":"0x6d0f...","requiredCumulative":"250025","acceptedCumulative":"250000"}

Sign a new voucher with a higher cumulative amount to resume delivery. Stream closes after 60s if no voucher is received.

Completion — server confirms payment:

event: payment-receipt
data: {"challengeId":"...","method":"tempo","reference":"0x...","status":"success"}

For non-streaming requests, the receipt is in the Payment-Receipt HTTP response header instead.

Non-Streaming

Set "stream": false to receive the complete response as a single JSON object with a usage field.

Routing

Control provider selection with routing_strategy and price ceilings. See the Routing Guide. Routing works identically for both payment rails.

Endpoint Reference

POST/v1/chat/completionsapi-key | payment

Create a chat completion. Accepts Bearer (API key) or Payment (MPP credential) auth.

Parameters

NameTypeReqDescription
modelstringYesModel identifier.
messagesarrayYesMessage objects with role and content.
streambooleanNoEnable SSE streaming. Default: true
max_tokensintegerNoMax tokens to generate. Default: 4096
temperaturenumberNoSampling temperature (0-2). Default: 1
top_pnumberNoNucleus sampling.
stopstring|arrayNoStop sequences.
frequency_penaltynumberNoPenalize repeated tokens.
presence_penaltynumberNoPenalize present tokens.
timeout_msintegerNoTimeout in ms (max 30000). Default: 30000
routing_strategystringNo'cheapest', 'fastest', or 'balanced' (default). Default: balanced
allow_externalbooleanNoInclude OpenRouter. Default: true for balanced/fastest, false for cheapest.
max_input_price_per_millionintegerNoMax input price (cents/M tokens). Both must be set.
max_output_price_per_millionintegerNoMax output price (cents/M tokens). Both must be set.

Request

{
  "model": "qwen/qwen3.5-9b",
  "messages": [
    {
      "role": "user",
      "content": "Hello"
    }
  ],
  "stream": false
}

Response

{
  "id": "chatcmpl-abc",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qwen/qwen3.5-9b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello!"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 5,
    "total_tokens": 15
  }
}

Errors

400Bad request
401Invalid API key
402Stripe: billing issue / MPP: payment challenge (WWW-Authenticate: Payment)
502All providers failed
503No providers available