Routing

When you make an inference request, the platform picks the best available provider. You can control how "best" is defined using routing strategies, price ceilings, and external provider preferences.

Routing Strategies

Set routing_strategy in the request body, or set a per-user default via PATCH /v1/settings/provider. Priority: per-request → per-user default → platform default (balanced).

Strategy	Behaviour	Best for
cheapest	Collects all providers across all types (native, quota, external). Picks randomly from those within 20% of the lowest combined price. External providers (OpenRouter) excluded by default.	Batch jobs, cost-sensitive workloads
fastest	Prefers native GPU providers (lowest latency), then quota providers, then external. Random selection within the preferred type.	Interactive chat, latency-sensitive apps
balanced	Default. Prefers native > quota > external, then uses price-band selection within the same type. Good balance of cost and performance.	General use (default)

Price-Band Selection

When using cheapest (or within a type for balanced), the platform doesn't pick the single cheapest provider. Instead, it defines a "price band" — all providers within 20% of the cheapest combined rate — and picks randomly within that band.

This prevents a single provider from capturing all traffic with a 1-cent undercut, while still giving cheaper providers more traffic overall.

Worked example

Three providers serve the same model:

Provider A: $0.10 input + $0.15 output = $0.25/M combined
Provider B: $0.12 input + $0.16 output = $0.28/M combined
Provider C: $0.15 input + $0.20 output = $0.35/M combined

Band ceiling = $0.25 × 1.20 = $0.30/M

→ Providers A and B are in the band. Provider C is excluded.

→ Random selection between A and B.

Price Ceiling

Set a maximum price you're willing to pay using max_input_price_per_million and max_output_price_per_million (in cents per million tokens).

Both fields must be set — setting only one is silently ignored.
Applied to all provider types: native, quota, and external (OpenRouter).
If no provider qualifies, you get a 503 with the message "No providers available within your price constraints."

{
  "model": "qwen/qwen3.5-9b",
  "messages": [{"role": "user", "content": "Hello"}],
  "routing_strategy": "cheapest",
  "max_input_price_per_million": 20,
  "max_output_price_per_million": 30
}

External Provider Policy

External providers (currently OpenRouter) are included automatically for balanced and fastest, and excluded automatically for cheapest.

Strategy	External providers
cheapest	Excluded. OpenRouter currently has 0% markup, so routing to it for "cheapest" would subsidize the request.
fastest / balanced	Included.

Payment Rail Filtering

Before strategy, price, and type selection, routing filters by the buyer's payment rail. A buyer paying via Stripe card is only routed to providers with accepts_stripe = true, and an MPP buyer only to providers with accepts_tempo = true.

Buyer auth	Payment rail	Provider filter
Bearer sk-...	Stripe	`accepts_stripe = true`
Payment eyJ...	Tempo (MPP)	`accepts_tempo = true`

If no provider on the buyer's rail serves the requested model, the request returns 503. Providers who enable both rails are routable by both buyer types — see Payouts for how to set up rails and PATCH /v1/settings/provider/rails to toggle them.

Per-User Default

Set a default routing strategy for all your requests via the Settings API:

curl -X PATCH https://api.vram.supply/v1/settings/provider \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{"input_price_per_million": 10, "output_price_per_million": 15, "availability": "always_available", "routing_strategy": "cheapest"}'

A per-request routing_strategy always overrides the per-user default.