API Documentation

Drop-in OpenAI-compatible inference. Pay with an API key or let your agent pay per token.

Quick Start

With API key

curl https://api.vram.supply/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3.5-9b",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

With MPP (no account needed)

# No auth → 402 challenge → pay → retry
npx mppx https://api.vram.supply/v1/chat/completions \
  -d '{
    "model": "qwen/qwen3.5-9b",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Works with any OpenAI-compatible client

Any tool that supports a custom OpenAI base URL works out of the box. No SDK, no special integration.

Python OpenAI SDK

from openai import OpenAI
client = OpenAI(base_url="https://api.vram.supply/v1", api_key="sk-...")

curl

curl https://api.vram.supply/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -d '{"model": "qwen/qwen3.5-9b", "messages": [{"role": "user", "content": "Hello"}]}'

Aider

aider --openai-api-base https://api.vram.supply/v1 --openai-api-key sk-...

Continue (VS Code)

{ "apiBase": "https://api.vram.supply/v1", "apiKey": "sk-..." }

Explore the API

Using the API

Make inference requests, control routing and cost, manage API keys.

→ Inference & Streaming
→ Routing Strategies
→ Models

Account & Payments

API keys, card billing, MPP agent payments, and settings.

→ API Keys & MPP Auth
→ Billing (Stripe & Tempo)
→ Settings

Providing

Serve models, sell quota, and get paid.

→ GPU Provider API
→ Quota Selling
→ Payouts