vram.supply

Providing

Earn by serving inference on the vram.supply network. Install the CLI agent, register your GPU, and start receiving requests.

Provider Lifecycle

  1. Set up payouts — choose one or both:
    • Passkey signup — wallet auto-derived, zero extra steps.
    • Stripe Connect — fiat payouts via Stripe Express onboarding.
    • Tempo wallet — USDC payouts via manual wallet verification.

    See Payouts for details on each option.

  2. Install the CLIcurl -fsSL https://vram.supply/install.sh | sh
  3. RegisterPOST /v1/providers/register with your model, pricing, and endpoint URL.
  4. Serve — the platform routes requests to your endpoint. Send heartbeats to stay online.
  5. Earn — 95% of revenue (5% platform fee). Payouts: daily (Stripe) or every 15 minutes (Tempo).

Payment Rails

Each provider instance declares which buyer payment rails it accepts via two independent flags: accepts_stripe and accepts_tempo. This determines which buyers can be routed to you.

FlagBuyer type routed to youPrerequisite
accepts_stripeBuyers using Bearer sk-...Stripe Connect onboarding complete
accepts_tempoBuyers using Payment (MPP)Verified Tempo wallet address

Providers who enable both rails maximise their traffic. Toggle rails via PATCH /v1/settings/provider/rails. At least one rail must remain enabled. Passkey-signup providers default to accepts_tempo = true. See Routing for how rail filtering works during provider selection.

Model Resolution

When you run vramsupply serve --model "org/model" --quant Q4_K_M, the CLI automatically finds a GGUF repository on HuggingFace, downloads the matching quantization file, and starts serving. The canonical model ID (e.g., qwen/qwen3.5-9b) is used for marketplace identity, while the resolved GGUF repo is used for file verification. You can also pass a local GGUF path directly with --model ./my-model.gguf.

Heartbeat & Health

The platform runs health checks every 60 seconds. Your agent should also send heartbeats via POST /v1/providers/heartbeat. If a provider fails a health check, it's marked offline and stops receiving requests.

Pricing

All providers set their own token pricing at registration time. Prices are specified as input_price_per_million and output_price_per_million in cents per million tokens.

  • CLI agent — set via --input-price / --output-price flags, or VRAM_SUPPLY_INPUT_PRICE / VRAM_SUPPLY_OUTPUT_PRICE env vars. Defaults: 100 / 200.
  • Browser provider — set in the pricing card on the provide page before starting.
  • Mobile (iOS / Android) — set via the pricing card on the providing screen. Suggested defaults come from the model catalog.

Browser Provider

Chrome users can serve inference directly from a browser tab using Chrome's built-in Prompt API. No GPU or CLI install required. Visit the Provide page, select the browser lane, set your pricing, and click Start Serving. Keep the tab open — closing it stops serving.

Market Demand

Use GET /v1/providers/demand to see which models have demand: requests in the last 24h, online provider count, and price ranges.

Endpoint Reference

POST/v1/providers/registerAPI Key

Register provider instance.

Parameters

NameTypeReqDescription
provider_typestringNo'agent' (default) or 'browser'. Default: agent
endpoint_urlstringYesPublic endpoint. Required for agent providers.
modelstringYesModel ID.
input_price_per_millionintegerYesInput price (cents/M tokens).
output_price_per_millionintegerYesOutput price (cents/M tokens).
context_length_offeredintegerYesContext window. Required for agent providers.
POST/v1/providers/heartbeatAPI Key

Provider heartbeat.

DELETE/v1/providers/:idAPI Key

Deregister provider.

GET/v1/providers/demandAPI Key

Market demand data.