Consumers never talk to orchestrators directly. All traffic flows through gateways, which handle routing, auth, pricing, and QoS.
Models best suited for orchestrators
Livepeer orchestrators are best for low-latency, real-time inference - especially:- Diffusion models (Stable Diffusion, StreamDiffusion)
- Image-to-image and video-to-video models
- ControlNet pipelines
- Vision models (depth, pose, segmentation)
- ComfyUI-based DAG workflows
- Real-time video AI effects
What you do and don’t need to build
As an orchestrator operator, you focus on running excellent inference infrastructure. You do not need to build:- A marketplace
- Authentication systems
- Billing infrastructure
- Service discovery
- Brand or user trust
Setup steps
Run an orchestrator node
Install go-livepeer with AI enabled.Requirements:The orchestrator advertises: hardware specs, supported capabilities, pricing, service endpoints.Full orchestrator setup guide
- GPU (RTX 3090 / 4090 / A100 / L4 or equivalent)
- CUDA + NVIDIA drivers
- Docker strongly recommended
Install an AI runtime
Livepeer does not mandate a single runtime. The two common approaches:Option A - ComfyUI (most common)Option B - Custom inference serverYou control: which models are loaded, VRAM usage and batching, precision (fp16 / int8), warm starts and caching.
- Load models via
.safetensors - Build inference pipelines as DAGs
- Serve real-time inference via ComfyStream
- Torch / TensorRT / ONNX
- HTTP or gRPC interface
- Wrapped by Livepeer’s AI Worker interface
Advertise AI capabilities
Your orchestrator declares machine-readable capability descriptors. These are not marketing labels - gateways use them to route jobs.Edit your
aiModels.json:| Capability | Description | Example models |
|---|
Set pricing
Configure pricing for your AI capabilities. Pricing is advertised off-chain and settled via Livepeer tickets.You can configure:
- Per-request, per-frame, or per-second pricing
- Optional surge or priority pricing
- Model-specific pricing
Expose capabilities via gateways
Once running, gateways automatically:You should see your orchestrator’s address, capabilities, and pricing in the response.
- Discover your orchestrator
- Route jobs based on cost, performance, and capability match
- Handle retries, auth, and aggregation
- Expose stable APIs to developers
/getNetworkCapabilities endpoint on a gateway:Hosting multiple models on one orchestrator
A single GPU node can host multiple models, but VRAM is finite. Common patterns: Model multiplexing (cold swap): Load one model at a time, swap on request. Higher latency per swap, lower VRAM pressure. Suitable for less frequently requested models. Warm multi-model (concurrent): Keep multiple small models resident simultaneously. Works well for depth + segmentation + ControlNet combinations where each is lightweight. Dedicated GPU per model: For high-throughput production, assign one GPU per capability. Eliminates swap overhead entirely.How developers consume models you host
From a developer’s perspective, consumption looks like:- HTTP API calls (image/video in → output out)
- WebRTC streams for real-time video
- Daydream - generative AI video platform
- StreamDiffusionTD - real-time diffusion via TouchDesigner
- ComfyStream - browser-based ComfyUI pipelines
- OBS plugins - live stream AI effects
image-to-image with competitive latency and pricing, you’ll receive their jobs.
Partnering with gateways
Instead of competing with gateways, orchestrators can partner with them for guaranteed routing and exclusive pipeline access. This is a commercial arrangement outside the protocol. Relevant contacts: Cloud SPE operators, Daydream gateway team. See Gateway Providers.See also
BYOC
Developer guide to building BYOC containers - how to structure your inference server.
Model Support
Full model family compatibility matrix - which models work on Livepeer and why.
ComfyStream
How ComfyStream works as the AI pipeline runtime for orchestrator nodes.
Orchestrator quickstart
Start here if you haven’t run a Livepeer orchestrator node yet.