> ## Documentation Index > Fetch the complete documentation index at: https://docs.livepeer.org/llms.txt > Use this file to discover all available pages before exploring further. # Add a Hugging Face Model to Livepeer > Configure an existing Livepeer pipeline to serve a Hugging Face model. Declare the model, pre-download weights, restart the orchestrator, and verify end-to-end through your own self-hosted gateway. export const TableCell = ({children, align = "left", header = false, style = {}, className = "", ...rest}) => { const Component = header ? "th" : "td"; return {children} ; }; export const TableRow = ({children, header = false, hover = false, style = {}, className = "", ...rest}) => { const rowId = `table-row-${Math.random().toString(36).substr(2, 9)}`; return <> {hover && } {children} ; }; export const StyledTable = ({children, variant = "default", style = {}, className = "", ...rest}) => { const wrapperVariants = { default: { border: "1px solid var(--lp-color-border-default)", backgroundColor: "var(--lp-color-bg-card)", overflow: "hidden" }, bordered: { border: "2px solid var(--lp-color-accent)", backgroundColor: "var(--lp-color-bg-page)", overflow: "hidden" }, minimal: { border: "none", backgroundColor: "transparent", overflow: "visible" } }; return

{children}

; }; export const CustomDivider = ({color = "var(--lp-color-border-default)", middleText = "", spacing = "default", style = {}, className = "", ...rest}) => { const spacingPresets = { default: { margin: "24px 0" }, overlap: { margin: "-1rem 0 -1rem 0" }, tight: { margin: "0 0 -1rem 0" }, section: { margin: "0 0 -2rem 0" }, sectionOverlap: { margin: "-1rem 0 -2rem 0" }, deepOverlap: { margin: "-1rem 0 -1.5rem 0" } }; const spacingStyle = spacingPresets[spacing] || spacingPresets.default; return

{middleText && <> {middleText} }

; }; Your Hugging Face model already fits one of the ten built-in Livepeer pipelines. You declare it, pre-download the weights, restart the Orchestrator with the AI flags, and verify through your own self-hosted Gateway. No Studio. No Daydream. No code written. *** By the end of this tutorial, a Hugging Face model is running on your Livepeer Orchestrator, advertised to the network, and callable through a Gateway you operate. The example model is `SG161222/RealVisXL_V4.0_Lightning`, served through the `text-to-image` pipeline. **What you will verify:** * `aiModels.json` parses cleanly at Orchestrator startup * The runner container loads the model into VRAM * The model is advertised on `tools.livepeer.cloud/ai/network-capabilities` * A request through your self-hosted Gateway returns a successful inference result ## Scope and intent This is the simplest path: your model conforms to one of the ten pipeline shapes the Livepeer AI worker supports out of the box. The runner does the model loading, inference, and response formatting. You only declare the model and the price. This is the right tutorial if your model is, for example, an SDXL fine-tune, a BLIP variant, or a Whisper variant. It is not the right tutorial if: * your model needs custom Python code (preprocessing, postprocessing, novel architecture, or non-standard input or output shape). See the custom pipeline path. * your model ships as an arbitrary container with its own protocol. See the BYOC path. * your model is an LLM you want to run via Ollama instead of the standard `livepeer/ai-runner` image. The same overall flow applies but the runner image and `aiModels.json` entry differ. See the LLM variant note at the end. ## Built-in pipelines The Livepeer AI worker ships with a fixed set of pipeline implementations under [`livepeer/ai-worker/runner/src/runner/pipelines/`](https://github.com/livepeer/ai-worker/tree/main/runner/src/runner/pipelines). Each file defines the input schema, the output schema, and the model-loading conventions for one class of inference task. Pipeline Input Output Typical model class `text-to-image` Text prompt + sampling params Image SDXL, SD 1.5, Lightning variants `image-to-image` Image + prompt + params Image SDXL img2img, ControlNet wrappers `image-to-video` Image + params Short video Stable Video Diffusion class `image-to-text` Image Caption text BLIP, captioning VLMs `audio-to-text` Audio bytes Transcript text Whisper variants `text-to-speech` Text + voice params Audio bytes TTS models (text in, audio out) `upscale` Image Higher-resolution image Diffusion upscalers `segment-anything-2` Image + prompt mask Segmentation mask SAM2 variants `llm` Chat messages Completion Ollama-supported LLMs `live-video-to-video` WebRTC stream WebRTC stream Real-time pipelines via ComfyStream If your model fits the input and output shape of one of these, take this tutorial. If not, the model needs either a custom pipeline or a BYOC container. ## Prerequisites Each requirement is a hard prerequisite, not a soft one. Stop here if any is not in place. Requirement Notes Active Orchestrator on Arbitrum One Registered on-chain with a reachable `serviceAddr`, in the Active Set. Verify on `explorer.livepeer.org`. NVIDIA GPU with 24 GB VRAM minimum RealVisXL is an SDXL fine-tune. SDXL inference at fp16 needs roughly 12 GB for the UNet; 24 GB is the sensible floor with VAE, scheduler state, and warm-load headroom. Docker with NVIDIA Container Toolkit The AI worker runs each pipeline in a container with GPU passthrough. Verify: `docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi`. `go-livepeer` build with AI worker mode Built from `master` or a release containing `-aiWorker`, `-aiModels`, and `-aiModelsDir` flags. Disk for model weights Fast disk, at least 50 GB free. ## Step 1: Choose the model directory Pick a host path for model weights. The AI worker mounts this path into the runner container at `/models`. ```bash icon="terminal" title="export-model-dir.sh" theme={"theme":{"light":"github-light","dark":"dark-plus"}} export LP_AI_MODELS_DIR=/data/livepeer-ai-models mkdir -p "$LP_AI_MODELS_DIR" ``` This is the path you pass to `go-livepeer` via `-aiModelsDir`. The runner reads weights from `/models` inside the container, which maps to this directory on the host. ## Step 2: Declare the model in aiModels.json Create an `aiModels.json` file. The Orchestrator parses this file at startup and advertises every pipeline it lists. ```json icon="code" title="aiModels.json" theme={"theme":{"light":"github-light","dark":"dark-plus"}} [ { "pipeline": "text-to-image", "model_id": "SG161222/RealVisXL_V4.0_Lightning", "price_per_unit": 4768371, "pixels_per_unit": 1, "currency": "wei", "warm": true } ] ``` Each field, grounded in the schema parsed by `go-livepeer`: Field Definition `pipeline` One of the canonical pipeline names (hyphenated form). Source: keys in `livePipelineToImage` in [`livepeer/go-livepeer/ai/worker/docker.go`](https://github.com/livepeer/go-livepeer/blob/master/ai/worker/docker.go). `model_id` The Hugging Face repository identifier as it appears in the URL `huggingface.co//`. Used by the runner as both the download target and the inference-routing key. `price_per_unit` / `pixels_per_unit` Together set the rate. For pixel-priced pipelines, the rate is `price_per_unit / pixels_per_unit` wei per pixel. The wei figure is illustrative; set yours by comparing live rates on `tools.livepeer.cloud/ai/network-capabilities`. `currency` `"wei"`. Settlement uses Arbitrum-native ETH denominated in wei. `warm` `true` keeps the model in VRAM continuously, eliminating cold-start latency. `false` lazy-loads on first request, adding tens of seconds to the first job for SDXL-class models. Orchestrators competing on latency advertise warm models. ## Step 3: Pre-download the model weights The model needs to land on disk before the runner starts. Otherwise warm load fails and lazy load stalls the first request. The canonical script in `livepeer/ai-worker` is [`runner/dl_checkpoints.sh`](https://github.com/livepeer/ai-worker/blob/main/runner/dl_checkpoints.sh). It reads pipeline names from environment variables, calls `huggingface_hub.snapshot_download` for each model, and places weights at `$MODEL_DIR//`. ```bash icon="terminal" title="download-weights.sh" theme={"theme":{"light":"github-light","dark":"dark-plus"}} git clone https://github.com/livepeer/ai-worker.git cd ai-worker docker run --rm \ -v "$LP_AI_MODELS_DIR:/models" \ -v "$(pwd)/runner:/runner" \ -e MODEL_DIR=/models \ -e PIPELINE=text-to-image \ -e MODEL_ID=SG161222/RealVisXL_V4.0_Lightning \ livepeer/ai-runner:latest \ bash /runner/dl_checkpoints.sh ``` The command: 1. Mounts your host model directory at `/models` inside the container 2. Mounts the `runner/` directory so the script and helpers are available 3. Sets `MODEL_DIR=/models` so the script knows where to write 4. Sets `PIPELINE` and `MODEL_ID` so the script knows what to fetch 5. Runs the script, which uses `huggingface_hub` (already installed in the runner image) to pull the weights Verify the download: ```bash icon="terminal" title="verify-weights.sh" theme={"theme":{"light":"github-light","dark":"dark-plus"}} ls -la "$LP_AI_MODELS_DIR/SG161222/RealVisXL_V4.0_Lightning/" ``` Expect SDXL's standard layout: `model_index.json`, `unet/`, `vae/`, `text_encoder/`, `text_encoder_2/`, `tokenizer/`, `tokenizer_2/`, `scheduler/`. If the directory is empty or partial, re-run the command. `huggingface_hub` resumes partial downloads. ## Step 4: Start the Orchestrator with the new model Stop your existing `go-livepeer` Orchestrator and restart with the AI flags: ```bash icon="terminal" title="start-orchestrator.sh" theme={"theme":{"light":"github-light","dark":"dark-plus"}} go-livepeer \ -orchestrator \ -transcoder \ -nvidia all \ -aiWorker \ -aiModels /path/to/aiModels.json \ -aiModelsDir "$LP_AI_MODELS_DIR" \ -ethUrl \ -serviceAddr : \ -pricePerUnit 0 ``` The relevant flags, defined in [`livepeer/go-livepeer/cmd/livepeer/livepeer.go`](https://github.com/livepeer/go-livepeer/blob/master/cmd/livepeer/livepeer.go): Flag Purpose `-aiWorker` Declares this node serves AI inference jobs. Without this flag, even a perfectly configured `aiModels.json` is ignored. `-aiModels` Path to your `aiModels.json` file. `-aiModelsDir` The host directory you populated in Step 3. Mounted into runner containers at `/models`. `-nvidia all` GPU exposure for both transcoding and AI workers. Use a GPU index (for example `-nvidia 0`) to pin AI to a specific card. At startup, `go-livepeer`: 1. Parses `aiModels.json` 2. For each entry with `warm: true`, looks up the runner image from the pipeline-to-image map in `livepeer/go-livepeer/ai/worker/docker.go`, pulls it if absent, and starts a container 3. Mounts `$LP_AI_MODELS_DIR` into the container at `/models` 4. Waits for the runner's `/health` endpoint to report ready 5. Begins advertising the pipeline plus model plus price as a capability Watch the logs. A successful warm load looks like a runner-container start, a model-load log line, and a "capability advertised" or equivalent message. Source for the runner's health and readiness contract: [`livepeer/ai-worker/runner/src/runner/main.py`](https://github.com/livepeer/ai-worker/blob/main/runner/src/runner/main.py) (FastAPI app definition). ## Step 5: Verify on the network capabilities tool Open [`tools.livepeer.cloud/ai/network-capabilities`](https://tools.livepeer.cloud/ai/network-capabilities) in a browser. This dashboard reads live capability advertisements from active Orchestrators on the network. Find your Orchestrator address. You should see: * the `text-to-image` pipeline listed under your Orchestrator * `SG161222/RealVisXL_V4.0_Lightning` listed under that pipeline * a warm indicator, if the dashboard surfaces it If your Orchestrator is not in the list, the model is not visible to the network. The three usual causes: Confirm on [`explorer.livepeer.org`](https://explorer.livepeer.org) that your address shows as active. Capability advertisement requires on-chain registration with sufficient stake. Check `docker ps -a` for an exited container, then `docker logs ` for the failure reason. The most common is CUDA out-of-memory at warm load. `go-livepeer` was started without `-aiWorker`, or `aiModels.json` did not parse. Check the Orchestrator startup logs for parse errors. Resolve any of these before continuing. ## Step 6: Send a test inference request Two paths verify the model end-to-end without touching Studio or Daydream. Use both in order: localhost first, Gateway second. ### Step 6a: Hit the runner directly on localhost The runner is a FastAPI service. Source: [`livepeer/ai-worker/runner/src/runner/main.py`](https://github.com/livepeer/ai-worker/blob/main/runner/src/runner/main.py). The Orchestrator runs it on a port internal to the host (printed in startup logs as the AI worker port). ```bash icon="terminal" title="runner-direct.sh" theme={"theme":{"light":"github-light","dark":"dark-plus"}} curl -X POST http://localhost:/text-to-image \ -H "Content-Type: application/json" \ -d '{ "model_id": "SG161222/RealVisXL_V4.0_Lightning", "prompt": "a quiet harbour at dawn, photo realistic", "width": 1024, "height": 1024, "num_inference_steps": 4, "guidance_scale": 2.0 }' \ --output result.json ``` The four-step inference and low guidance scale follow the SDXL Lightning recommendations on the model card at [`huggingface.co/SG161222/RealVisXL_V4.0_Lightning`](https://huggingface.co/SG161222/RealVisXL_V4.0_Lightning). A successful response is a JSON object with an `images` array. Each image is base64-encoded or referenced by URL depending on runner version. Decode and inspect the output: ```bash icon="terminal" title="inspect-output.sh" theme={"theme":{"light":"github-light","dark":"dark-plus"}} jq -r '.images[0].url // .images[0]' result.json | head -c 200 ``` This step confirms the model is loaded and inference works. It does not confirm that the model is reachable through the Livepeer Network. That is Step 6b. ### Step 6b: Self-hosted Gateway test `go-livepeer` runs as a Gateway when started with `-gateway`. On a separate process or machine: ```bash icon="terminal" title="start-gateway.sh" theme={"theme":{"light":"github-light","dark":"dark-plus"}} go-livepeer \ -gateway \ -httpAddr 0.0.0.0:8935 \ -orchAddr : \ -ethUrl ``` The `-orchAddr` flag pins discovery to your own Orchestrator, removing the variability of network-wide selection. This is what makes the test deterministic: the Gateway can only route to your node. Then send the inference request to the Gateway: ```bash icon="terminal" title="gateway-request.sh" theme={"theme":{"light":"github-light","dark":"dark-plus"}} curl -X POST http://localhost:8935/text-to-image \ -H "Content-Type: application/json" \ -d '{ "model_id": "SG161222/RealVisXL_V4.0_Lightning", "prompt": "a quiet harbour at dawn, photo realistic", "width": 1024, "height": 1024, "num_inference_steps": 4, "guidance_scale": 2.0 }' \ --output gateway-result.json ``` The Gateway handles discovery, capability matching, ticket-based payment, and routing to the Orchestrator. The response includes the inference output and a settlement record for the probabilistic micropayment ticket. A successful response means your model is reachable across the protocol layer through your own infrastructure. The Livepeer Cloud Community Gateway is a free public Gateway maintained by the Cloud SPE (Titan Node). Sending a request to it tests routing from outside your own infrastructure. The downside is non-determinism: it selects an Orchestrator from the Active Set and may not select yours. Use it only as a cross-check after Step 6b succeeds, never as the primary verification. ## Step 7: Confirm the loop is closed The tutorial is complete when all four are observable: 1. `aiModels.json` declares the model and `go-livepeer` parsed it cleanly at startup (Orchestrator logs) 2. The runner container is running and the model is loaded into VRAM (`docker ps`, `nvidia-smi`) 3. The Orchestrator advertises the model on `tools.livepeer.cloud/ai/network-capabilities` 4. A request through your self-hosted Gateway returns a successful inference result If any one of these is missing, the model is not yet on the network. Resolve before relying on the path for paid traffic. ## Operational notes Setting price-per-pixel above the network median means your Orchestrator receives no jobs. Gateway selection in `go-livepeer` filters by price competitiveness. Compare against the rates visible on the network capabilities dashboard before going live. `warm: true` holds the model in VRAM continuously. SDXL-class models occupy roughly 12 GB; on a 24 GB card you can warm one SDXL plus, perhaps, a smaller pipeline like `image-to-text` (4 GB floor per `Salesforce/blip-image-captioning-large`) but not two SDXL variants. Cold models (`warm: false`) share VRAM via swap on first request; price them lower because the cold-start latency makes them less attractive to Gateways. Replace the `model_id` in `aiModels.json` and the `MODEL_ID` in the download command with your chosen model. The pipeline name stays the same as long as the model fits the same I/O shape. For example, swapping `SG161222/RealVisXL_V4.0_Lightning` for `ByteDance/SDXL-Lightning` (also a `text-to-image` model) requires no other changes. ## LLM variant via Ollama LLM models follow the same overall flow but use a different runner image. The Cloud SPE maintains [`tztcloud/livepeer-ollama-runner`](https://hub.docker.com/r/tztcloud/livepeer-ollama-runner), which wraps Ollama for OpenAI-compatible completions. The `aiModels.json` entry for an LLM: ```json icon="code" title="aiModels-llm.json" theme={"theme":{"light":"github-light","dark":"dark-plus"}} { "pipeline": "llm", "model_id": "meta-llama/Meta-Llama-3.1-8B-Instruct", "price_per_unit": 1, "pixels_per_unit": 1000000, "currency": "wei", "warm": true } ``` The model identifier is the Hugging Face repo for documentation purposes; the actual model pull happens through Ollama's tagging system (`ollama pull llama3.1:8b`) inside the Ollama runner container. The mapping between HF identifier and Ollama tag for each LLM is the only piece that does not generalise from the standard runner. Reference: the Ollama tag library at [`ollama.com/library`](https://ollama.com/library). Otherwise the pattern is identical: declare in `aiModels.json`, ensure the runner image is available, restart `go-livepeer`, verify on the capabilities tool, test through your self-hosted Gateway with an OpenAI-compatible chat completion request. ## Troubleshooting Run `docker logs `. Three common causes: model files missing or partial (re-run Step 3); CUDA out-of-memory at load (insufficient VRAM, downgrade to `warm: false` or pick a smaller variant); image pull failed (check Docker Hub connectivity). Check [`explorer.livepeer.org`](https://explorer.livepeer.org) that your Orchestrator is in the active set. Capability advertisement requires on-chain registration with sufficient stake. Confirm `serviceAddr` is reachable from outside your network. Open the relevant port at the firewall, confirm DNS, and confirm the Orchestrator is binding to a public interface instead of `localhost`. Check that you are using the SDXL Lightning recommended sampling (4 steps, low guidance). Different SDXL fine-tunes have different recommended schedulers and step counts. Consult the model card. ## Sources Every claim in this tutorial is grounded in one of the following readable references: * [`github.com/livepeer/ai-worker`](https://github.com/livepeer/ai-worker) – runner architecture, pipeline implementations, `dl_checkpoints.sh` * [`livepeer/ai-worker/runner/src/runner/pipelines`](https://github.com/livepeer/ai-worker/tree/main/runner/src/runner/pipelines) – supported pipeline list and their I/O shapes * [`livepeer/ai-worker/runner/dl_checkpoints.sh`](https://github.com/livepeer/ai-worker/blob/main/runner/dl_checkpoints.sh) – model download script, environment variables, HF integration * [`livepeer/ai-worker/runner/src/runner/main.py`](https://github.com/livepeer/ai-worker/blob/main/runner/src/runner/main.py) – FastAPI app, `/health` endpoint, port binding * [`github.com/livepeer/go-livepeer`](https://github.com/livepeer/go-livepeer) – Orchestrator, Gateway, AI worker mode * [`livepeer/go-livepeer/cmd/livepeer/livepeer.go`](https://github.com/livepeer/go-livepeer/blob/master/cmd/livepeer/livepeer.go) – flag definitions for `-aiWorker`, `-aiModels`, `-aiModelsDir`, `-gateway`, `-orchAddr`, `-serviceAddr` * [`livepeer/go-livepeer/ai/worker/docker.go`](https://github.com/livepeer/go-livepeer/blob/master/ai/worker/docker.go) – pipeline-to-image map keyed on canonical pipeline name strings * [`huggingface_hub`](https://github.com/huggingface/huggingface_hub) – `snapshot_download` semantics * [`huggingface.co/SG161222/RealVisXL_V4.0_Lightning`](https://huggingface.co/SG161222/RealVisXL_V4.0_Lightning) – model card, recommended sampling * [`hub.docker.com/r/livepeer/ai-runner`](https://hub.docker.com/r/livepeer/ai-runner) – runner image, tags * [`hub.docker.com/r/tztcloud/livepeer-ollama-runner`](https://hub.docker.com/r/tztcloud/livepeer-ollama-runner) – Ollama-based LLM runner * [`tools.livepeer.cloud/ai/network-capabilities`](https://tools.livepeer.cloud/ai/network-capabilities) – live capability dashboard * [`explorer.livepeer.org`](https://explorer.livepeer.org) – Orchestrator active-set status Your model is now running on the Livepeer Network, advertised to Gateways, and callable through your self-hosted Gateway. For custom architectures that do not fit a native pipeline, see the [advanced paths](/v2/developers/build/tutorials/huggingface-to-livepeer-advanced). ## AI agent prompt ```text theme={"theme":{"light":"github-light","dark":"dark-plus"}} Complete the "Add a Hugging Face Model to Livepeer" tutorial for a model that fits an existing Livepeer AI pipeline. Use placeholders for MODEL_ID=, PIPELINE=, LP_AI_MODELS_DIR=/data/livepeer-ai-models, ORCH_SERVICE_ADDR=, ORCH_ETH_ADDR=, GATEWAY_PORT=8935, and ORCH_ADDR=. Clone livepeer/ai-worker only for the checkpoint script, use livepeer/ai-runner images, write aiModels.json, pre-download weights, start go-livepeer with -aiWorker -aiModels -aiModelsDir, verify the runner container and tools.livepeer.cloud capability listing, then start a self-hosted go-livepeer -gateway pinned to the orchestrator and send a test inference request. Do not use Studio or Daydream. ``` ## Related pages Three structurally different paths: existing pipeline, custom pipeline, BYOC. Local end-to-end pipeline: Gateway routes inference to Orchestrator and the result returns through the full pipeline. Live video-to-video pipeline: continuous WebRTC stream in, transformed stream out. Stand up a ComfyStream pipeline for real-time AI workloads.