} ; }; export const CustomDivider = ({color = "var(--lp-color-border-default)", middleText = "", spacing = "default", style = {}, className = "", ...rest}) => { const spacingPresets = { default: { margin: "24px 0" }, overlap: { margin: "-1rem 0 -1rem 0" }, tight: { margin: "0 0 -1rem 0" }, section: { margin: "0 0 -2rem 0" }, sectionOverlap: { margin: "-1rem 0 -2rem 0" }, deepOverlap: { margin: "-1rem 0 -1.5rem 0" } }; const spacingStyle = spacingPresets[spacing] || spacingPresets.default; return

{middleText && <> {middleText} }

; }; Realtime AI is Livepeer's highest-value compute offering. A continuous stream of video frames enters the Orchestrator, AI transforms each frame, and a processed stream exits. This is fundamentally different from batch inference - there is no request-response cycle. *** This tutorial sets up a working `live-video-to-video` pipeline using the Cascade architecture and ComfyStream. By the end, a live video stream enters the Orchestrator, StreamDiffusion transforms each frame in a continuous low-latency pipeline, and the output stream is viewable. Estimated time: **3 hours** (most of this is model download time). **What you will verify:** * The `livepeer/ai-runner:live-base` container starts cleanly with GPU access * The `live-video-to-video` pipeline registers at `tools.livepeer.cloud/ai/network-capabilities` * A test stream sends successfully and the transformed output is visible ## How realtime AI differs from batch Batch AI Realtime AI (Cascade) **Input** Discrete file or prompt Continuous WebRTC video stream **Output** Result returned once Continuous processed stream **Latency target** Seconds per request \<100ms per frame **Runtime** `livepeer/ai-runner` `livepeer/ai-runner:live-base` **Min VRAM** 4–24 GB 24 GB recommended At 30 fps, the frame budget is 33 ms. The pipeline must receive, process, and emit each frame within that window. StreamDiffusion's architecture is purpose-built for this: stream batching, residual CFG, and stochastic similarity filtering combine to achieve 30+ fps on an RTX 4090. ## Prerequisites Requirement Notes NVIDIA GPU, 24 GB VRAM RTX 4090 strongly recommended. RTX 3090 functional with less headroom. A100/H100 for production multi-stream. Cards below 24 GB are typically insufficient. CUDA 12.0+ drivers `nvidia-smi` shows driver and CUDA version. Min driver: `525.60.13`. Docker + NVIDIA Container Toolkit Verify: `docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi` go-livepeer running with `-aiWorker` Use AI Earning Quickstart for the base AI node setup, then return here for the live pipeline. 30 GB free disk space StreamDiffusion model weights + ComfyStream dependencies FFmpeg For sending a test RTMP stream. Install: `apt-get install ffmpeg` CPU: 8+ cores WebRTC frame encode/decode is CPU-bound GPUs below 24 GB VRAM (RTX 3080 10 GB, RTX 3060 12 GB) are typically insufficient for live-video inference at acceptable frame rates. StreamDiffusion's stream batch buffers, model weights, and ControlNet adapters combined exhaust available VRAM on these cards. ## Step 1: Verify GPU and Docker access ```bash icon="terminal" filename="verify-gpu" theme={"theme":{"light":"github-light","dark":"dark-plus"}} nvidia-smi ``` Note the GPU name, VRAM total, and driver version. Driver must be `525.60.13` or newer. Confirm Docker GPU access: ```bash icon="terminal" filename="verify-docker-gpu" theme={"theme":{"light":"github-light","dark":"dark-plus"}} docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi ``` The GPU table should appear inside the container output. Re-install the NVIDIA Container Toolkit until this command succeeds. ## Step 2: Pull the live-base AI Runner image The `live-base` image is separate from the standard `livepeer/ai-runner` used for batch pipelines. It includes ComfyStream, ComfyUI, and StreamDiffusion dependencies: ```bash icon="terminal" filename="pull-live-runner" theme={"theme":{"light":"github-light","dark":"dark-plus"}} docker pull livepeer/ai-runner:live-base ``` Verify the image is available: ```bash icon="terminal" filename="verify-live-runner" theme={"theme":{"light":"github-light","dark":"dark-plus"}} docker images | grep "ai-runner.*live-base" ``` Verify CUDA works inside the container: ```bash icon="terminal" filename="verify-cuda-in-container" theme={"theme":{"light":"github-light","dark":"dark-plus"}} docker run --gpus all --rm livepeer/ai-runner:live-base \ python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}, GPU: {torch.cuda.get_device_name(0)}')" ``` Expected: `CUDA available: True, GPU: NVIDIA GeForce RTX 4090` ## Step 3: Download ComfyStream model weights ComfyStream requires model weights before the container starts. Clone the ComfyStream repository and run the download script: ```bash icon="terminal" filename="clone-comfystream" theme={"theme":{"light":"github-light","dark":"dark-plus"}} git clone https://github.com/livepeer/comfystream cd comfystream pip install -r requirements.txt ``` Download StreamDiffusion and base models: ```bash icon="terminal" filename="download-models" theme={"theme":{"light":"github-light","dark":"dark-plus"}} python scripts/download_models.py ``` This downloads approximately 15 to 20 GB. Wait for completion. Models are downloaded to the directory that will be mounted into the AI Runner container via `-aiModelsDir`. Verify the download: ```bash icon="terminal" filename="verify-models" theme={"theme":{"light":"github-light","dark":"dark-plus"}} ls -lh ~/.lpData/models/ | head -20 ``` ## Step 4: Configure aiModels.json for live pipeline For the live pipeline, `model_id` names the ComfyUI workflow or pipeline. The underlying models load inside the ComfyStream container. ```bash icon="terminal" filename="write-aimodels" theme={"theme":{"light":"github-light","dark":"dark-plus"}} cat > ~/.lpData/aiModels.json << 'EOF' [ { "pipeline": "live-video-to-video", "model_id": "streamdiffusion", "price_per_unit": 500, "warm": true } ] EOF ``` `price_per_unit` for the live pipeline is charged per frame, unlike batch pipelines that charge per pixel or per millisecond. Set a value at or below the current Gateway caps in `-maxPricePerCapability` for `live-video-to-video`. Check current rates at [tools.Livepeer.cloud/ai/network-capabilities](https://tools.livepeer.cloud/ai/network-capabilities). ## Step 5: Start go-livepeer with live AI flags Existing AI nodes should stop and restart with the updated `aiModels.json`. Fresh setups should use: ```bash icon="terminal" filename="start-orchestrator" theme={"theme":{"light":"github-light","dark":"dark-plus"}} docker run -d \ --name livepeer-orchestrator \ -v ~/.lpData/:/root/.lpData/ \ -v /var/run/docker.sock:/var/run/docker.sock \ --network host \ --gpus all \ livepeer/go-livepeer:latest \ -network arbitrum-one-mainnet \ -ethUrl https://arb-mainnet.g.alchemy.com/v2/YOUR_API_KEY \ -orchestrator \ -transcoder \ -nvidia 0 \ -pricePerUnit 1000 \ -serviceAddr YOUR_PUBLIC_IP:8935 \ -aiWorker \ -aiModels /root/.lpData/aiModels.json \ -aiModelsDir /root/.lpData/models \ -v 6 ``` Wait for the live runner container to start and the pipeline to warm. This takes longer than batch pipelines - ComfyStream loads the full ComfyUI environment and StreamDiffusion model stack: ```bash icon="terminal" filename="watch-startup" theme={"theme":{"light":"github-light","dark":"dark-plus"}} docker logs -f livepeer-orchestrator 2>&1 | grep -i "live\|cascade\|pipeline\|warm\|error" ``` Expected after a typical 5 to 10 minute warm-up: ```text icon="terminal" title="Expected live-runner startup log" theme={"theme":{"light":"github-light","dark":"dark-plus"}} Starting AI worker Starting live-video-to-video pipeline: streamdiffusion ComfyStream container started Warm model loaded: streamdiffusion ``` Check the live runner container is running: ```bash icon="terminal" filename="check-containers" theme={"theme":{"light":"github-light","dark":"dark-plus"}} docker ps | grep livepeer ``` Two containers should be running: `livepeer-orchestrator` and the AI Runner container for the live pipeline. ## Step 6: Set up the Gateway for live routing Start an off-chain Gateway that routes `live-video-to-video` jobs to the Orchestrator for this local test: ```bash icon="terminal" filename="start-gateway" theme={"theme":{"light":"github-light","dark":"dark-plus"}} docker run -d \ --name livepeer-gateway-live \ -v ~/.lpData-gateway-live/:/root/.lpData/ \ --network host \ livepeer/go-livepeer:latest \ -gateway \ -cliAddr 127.0.0.1:7936 \ -httpAddr 0.0.0.0:8936 \ -rtmpAddr 0.0.0.0:1935 \ -orchAddr http://127.0.0.1:8935 \ -httpIngest \ -remoteSignerAddr https://signer.eliteencoder.net \ -network offchain ``` Verify the Gateway started: ```bash icon="terminal" filename="check-gateway" theme={"theme":{"light":"github-light","dark":"dark-plus"}} docker logs livepeer-gateway-live 2>&1 | grep -i "started\|gateway\|rtmp\|http" | head -10 ``` Expected: ```text icon="terminal" title="Expected gateway startup log" theme={"theme":{"light":"github-light","dark":"dark-plus"}} Gateway started on :8936 RTMP server listening on :1935 ``` For a production Gateway routing live AI jobs on-chain, configure `-maxPricePerCapability` with a cap for `live-video-to-video`. The Gateway routes only to Orchestrators priced at or below this cap, regardless of hardware capability. ## Step 7: Send a test stream Send a test RTMP stream through the Gateway using FFmpeg. This simulates a camera or OBS stream: ```bash icon="terminal" filename="send-test-stream" theme={"theme":{"light":"github-light","dark":"dark-plus"}} # Generate a synthetic test pattern and stream it via RTMP ffmpeg \ -re \ -f lavfi -i "testsrc=size=512x512:rate=30" \ -f lavfi -i "sine=frequency=440:sample_rate=44100" \ -vcodec libx264 \ -preset ultrafast \ -tune zerolatency \ -b:v 2000k \ -acodec aac \ -f flv \ rtmp://localhost:1935/live/test-stream-key ``` This streams a synthetic test pattern at 30 fps. The stream should be processed by the `live-video-to-video` pipeline. Keep this running while checking the output. In a second terminal, watch the Orchestrator process frames: ```bash icon="terminal" filename="watch-orchestrator" theme={"theme":{"light":"github-light","dark":"dark-plus"}} docker logs -f livepeer-orchestrator 2>&1 | grep -i "frame\|stream\|cascade\|inference" | head -20 ``` Expected: ```text icon="terminal" title="Expected frame-processing log" theme={"theme":{"light":"github-light","dark":"dark-plus"}} Received live stream: test-stream-key Dispatching to live-video-to-video pipeline Processing frame 0 Processing frame 1 ... ``` ## Step 8: Verify the transformed output Retrieve the processed output stream from the Gateway: ```bash icon="terminal" filename="get-output" theme={"theme":{"light":"github-light","dark":"dark-plus"}} # Pull the transformed output HLS stream curl -o output-manifest.m3u8 http://localhost:8936/hls/test-stream-key/index.m3u8 ``` A non-empty manifest confirms the live pipeline is processing frames and delivering output. To view the output stream in VLC or another player: ```bash icon="terminal" filename="view-output" theme={"theme":{"light":"github-light","dark":"dark-plus"}} ffplay http://localhost:8936/hls/test-stream-key/index.m3u8 ``` Check the network registration: Open [tools.Livepeer.cloud/ai/network-capabilities](https://tools.livepeer.cloud/ai/network-capabilities) and search for the Orchestrator address. The `live-video-to-video` pipeline should appear with **Warm** status. **Latency check:** Monitor frame processing times in the Orchestrator logs. At 30 fps, each frame should be processed in under 33 ms. Repeated frame times above 33 ms show the pipeline is falling behind the incoming stream: ```bash icon="terminal" filename="check-latency" theme={"theme":{"light":"github-light","dark":"dark-plus"}} docker logs livepeer-orchestrator 2>&1 | grep -i "frame.*ms\|latency\|processing time" | tail -20 ``` ## Troubleshooting **Frames dropping or high latency:** * The model is running too slowly for the target fps. StreamDiffusion at 2 steps is the minimum viable configuration for 30 fps on an RTX 4090. Try reducing output resolution. * VRAM OOM: reduce `stream_batch_size` in the StreamDiffusion config. * CPU bottleneck: WebRTC frame encode/decode is CPU-bound. Monitor CPU with `htop`. **Pipeline job registration check:** * Confirm `live-video-to-video` appears at [tools.Livepeer.cloud/ai/network-capabilities](https://tools.livepeer.cloud/ai/network-capabilities). * Verify the live runner container is running: `docker ps --filter name=livepeer`. * Check the container started cleanly: `docker logs `. **ComfyStream container failing to start:** ```bash icon="terminal" filename="debug-container" theme={"theme":{"light":"github-light","dark":"dark-plus"}} docker run --gpus all --rm livepeer/ai-runner:live-base \ python -c "import torch; print(torch.cuda.is_available())" ``` This command should return `True`. Any other result means CUDA is unavailable inside the container, so re-install the NVIDIA Container Toolkit. ## What happened The Cascade architecture processed a live stream end-to-end: 1. **FFmpeg** sent an RTMP stream to the Gateway at `:1935`. 2. **The Gateway** routed the stream to the Orchestrator at `:8935` with a `live-video-to-video` capability match. 3. **The Orchestrator** dispatched the stream to the `livepeer/ai-runner:live-base` container. 4. **ComfyStream** received each frame via WebRTC, ran it through the StreamDiffusion workflow, and emitted the processed frame. 5. **The Orchestrator** collected processed frames and returned the output stream through the Gateway. 6. **The HLS output** was available at the Gateway's `/hls/` endpoint. Payment for live streams uses an interval-based model instead of per-frame settlement: the Gateway sends periodic PM tickets at a configurable interval (`-livePaymentInterval`, default 5 seconds) instead of one ticket per frame. This reduces payment overhead for continuous streams. ## Related pages Full reference for Cascade architecture, ComfyStream workflows, ControlNet variants, and multi-stream capacity. Batch inference end-to-end - the alternative pipeline for request-response AI workloads. VRAM budgeting for realtime workloads and the one-warm-model-per-GPU constraint. Production combined setup with port allocation and pricing alignment.