> ## Documentation Index
> Fetch the complete documentation index at: https://docs.livepeer.org/llms.txt
> Use this file to discover all available pages before exploring further.

# AI and agents on Livepeer

> Three AI pipeline categories on the Livepeer network: batch inference, real-time AI, and LLM, with the infrastructure, GPU requirements, and developer entry points for each.

export const TableCell = ({children, align = "left", header = false, style = {}, className = "", ...rest}) => {
  const Component = header ? "th" : "td";
  return <Component className={className} style={{
    padding: "0.75rem 1rem",
    textAlign: align,
    border: header ? "none" : "1px solid var(--lp-color-border-default)",
    ...style
  }} {...rest}>
      {children}
    </Component>;
};

export const TableRow = ({children, header = false, hover = false, style = {}, className = "", ...rest}) => {
  const rowId = `table-row-${Math.random().toString(36).substr(2, 9)}`;
  return <>
      {hover && <style>{`
          #${rowId}:hover {
            background-color: var(--lp-color-bg-card);
          }
        `}</style>}
      <tr id={rowId} className={className} style={{
    ...header && ({
      backgroundColor: "var(--lp-color-accent-strong)",
      color: "var(--lp-color-on-accent)",
      fontWeight: "bold"
    }),
    ...style
  }} {...rest}>
        {children}
      </tr>
    </>;
};

export const StyledTable = ({children, variant = "default", style = {}, className = "", ...rest}) => {
  const wrapperVariants = {
    default: {
      border: "1px solid var(--lp-color-border-default)",
      backgroundColor: "var(--lp-color-bg-card)",
      overflow: "hidden"
    },
    bordered: {
      border: "2px solid var(--lp-color-accent)",
      backgroundColor: "var(--lp-color-bg-page)",
      overflow: "hidden"
    },
    minimal: {
      border: "none",
      backgroundColor: "transparent",
      overflow: "visible"
    }
  };
  return <div data-docs-styled-table-shell className={className} style={{
    width: "100%",
    padding: 0,
    margin: 0,
    ...wrapperVariants[variant],
    ...style
  }} {...rest}>
      <table data-docs-styled-table style={{
    width: "100%",
    borderCollapse: "collapse",
    borderSpacing: 0,
    margin: 0,
    backgroundColor: "transparent"
  }}>
        {children}
      </table>
    </div>;
};

export const CustomDivider = ({color = "var(--lp-color-border-default)", middleText = "", spacing = "default", style = {}, className = "", ...rest}) => {
  const spacingPresets = {
    default: {
      margin: "24px 0"
    },
    overlap: {
      margin: "-1rem 0 -1rem 0"
    },
    tight: {
      margin: "0 0 -1rem 0"
    },
    section: {
      margin: "0 0 -2rem 0"
    },
    sectionOverlap: {
      margin: "-1rem 0 -2rem 0"
    },
    deepOverlap: {
      margin: "-1rem 0 -1.5rem 0"
    }
  };
  const spacingStyle = spacingPresets[spacing] || spacingPresets.default;
  return <div role="separator" aria-orientation="horizontal" className={className} style={{
    display: "flex",
    alignItems: "center",
    ...spacingStyle,
    fontSize: style?.fontSize || "16px",
    height: "fit-content",
    ...style
  }} {...rest}>
      <span style={{
    marginRight: "var(--lp-spacing-px-8)",
    opacity: 0.2
  }}>
        <Icon icon="/snippets/assets/logos/Livepeer-Logo-Symbol-Theme.svg" />
      </span>
      <div style={{
    flex: 1,
    height: "1px",
    background: "var(--lp-color-border-default)",
    opacity: 0.4
  }}></div>
      {middleText && <>
          <Icon icon="circle" size={2} />
          <span style={{
    margin: "0 8px",
    fontWeight: "bold",
    color: color,
    opacity: 0.7
  }}>
            {middleText}
          </span>
          <Icon icon="circle" size={2} />
        </>}
      <div style={{
    flex: 1,
    height: "1px",
    background: "var(--lp-color-border-default)",
    opacity: 0.4
  }}></div>
      <span style={{
    marginLeft: "var(--lp-spacing-px-8)",
    opacity: 0.2
  }}>
        <span style={{
    display: "inline-block",
    transform: "scaleX(-1)"
  }}>
          <Icon icon="/snippets/assets/logos/Livepeer-Logo-Symbol-Theme.svg" />
        </span>
      </span>
    </div>;
};

The Livepeer Network supports three distinct categories of AI pipeline. Each category works differently at the protocol level: different connection models, different billing, different GPU requirements. Understanding which category fits your use case before building prevents rework.

**Constraint:** Livepeer AI pipelines run on GPU capacity contributed by independent Orchestrators. Availability and latency depend on the Orchestrator set at any given time. The community Gateway at `dream-gateway.livepeer.cloud` routes to the best available Orchestrator for development; production applications use a self-hosted Gateway or a Gateway provider for routing control.

<CustomDivider />

## Pipeline categories at a glance

<StyledTable variant="bordered">
  <thead>
    <TableRow header>
      <TableCell header>Category</TableCell>
      <TableCell header>What it does</TableCell>
      <TableCell header>Best for</TableCell>
      <TableCell header>Primary tool</TableCell>
    </TableRow>
  </thead>

  <tbody>
    <TableRow>
      <TableCell>**Batch AI**</TableCell>
      <TableCell>Single request to inference result</TableCell>
      <TableCell>Image generation, transcription, upscaling, captioning</TableCell>
      <TableCell>AI Gateway API</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>**Real-time AI**</TableCell>
      <TableCell>Persistent stream with continuous frame-by-frame output</TableCell>
      <TableCell>Live video transformation, VTuber avatars, generative overlays</TableCell>
      <TableCell>ComfyStream, PyTrickle, Stream Pack</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>**LLM**</TableCell>
      <TableCell>Text in, text out (OpenAI-compatible)</TableCell>
      <TableCell>Chatbots, agents, copilots, text inference</TableCell>
      <TableCell>LLM API (Ollama-based)</TableCell>
    </TableRow>
  </tbody>
</StyledTable>

<CustomDivider />

## Batch AI pipelines

Batch AI pipelines follow a request-and-response model: your application sends a job to the network, an Orchestrator processes it, and you receive the result. There is no persistent connection. The GPU is assigned to your job, completes the inference, and is released.

<StyledTable variant="bordered">
  <thead>
    <TableRow header>
      <TableCell header>Pipeline</TableCell>
      <TableCell header>What it does</TableCell>
      <TableCell header>Min VRAM</TableCell>
    </TableRow>
  </thead>

  <tbody>
    <TableRow>
      <TableCell>`text-to-image`</TableCell>
      <TableCell>Generate images from text prompts</TableCell>
      <TableCell>24 GB</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>`image-to-image`</TableCell>
      <TableCell>Style transfer, enhancement, img2img</TableCell>
      <TableCell>\~16 GB</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>`image-to-video`</TableCell>
      <TableCell>Animate images into video clips</TableCell>
      <TableCell>\~16 GB</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>`image-to-text`</TableCell>
      <TableCell>Generate captions or descriptions for images</TableCell>
      <TableCell>4 GB</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>`audio-to-text`</TableCell>
      <TableCell>Speech recognition (ASR) with timestamps</TableCell>
      <TableCell>\~16 GB</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>`text-to-speech`</TableCell>
      <TableCell>Generate natural speech from text</TableCell>
      <TableCell>\~16 GB</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>`upscale`</TableCell>
      <TableCell>Upscale low-resolution images without distortion</TableCell>
      <TableCell>\~16 GB</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>`segment-anything-2`</TableCell>
      <TableCell>Promptable visual segmentation for images and video</TableCell>
      <TableCell>\~16 GB</TableCell>
    </TableRow>
  </tbody>
</StyledTable>

Orchestrators keep one model per pipeline "warm" in GPU memory. Requesting a model that no Orchestrator currently has warm still works, but the first response is slower while the model loads (30 seconds to 5 minutes depending on model size). Warm model availability per pipeline is listed on the [model support](/v2/developers/build/ai-and-agents/model-support) page.

**Where to start:** [AI quickstart](/v2/developers/build/ai-and-agents/ai-jobs-direct-quickstart)

<CustomDivider />

## Real-time AI

Real-time AI on Livepeer is built around the `live-video-to-video` pipeline type. Unlike batch pipelines, real-time AI maintains a persistent stream connection: video frames flow in continuously, inference runs on each frame, and transformed frames flow back out at sub-second latency.

The infrastructure model differs from batch processing in four ways:

* **Connection:** Persistent WebRTC or trickle stream, not request/response
* **Billing:** Per second of compute time (confirmed in the go-livepeer `LivePaymentSender` interface)
* **GPU assignment:** Dedicated to your stream for its full duration
* **Output:** Continuous frame-by-frame results, not a single returned asset

### Developer tools for real-time AI

Three tools serve different real-time AI use cases:

**ComfyStream** (`livepeer/comfystream`) is the primary tool for building real-time AI pipelines. It turns ComfyUI's node-graph workflow editor into a real-time inference engine for live video. Supported models include StreamDiffusion, ControlNet, IPAdapter, FaceID, LoRA, Whisper (audio), Gemma (video understanding), and SuperResolution. See [ComfyStream overview](/v2/developers/build/ai-and-agents/realtime-ai/ComfyStream/overview).

**PyTrickle** (`livepeer/pytrickle`) is the Python SDK for building custom real-time processing services outside ComfyUI. Subclass `FrameProcessor`, implement `process_frame()`, and PyTrickle handles the trickle protocol transport, session management, and frame serialisation. See [PyTrickle overview](/v2/developers/build/ai-and-agents/realtime-ai/pytrickle/overview).

**ComfyUI-Stream-Pack** (`livepeer/ComfyUI-Stream-Pack`) provides custom ComfyUI nodes for live video and audio input: `LoadTensor` and `LoadAudioTensor` nodes that feed real-time media into ComfyUI workflows. See [Stream Pack overview](/v2/developers/build/ai-and-agents/ai-stream-pack/overview).

### VTuber and agent avatar infrastructure

VTuber avatar generation requires sub-100ms latency, face/body tracking input, and a real-time diffusion pipeline running at 20+ FPS. Livepeer's real-time AI infrastructure supports this via ComfyStream.

The **Agent SPE** (treasury-funded Special Purpose Entity, approved April 2025 with 30,000 LPT) built the first production VTuber and AI avatar pipeline on Livepeer, delivering:

* A real-time agent avatar generation pipeline using ComfyStream and StreamDiffusion
* A Livepeer model provider plugin for the [Eliza](https://github.com/elizaos/eliza) agent framework (ai16z), enabling Eliza agents to route LLM inference through the Livepeer Network

**Technical path for VTuber / avatar products:**

1. ComfyStream as the real-time inference engine
2. `live-video-to-video` pipeline type via the AI Gateway
3. StreamDiffusion custom nodes from ComfyUI-Stream-Pack for diffusion-based avatar transformation
4. GPU requirements: NVIDIA RTX 3090 or better; RTX 4090 recommended for 25 FPS

**Where to start for real-time AI:** [ComfyStream quickstart](/v2/developers/build/ai-and-agents/realtime-ai/ComfyStream/ComfyStream-quickstart)

**Where to start for AI agents:** [Eliza Livepeer plugin tutorial](/v2/developers/build/tutorials/eliza-livepeer-plugin)

<Note>
  Real-time AI requires a dedicated GPU for the duration of the stream. At peak network load, Orchestrator availability for `live-video-to-video` is lower than for batch pipelines. Test under expected concurrency before production launch.
</Note>

<CustomDivider />

## LLM pipeline

The LLM pipeline brings text inference to the Livepeer Network using an Ollama-based runner with an OpenAI-compatible API. From a developer's perspective, it works like any OpenAI-compatible chat completions endpoint. Requests route to decentralised GPU Orchestrators instead of a centralised cloud provider.

The LLM pipeline is currently in beta. It runs on a wider range of GPU hardware than diffusion-based batch pipelines: an Orchestrator needs as little as 8 GB of VRAM to serve LLM workloads.

```bash icon="terminal" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
curl -X POST https://dream-gateway.livepeer.cloud/llm \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "messages": [
      {"role": "user", "content": "Explain Livepeer in one sentence."}
    ]
  }'
```

Supported models include `meta-llama/Meta-Llama-3.1-8B-Instruct` (warm, 8 GB VRAM), `mistralai/Mistral-7B-Instruct-v0.3`, `google/gemma-2-9b-it`, and `Qwen/Qwen2.5-7B-Instruct`. Any Ollama-compatible model works; cold-start applies to models not currently loaded on any Orchestrator.

The **LLM SPE** built and maintains this pipeline. The **Cloud SPE** provides managed Gateway access to it for production use.

**Where to start:** [AI quickstart](/v2/developers/build/ai-and-agents/ai-jobs-direct-quickstart) for the LLM endpoint; [Eliza Livepeer plugin tutorial](/v2/developers/build/tutorials/eliza-livepeer-plugin) for the agent integration path.

<CustomDivider />

## Choose your path

<StyledTable variant="bordered">
  <thead>
    <TableRow header>
      <TableCell header>If your workload is...</TableCell>
      <TableCell header>Use</TableCell>
      <TableCell header>Latency</TableCell>
      <TableCell header>Setup complexity</TableCell>
    </TableRow>
  </thead>

  <tbody>
    <TableRow>
      <TableCell>Generating images or video on demand</TableCell>
      <TableCell>Batch AI (text-to-image, image-to-video)</TableCell>
      <TableCell>Seconds</TableCell>
      <TableCell>Low</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>Processing audio to text</TableCell>
      <TableCell>Batch AI (audio-to-text)</TableCell>
      <TableCell>Seconds</TableCell>
      <TableCell>Low</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>Captioning or analysing images</TableCell>
      <TableCell>Batch AI (image-to-text, segment-anything-2)</TableCell>
      <TableCell>Seconds</TableCell>
      <TableCell>Low</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>Live video transformation, avatars, VTubers, overlays</TableCell>
      <TableCell>Real-time AI (live-video-to-video via ComfyStream)</TableCell>
      <TableCell>Sub-second</TableCell>
      <TableCell>Medium to high</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>Text/code inference, chatbots, agents</TableCell>
      <TableCell>LLM pipeline</TableCell>
      <TableCell>Seconds</TableCell>
      <TableCell>Low to medium</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>Custom AI model or pipeline</TableCell>
      <TableCell>Real-time AI + BYOC</TableCell>
      <TableCell>Sub-second</TableCell>
      <TableCell>High</TableCell>
    </TableRow>
  </tbody>
</StyledTable>

The key question: does your application transform a live stream continuously, or process one piece of media at a time? Continuous live transformation requires real-time AI. One-at-a-time processing uses batch AI. Text inference uses the LLM pipeline.

The [AI quickstart](/v2/developers/build/ai-and-agents/ai-jobs-direct-quickstart) covers the batch and LLM paths. The [ComfyStream quickstart](/v2/developers/build/ai-and-agents/realtime-ai/ComfyStream/ComfyStream-quickstart) covers the real-time path.
