Use cases¶

Where the eyes earn their keep. The through-line: anything that renders to pixels can be graded before it's shipped — by one agent, a CI pipeline, or a whole swarm.

A swarm of agents (the main event)¶

A fleet of agents generating UIs, charts, dashboards or documents in parallel — each one needs to see its output before claiming done, and the orchestrator needs every verdict in the same language to decide what advances.

Run eyes as a service (scaling) and point the whole swarm at it; or embed the library in each worker. Single-shot grading (check/analyze/conform) scales horizontally with zero coordination.
Every worker returns the same agentsensory Report/Handoff, so a coordinator (or a brain like Verel) aggregates verdicts on one bus — vision alongside tests, lint and types — and only PASS work compounds.
See Swarms & scaling for topologies and the fan-out example.

An agent that self-corrects before claiming done¶

The single-agent core, and the foundation of the swarm case. The agent writes a UI, renders and looks at it, gets grounded issues, fixes them, and re-renders until the verdict is PASS — instead of confidently shipping breakage it can't perceive.

Drop the agent contract into the system prompt, or use the MCP tools or Claude Code Skill so the agent calls the eyes mid-task.
The self-correcting loop (loop / LoopSession) automates the render→see→fix→re-render cycle.

A visual gate in CI¶

Fail the build when the page actually looks wrong — not when a snapshot pixel-diffs (those flake on fonts/timing). check is deterministic and needs no API key:

agentvision check dist/index.html --full-page --quiet   # exit 2 on FAIL

Add a vision backend and conform to gate on intent ("a checkout button is visible"), not just defects. See Workflows & agents for the GitHub Action and pre-commit hook.

Documents, decks, and PDFs¶

Point any command at a .pdf or an Office/OpenDocument file (.docx/.pptx/.xlsx/.odt/…) and it's rasterized per page and graded like a screenshot — so a generated report or slide deck gets the same FAIL/PASS treatment as a web page. (Office conversion is on for local use, off by default on the REST service — it's a large attack surface.)

PowerPoint decks get an offline slide inspector — agentvision check deck.pptx runs key-free and no-egress, flagging unreadable text (low / dark-on-dark contrast on the rendered pixels), clipped/truncated text, off-slide shapes, and overlapping boxes, each tagged [slide N]. Add --no-cache for a confidential deck (nothing is written to disk). See the check command.

Streaming, loading, and liveness¶

A glance can't tell a chart that's still loading from one that's broken. watch verifies an artifact over time — frames across an interval — to confirm playback, a loading→loaded transition, or that a live dashboard actually updates. See Streaming / temporal.

Visual regression against a baseline¶

Capture a named baseline, then gate future renders against it with a structural SSIM diff:

agentvision baseline dist/index.html --name home
agentvision regress dist/index.html --name home      # fails if it drifted

Useful as a cheap, key-free guardrail alongside the semantic checks.

Generative loops (image/asset generation)¶

When the artifact is generated (not hand-written), generate runs generate → see → grade vs intent → refine prompt → regenerate, so an image/asset pipeline converges on what you actually asked for rather than stopping at the first plausible output.

Ready to try these? Try it yourself · 5-minute tutorial · Real-world scenarios.