Skip to content

Python API

Everything the CLI does is available as a library. The high-level entry points are async.

import asyncio
from agentvision import analyze, load_settings

async def main():
    report = await analyze("dist/index.html", settings=load_settings(vision_backend="local"))
    print(report.verdict, [i.message for i in report.issues])

asyncio.run(main())

Core functions

analyze async

analyze(source: str, *, settings: Settings | None = None, backend: str | None = None, instructions: str | None = None, expected: str | None = None, brief: Brief | None = None, use_ocr: bool = True, source_type: str = 'auto', viewport: Viewport | None = None, full_page: bool | None = None, wait_for: str | None = None, out_dir: Path | None = None) -> Report

Full visual analysis: structural grounding + vision-backend critique.

When brief is given, the render is also graded for intent conformance — does it match what the agent set out to build — and the verdict is gated on it.

check async

check(source: str, *, settings: Settings | None = None, brief: Brief | None = None, source_type: str = 'auto', viewport: Viewport | None = None, full_page: bool | None = None, wait_for: str | None = None, use_ocr: bool = True, out_dir: Path | None = None) -> Report

Classic checks only — no LLM, no API key, no egress (incl. OCR spell-check).

With a brief, also grades text requirements deterministically via OCR; non-text requirements are reported uncertain (the offline path cannot judge visual intent).

watch async

watch(source: str, *, settings: Settings | None = None, backend: str | None = None, frames: int | None = None, interval_ms: int | None = None, brief: Brief | None = None, instructions: str | None = None, use_vision: bool = True, source_type: str = 'auto', out_dir: Path | None = None) -> Report

Watch source over time and report temporal behavior (playback/loading/liveness).

render async

render(source: str, *, settings: Settings | None = None, source_type: str = 'auto', viewports: list[Viewport] | None = None, full_page: bool | None = None, wait_for: str | None = None, device_scale: float | None = None, settle_ms: int | None = None, freeze: bool | None = None, out_dir: Path | None = None) -> RenderResult

Render source and return image(s) plus trustworthy DOM/CV signals.

compute_diff

compute_diff(baseline_path: str | Path, candidate_path: str | Path, out_path: str | Path | None = None) -> DiffResult

Sessions

LoopSession

Drive the visual feedback loop for one artifact across iterations.

Agents call :meth:iterate after each fix attempt (optionally passing an updated source). The session persists state under the workspace so it can be resumed.

run async

run(max_iter: int = 5) -> list[IterationResult]

Convenience: iterate the SAME source up to max_iter times.

Useful for demonstrating stuck-detection on an unchanged artifact. Real agents drive :meth:iterate themselves, editing the source between calls.

IterationResult

Bases: BaseModel

GenerativeLoopSession

Drive generate → perceive → refine until the output matches the brief.

Intent

Brief

Bases: BaseModel

The intended product — the thought the render is graded against.

from_inputs classmethod

from_inputs(*, text: str | None = None, expect: list[str] | None = None, reference_image: str | None = None) -> Brief

Build a brief from CLI/REST-style inputs (--brief + repeated --expect).

IntentClaim

Bases: BaseModel

A single checkable visual requirement extracted from the intent.

parse classmethod

parse(raw: str, *, default: Importance = Importance.MUST) -> IntentClaim

Parse "must: the title reads AgentVision" → claim + importance.

The report contract

Report

Bases: BaseModel

to_handoff

to_handoff()

Distill this Report into a :class:~agentvision.models.handoff.Handoff.

The afferent signal an agent's reasoning/memory layer ("the brain") acts on.

issue_signature

issue_signature() -> frozenset[tuple[str, str]]

Identity of the issue set — used for loop progress/stuck detection.

Deliberately ignores bbox/severity drift; two iterations are "the same" if they flag the same (kind, message) pairs.

Issue

Bases: BaseModel

Conformance

Bases: BaseModel

How well the render matches the intended product (the brief / checklist).

score property

score: float

Fraction of claims satisfied (0..1); 1.0 when there are no claims.

matches_intent

matches_intent() -> bool

True only when no must requirement is violated or left uncertain.

ClaimResult

Bases: BaseModel

One requirement (a piece of the thought) graded against the render.

The handoff

Handoff

Bases: BaseModel

The afferent signal: what the eyes saw, shaped for the brain to act on.

from_report classmethod

from_report(report: Report) -> Handoff

Distill a full Report into the actionable signal.

Backends

VisionBackend

Bases: Protocol

complete_text async

complete_text(system: str, user: str) -> str

Text-only completion (no image) — for checklist extraction + prompt refinement.

Backends that cannot do this (e.g. the offline local backend) return "".

AnalysisRequest

Bases: BaseModel

Configuration

Settings

Bases: BaseSettings

Runtime settings. Environment prefix: AGENTVISION_.

Provider API keys use their conventional env names (not the prefix) so they match what the provider SDKs already expect.

load_settings

load_settings(**overrides) -> Settings

Build a Settings object, applying any explicit overrides last.