Python API¶
Everything the CLI does is available as a library. The high-level entry points are async.
import asyncio
from agentvision import analyze, load_settings
async def main():
report = await analyze("dist/index.html", settings=load_settings(vision_backend="local"))
print(report.verdict, [i.message for i in report.issues])
asyncio.run(main())
Core functions¶
analyze
async
¶
analyze(source: str, *, settings: Settings | None = None, backend: str | None = None, instructions: str | None = None, expected: str | None = None, brief: Brief | None = None, use_ocr: bool = True, source_type: str = 'auto', viewport: Viewport | None = None, full_page: bool | None = None, wait_for: str | None = None, out_dir: Path | None = None) -> Report
Full visual analysis: structural grounding + vision-backend critique.
When brief is given, the render is also graded for intent conformance — does it
match what the agent set out to build — and the verdict is gated on it.
check
async
¶
check(source: str, *, settings: Settings | None = None, brief: Brief | None = None, source_type: str = 'auto', viewport: Viewport | None = None, full_page: bool | None = None, wait_for: str | None = None, use_ocr: bool = True, out_dir: Path | None = None) -> Report
Classic checks only — no LLM, no API key, no egress (incl. OCR spell-check).
With a brief, also grades text requirements deterministically via OCR; non-text
requirements are reported uncertain (the offline path cannot judge visual intent).
watch
async
¶
watch(source: str, *, settings: Settings | None = None, backend: str | None = None, frames: int | None = None, interval_ms: int | None = None, brief: Brief | None = None, instructions: str | None = None, use_vision: bool = True, source_type: str = 'auto', out_dir: Path | None = None) -> Report
Watch source over time and report temporal behavior (playback/loading/liveness).
render
async
¶
render(source: str, *, settings: Settings | None = None, source_type: str = 'auto', viewports: list[Viewport] | None = None, full_page: bool | None = None, wait_for: str | None = None, device_scale: float | None = None, settle_ms: int | None = None, freeze: bool | None = None, out_dir: Path | None = None) -> RenderResult
Render source and return image(s) plus trustworthy DOM/CV signals.
compute_diff ¶
compute_diff(baseline_path: str | Path, candidate_path: str | Path, out_path: str | Path | None = None) -> DiffResult
Sessions¶
LoopSession ¶
Drive the visual feedback loop for one artifact across iterations.
Agents call :meth:iterate after each fix attempt (optionally passing an updated
source). The session persists state under the workspace so it can be resumed.
run
async
¶
Convenience: iterate the SAME source up to max_iter times.
Useful for demonstrating stuck-detection on an unchanged artifact. Real agents
drive :meth:iterate themselves, editing the source between calls.
IterationResult ¶
Bases: BaseModel
GenerativeLoopSession ¶
Drive generate → perceive → refine until the output matches the brief.
Intent¶
Brief ¶
Bases: BaseModel
The intended product — the thought the render is graded against.
from_inputs
classmethod
¶
from_inputs(*, text: str | None = None, expect: list[str] | None = None, reference_image: str | None = None) -> Brief
Build a brief from CLI/REST-style inputs (--brief + repeated --expect).
IntentClaim ¶
Bases: BaseModel
A single checkable visual requirement extracted from the intent.
parse
classmethod
¶
Parse "must: the title reads AgentVision" → claim + importance.
The report contract¶
Report ¶
Bases: BaseModel
to_handoff ¶
Distill this Report into a :class:~agentvision.models.handoff.Handoff.
The afferent signal an agent's reasoning/memory layer ("the brain") acts on.
issue_signature ¶
Identity of the issue set — used for loop progress/stuck detection.
Deliberately ignores bbox/severity drift; two iterations are "the same" if they flag the same (kind, message) pairs.
Issue ¶
Bases: BaseModel
Conformance ¶
ClaimResult ¶
Bases: BaseModel
One requirement (a piece of the thought) graded against the render.
The handoff¶
Handoff ¶
Bases: BaseModel
The afferent signal: what the eyes saw, shaped for the brain to act on.
from_report
classmethod
¶
Distill a full Report into the actionable signal.
Backends¶
VisionBackend ¶
Bases: Protocol
complete_text
async
¶
Text-only completion (no image) — for checklist extraction + prompt refinement.
Backends that cannot do this (e.g. the offline local backend) return "".
AnalysisRequest ¶
Bases: BaseModel
Configuration¶
Settings ¶
Bases: BaseSettings
Runtime settings. Environment prefix: AGENTVISION_.
Provider API keys use their conventional env names (not the prefix) so they match what the provider SDKs already expect.
load_settings ¶
Build a Settings object, applying any explicit overrides last.