Intent conformance — match the thought, not just avoid defects¶
By default AgentVision is a defect detector: it asks "is anything broken?" (overflow, low contrast, broken images, typos). That's necessary but not sufficient. An artifact can be flawless and still be the wrong thing — a clean infographic that shows the wrong stages, a dashboard missing the panel you asked for, a generated image that ignored half the prompt.
Conformance adds the missing question: "does this match what I set out to build?" You give AgentVision the intent (the thought), it forms an explicit checklist, grades the render against it, and gates the verdict so PASS means "matches intent", not merely "defect-free".
Three ways to express intent (combine freely)¶
| Source | How | Best for | Needs |
|---|---|---|---|
| Brief | --brief "a 4-stage infographic, dark theme, title 'AgentVision'" |
quick, natural | a vision/LLM backend (extracts the checklist) |
| Explicit checklist | --expect 'must: title reads "AgentVision"' (repeatable) |
determinism, CI, no-key | nothing (text claims grade via OCR) |
| Reference image | --reference target.png |
"make it look like this" | a vision backend (compares the two images) |
Explicit claims take an importance prefix: must: (default — violation fails),
should: (violation warns), nice: (never escalates). Put exact required text in
quotes — those claims are graded deterministically against OCR, independent of any model.
CLI¶
# Grade an artifact against intent
agentvision conform ./infographic.png \
--brief "launch infographic for AgentVision" \
--expect 'must: title reads "AgentVision"' \
--expect 'should: shows 4 stages left to right' \
--backend anthropic
# Any analyze/loop call can be conformance-aware
agentvision analyze ./dashboard.html --expect 'must: a "Checkout" button is visible'
agentvision loop ./page.html --brief "pricing page with 3 tiers" --max-iter 3
A conform run prints a per-requirement breakdown (✓ satisfied, ✗ violated,
? uncertain) and exits non-zero on FAIL.
The generative loop (AI images / infographics)¶
For a hand-written page the agent edits code between iterations. For a generated
artifact the fix is a better prompt. agentvision generate closes that loop:
The generator is your callable — AgentVision never bundles an image-gen dependency:
# mypkg/gen.py
def make_image(prompt: str) -> str: # returns a path to the produced image
img = my_image_model.generate(prompt) # OpenAI gpt-image, Qwen-Image, local SD, …
img.save("/tmp/out.png")
return "/tmp/out.png"
agentvision generate --generator mypkg.gen:make_image \
--brief "minimalist infographic: render → see → fix, dark background, no typos" \
--expect 'must: spelled correctly' --max-iter 4 -o final.png
Library form:
from agentvision import Brief, GenerativeLoopSession
brief = Brief.from_inputs(text="…", expect=['must: title reads "AgentVision"'])
session = GenerativeLoopSession(brief, make_image, backend="anthropic")
history = await session.run(max_iter=4)
print(session.stop_reason) # "matched intent" | "stuck" | "max-iter" | "cannot refine"
How a requirement is graded (and why you can trust it)¶
In increasing order of trust:
- Vision — the backend emits an
intent_mismatchissue (citing[#N]) for each unmet requirement. Advisory, like all vision findings. - OCR text-presence — quoted-text requirements are decided by what Tesseract actually reads in the render. Model-independent ground truth; overrides the vision call.
- Verdict gate — a violated
mustfails regardless of how the model scored it.
Honesty / limits¶
- The offline
localbackend can't judge visual intent. It grades quoted-text claims via OCR and reports everything else asuncertain— never a false PASS. - LLM-extracted checklists can over-specify. Extracted claims are surfaced; only
mustcan fail; use the explicit checklist when you want full determinism. - Don't grade a generated image with the same model that produced it if you can avoid it
(it tends to rate itself kindly) — pick a different
--backendfor perception.