Skip to content

FAQ & troubleshooting

Do I need an API key?

No. agentvision check and --backend local run fully offline (DOM geometry, computed-style WCAG contrast, OCR/typos, broken-image & console capture) — no key, no network. A key only adds the vision-LLM semantic critique on top.

Which backend should I use?

  • local — CI, air-gapped, fast structural checks; honest "uncertain" instead of guessing.
  • anthropic (default), openai, gemini — semantic critique + intent grading.
  • ollama — self-hosted/OSS multimodal (local or Ollama Cloud).

See Backends. The deterministic findings (DOM/OCR/CV) are precise everywhere; vision-model findings are advisory.

Chromium won't launch

agentvision doctor          # attempts a real launch + lists every missing system lib
agentvision doctor --fix    # installs the Chromium browser binary

On bare RHEL/CentOS, playwright install-deps is apt-only — doctor prints the exact dnf line. Or use the bundled Dockerfile, which bakes the deps in.

My live dashboard / SPA times out

Polling and websocket pages never go network-idle. The default --nav-wait is already load (networkidle is bounded), but if you still hit a timeout: add --settle-ms 800, keep --freeze on for canvas/WebGL, and raise --render-timeout. See the field report.

Why does a requirement come back "uncertain"?

The offline local backend can't judge visual/semantic intent — it grades quoted-text claims deterministically (OCR/DOM) and reports everything else as uncertain rather than a false PASS. Use a vision backend for semantic claims. See Conformance.

Are the bounding boxes pixel-accurate?

DOM/OCR/CV boxes are precise (bbox_precise: true). Vision-model boxes are advisory (bbox_precise: false) — never marketed as pixel-accurate.

How do I keep cost / tokens down?

Use local where you can; oversized screenshots are auto-downscaled before the LLM, and full-coverage tiles are content-aware and capped (max_vision_tiles). The default model is the cheap/fast claude-haiku-4-5.

What gets sent where? (privacy)

Only the cloud backends send anything: a screenshot (base64) to the chosen provider. local sends nothing. API keys are read once and never logged or written to any cache/report. See Configuration.

How is this different from Percy / Playwright / Applitools?

It's not human-reviewed visual regression and not browser automation. It's a machine-graded critique loop an agent consumes to self-correct before claiming done — a verdict + actionable, coordinate-grounded issues. It complements those tools.

It found something that isn't real / missed something

Vision findings are advisory and cross-checked against DOM/OCR ground truth (a "missing" claim is suppressed when the element provably exists). If a grounded (DOM/OCR/CV) finding is wrong, please open an issue with the artifact — that's the trustworthy path and we want it airtight.