Skip to content

Developer guide

How to use Verel as a library, a CLI, a CI gate, and an MCP server. Every example here runs against the real API; the ones that need a model say so.

The one idea underneath all of it: an agent's output is a hypothesis until a grader returns a verdict. You compose graders (any sense — tests, lint, types, vision, perf, security) into a Report, reduce them with gate(), and only verified work is allowed to compound.


Install

pip install verel                 # core (only dependency: pydantic)
pip install "verel[dev]"          # + pytest / ruff / mypy graders (the CI gate)
pip install "verel[sight]"        # + AgentVision eyes (visual gating + temporal watch)
pip install "verel[container]"    # + seccomp-bpf for the bwrap tool sandbox
pip install "verel[mem0]"         # + the rented mem0 memory backend
pip install "verel[mcp]"          # + the MCP server
Extra Pulls in Enables
dev pytest, ruff, mypy the Python test/lint/type graders
sight agentvision[render] verel.senses — DOM/contrast/OCR vision + watch
container pyseccomp the seccomp syscall filter on the bwrap tool runner
mem0 mem0ai, chromadb Mem0Memory as the MemoryView backend
mcp mcp, anyio verel-mcp (Cursor / Claude / any MCP host)

Verify your environment:

verel doctor

Configure the LLM

Anything that authors or judges with a model uses the provider seam in verel.agents.llm. Default is Ollama Cloud; OpenAI is the bundled fallback.

export VEREL_LLM_PROVIDER=ollama        # default; or: openai
export VEREL_CODER_MODEL=qwen3-coder:480b

Keys resolve from an env var first, then ~/.config/<provider>/key:

Provider Env var Key file Default model
ollama OLLAMA_API_KEY ~/.config/ollama/key qwen3-coder:480b
openai OPENAI_API_KEY ~/.config/OpenAI/key gpt-4o-mini

Everything that calls a model takes an injectable chat function, so unit tests (and the offline examples in examples/) run with no key at all.


The surfaces

Surface Entry point Use it for
Library import verel building your own harness on the verdict bus
CLI verel … doctor · loop · fleet · heal · ci
CI CLI / git hook verel-ci … a verdict-bus gate in CI or a pre-commit hook
MCP server verel-mcp exposing gate / recall / build-tool / ci-check to an MCP host
GitHub Action amitpatole/verel@v0.32.0 failing a build on a FAIL verdict
pre-commit .pre-commit-hooks.yaml gating commits

CLI reference

verel doctor                              # environment + key check
verel version
verel loop <artifact> [--backend local] [--max-iter 5]    # ultracode visual loop (needs sight + LLM)
verel fleet "<goal>" --artifacts a.html b.html            # LLM manager fan-out
verel heal --repo . [--max-rounds 3]                      # self-healing CI (needs LLM)
verel ci <args…>                                          # delegates to verel-ci
verel-ci check    --repo .   [--no-lint]   # run the inner-loop stage; print the verdict
verel-ci precommit --repo .                # pre-commit stage; non-zero exit aborts the commit
verel-ci install  --repo .                 # install the native git pre-commit hook

GitHub Action & pre-commit

# .github/workflows/verify.yml
jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: amitpatole/verel@v0.32.0
        with:
          repo: .
          install: "-e .[dev]"      # your project deps so its tests import
# .pre-commit-config.yaml
- repo: https://github.com/amitpatole/verel
  rev: v0.32.0
  hooks: [{ id: verel-precommit }]

The verdict bus (verel.verdict)

The contract every sense speaks. A grader emits a Report of Issues; gate() reduces a set of reports to one pass / warn / fail.

from verel.verdict import Issue, IssueKind, Report, Severity, GraderKind, Verdict, gate, assign

report = assign(Report(                       # assign() stamps stable fingerprints
    verdict=Verdict.FAIL,
    summary="2 type errors",
    grader=GraderKind.TYPECHECK,
    issues=[Issue(kind=IssueKind.OTHER, severity=Severity.ERROR,
                  source=GraderKind.TYPECHECK, message="Incompatible return type",
                  locator="app.py:42")],
))

result = gate([report], required={GraderKind.TYPECHECK})
print(result.verdict)            # Verdict.FAIL

Load-bearing rules gate() enforces:

  • Advisory ceiling — per-issue trust keys off Issue.source. Precise sources (TEST/LINT/TYPECHECK/DOM/CV/OCR/SECURITY/PERF) gate at full severity; advisory ones (VISION/LLM_JUDGE) are clamped to at most warn.
  • Attestation — a required grader must carry a signed run_receipt proving it ran the frozen suite over the changed files. A hollow PASS, issues=[] with no receipt fails.
  • Stuck vs. progressprogressed(curr, prev) is strict shrinkage of the gating-failure set; pure churn or growth is not progress.

Agent-run CI/CD (verel.ci)

Tests, lint, and types as first-class graders, across Python · JS/TS · Go, plus perf and security. Stages compose graders and gate them with attestation + failure-memory.

from verel.ci import inner_loop_stage, premerge_stage, run_stage

res = run_stage(inner_loop_stage(".", language="python", with_lint=True, with_types=True))
print(res.verdict, [r.grader.value for r in res.reports])

Pick a language; add precise senses:

from verel.ci import premerge_stage, perf_spec, run_stage

stage = premerge_stage(
    ".", language="go",            # python | js | go
    security=True,                 # SAST (bandit) / dependency audit (npm)
    perf=perf_spec(".", ["./bench"], budgets={"p95_ms": 150}),  # regression past budget gates
)
res = run_stage(stage)

Each GraderSpec carries its own parser, so pytest, go test -json, and a TAP runner — all GraderKind.TEST — coexist on one bus. Language toolchains live in verel.ci.LANGS; the graders are pytest_spec/ruff_spec/mypy_spec, jstest_spec/eslint_spec/tsc_spec, gotest_spec/govet_spec, plus bandit_spec/npm_audit_spec/perf_spec.

Self-healing

from verel.ci import inner_loop_stage, self_heal

result = self_heal(".", inner_loop_stage(".", with_lint=False))   # needs an LLM key
print(result.healed, result.terminated_on)

On a failure the ci-medic classifies each issue (retry / regen-lockfile / quarantine-flaky / fix-branch) and, for genuine regressions, invokes the code-fixer — re-gating each round until the graders pass or it escalates.

Verdict-driven rollback

The agent proposes; a deterministic engine authorizes — and only on precise gating evidence (never an advisory opinion), performing a safe git revert (never a history rewrite).

from verel.ci import RollbackExecutor, RollbackProposal
outcome = RollbackExecutor().maybe_rollback(repo, proposal, reports)

The brain — memory that compounds (verel.memory)

A trust layer over a swappable backend (LocalMemory, zero-dep sqlite; or mem0). Each record carries two orthogonal quantitiesepistemic_confidence (belief; moved only by corroborate/contradict) and retrieval_strength (reachability; decays, resets on recall).

from verel.memory import LocalMemory, MemoryRecord, MemoryKind
from verel.memory.view import make_key

mem = LocalMemory()                                   # or LocalMemory(embedder=OpenAIEmbedder())
mem.write(MemoryRecord(kind=MemoryKind.FACT, subject="auth", predicate="uses",
                       text="sessions are JWT, 15-min expiry", scope="repo:app",
                       subj_pred_key=make_key("auth", "uses", "repo:app")))
hits = mem.recall("how does login work", scope="repo:app", k=3)

Trust is earned, never asserted — a candidate reaches verified only by passing a held-out, agent-inaccessible eval (with a leakage canary):

from verel.memory import PromotionGate, HeldOutCorpus, EvalCase
corpus = HeldOutCorpus([
    EvalCase(text="a card overflows the viewport on a 320px screen",
             covers_kind="overflow", label="prevent"),   # "prevent" | "allow"
])
gate = PromotionGate(mem, corpus)        # ratifies candidates → verified via the bus

Consolidation: episodes → rules → schemas

from verel.memory import consolidate_failures, induce_hierarchy, consolidate_across_scopes

# recurring FAILUREs in a scope → candidate, structured DesignRules (condition → action)
rules = consolidate_failures(mem, scope="repo:app", min_cluster=2)

# rules → order-2 principles → order-3 meta-principles, until the corpus stops supporting more
levels = induce_hierarchy(mem, scope="repo:app", min_size=2)

# a pattern recurring across several repos → one `global` rule (records detail['spans'])
glob = consolidate_across_scopes(mem, ["repo:a", "repo:b"], min_scopes=2)

Contradiction-driven revision

Consolidation can be wrong. A new failure in a rule's domain that the rule failed to prevent is a counterexample: the rule is weakened, and once enough accumulate it's split into a narrowed rule + an exception — and the split propagates up the schema hierarchy so principles above stop over-claiming.

from verel.memory import revise_with_counterexample, contradicts

if contradicts(rule, new_failure):
    rev = revise_with_counterexample(mem, rule, new_failure)   # needs an LLM for the split
    print(rev.action)            # "weakened" | "split" | "rejected"
    print(rev.propagated)        # schemas above the rule that were re-derived

Everything starts candidate / inferred; height, breadth, and survival never confer trust.

mem0 backend

from verel.memory import make_ollama_mem0      # needs verel[mem0]
mem = make_ollama_mem0()                        # same MemoryView Protocol; recall is semantic

Tool-smith — agents build their own tools (verel.toolsmith)

detect → scaffold → test → register → reuse. A tool is admitted only on a passing attested eval; reuse re-verifies against the new spec's cases (a close match isn't trusted blindly).

from verel.memory import LocalMemory
from verel.toolsmith import ToolRegistry, ToolSmith, ToolSpec, ToolCase, SideEffect

smith = ToolSmith(ToolRegistry(LocalMemory()), isolation="container")   # needs an LLM key
res = smith.build(ToolSpec(
    name="slugify", capability="convert a title to a url slug",
    side_effect=SideEffect.READ_ONLY,
    cases=[ToolCase(args=["Hello World"], expected="hello-world")],
))
print(res.trust, res.registered)

Isolation tiers

Untrusted, agent-authored code runs in a separate trust domain. From weakest to strongest:

isolation= What it is
"subprocess" fresh interpreter + rlimits + wall-clock timeout (dependency-free)
"container" bwrap namespace sandbox — no network, read-only fs, ephemeral /tmp, cleared env
"container" + verel[container] …plus a seccomp-bpf filter

Three seccomp profiles (run_container(..., seccomp_profile=…)):

  • denylist (default) — EPERM on dangerous syscalls; safe for arbitrary tools.
  • allowlist — default-deny; only pure-compute syscalls (no network/subprocess/threads).
  • capability — the tightest: only the syscalls a tool exercised while passing its eval (learn with learn_syscall_profile, then enforce via tool.syscall_policy).

The fleet — agents managing agents (verel.fleet)

A single-writer scheduler over a Task DAG, every node gated by the bus.

import asyncio
from verel.fleet import Scheduler, Task, WorkerResult
from verel.verdict import Verdict

async def worker(task):                       # your agent; returns a graded result
    ...
    return WorkerResult(verdict=Verdict.PASS)

tasks = [Task(id="a"), Task(id="b", deps=["a"])]
state = asyncio.run(Scheduler(worker, concurrency=4).run(tasks))

Barriers (all / k_of_n / optional), retry → quarantine, a hard budget lease, and WAL-based crash resume are all on Task / Scheduler.

Concurrent managers (fencing)

More than one scheduler can share a task store safely via fencing leases — a stale leader's writes are rejected:

from verel.fleet import Scheduler, InMemoryLeaseStore   # or SqliteLeaseStore for cross-process
store = InMemoryLeaseStore()
s1 = Scheduler(worker, leases=store, owner="m1")
s2 = Scheduler(worker, leases=store, owner="m2")          # each task runs exactly once

Across machines, put the lease authority behind the HTTP control plane:

from verel.fleet import ControlPlaneServer, RemoteLeaseStore
srv = ControlPlaneServer("/var/lib/verel/leases.db", auth_token="…").start()
sched = Scheduler(worker, leases=RemoteLeaseStore(srv.url, auth_token="…"), owner="host-1")

Multi-repo + atomic sagas

from verel.fleet import plan_multi_repo, CrossDep, run_saga, SagaStep, git_revert_head

dag = plan_multi_repo({"api": api_tasks, "client": client_tasks},
                      [CrossDep(to_repo="client", dependent="ship", from_repo="api", needs="build")])

# all-or-nothing across repos: a failure compensates the repos that already landed, in reverse
res = run_saga([SagaStep("api",    forward_api,    lambda _r: git_revert_head("/repos/api")),
                SagaStep("client", forward_client, lambda _r: git_revert_head("/repos/client"))])

A git pre-receive fencing sink (write_pre_receive_hook) extends the fence to pushes: a push carrying a stale token is refused at the remote.


Eyes / senses (verel.senses) — needs verel[sight]

AgentVision as a grounded perception sense on the same bus.

from verel.senses import perceive, watch
percept = perceive("dist/index.html")           # DOM / contrast / OCR, + intent conformance
clip = watch("https://app.local/player")         # temporal: playback / loading / liveness

A precise visual failure (overflow, clipped, missing element) gates; the advisory vision-LLM opinion is clamped to warn.


Skill registry (verel.registry)

Content-addressed, signed skill artifacts — and the rule that keeps the flywheel honest: trust does not travel. A fetched skill enters as a candidate and only becomes verified by passing the importer's OWN held-out eval.

from verel.registry import export_skill, import_skill, PublicRegistry

art = export_skill(verified_tool, origin="tenant:A")
PublicRegistry("/srv/skills").publish(art)        # verifies the signature; refuses a tamper
res = import_skill(art, into=my_registry, target_cases=my_cases)
print(res.reverified)                             # True only if it passed MY eval

Host it over HTTP for cross-machine sharing:

from verel.registry import RegistryServer, RemoteRegistry
srv = RegistryServer("/srv/skills", auth_token="…").start()
remote = RemoteRegistry(srv.url, auth_token="…")
import_skill(remote.get(content_hash), into=my_registry, target_cases=my_cases)

Whether a public registry is even a moat is measured, not assumed — measure_transfer (the H2 experiment) re-verifies skills across tenants. See H2 results.


Cookbook

Runnable, mostly offline — see the examples/ directory:

Want to… Run
gate a repo on tests+lint+types verel-ci check --repo .
self-heal failing tests python examples/demo_selfheal.py
grade Python/JS/Go + perf + security on one bus python examples/demo_polyglot_ci.py
consolidate failures → rules → schema → revise python examples/demo_consolidation.py
sandbox a tool to only the syscalls it earned python examples/demo_capability_jail.py
run concurrent managers + a multi-repo saga python examples/demo_distributed_fleet.py
publish a skill and have another tenant re-verify python examples/demo_hosted_registry.py
measure cross-tenant skill transfer (live) python examples/run_h2.py

See also the Architecture & roadmap for how the organs fit together.