Developer guide¶
How to use Verel as a library, a CLI, a CI gate, and an MCP server. Every example here runs against the real API; the ones that need a model say so.
The one idea underneath all of it: an agent's output is a hypothesis until a grader returns a
verdict. You compose graders (any sense — tests, lint, types, vision, perf, security) into a
Report, reduce them with gate(), and only verified work is allowed to compound.
Install¶
pip install verel # core (only dependency: pydantic)
pip install "verel[dev]" # + pytest / ruff / mypy graders (the CI gate)
pip install "verel[sight]" # + AgentVision eyes (visual gating + temporal watch)
pip install "verel[container]" # + seccomp-bpf for the bwrap tool sandbox
pip install "verel[mem0]" # + the rented mem0 memory backend
pip install "verel[mcp]" # + the MCP server
| Extra | Pulls in | Enables |
|---|---|---|
dev |
pytest, ruff, mypy | the Python test/lint/type graders |
sight |
agentvision[render] |
verel.senses — DOM/contrast/OCR vision + watch |
container |
pyseccomp |
the seccomp syscall filter on the bwrap tool runner |
mem0 |
mem0ai, chromadb |
Mem0Memory as the MemoryView backend |
mcp |
mcp, anyio |
verel-mcp (Cursor / Claude / any MCP host) |
Verify your environment:
verel doctor
Configure the LLM¶
Anything that authors or judges with a model uses the provider seam in verel.agents.llm. Default
is Ollama Cloud; OpenAI is the bundled fallback.
export VEREL_LLM_PROVIDER=ollama # default; or: openai
export VEREL_CODER_MODEL=qwen3-coder:480b
Keys resolve from an env var first, then ~/.config/<provider>/key:
| Provider | Env var | Key file | Default model |
|---|---|---|---|
ollama |
OLLAMA_API_KEY |
~/.config/ollama/key |
qwen3-coder:480b |
openai |
OPENAI_API_KEY |
~/.config/OpenAI/key |
gpt-4o-mini |
Everything that calls a model takes an injectable chat function, so unit tests (and the
offline examples in examples/) run with no key at all.
The surfaces¶
| Surface | Entry point | Use it for |
|---|---|---|
| Library | import verel |
building your own harness on the verdict bus |
| CLI | verel … |
doctor · loop · fleet · heal · ci |
| CI CLI / git hook | verel-ci … |
a verdict-bus gate in CI or a pre-commit hook |
| MCP server | verel-mcp |
exposing gate / recall / build-tool / ci-check to an MCP host |
| GitHub Action | amitpatole/verel@v0.32.0 |
failing a build on a FAIL verdict |
| pre-commit | .pre-commit-hooks.yaml |
gating commits |
CLI reference¶
verel doctor # environment + key check
verel version
verel loop <artifact> [--backend local] [--max-iter 5] # ultracode visual loop (needs sight + LLM)
verel fleet "<goal>" --artifacts a.html b.html # LLM manager fan-out
verel heal --repo . [--max-rounds 3] # self-healing CI (needs LLM)
verel ci <args…> # delegates to verel-ci
verel-ci check --repo . [--no-lint] # run the inner-loop stage; print the verdict
verel-ci precommit --repo . # pre-commit stage; non-zero exit aborts the commit
verel-ci install --repo . # install the native git pre-commit hook
GitHub Action & pre-commit¶
# .github/workflows/verify.yml
jobs:
verify:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: amitpatole/verel@v0.32.0
with:
repo: .
install: "-e .[dev]" # your project deps so its tests import
# .pre-commit-config.yaml
- repo: https://github.com/amitpatole/verel
rev: v0.32.0
hooks: [{ id: verel-precommit }]
The verdict bus (verel.verdict)¶
The contract every sense speaks. A grader emits a Report of Issues; gate() reduces a set of
reports to one pass / warn / fail.
from verel.verdict import Issue, IssueKind, Report, Severity, GraderKind, Verdict, gate, assign
report = assign(Report( # assign() stamps stable fingerprints
verdict=Verdict.FAIL,
summary="2 type errors",
grader=GraderKind.TYPECHECK,
issues=[Issue(kind=IssueKind.OTHER, severity=Severity.ERROR,
source=GraderKind.TYPECHECK, message="Incompatible return type",
locator="app.py:42")],
))
result = gate([report], required={GraderKind.TYPECHECK})
print(result.verdict) # Verdict.FAIL
Load-bearing rules gate() enforces:
- Advisory ceiling — per-issue trust keys off
Issue.source. Precise sources (TEST/LINT/TYPECHECK/DOM/CV/OCR/SECURITY/PERF) gate at full severity; advisory ones (VISION/LLM_JUDGE) are clamped to at mostwarn. - Attestation — a required grader must carry a signed
run_receiptproving it ran the frozen suite over the changed files. A hollowPASS, issues=[]with no receipt fails. - Stuck vs. progress —
progressed(curr, prev)is strict shrinkage of the gating-failure set; pure churn or growth is not progress.
Agent-run CI/CD (verel.ci)¶
Tests, lint, and types as first-class graders, across Python · JS/TS · Go, plus perf and security. Stages compose graders and gate them with attestation + failure-memory.
from verel.ci import inner_loop_stage, premerge_stage, run_stage
res = run_stage(inner_loop_stage(".", language="python", with_lint=True, with_types=True))
print(res.verdict, [r.grader.value for r in res.reports])
Pick a language; add precise senses:
from verel.ci import premerge_stage, perf_spec, run_stage
stage = premerge_stage(
".", language="go", # python | js | go
security=True, # SAST (bandit) / dependency audit (npm)
perf=perf_spec(".", ["./bench"], budgets={"p95_ms": 150}), # regression past budget gates
)
res = run_stage(stage)
Each GraderSpec carries its own parser, so pytest, go test -json, and a TAP runner — all
GraderKind.TEST — coexist on one bus. Language toolchains live in verel.ci.LANGS; the graders
are pytest_spec/ruff_spec/mypy_spec, jstest_spec/eslint_spec/tsc_spec,
gotest_spec/govet_spec, plus bandit_spec/npm_audit_spec/perf_spec.
Self-healing¶
from verel.ci import inner_loop_stage, self_heal
result = self_heal(".", inner_loop_stage(".", with_lint=False)) # needs an LLM key
print(result.healed, result.terminated_on)
On a failure the ci-medic classifies each issue (retry / regen-lockfile / quarantine-flaky / fix-branch) and, for genuine regressions, invokes the code-fixer — re-gating each round until the graders pass or it escalates.
Verdict-driven rollback¶
The agent proposes; a deterministic engine authorizes — and only on precise gating evidence
(never an advisory opinion), performing a safe git revert (never a history rewrite).
from verel.ci import RollbackExecutor, RollbackProposal
outcome = RollbackExecutor().maybe_rollback(repo, proposal, reports)
The brain — memory that compounds (verel.memory)¶
A trust layer over a swappable backend (LocalMemory, zero-dep sqlite; or mem0). Each record
carries two orthogonal quantities — epistemic_confidence (belief; moved only by
corroborate/contradict) and retrieval_strength (reachability; decays, resets on recall).
from verel.memory import LocalMemory, MemoryRecord, MemoryKind
from verel.memory.view import make_key
mem = LocalMemory() # or LocalMemory(embedder=OpenAIEmbedder())
mem.write(MemoryRecord(kind=MemoryKind.FACT, subject="auth", predicate="uses",
text="sessions are JWT, 15-min expiry", scope="repo:app",
subj_pred_key=make_key("auth", "uses", "repo:app")))
hits = mem.recall("how does login work", scope="repo:app", k=3)
Trust is earned, never asserted — a candidate reaches verified only by passing a held-out,
agent-inaccessible eval (with a leakage canary):
from verel.memory import PromotionGate, HeldOutCorpus, EvalCase
corpus = HeldOutCorpus([
EvalCase(text="a card overflows the viewport on a 320px screen",
covers_kind="overflow", label="prevent"), # "prevent" | "allow"
])
gate = PromotionGate(mem, corpus) # ratifies candidates → verified via the bus
Consolidation: episodes → rules → schemas¶
from verel.memory import consolidate_failures, induce_hierarchy, consolidate_across_scopes
# recurring FAILUREs in a scope → candidate, structured DesignRules (condition → action)
rules = consolidate_failures(mem, scope="repo:app", min_cluster=2)
# rules → order-2 principles → order-3 meta-principles, until the corpus stops supporting more
levels = induce_hierarchy(mem, scope="repo:app", min_size=2)
# a pattern recurring across several repos → one `global` rule (records detail['spans'])
glob = consolidate_across_scopes(mem, ["repo:a", "repo:b"], min_scopes=2)
Contradiction-driven revision¶
Consolidation can be wrong. A new failure in a rule's domain that the rule failed to prevent is a counterexample: the rule is weakened, and once enough accumulate it's split into a narrowed rule + an exception — and the split propagates up the schema hierarchy so principles above stop over-claiming.
from verel.memory import revise_with_counterexample, contradicts
if contradicts(rule, new_failure):
rev = revise_with_counterexample(mem, rule, new_failure) # needs an LLM for the split
print(rev.action) # "weakened" | "split" | "rejected"
print(rev.propagated) # schemas above the rule that were re-derived
Everything starts candidate / inferred; height, breadth, and survival never confer trust.
mem0 backend¶
from verel.memory import make_ollama_mem0 # needs verel[mem0]
mem = make_ollama_mem0() # same MemoryView Protocol; recall is semantic
Tool-smith — agents build their own tools (verel.toolsmith)¶
detect → scaffold → test → register → reuse. A tool is admitted only on a passing attested eval;
reuse re-verifies against the new spec's cases (a close match isn't trusted blindly).
from verel.memory import LocalMemory
from verel.toolsmith import ToolRegistry, ToolSmith, ToolSpec, ToolCase, SideEffect
smith = ToolSmith(ToolRegistry(LocalMemory()), isolation="container") # needs an LLM key
res = smith.build(ToolSpec(
name="slugify", capability="convert a title to a url slug",
side_effect=SideEffect.READ_ONLY,
cases=[ToolCase(args=["Hello World"], expected="hello-world")],
))
print(res.trust, res.registered)
Isolation tiers¶
Untrusted, agent-authored code runs in a separate trust domain. From weakest to strongest:
isolation= |
What it is |
|---|---|
"subprocess" |
fresh interpreter + rlimits + wall-clock timeout (dependency-free) |
"container" |
bwrap namespace sandbox — no network, read-only fs, ephemeral /tmp, cleared env |
"container" + verel[container] |
…plus a seccomp-bpf filter |
Three seccomp profiles (run_container(..., seccomp_profile=…)):
denylist(default) — EPERM on dangerous syscalls; safe for arbitrary tools.allowlist— default-deny; only pure-compute syscalls (no network/subprocess/threads).capability— the tightest: only the syscalls a tool exercised while passing its eval (learn withlearn_syscall_profile, then enforce viatool.syscall_policy).
The fleet — agents managing agents (verel.fleet)¶
A single-writer scheduler over a Task DAG, every node gated by the bus.
import asyncio
from verel.fleet import Scheduler, Task, WorkerResult
from verel.verdict import Verdict
async def worker(task): # your agent; returns a graded result
...
return WorkerResult(verdict=Verdict.PASS)
tasks = [Task(id="a"), Task(id="b", deps=["a"])]
state = asyncio.run(Scheduler(worker, concurrency=4).run(tasks))
Barriers (all / k_of_n / optional), retry → quarantine, a hard budget lease, and WAL-based
crash resume are all on Task / Scheduler.
Concurrent managers (fencing)¶
More than one scheduler can share a task store safely via fencing leases — a stale leader's writes are rejected:
from verel.fleet import Scheduler, InMemoryLeaseStore # or SqliteLeaseStore for cross-process
store = InMemoryLeaseStore()
s1 = Scheduler(worker, leases=store, owner="m1")
s2 = Scheduler(worker, leases=store, owner="m2") # each task runs exactly once
Across machines, put the lease authority behind the HTTP control plane:
from verel.fleet import ControlPlaneServer, RemoteLeaseStore
srv = ControlPlaneServer("/var/lib/verel/leases.db", auth_token="…").start()
sched = Scheduler(worker, leases=RemoteLeaseStore(srv.url, auth_token="…"), owner="host-1")
Multi-repo + atomic sagas¶
from verel.fleet import plan_multi_repo, CrossDep, run_saga, SagaStep, git_revert_head
dag = plan_multi_repo({"api": api_tasks, "client": client_tasks},
[CrossDep(to_repo="client", dependent="ship", from_repo="api", needs="build")])
# all-or-nothing across repos: a failure compensates the repos that already landed, in reverse
res = run_saga([SagaStep("api", forward_api, lambda _r: git_revert_head("/repos/api")),
SagaStep("client", forward_client, lambda _r: git_revert_head("/repos/client"))])
A git pre-receive fencing sink (write_pre_receive_hook) extends the fence to pushes: a push
carrying a stale token is refused at the remote.
Eyes / senses (verel.senses) — needs verel[sight]¶
AgentVision as a grounded perception sense on the same bus.
from verel.senses import perceive, watch
percept = perceive("dist/index.html") # DOM / contrast / OCR, + intent conformance
clip = watch("https://app.local/player") # temporal: playback / loading / liveness
A precise visual failure (overflow, clipped, missing element) gates; the advisory vision-LLM
opinion is clamped to warn.
Skill registry (verel.registry)¶
Content-addressed, signed skill artifacts — and the rule that keeps the flywheel honest: trust
does not travel. A fetched skill enters as a candidate and only becomes verified by passing
the importer's OWN held-out eval.
from verel.registry import export_skill, import_skill, PublicRegistry
art = export_skill(verified_tool, origin="tenant:A")
PublicRegistry("/srv/skills").publish(art) # verifies the signature; refuses a tamper
res = import_skill(art, into=my_registry, target_cases=my_cases)
print(res.reverified) # True only if it passed MY eval
Host it over HTTP for cross-machine sharing:
from verel.registry import RegistryServer, RemoteRegistry
srv = RegistryServer("/srv/skills", auth_token="…").start()
remote = RemoteRegistry(srv.url, auth_token="…")
import_skill(remote.get(content_hash), into=my_registry, target_cases=my_cases)
Whether a public registry is even a moat is measured, not assumed — measure_transfer (the H2
experiment) re-verifies skills across tenants. See H2 results.
Cookbook¶
Runnable, mostly offline — see the examples/ directory:
| Want to… | Run |
|---|---|
| gate a repo on tests+lint+types | verel-ci check --repo . |
| self-heal failing tests | python examples/demo_selfheal.py |
| grade Python/JS/Go + perf + security on one bus | python examples/demo_polyglot_ci.py |
| consolidate failures → rules → schema → revise | python examples/demo_consolidation.py |
| sandbox a tool to only the syscalls it earned | python examples/demo_capability_jail.py |
| run concurrent managers + a multi-repo saga | python examples/demo_distributed_fleet.py |
| publish a skill and have another tenant re-verify | python examples/demo_hosted_registry.py |
| measure cross-tenant skill transfer (live) | python examples/run_h2.py |
See also the Architecture & roadmap for how the organs fit together.