Skip to content

Real-world scenarios

Six situations a team actually hits — and what Verel does about them. Every block below is real captured output from a runnable script in examples/; nothing here is mocked up. Run any of them yourself:

pip install verel
python examples/demo_selfheal.py

The throughline: an agent never decides "done" — a grader does. Each scenario shows that rule holding under a different kind of pressure.


1. Your CI went red — and fixed itself

The situation. A push breaks the test suite. Normally someone gets paged, reads the failure, patches the code, and re-runs. Verel closes that loop: the real pytest grader fails, an agent patches the source (never the tests), and the stage re-gates until the graders themselves go green. The agent proposes; the verdict bus disposes.

from verel.ci import inner_loop_stage, self_heal
result = self_heal(".", inner_loop_stage(".", with_lint=False))
print(result.healed, result.terminated_on)
── Self-healing CI (real pytest grader + Ollama code-fixer) ──
  round 1: verdict=fail  medic=['fix_branch']  patched=['mathx.py', 'strx.py']
  round 2: verdict=pass  medic=[]  patched=[]

healed=True  terminated_on=passed
Result: PASS — agent healed failing CI to green; graders decided done

The fix landed in the source under test; the tests were never touched. terminated_on=passed means the loop stopped because the graders went green, not because the agent claimed success.

python examples/demo_selfheal.py — uses a live LLM code-fixer (Ollama by default; set VEREL_LLM_PROVIDER=openai to switch). The grader is real pytest.


2. A bad merge slipped through — caught at canary, reverted on precise evidence

The situation. A change passes review and merges, but it's wrong. Verel runs the merged code through a canary grader; on a precise gating failure it performs a deterministic git revert back to the last good HEAD — and, crucially, it refuses to do anything destructive when the only evidence is advisory (a vision or LLM hunch).

from verel.ci import canary_stage, rollback_engine
# canary fails on HEAD → engine reverts to HEAD~1; then an advisory-only failure is offered
── Canary on the merged code (HEAD=c2852b1, VALUE=999) ──
  canary verdict=fail  rolled_back=True
  policy: authorized: 1 precise gating failure(s) justify rollback to HEAD~1
  reverted c2852b1 → new HEAD 59ce62e
  app.py now: VALUE = 1

── An ADVISORY-only failure must NOT trigger a destructive revert ──
  executed=False  reason=denied: only ADVISORY graders support this rollback — destructive action refused
  HEAD unchanged: True

Result: PASS — bad merge auto-reverted on precise evidence; advisory-only refused

The rollback engine acts on hard, reproducible evidence and never on advisory signals — so an agent's opinion can inform a human but can't trigger a git revert on its own.

python examples/demo_canary_rollback.py


3. One fix, the whole fleet — concurrent agents that can't collide or half-apply

The situation. You point a fleet of agents at a backlog spanning several repos. Two risks: two managers grabbing the same task (double work, races), and a multi-repo change landing in repo A but failing in repo B (a half-applied mess). Verel fences managers with leases (a stale leader's writes are rejected — even at the git remote) and commits cross-repo work as an atomic saga that compensates everything already landed if any repo fails.

8 tasks across 2 concurrent managers — ran once each: True
  work split by lease: {'m1': 5, 'm2': 3}

stale leader A fenced off: stale token for 'deploy': 1 < current 2 — write …
  current leader B's write accepted: outcome=passed

cross-repo DAG: ['api::migrate', 'api::build', 'client::ship']
  client shipped only after api built: True

cross-repo saga (client fails):
  committed=[] compensated=['commit:api'] failed=['commit:client']
  repos left landed: [] (empty → atomic: nothing half-applied)

Every task ran exactly once; a deposed manager's writes were refused; and when one repo failed, the saga rolled the others back so nothing was left half-applied.

python examples/demo_distributed_fleet.py


4. A polyglot monorepo — one gate for Python, JS, Go, perf, and security

The situation. A real repo isn't one language. You want one pass/fail signal across pytest, jest, go test, lint, type-checkers, a perf budget, and a security scanner — not eight dashboards. Verel maps every sense onto one verdict schema, so a single gate (and a single stuck/progress signal) covers them all.

Go inner-loop: FAIL
  test      [-] 1 issue(s) — TestLogin failed

JS pre-merge: FAIL
  test      [-] 1 issue(s) — submit posts the form
  typecheck [-] 0 issue(s)

Python pre-merge + perf + security: FAIL
  security  [-] 1 issue(s) — B602 subprocess with shell=True
  perf      [-] 1 issue(s) — p95_ms 240 exceeds budget 150

All senses share one schema, one gate, one stuck/progress signal.

A failing Go test, a broken JS form, a shell-injection finding, and a blown latency budget all speak the same verdict language — so "is this mergeable?" is one question with one answer.

python examples/demo_polyglot_ci.py


5. An agent built its own tool — and got jailed to exactly what it earned

The situation. An agent needs a capability you don't have a tool for. Verel lets it detect → scaffold → test → register the tool — but only admits it on a passing held-out eval, and then runs it under a capability jail: the tool may use only the syscalls it actually exercised while passing that eval (learned via strace). Anything it never earned is refused at the kernel — even syscalls a normal allow-list would have permitted.

learned 28 syscalls → enforced 71 (allow-list jail would permit 83)
  denied here but allowed by the allow-list jail: ['clock_nanosleep', 'epoll_wait', 'nanosleep', 'pipe2', 'select', 'sysinfo', ...]

verified tool under its capability jail: 5
pipe() under the ALLOW-LIST jail: 5
pipe() under the CAPABILITY jail: REFUSED — [Errno 1] Operation not permitted
socket() under the CAPABILITY jail: REFUSED — [Errno 1] Operation not permitted
subprocess under the CAPABILITY jail: REFUSED — [Errno 1] Operation not permitted

The tool that only ever did arithmetic can't open a socket or spawn a subprocess — not by policy review, but because it never earned those syscalls on the eval that admitted it.

python examples/demo_capability_jail.py (Linux + bwrap for the real container; the jail profile is learned from the tool's own passing run)


6. A shared team brain — compounding, un-poisonable, and crash-tolerant

The situation. A fleet of agents keeps relearning the same lessons, and you don't want one noisy (or malicious) agent poisoning everyone's memory — or a single memory node being a SPOF. Verel's shared brain lets agents recall down a scope lattice (self → team → org → global) and graduate verified beliefs up; a peer's claim re-verifies before it's trusted (trust never travels), authors earn reputation, and the store runs as a leader-fenced HA cluster that survives node loss.

── Cross-agent trust: trust does not travel; authors earn reputation ──
  agent-A's belief (my check passes): VERIFIED locally
  agent-B's belief (my check fails):  stayed CANDIDATE — trust did not travel
  reputations → agent-A prior=0.92 (10, 10), agent-B prior=0.33 (3, 10)

── Replicated HA: leader-fenced, fault-tolerant, no split-brain ──
  leader A wrote despite a dead follower — status ReplicationStatus(acks=2, lagging=1, quorum=1)
  failover: B is now leader (token 2); A is fenced out.
  deposed leader A refused: NotLeaderError — no split-brain.
  quorum read: with the leader DOWN, a read still returns 'restart the worker pool'
               (freshest by version) from a surviving replica — strong reads couldn't.

A bad actor's belief stays a candidate until your check agrees; a crashed leader is fenced out with no split-brain; and a point read survives the leader being down by reading the freshest copy from a surviving replica.

python examples/demo_shared_brain.py


Run them all

python examples/demo_selfheal.py          # 1 · red CI heals itself (live LLM + real pytest)
python examples/demo_canary_rollback.py   # 2 · bad merge auto-reverted on precise evidence
python examples/demo_distributed_fleet.py # 3 · fenced concurrent managers + atomic cross-repo saga
python examples/demo_polyglot_ci.py       # 4 · Python/JS/Go + perf + security on one gate
python examples/demo_capability_jail.py   # 5 · a tool jailed to the syscalls it earned
python examples/demo_shared_brain.py      # 6 · shared brain — un-poisonable, HA, crash-tolerant

More feature-level demos (consolidation into structured rules, the tool-smith lifecycle, semantic recall, the H2 cross-tenant transfer experiment) live in examples/.