Technical Deep-Dive · 2026-06-24

Omnigent

A first-party Databricks open-source meta-harness: one orchestration layer over Claude Code, Codex, Cursor, Pi & your own agents — with real governance, sessions that follow you across devices, and cross-vendor review.

568k

lines of code (Py+TS)

~4.7k

stars in ~2 weeks

19/15

policy + engine checks pass

alpha

v0.2.0 · moves hourly

Verdict in one line: Adopt selectively now — strong as a local multi-agent orchestrator + guardrail; pilot the server/collab; the risk is maturity & velocity, not safety.

The thesis

Not another coding agent — the layer above them

You already juggle several agents, each with its own CLI, auth, session model, and no shared governance. Omnigent makes the harness a swappable backend.

Swap or combine harnesses from one YAML — no rewrite.
Supervise multiple agents: one vendor reviews another's diff.
Govern: pause on risky actions, cap spend, limit tools.
Sessions follow you: terminal → browser → phone, in sync.
Run anywhere: local host or disposable cloud sandboxes.

The flagship example: 🐙 Polly

A coding orchestrator that writes no code itself. It plans, delegates to Claude Code / Codex / Pi sub-agents in parallel git worktrees, then routes each diff to a reviewer from a different vendor. You merge.

Cross-vendor review is the whole point: claude_code's PR → reviewed by codex or pi, and vice-versa.

Repository profile & trust

Who's behind it: Databricks, in the open

Field	Detail
Author	Databricks, Inc. (PyPI + NOTICE)
Lineage	Matei Zaharia (Spark/MLflow), Corey Zumar (MLflow), Yuan Tang (Argo/Kubeflow, Red Hat)
License	Apache-2.0 — commercial-safe
Created	2026-06-11 · launched at Data+AI Summit
Languages	Python 378k · TypeScript 93k · Swift
Tests	964 Python test files

4,690

⭐ stars

~1,150

PRs in 2 weeks

0

covert analytics / phone-home

0

hardcoded secrets found

Stars are organic-but-promoted (corporate launch curve, not bots). "Meta-harness" is partly category-creation marketing — but backed by real engineering and a real OSS pedigree.

Architecture · four cooperating layers

How it fits together

clients

CLI / REPL

Web UI mobile-first

macOS app Swift

one session, every device

▼ SSE events · WebSocket tunnels

server

FastAPI 58 paths / 78 ops

Postgres/SQLite 12 tables · 43 migrations

Auth argon2id · JWT · OIDC · RBAC

Policy engine 20 handlers · 6 phases · 3 levels

▼ harness-agnostic agent YAML · policy gates every action

meta-harness

SDK executors claude-sdk · codex · pi · openai-agents · antigravity

Native CLI executors claude/codex/cursor/pi-native · pexpect+tmux

OS sandbox bwrap / seatbelt + secretless cred proxy

▼ runs on

hosts

Local host

Cloud sandboxes Modal · Daytona · Islo · E2B · OpenShell

Tool bridge is uniform across every harness: (name, args) → result, whether the vendor exposes tools via MCP, SDK callbacks, or REST.

How we tested

Five model-free, reproducible experiments

No API keys, no live LLM calls — so results are deterministic and anyone can re-run them (setup.sh && run_all.sh). We installed the real PyPI wheel and exercised the shipped code paths.

1 · Install footprint

Time, disk, deps, Python constraint from PyPI.

2 · Policy logic

Do the builtin policies return the right ALLOW/DENY/ASK?

3 · Harness swap

Does one identical agent body validate across backends?

4 · CLI + registry

Does the shipped 0.2.0 work? What does it really expose?

5 · Engine composition

Does the real PolicyEngine.evaluate() enforce stacking?

⚠ Honest limit

No live end-to-end agent turn (no creds / no bwrap here). Cost/latency/quality of live orchestration are flagged unverified, not claimed.

Result · Experiment 1

Install: fast to try, heavy to vendor

~6s

install (uv + py3.12)

458 MB

total venv size

98

transitive deps

225 MB

bundled Claude SDK alone

Python 3.12+ required — the common system 3.11 fails without uv/pyenv.
Also needs Node 22+ (CLI harnesses) and, on Linux, bubblewrap for native sandboxing.
CLI works out of the box: omni --version → 0.2.0 (4/4 smoke checks pass).

Dependency CVEs (pip-audit)

97 / 98 packages clean. One transitive issue:

starlette 0.52.1 — 6 CVEs, fix ≥1.0.1, blocked by Omnigent's <1 pin.

Server-side only. Patch before exposing a public multi-user server; irrelevant for local CLI use.

Result · Experiments 2 & 5 · the standout

Governance is real and enforced

Exp 2 — decision logic 15/15 ✓

cost_budget downgrades expensive models over a hard cap, ASKs at soft thresholds, fails closed on an unknown model.

max_tool_calls counts & DENYs past the limit. ask_on_os_tools ASKs before shell/file across all vendors' tool names.

Exp 5 — real engine 4/4 ✓

Driving the actual PolicyEngine.evaluate() with stacked policies: ASK bubbles up; an over-budget DENY short-circuits the ASK; DENY beats ASK regardless of order ("stricter-wins").

Scenario (stacked)	Verdict
under budget, cheap tool	ALLOW
Bash tool, under budget	ASK
over budget on Opus + Bash	DENY
ASK declared first, then DENY	DENY
agent tries to add a policy	ASK

20 registered policy handlers ship (cost, safety, PII, github, google, working-dir, risk-score, routing, CEL + 4 orchestration). Agents cannot weaken their own policies — stricter-session-wins + an unconditional add-policy gate.

Result · Experiments 3 & 4

"Swap harnesses without rewriting" — real, for 11 backends

Swapping only executor.harness on one identical agent body, the released 0.2.0 validator accepts 11 canonical harnesses:

claude-sdkclaude-nativecodexcodex-native cursorcursor-nativepipi-native openai-agentsantigravityopen-responses

Advertised but not yet in the stable release's validator:

opencode-nativegoose-nativeqwen copilothermesdatabricks

The skew, precisely

Wheel is internally consistent — its own bundled polly/debby validate.

main is ahead of release — main/polly & main/debby fail on 0.2.0 because they declare an opencode-native sub-agent it doesn't accept.

So the abstraction itself is clean and genuine; the gap is that the README/main run ahead of what the pinned release ships. Cursor is supported today.

Security scan

No backdoors. The risk is operational, not malicious

Strong

Secretless credential proxy — real keys stay in parent; sandboxed child sees placeholders.
Fail-loud sandbox — bwrap missing → raises, no silent no-iso fallback.
Auth: argon2id, JWT, OIDC, single-use invites, 3-level RBAC.
No covert telemetry — only an opt-out PyPI update check.

Watch-outs

Isolation is opt-in per spec — flagship Polly runs sub-agents sandbox: none (worktree + policy isolation instead).
starlette CVE in the server stack, blocked by a version cap.
curl|sh installer — standard, but prefer pip/uv install.
Public server = real ops (Postgres, TLS, secrets, OIDC).

Full detail in security/scan_report.md. Bottom line: safe to run locally today; harden the deps + sandboxing before exposing a shared server.

Maturity signal

It moves faster than you can audit it

During our ~40-minute study, main advanced from PR #1080 → #1150.

And PR #1150 was literally:

fix(polly): drop the opencode sub-agent to stay loadable on older clients

— i.e. the maintainers shipped a fix for the exact main↔release skew our Experiment 3 surfaced, within the hour. Great for confidence in the team; a clear signal to pin a version and treat the README as ahead of the release.

70

PR numbers advanced in ~40 min

v0.2.0

stable · vs 0.3.0.dev0 on main

alpha

self-declared status

Should you adopt it?

Where Omnigent fits your workflow

Use now

local multi-agent orchestration + policy guardrails

Use now

cross-vendor review (Polly): one model checks another

Pilot

server · phone · live collaboration (alpha-operational)

Wait

Copilot / Goose / Qwen / OpenCode harnesses (main-only)

Hold

if you need a stable API today, or no Py3.12 / Postgres

Bar = strength of the evidence-backed recommendation. "Use now" items are the ones our experiments directly validated.

Bottom line

Adopt selectively — the engineering is real, the risk is velocity

If you run more than one coding agent and want shared spend caps, approvals, and cross-vendor review — start today, locally.

uv tool install omnigent (Python 3.12)
omnigent run examples/polly/
Add a cost_budget + ask_on_os_tools policy.
Pin the version. Re-check the harness list each release.

Scorecard

Governance engine	★★★★★ proven
Harness abstraction	★★★★☆ clean
Security posture	★★★★☆ solid
Trust / provenance	★★★★★ Databricks
Maturity / stability	★★☆☆☆ alpha
Footprint / friction	★★★☆☆ heavy

Full paper, experiments & raw logs in this repo's omnigent/ folder.