Technical Deep-Dive · 2026-06-24

Omnigent

A first-party Databricks open-source meta-harness: one orchestration layer over Claude Code, Codex, Cursor, Pi & your own agents — with real governance, sessions that follow you across devices, and cross-vendor review.

568k
lines of code (Py+TS)
~4.7k
stars in ~2 weeks
19/15
policy + engine checks pass
alpha
v0.2.0 · moves hourly

Verdict in one line: Adopt selectively now — strong as a local multi-agent orchestrator + guardrail; pilot the server/collab; the risk is maturity & velocity, not safety.

The thesis

Not another coding agent — the layer above them

You already juggle several agents, each with its own CLI, auth, session model, and no shared governance. Omnigent makes the harness a swappable backend.

  • Swap or combine harnesses from one YAML — no rewrite.
  • Supervise multiple agents: one vendor reviews another's diff.
  • Govern: pause on risky actions, cap spend, limit tools.
  • Sessions follow you: terminal → browser → phone, in sync.
  • Run anywhere: local host or disposable cloud sandboxes.

The flagship example: 🐙 Polly

A coding orchestrator that writes no code itself. It plans, delegates to Claude Code / Codex / Pi sub-agents in parallel git worktrees, then routes each diff to a reviewer from a different vendor. You merge.

Cross-vendor review is the whole point: claude_code's PR → reviewed by codex or pi, and vice-versa.

Repository profile & trust

Who's behind it: Databricks, in the open

FieldDetail
AuthorDatabricks, Inc. (PyPI + NOTICE)
LineageMatei Zaharia (Spark/MLflow), Corey Zumar (MLflow), Yuan Tang (Argo/Kubeflow, Red Hat)
LicenseApache-2.0 — commercial-safe
Created2026-06-11 · launched at Data+AI Summit
LanguagesPython 378k · TypeScript 93k · Swift
Tests964 Python test files
4,690
⭐ stars
~1,150
PRs in 2 weeks
0
covert analytics / phone-home
0
hardcoded secrets found

Stars are organic-but-promoted (corporate launch curve, not bots). "Meta-harness" is partly category-creation marketing — but backed by real engineering and a real OSS pedigree.

Architecture · four cooperating layers

How it fits together

clients
CLI / REPL
Web UI mobile-first
macOS app Swift
one session, every device
▼ SSE events · WebSocket tunnels
server
FastAPI 58 paths / 78 ops
Postgres/SQLite 12 tables · 43 migrations
Auth argon2id · JWT · OIDC · RBAC
Policy engine 20 handlers · 6 phases · 3 levels
▼ harness-agnostic agent YAML · policy gates every action
meta-harness
SDK executors claude-sdk · codex · pi · openai-agents · antigravity
Native CLI executors claude/codex/cursor/pi-native · pexpect+tmux
OS sandbox bwrap / seatbelt + secretless cred proxy
▼ runs on
hosts
Local host
Cloud sandboxes Modal · Daytona · Islo · E2B · OpenShell

Tool bridge is uniform across every harness: (name, args) → result, whether the vendor exposes tools via MCP, SDK callbacks, or REST.

How we tested

Five model-free, reproducible experiments

No API keys, no live LLM calls — so results are deterministic and anyone can re-run them (setup.sh && run_all.sh). We installed the real PyPI wheel and exercised the shipped code paths.

1 · Install footprint

Time, disk, deps, Python constraint from PyPI.

2 · Policy logic

Do the builtin policies return the right ALLOW/DENY/ASK?

3 · Harness swap

Does one identical agent body validate across backends?

4 · CLI + registry

Does the shipped 0.2.0 work? What does it really expose?

5 · Engine composition

Does the real PolicyEngine.evaluate() enforce stacking?

⚠ Honest limit

No live end-to-end agent turn (no creds / no bwrap here). Cost/latency/quality of live orchestration are flagged unverified, not claimed.

Result · Experiment 1

Install: fast to try, heavy to vendor

~6s
install (uv + py3.12)
458 MB
total venv size
98
transitive deps
225 MB
bundled Claude SDK alone
  • Python 3.12+ required — the common system 3.11 fails without uv/pyenv.
  • Also needs Node 22+ (CLI harnesses) and, on Linux, bubblewrap for native sandboxing.
  • CLI works out of the box: omni --version → 0.2.0 (4/4 smoke checks pass).

Dependency CVEs (pip-audit)

97 / 98 packages clean. One transitive issue:

starlette 0.52.1 — 6 CVEs, fix ≥1.0.1, blocked by Omnigent's <1 pin.

Server-side only. Patch before exposing a public multi-user server; irrelevant for local CLI use.

Result · Experiments 2 & 5 · the standout

Governance is real and enforced

Exp 2 — decision logic 15/15 ✓

cost_budget downgrades expensive models over a hard cap, ASKs at soft thresholds, fails closed on an unknown model.

max_tool_calls counts & DENYs past the limit. ask_on_os_tools ASKs before shell/file across all vendors' tool names.

Exp 5 — real engine 4/4 ✓

Driving the actual PolicyEngine.evaluate() with stacked policies: ASK bubbles up; an over-budget DENY short-circuits the ASK; DENY beats ASK regardless of order ("stricter-wins").

Scenario (stacked)Verdict
under budget, cheap toolALLOW
Bash tool, under budgetASK
over budget on Opus + BashDENY
ASK declared first, then DENYDENY
agent tries to add a policyASK

20 registered policy handlers ship (cost, safety, PII, github, google, working-dir, risk-score, routing, CEL + 4 orchestration). Agents cannot weaken their own policies — stricter-session-wins + an unconditional add-policy gate.

Result · Experiments 3 & 4

"Swap harnesses without rewriting" — real, for 11 backends

Swapping only executor.harness on one identical agent body, the released 0.2.0 validator accepts 11 canonical harnesses:

claude-sdkclaude-nativecodexcodex-native cursorcursor-nativepipi-native openai-agentsantigravityopen-responses

Advertised but not yet in the stable release's validator:

opencode-nativegoose-nativeqwen copilothermesdatabricks

The skew, precisely

Wheel is internally consistent — its own bundled polly/debby validate.

main is ahead of releasemain/polly & main/debby fail on 0.2.0 because they declare an opencode-native sub-agent it doesn't accept.

So the abstraction itself is clean and genuine; the gap is that the README/main run ahead of what the pinned release ships. Cursor is supported today.

Security scan

No backdoors. The risk is operational, not malicious

Strong

  • Secretless credential proxy — real keys stay in parent; sandboxed child sees placeholders.
  • Fail-loud sandbox — bwrap missing → raises, no silent no-iso fallback.
  • Auth: argon2id, JWT, OIDC, single-use invites, 3-level RBAC.
  • No covert telemetry — only an opt-out PyPI update check.

Watch-outs

  • Isolation is opt-in per spec — flagship Polly runs sub-agents sandbox: none (worktree + policy isolation instead).
  • starlette CVE in the server stack, blocked by a version cap.
  • curl|sh installer — standard, but prefer pip/uv install.
  • Public server = real ops (Postgres, TLS, secrets, OIDC).

Full detail in security/scan_report.md. Bottom line: safe to run locally today; harden the deps + sandboxing before exposing a shared server.

Maturity signal

It moves faster than you can audit it

During our ~40-minute study, main advanced from PR #1080 → #1150.

And PR #1150 was literally:

fix(polly): drop the opencode sub-agent to stay loadable on older clients

— i.e. the maintainers shipped a fix for the exact main↔release skew our Experiment 3 surfaced, within the hour. Great for confidence in the team; a clear signal to pin a version and treat the README as ahead of the release.

70
PR numbers advanced in ~40 min
v0.2.0
stable · vs 0.3.0.dev0 on main
alpha
self-declared status
Should you adopt it?

Where Omnigent fits your workflow

Use now
local multi-agent orchestration + policy guardrails
Use now
cross-vendor review (Polly): one model checks another
Pilot
server · phone · live collaboration (alpha-operational)
Wait
Copilot / Goose / Qwen / OpenCode harnesses (main-only)
Hold
if you need a stable API today, or no Py3.12 / Postgres

Bar = strength of the evidence-backed recommendation. "Use now" items are the ones our experiments directly validated.

Bottom line

Adopt selectively — the engineering is real, the risk is velocity

If you run more than one coding agent and want shared spend caps, approvals, and cross-vendor review — start today, locally.

  • uv tool install omnigent (Python 3.12)
  • omnigent run examples/polly/
  • Add a cost_budget + ask_on_os_tools policy.
  • Pin the version. Re-check the harness list each release.

Scorecard

Governance engine★★★★★ proven
Harness abstraction★★★★☆ clean
Security posture★★★★☆ solid
Trust / provenance★★★★★ Databricks
Maturity / stability★★☆☆☆ alpha
Footprint / friction★★★☆☆ heavy

Full paper, experiments & raw logs in this repo's omnigent/ folder.

← → / space to navigate · F fullscreen