A first-party Databricks open-source meta-harness: one orchestration layer over Claude Code, Codex, Cursor, Pi & your own agents — with real governance, sessions that follow you across devices, and cross-vendor review.
Verdict in one line: Adopt selectively now — strong as a local multi-agent orchestrator + guardrail; pilot the server/collab; the risk is maturity & velocity, not safety.
You already juggle several agents, each with its own CLI, auth, session model, and no shared governance. Omnigent makes the harness a swappable backend.
A coding orchestrator that writes no code itself. It plans, delegates to Claude Code / Codex / Pi sub-agents in parallel git worktrees, then routes each diff to a reviewer from a different vendor. You merge.
Cross-vendor review is the whole point: claude_code's PR → reviewed by codex or pi, and vice-versa.
| Field | Detail |
|---|---|
| Author | Databricks, Inc. (PyPI + NOTICE) |
| Lineage | Matei Zaharia (Spark/MLflow), Corey Zumar (MLflow), Yuan Tang (Argo/Kubeflow, Red Hat) |
| License | Apache-2.0 — commercial-safe |
| Created | 2026-06-11 · launched at Data+AI Summit |
| Languages | Python 378k · TypeScript 93k · Swift |
| Tests | 964 Python test files |
Stars are organic-but-promoted (corporate launch curve, not bots). "Meta-harness" is partly category-creation marketing — but backed by real engineering and a real OSS pedigree.
Tool bridge is uniform across every harness: (name, args) → result, whether the vendor exposes tools via MCP, SDK callbacks, or REST.
No API keys, no live LLM calls — so results are deterministic and anyone can re-run them (setup.sh && run_all.sh). We installed the real PyPI wheel and exercised the shipped code paths.
Time, disk, deps, Python constraint from PyPI.
Do the builtin policies return the right ALLOW/DENY/ASK?
Does one identical agent body validate across backends?
Does the shipped 0.2.0 work? What does it really expose?
Does the real PolicyEngine.evaluate() enforce stacking?
No live end-to-end agent turn (no creds / no bwrap here). Cost/latency/quality of live orchestration are flagged unverified, not claimed.
97 / 98 packages clean. One transitive issue:
starlette 0.52.1 — 6 CVEs, fix ≥1.0.1, blocked by Omnigent's <1 pin.
Server-side only. Patch before exposing a public multi-user server; irrelevant for local CLI use.
cost_budget downgrades expensive models over a hard cap, ASKs at soft thresholds, fails closed on an unknown model.
max_tool_calls counts & DENYs past the limit. ask_on_os_tools ASKs before shell/file across all vendors' tool names.
Driving the actual PolicyEngine.evaluate() with stacked policies: ASK bubbles up; an over-budget DENY short-circuits the ASK; DENY beats ASK regardless of order ("stricter-wins").
| Scenario (stacked) | Verdict |
|---|---|
| under budget, cheap tool | ALLOW |
| Bash tool, under budget | ASK |
| over budget on Opus + Bash | DENY |
| ASK declared first, then DENY | DENY |
| agent tries to add a policy | ASK |
20 registered policy handlers ship (cost, safety, PII, github, google, working-dir, risk-score, routing, CEL + 4 orchestration). Agents cannot weaken their own policies — stricter-session-wins + an unconditional add-policy gate.
Swapping only executor.harness on one identical agent body, the released 0.2.0 validator accepts 11 canonical harnesses:
Advertised but not yet in the stable release's validator:
Wheel is internally consistent — its own bundled polly/debby validate.
main is ahead of release — main/polly & main/debby fail on 0.2.0 because they declare an opencode-native sub-agent it doesn't accept.
So the abstraction itself is clean and genuine; the gap is that the README/main run ahead of what the pinned release ships. Cursor is supported today.
Full detail in security/scan_report.md. Bottom line: safe to run locally today; harden the deps + sandboxing before exposing a shared server.
During our ~40-minute study, main advanced from PR #1080 → #1150.
And PR #1150 was literally:
fix(polly): drop the opencode sub-agent to stay loadable on older clients
— i.e. the maintainers shipped a fix for the exact main↔release skew our Experiment 3 surfaced, within the hour. Great for confidence in the team; a clear signal to pin a version and treat the README as ahead of the release.
Bar = strength of the evidence-backed recommendation. "Use now" items are the ones our experiments directly validated.
If you run more than one coding agent and want shared spend caps, approvals, and cross-vendor review — start today, locally.
| Governance engine | ★★★★★ proven |
| Harness abstraction | ★★★★☆ clean |
| Security posture | ★★★★☆ solid |
| Trust / provenance | ★★★★★ Databricks |
| Maturity / stability | ★★☆☆☆ alpha |
| Footprint / friction | ★★★☆☆ heavy |
Full paper, experiments & raw logs in this repo's omnigent/ folder.