The Harness Above the Harnesses

Two weeks old, 568,000 lines of code, and the pedigree of Spark and MLflow behind it: Omnigent is what happens when Databricks decides the interesting layer isn’t another coding agent, but the thing that governs all of them. It abstracts agent “harnesses” — Claude Code, Codex, Cursor, Pi — behind a single declarative YAML spec, wraps every action in a policy engine, and makes sessions portable across laptop, browser, and phone.

The Premise

The claim is bold category-creation: a meta-harness. Your agent spec contains zero harness-specific code; one field swaps the backend. Above it sits a governance layer that gates every action ALLOW, DENY, or ASK — and, crucially, that agents cannot weaken from the inside. Whether that claim survives contact with the code is exactly what the study tested.

main advanced 70 PR numbers during the 40-minute study.

The Machine

Four cooperating layers. Clients (CLI, mobile-first web UI, native macOS app) attach to one session. A FastAPI server over Postgres — 58 API paths, real auth with argon2id, JWT, OIDC, and three-level RBAC — streams events over SSE and WebSockets. The policy engine registers twenty handlers across six phases, evaluated stricter-wins with DENY short-circuiting. And the meta-harness itself runs nineteen executor modules, either in-process via vendor SDKs or by driving the real vendor CLI through terminal emulation inside an OS sandbox, with a secretless credential proxy so child agents only ever see synthetic placeholders.

The Test Drive

Five deterministic, model-free experiments. Install: six seconds with uv, at the price of a 458 MB virtualenv — 225 MB of which is the bundled Claude Agent SDK. The policy decision suite went 15 for 15, including failing closed on an unknown model. Engine composition went 4 for 4, confirming that a DENY beats an ASK even when the ASK is declared first — the strongest evidence for the governance thesis. The one blemish: version skew. The flagship examples on main fail validation against the stable 0.2.0 release, because the README advertises harnesses the shipped validator rejects. Tellingly, a fix for the exact skew the study surfaced landed upstream within the hour.

The Fine Print

The scan found no critical issues and no backdoors — and some genuinely strong engineering: a fail-loud sandbox that raises rather than silently degrading, the secretless credential proxy, and no third-party analytics at all. Two things to watch: a starlette dependency pin that blocks the fix for six known vulnerabilities in the server path, and the fact that sandboxing is opt-in per spec — the flagship orchestrator runs with sandbox: none, isolated instead by git worktrees and a blast-radius policy.

Agents cannot silently weaken their own governance.

The Verdict

“Meta-harness” is partly marketing, but it is marketing backed by real engineering. The orchestration and policy layers are adoptable today; the cross-device server story deserves a pilot, not a standard. The dominant risk is not safety — it is that the project is moving at multiple PRs per hour, and what you adopt on Monday may be renamed by Friday.