The Workflow That Compounds

Twenty-one thousand stars in nine months is usually a sign of a good demo. What Every Inc. has instead is a good habit — packaged, versioned, and shipped as software. The compound-engineering plugin is the most-starred concrete implementation of an idea that inverts how engineering teams think about debt: every unit of work should make the next one cheaper.

The Premise

The repo is really two products. The first is a methodology plugin — 39 skills and 43 agent personas that encode Every’s internal loop: strategize, ideate, plan, execute, review, then compound — write what was learned into a reusable artifact so neither humans nor future agents relearn it. The house arithmetic is 80% planning and review, 20% execution. The second product is a ~8,900-line TypeScript CLI that converts the whole plugin to ten-plus agent platforms from a single source — solving the second-order problem of a fragmenting coding-agent landscape.

Each unit of engineering work should make subsequent units easier.

The Machine

The converter is a textbook transpiler: parse the Claude-native plugin into a typed intermediate representation, convert through per-target semantic mappers, write with per-target writers. Crucially, the mappings are explicit lookup tables, not naming conventions — tools, permissions, hooks, and model aliases each get deliberate translations. Installs are non-destructive: removed artifacts are moved to timestamped backups, and existing configs are deep-merged rather than overwritten.

The reusable prize is ce-code-review: fourteen reviewer personas selected by judgment of the diff rather than keyword matching, discrete confidence anchors with a default gate that suppresses findings below 75, autofix classes that separate what may be applied automatically from what needs a human, and evidence dossiers with progressive disclosure. Around it runs the compounding memory loop — solved problems become YAML-frontmatter “Learnings,” retrieved grep-first inside future reviews. The repo ships 31 of its own.

The Test Drive

The study ran the full test suite: 1,669 of 1,678 tests pass in 14 seconds — the nine failures all require live GitHub access the sandbox blocks, not code defects. A real multi-target conversion of the live plugin succeeded against OpenCode, Codex, and Gemini, with one skill correctly excluded by platform filtering. The sharpest probe: a reviewer persona declared as model: inherit converts to OpenCode with an inferred temperature: 0.1 — genuine per-target semantic remapping, not a passthrough copy.

The Fine Print

The security scan came back about as clean as scans come: two runtime dependencies, no dynamic code execution, no runtime network calls beyond git, path-traversal hardening enforced by CI tests. The one field worth watching is which global config roots get written when converting to all targets at once.

The evidence is testimony, not measurement — the single biggest gap.

The Verdict

What’s missing is proof that the compounding actually compounds: the efficacy evidence is testimony from its authors, not measurement. That is the caveat behind our rating — and the only one. The codebase is production-grade, dependency-light, and defensively engineered, and its discipline of treating prompts as compiled, tested, versioned source code is more instructive than most agent frameworks carrying ten times the code.