CRAFTNº 003
The Workflow That Compounds
Every Inc.'s compound-engineering plugin bets that prompts deserve the same rigor as code — compiled, tested, versioned, and shipped to ten agent platforms from one source.
Twenty-one thousand stars in nine months is usually a sign of a good demo. What Every Inc. has instead is a good habit — packaged, versioned, and shipped as software. The compound-engineering plugin is the most-starred concrete implementation of an idea that inverts how engineering teams think about debt: every unit of work should make the next one cheaper.
The Premise
The repo is really two products. The first is a methodology plugin — 39 skills and 43 agent personas that encode Every’s internal loop: strategize, ideate, plan, execute, review, then compound — write what was learned into a reusable artifact so neither humans nor future agents relearn it. The house arithmetic is 80% planning and review, 20% execution. The second product is a ~8,900-line TypeScript CLI that converts the whole plugin to ten-plus agent platforms from a single source — solving the second-order problem of a fragmenting coding-agent landscape.
Each unit of engineering work should make subsequent units easier.
The Machine
The converter is a textbook transpiler: parse the Claude-native plugin into a typed intermediate representation, convert through per-target semantic mappers, write with per-target writers. Crucially, the mappings are explicit lookup tables, not naming conventions — tools, permissions, hooks, and model aliases each get deliberate translations. Installs are non-destructive: removed artifacts are moved to timestamped backups, and existing configs are deep-merged rather than overwritten.
The reusable prize is ce-code-review: fourteen reviewer personas selected by judgment of the
diff rather than keyword matching, discrete confidence anchors with a default gate that
suppresses findings below 75, autofix classes that separate what may be applied automatically
from what needs a human, and evidence dossiers with progressive disclosure. Around it runs the
compounding memory loop — solved problems become YAML-frontmatter “Learnings,” retrieved
grep-first inside future reviews. The repo ships 31 of its own.
The Test Drive
The study ran the full test suite: 1,669 of 1,678 tests pass in 14 seconds — the nine failures
all require live GitHub access the sandbox blocks, not code defects. A real multi-target
conversion of the live plugin succeeded against OpenCode, Codex, and Gemini, with one skill
correctly excluded by platform filtering. The sharpest probe: a reviewer persona declared as
model: inherit converts to OpenCode with an inferred temperature: 0.1 — genuine per-target
semantic remapping, not a passthrough copy.
The Fine Print
The security scan came back about as clean as scans come: two runtime dependencies, no dynamic code execution, no runtime network calls beyond git, path-traversal hardening enforced by CI tests. The one field worth watching is which global config roots get written when converting to all targets at once.
The evidence is testimony, not measurement — the single biggest gap.
The Verdict
What’s missing is proof that the compounding actually compounds: the efficacy evidence is testimony from its authors, not measurement. That is the caveat behind our rating — and the only one. The codebase is production-grade, dependency-light, and defensively engineered, and its discipline of treating prompts as compiled, tested, versioned source code is more instructive than most agent frameworks carrying ten times the code.
The Deck
Open fullscreen ↗