Research Deck · v3.12.0
Technical Deep Dive · EveryInc

Compound Engineering
Plugin

A cross-platform AI coding-agent plugin — 39 skills & 43 agents implementing a "compound engineering" workflow — shipped with a TypeScript converter CLI that retargets it to 10+ agent platforms from one source.

21.2k
GitHub stars
1,557
forks
MIT
license
v3.12.0
released
github.com/EveryInc/compound-engineering-plugin  ·  analyzed 2026-06-13  ·  use to navigate
Repository Profile

Who built it, and what's inside

FieldDetails
CreatorsEvery Inc. (every.to) — maintained by Kieran Klaassen (@kieranklaassen) & Trevin Chow (@tmchow, ~60% of commits). Productizes Every's internal "how we code with agents" methodology.
LanguageTypeScript on the Bun runtime (CLI). Product content is Markdown + YAML frontmatter; co-located Python/Bash skill scripts.
Stats21,169 ★ · 1,557 forks · 95 open issues · created Oct 2025 · actively pushed (daily).
Lines of codeCLI: ~8,900 LOC TypeScript across ~46 src/ files + ~55 test files (1,678 tests). Plugin: 39 skill dirs, 43 agent definitions, ~190 reference/asset files.
LicenseMIT — permissive, commercial-friendly, no copyleft obligations.
Security Low risk. 2 runtime deps · no eval/secrets · hardened path & symlink handling. (full report → slide 11)
Sources: GitHub API · git shortlog · package.json · cloc-equivalent line counts
What it actually is

One repo, two products

The name says "plugin," but the repository ships two tightly-coupled artifacts. Understanding the split is the key to the whole codebase.

PRODUCT The methodology plugin

plugins/compound-engineering/ — 39 user-invoked Skills (slash commands) and 43 dispatched Agents (subagents) that encode an opinionated engineering loop: plan deeply, review rigorously, and codify every lesson so the next task is easier.

INFRA The converter CLI

src/ — a Bun/TypeScript compiler that parses the Claude-format plugin once and emits native bundles for Codex, OpenCode, Gemini, Pi, Kiro, Copilot, Droid & Qwen. Authored once, runs everywhere.

Thesis  The plugin is the payload; the CLI is the distribution layer that frees the payload from any single vendor.

Architecture · the CLI

A three-stage Parse → Convert → Write compiler

Like a transpiler: one parser, an intermediate representation (the Bundle), and a writer per target. Adding a platform = one converter + one writer, nothing else touched.

stage 1
Parse
parsers/claude.ts reads the manifest, agents, skills, hooks & MCP into a typed ClaudePlugin.
stage 2
Convert
converters/claude-to-* map tools, models, hooks & permissions into a per-target Bundle (in-memory IR).
stage 3
Write
targets/*.ts emit each Bundle to the platform's real paths with merge semantics.

Explicit, not magic

Tools, permissions, hook events & model names are mapped by lookup tables — never by convention. targets/index.ts is the registry that drives --to and --also.

Non-destructive

Installs track an install-manifest; removed files move to legacy-backup/<ts>/. opencode.json is deep-merged with a .bak, never clobbered.

Safe by default

ASCII path sanitization, traversal guards, and symlink-ownership checks (only unlink CE-managed links) are enforced by dedicated test suites.

Architecture · the methodology

The compounding loop

"80% planning & review, 20% execution." Each skill hands a durable artifact to the next; the cycle closes by writing the lesson back into the repo.

/ce-strategy · STRATEGY.md /ce-ideate /ce-brainstorm · requirements /ce-plan · plan doc /ce-work · code /ce-code-review /ce-compound · learning ↩

WHAT vs HOW vs DO

brainstorm defines what to build → plan defines howwork executes. Each reads the prior artifact; none is required, but each sharpens the next.

The read-side companion

/ce-product-pulse reports what users actually experienced over a window, feeding real signal back into the next strategy and brainstorm.

Supporting skills: ce-debug · ce-doc-review · ce-simplify-code · ce-commit · ce-resolve-pr-feedback · ce-setup · ce-dhh-rails-style · ce-frontend-design · ce-gemini-imagegen · …

The interesting engineering

How a skill orchestrates a panel of agents

/ce-code-review is the showcase: it spawns parallel reviewer sub-agents that return structured JSON, then merges, dedups & gates them. Five reusable primitives make it work:

🎭 Reviewer personas

14 single-lens reviewers. 4 always-on (correctness, testing, maintainability, standards) + conditionals (security, performance, migrations…) selected by agent judgment of the diff, not keyword match.

📊 Confidence anchors

Findings self-score on a discrete 0/25/50/75/100 scale, each tied to a behavioral test. Default gate suppresses <75 — killing "confident false positives."

🔧 Autofix classes

Every finding is tagged gated_auto / manual / advisory — encoding how safely a fix may be applied before any code is touched.

🎚️ Model tiers

Semantic tiers — extraction / generation / ceiling — named per agent so model IDs never hardcode. High-stakes personas inherit the ceiling model; scouts run cheap.

📁 Evidence dossiers

Bulky JSON is written to /tmp/…/run-id/; the orchestrator carries only a compact gist, loading detail from disk only when validating. Scales to many agents without context blowup.

🧩 Load stubs

Conditional logic lives in references/ and is pulled in on demand, keeping SKILL.md lean at session-load time — progressive disclosure for prompts.

The core idea, mechanized

Knowledge that compounds

The differentiator isn't the agents — it's the memory loop. /ce-compound turns every solved problem into a queryable Learning in docs/solutions/ with structured YAML frontmatter.

  • Two tracks: bug (build_error, test_failure, security_issue…) and knowledge (best_practice, convention, architecture_pattern…).
  • Grep-first retrieval: ce-learnings-researcher filters by frontmatter, then reads only strong matches — scales to hundreds of docs.
  • Self-maintaining: /ce-compound-refresh classifies stale docs as Keep / Update / Consolidate / Replace / Delete against the live code.
  • Closed loop: reviews auto-spawn ce-learnings-researcher, so past lessons resurface inside future work.
# docs/solutions/workflow/…md title: Plugin Versioning Requirements category: workflow problem_type: workflow_issue module: plugin-development component: documentation severity: process tags: [versioning, changelog, readme] date: 2026-03-17 --- # Plugin Versioning and Documentation… # 31 learnings already committed in-repo — # the plugin dogfoods its own mechanism.
Write once, run everywhere

One source → 10+ agent platforms

Conversion is genuine semantic remapping. Capabilities degrade gracefully where a target lacks a primitive (e.g. hooks only survive on OpenCode).

TargetAgentsSkillsHooksMCP / Perms
Claude Codenativenativefullfull
OpenCode.md + inferred tempcopied dirs→ TS pluginmerged into opencode.json
Codex.toml custom agentsnative + Bun stepsuppressedsuppressed (ADR)
Gemini CLI.mdcopied dirssuppressedmcp.json
Pi / Kiro.md / steering YAMLcopied dirsn/amcporter.json / mcp.json
Copilot · Droid · Qwennative (Claude-compatible)nativesuppressedconfig

Model mapping  model: sonnetanthropic/claude-sonnet-4-6 for provider-prefixed targets; subagents drop the field and inherit the session model.

Experiments & Results

I ran it. It holds up.

1669/1678
tests pass (Bun 1.3.11, 14.1s)
9
failures — all network/git-clone, not code
99.5%
offline pass rate · 4,468 assertions

Real conversion of the live plugin

bun run … convert --to {opencode,gemini,codex} all succeeded:

OpenCode → 43 agents · 38 skills · merged config
Codex → 43 agents · native skill tree
Gemini → 43 agents · 189 skill files

(38 vs 39 skills: one is excluded by ce_platforms filtering — exactly as designed.)

Conversion is semantic, proven

ce-correctness-reviewer — source (Claude) model: inherit tools: Read,Grep,… color: blue ↓ converted to OpenCode mode: subagent temperature: 0.1 ← inferred: reviewer = deterministic

The converter infers a low temperature for review personas — not a passthrough copy.

Full logs: results/logs/experiments.md
Engineering judgment worth stealing

What's genuinely notable

Self-contained skill units

Every skill is a portable directory — no cross-skill imports. That single constraint is what makes the converter possible at all.

Prompts as compiler input

Markdown skills are treated as source code: linted by contract tests (ce- prefix, shell safety, relative paths) and versioned via semantic-release.

Structure as cost control

When a platform can't pick models per-agent, cost falls back to read budgets + output caps — graceful degradation baked into the design.

"Guardrails, not a controller"

Skills give an intelligent agent hard rules + judgment room — deliberately under-prescribed, the opposite of brittle scripted flows.

Dogfooded memory

31 real Learnings already in-repo, including one about the team's own release-version drift. The methodology debugged itself.

Two-output dispatch

Agents return a compact gist + a disk dossier — the pattern that lets one orchestrator fan out to a dozen agents without drowning in context.

Security & Limitations

The honest column

Security: low risk

  • 2 runtime deps (citty, js-yaml) — tiny supply-chain surface, no postinstall.
  • No eval / new Function; Bun.spawn only for git with array args (no shell injection).
  • No secrets committedsecrets.ts is itself a detector that warns on MCP env vars.
  • Hardened FS: path sanitization, traversal & symlink-ownership guards, all CI-tested.

Limitations & concerns

  • Opinionated by design — Rails/Hotwire & Every's conventions are baked in; not all skills are stack-neutral.
  • Converter targets drift — non-Claude formats evolve; Codex still needs a manual Bun step for agents.
  • Pinned model aliases (claude-opus-4-6…) are a maintenance burden, updated by hand.
  • Efficacy is unmeasured — "compounding" is a credible design, but the repo ships no benchmark proving teams actually get faster.
The single biggest gap is evidence of outcome: the architecture for compounding is real and well-built, but its productivity claim rests on testimony, not measurement.
Conclusion & Recommendation

Should you adopt it?

Yes — as a reference architecture, and as a daily driver if you live on a supported platform.

✅ Adopt if…

  • You want a rigorous, opinionated agent workflow out of the box.
  • You value a persistent learning store over one-shot prompting.
  • You're on Claude Code, Codex, Cursor, OpenCode or Gemini.

⚠️ Borrow the patterns if…

  • You're building your own agent harness — the persona/anchor/dossier/tier patterns are the real prize.
  • Your stack is far from Rails and you'd fight the defaults.

Bottom line  A production-grade, defensively-engineered codebase whose prompt-as-source-code discipline is more instructive than most "agent framework" repos with 10× the code. The methodology's payoff is plausible and well-architected — verify it on your own work before betting the team on it.

Deck + paper.md + security report committed to research workspace · EveryInc/compound-engineering-plugin @ v3.12.0