Technical Deep Dive · EveryInc

Compound Engineering
Plugin

A cross-platform AI coding-agent plugin — 39 skills & 43 agents implementing a "compound engineering" workflow — shipped with a TypeScript converter CLI that retargets it to 10+ agent platforms from one source.

21.2k

GitHub stars

1,557

forks

MIT

license

v3.12.0

released

github.com/EveryInc/compound-engineering-plugin · analyzed 2026-06-13 · use → ← to navigate

Repository Profile

Who built it, and what's inside

Field	Details
Creators	Every Inc. (every.to) — maintained by Kieran Klaassen (@kieranklaassen) & Trevin Chow (@tmchow, ~60% of commits). Productizes Every's internal "how we code with agents" methodology.
Language	TypeScript on the Bun runtime (CLI). Product content is Markdown + YAML frontmatter; co-located Python/Bash skill scripts.
Stats	21,169 ★ · 1,557 forks · 95 open issues · created Oct 2025 · actively pushed (daily).
Lines of code	CLI: ~8,900 LOC TypeScript across ~46 `src/` files + ~55 test files (1,678 tests). Plugin: 39 skill dirs, 43 agent definitions, ~190 reference/asset files.
License	MIT — permissive, commercial-friendly, no copyleft obligations.
Security	Low risk. 2 runtime deps · no eval/secrets · hardened path & symlink handling. (full report → slide 11)

Sources: GitHub API · git shortlog · package.json · cloc-equivalent line counts

What it actually is

One repo, two products

The name says "plugin," but the repository ships two tightly-coupled artifacts. Understanding the split is the key to the whole codebase.

PRODUCT The methodology plugin

plugins/compound-engineering/ — 39 user-invoked Skills (slash commands) and 43 dispatched Agents (subagents) that encode an opinionated engineering loop: plan deeply, review rigorously, and codify every lesson so the next task is easier.

INFRA The converter CLI

src/ — a Bun/TypeScript compiler that parses the Claude-format plugin once and emits native bundles for Codex, OpenCode, Gemini, Pi, Kiro, Copilot, Droid & Qwen. Authored once, runs everywhere.

Thesis The plugin is the payload; the CLI is the distribution layer that frees the payload from any single vendor.

Architecture · the CLI

A three-stage Parse → Convert → Write compiler

Like a transpiler: one parser, an intermediate representation (the Bundle), and a writer per target. Adding a platform = one converter + one writer, nothing else touched.

stage 1

Parse

parsers/claude.ts reads the manifest, agents, skills, hooks & MCP into a typed ClaudePlugin.

→

stage 2

Convert

converters/claude-to-* map tools, models, hooks & permissions into a per-target Bundle (in-memory IR).

→

stage 3

Write

targets/*.ts emit each Bundle to the platform's real paths with merge semantics.

Explicit, not magic

Tools, permissions, hook events & model names are mapped by lookup tables — never by convention. targets/index.ts is the registry that drives --to and --also.

Non-destructive

Installs track an install-manifest; removed files move to legacy-backup/<ts>/. opencode.json is deep-merged with a .bak, never clobbered.

Safe by default

ASCII path sanitization, traversal guards, and symlink-ownership checks (only unlink CE-managed links) are enforced by dedicated test suites.

Architecture · the methodology

The compounding loop

"80% planning & review, 20% execution." Each skill hands a durable artifact to the next; the cycle closes by writing the lesson back into the repo.

/ce-strategy · STRATEGY.md→ /ce-ideate→ /ce-brainstorm · requirements→ /ce-plan · plan doc→ /ce-work · code→ /ce-code-review→ /ce-compound · learning ↩

WHAT vs HOW vs DO

brainstorm defines what to build → plan defines how → work executes. Each reads the prior artifact; none is required, but each sharpens the next.

The read-side companion

/ce-product-pulse reports what users actually experienced over a window, feeding real signal back into the next strategy and brainstorm.

Supporting skills: ce-debug · ce-doc-review · ce-simplify-code · ce-commit · ce-resolve-pr-feedback · ce-setup · ce-dhh-rails-style · ce-frontend-design · ce-gemini-imagegen · …

The interesting engineering

How a skill orchestrates a panel of agents

/ce-code-review is the showcase: it spawns parallel reviewer sub-agents that return structured JSON, then merges, dedups & gates them. Five reusable primitives make it work:

🎭 Reviewer personas

14 single-lens reviewers. 4 always-on (correctness, testing, maintainability, standards) + conditionals (security, performance, migrations…) selected by agent judgment of the diff, not keyword match.

📊 Confidence anchors

Findings self-score on a discrete 0/25/50/75/100 scale, each tied to a behavioral test. Default gate suppresses <75 — killing "confident false positives."

🔧 Autofix classes

Every finding is tagged gated_auto / manual / advisory — encoding how safely a fix may be applied before any code is touched.

🎚️ Model tiers

Semantic tiers — extraction / generation / ceiling — named per agent so model IDs never hardcode. High-stakes personas inherit the ceiling model; scouts run cheap.

📁 Evidence dossiers

Bulky JSON is written to /tmp/…/run-id/; the orchestrator carries only a compact gist, loading detail from disk only when validating. Scales to many agents without context blowup.

🧩 Load stubs

Conditional logic lives in references/ and is pulled in on demand, keeping SKILL.md lean at session-load time — progressive disclosure for prompts.

The core idea, mechanized

Knowledge that compounds

The differentiator isn't the agents — it's the memory loop. /ce-compound turns every solved problem into a queryable Learning in docs/solutions/ with structured YAML frontmatter.

Two tracks: bug (build_error, test_failure, security_issue…) and knowledge (best_practice, convention, architecture_pattern…).
Grep-first retrieval: ce-learnings-researcher filters by frontmatter, then reads only strong matches — scales to hundreds of docs.
Self-maintaining: /ce-compound-refresh classifies stale docs as Keep / Update / Consolidate / Replace / Delete against the live code.
Closed loop: reviews auto-spawn ce-learnings-researcher, so past lessons resurface inside future work.

# docs/solutions/workflow/…md title: Plugin Versioning Requirements category: workflow problem_type: workflow_issue module: plugin-development component: documentation severity: process tags: [versioning, changelog, readme] date: 2026-03-17 --- # Plugin Versioning and Documentation… # 31 learnings already committed in-repo — # the plugin dogfoods its own mechanism.

Write once, run everywhere

One source → 10+ agent platforms

Conversion is genuine semantic remapping. Capabilities degrade gracefully where a target lacks a primitive (e.g. hooks only survive on OpenCode).

Target	Agents	Skills	Hooks	MCP / Perms
Claude Code	native	native	full	full
OpenCode	.md + inferred temp	copied dirs	→ TS plugin	merged into `opencode.json`
Codex	.toml custom agents	native + Bun step	suppressed	suppressed (ADR)
Gemini CLI	.md	copied dirs	suppressed	`mcp.json`
Pi / Kiro	.md / steering YAML	copied dirs	n/a	`mcporter.json` / `mcp.json`
Copilot · Droid · Qwen	native (Claude-compatible)	native	suppressed	config

Model mapping model: sonnet → anthropic/claude-sonnet-4-6 for provider-prefixed targets; subagents drop the field and inherit the session model.

Experiments & Results

I ran it. It holds up.

1669/1678

tests pass (Bun 1.3.11, 14.1s)

failures — all network/git-clone, not code

99.5%

offline pass rate · 4,468 assertions

Real conversion of the live plugin

bun run … convert --to {opencode,gemini,codex} all succeeded:

OpenCode → 43 agents · 38 skills · merged config
Codex → 43 agents · native skill tree
Gemini → 43 agents · 189 skill files

(38 vs 39 skills: one is excluded by ce_platforms filtering — exactly as designed.)

Conversion is semantic, proven

ce-correctness-reviewer — source (Claude) model: inherit tools: Read,Grep,… color: blue ↓ converted to OpenCode mode: subagent temperature: 0.1 ← inferred: reviewer = deterministic

The converter infers a low temperature for review personas — not a passthrough copy.

Full logs: results/logs/experiments.md

Engineering judgment worth stealing

What's genuinely notable

Self-contained skill units

Every skill is a portable directory — no cross-skill imports. That single constraint is what makes the converter possible at all.

Prompts as compiler input

Markdown skills are treated as source code: linted by contract tests (ce- prefix, shell safety, relative paths) and versioned via semantic-release.

Structure as cost control

When a platform can't pick models per-agent, cost falls back to read budgets + output caps — graceful degradation baked into the design.

"Guardrails, not a controller"

Skills give an intelligent agent hard rules + judgment room — deliberately under-prescribed, the opposite of brittle scripted flows.

Dogfooded memory

31 real Learnings already in-repo, including one about the team's own release-version drift. The methodology debugged itself.

Two-output dispatch

Agents return a compact gist + a disk dossier — the pattern that lets one orchestrator fan out to a dozen agents without drowning in context.

Security & Limitations

The honest column

Security: low risk

2 runtime deps (citty, js-yaml) — tiny supply-chain surface, no postinstall.
No eval / new Function; Bun.spawn only for git with array args (no shell injection).
No secrets committed — secrets.ts is itself a detector that warns on MCP env vars.
Hardened FS: path sanitization, traversal & symlink-ownership guards, all CI-tested.

Limitations & concerns

Opinionated by design — Rails/Hotwire & Every's conventions are baked in; not all skills are stack-neutral.
Converter targets drift — non-Claude formats evolve; Codex still needs a manual Bun step for agents.
Pinned model aliases (claude-opus-4-6…) are a maintenance burden, updated by hand.
Efficacy is unmeasured — "compounding" is a credible design, but the repo ships no benchmark proving teams actually get faster.

The single biggest gap is evidence of outcome: the architecture for compounding is real and well-built, but its productivity claim rests on testimony, not measurement.

Conclusion & Recommendation

Should you adopt it?

Yes — as a reference architecture, and as a daily driver if you live on a supported platform.

✅ Adopt if…

You want a rigorous, opinionated agent workflow out of the box.
You value a persistent learning store over one-shot prompting.
You're on Claude Code, Codex, Cursor, OpenCode or Gemini.

⚠️ Borrow the patterns if…

You're building your own agent harness — the persona/anchor/dossier/tier patterns are the real prize.
Your stack is far from Rails and you'd fight the defaults.

Bottom line A production-grade, defensively-engineered codebase whose prompt-as-source-code discipline is more instructive than most "agent framework" repos with 10× the code. The methodology's payoff is plausible and well-architected — verify it on your own work before betting the team on it.

Deck + paper.md + security report committed to research workspace · EveryInc/compound-engineering-plugin @ v3.12.0

Compound EngineeringPlugin