TECHNICAL DECISION BRIEF · JUNE 2026

Choosing an Agent Framework

LangChain · LangGraph · Claude Agent SDK · OpenAI Agents SDK · Flue · Pydantic AI · Sandcastle · NVIDIA NeMo Agent Toolkit

A hands-on technical deep dive into what you are actually choosing between when you build & deploy an agent — viewed through a hybrid NVIDIA lens (self-hosted GPU/NIM and cloud APIs).

8 tools 12 decision axes decision tree hybrid deploy recommendations

Use ← → arrows or Space to navigate · press F for fullscreen

Read this first

These eight are not the same kind of thing

The #1 selection mistake is comparing tools that live at different layers of the stack. Place each one correctly and the choice gets easy.

① Integration / building blocks

Model wrappers, tools, retrievers, vector stores, prompt chains. The "glue".

LangChain

→

② Orchestration / control flow

How the agent loop runs: state, branching, durability, multi-agent.

LangGraphOpenAI Agents SDKPydantic AI

→

③ Harness (full autonomy)

Batteries-included loop: sandbox, tools, skills, sessions, subagents, context mgmt.

Claude Agent SDKFlue

→

④ Fleet orchestration

Runs N harnesses as units: container sandboxes, git worktrees/branch strategies, parallel fan-out, session fork/resume.

Sandcastle

NVIDIA NeMo Agent Toolkit sits sideways across all three. It is framework-agnostic — an observability, profiling, evaluation & optimization layer that wraps whatever you build above (LangChain, LangGraph, CrewAI, custom). It is a complement, not a competitor. The real question for you is rarely "which one" — it's "which orchestration/harness layer, instrumented by NeMo, served by NIM."

The selection framework

12 axes that should drive the decision

Score each candidate against the axes that matter for your specific agent — not all twelve weigh equally.

🧩

1 · Abstraction layer
Glue vs orchestration vs full harness — what do you actually need built for you?

🔓

2 · Model portability / lock-in
Model-agnostic or vendor-tied? Decisive for hybrid + multi-model strategy.

🏠

3 · Self-hosting & on-prem fit
Can it drive an OpenAI-compatible endpoint (NIM) on your own GPUs / air-gapped?

🎚️

4 · Control vs convenience
Low-level graph (you decide everything) vs opinionated (fewer decisions, faster).

💾

5 · State, durability & HITL
Checkpointing, resume-after-failure, human-in-the-loop, long-running jobs.

🤝

6 · Multi-agent model
Handoffs vs typed graph vs subagents — how teams of agents coordinate.

🔭

7 · Observability & eval
Tracing, token/latency profiling, evals. OTel-native vs proprietary dashboards.

🚀

8 · Deployment targets
Long-running server, serverless, edge, CI/CD, K8s. Where does it run?

🛡️

9 · Security & sandboxing
Permission gates, sandboxed tool execution, secrets handling, isolation.

🐍

10 · Language & team fit
Python vs TypeScript — match your team & existing services.

🌐

11 · Maturity & ecosystem
Integrations, community, docs, stability of the API surface over time.

💰

12 · Cost & operational model
Library you host vs managed platform; token credits; infra overhead.

LangChain Python JS/TS

v1.0 · Integration layer · LangChain Inc (Harrison Chase) · MIT

Layer ① — building blocks

🧠 What it is

The original, broadest toolkit: standardized chat-model wrappers, tools, retrievers, vector-store connectors, output parsers, and LCEL chains. v1.0 added a standard create_agent + middleware.

✅ Strengths

Unmatched ecosystem & integrations
Fastest path from idea → prototype
Swap models/vector DBs trivially
Model-agnostic by design

⚠️ Watch-outs

History of churn / leaky abstractions
Deep call stacks → hard to debug
Real agent control now lives in LangGraph

🎯 Choose when

You're prototyping, doing RAG, or need to integrate many models/data sources fast and don't yet need complex stateful control flow.

🔭 Observability

LangSmith (first-party tracing & eval). OTel export available. Also instrumentable by NeMo Agent Toolkit.

🟩 NVIDIA / hybrid fit

Strong. langchain-nvidia-ai-endpoints talks to NIM microservices on your own GPUs and hosted NIM/cloud APIs — true hybrid.

LangGraph Python JS/TS

v1.2 (May 2026) · Orchestration · LangChain Inc · MIT

Layer ② — control flow

🧠 What it is

Low-level graph orchestration: nodes (agents/functions), edges (incl. conditional routing), and a typed shared state object flowing through. Models the agent loop explicitly.

✅ Strengths

Deterministic, inspectable control flow
Durable execution + checkpointing
HITL interrupts, pause/resume, time-travel
Best for cyclic, branching, long-running

⚠️ Watch-outs

Steepest learning curve here
Graph boilerplate for simple agents
You own more of the design

🎯 Choose when

Production agents in regulated / high-stakes settings needing audit trails, approvals, retries, and resumability. Finance, healthcare, complex tool pipelines.

🔭 Observability

LangSmith + checkpointer state inspection. Pairs with Temporal for heavyweight durability. NeMo-instrumentable.

🟩 NVIDIA / hybrid fit

Strong. Same NVIDIA endpoint connector as LangChain; durable graphs map well to on-prem, long-running enterprise jobs. A top on-prem pick.

Claude Agent SDK Python TS

Renamed from Claude Code SDK (Sep 2025) · Anthropic · Harness

Layer ③ — full harness

🧠 What it is

The exact harness that powers Claude Code, made programmable: agent loop, automatic context management / compaction, built-in tools (file edit, bash, web), permissions, persistent sessions, subagents, first-class MCP.

✅ Strengths

Best-in-class autonomous loop out of the box
Context compaction handled for you
Superb for coding / computer-use agents
HITL checkpoints + sandboxed tools

⚠️ Watch-outs

Claude-only models (Anthropic API, Bedrock, Vertex)
Cannot drive self-hosted open models
Subscription credit model for SDK usage

🎯 Choose when

You want top-tier autonomy fast, Claude is your model, and the work is agentic coding, file/computer manipulation, or research loops.

🔭 Observability

SDK message stream (incl. parent_tool_use_id for subagents) + your own OTel. No managed dashboard equivalent to LangSmith.

🟨 NVIDIA / hybrid fit

Cloud-leaning. Excellent if Claude-via-API is allowed. Not a fit for self-hosted-on-NVIDIA-GPU models — Claude isn't self-hostable. Pick for the cloud side of a hybrid only.

OpenAI Agents SDK Python JS/TS

Successor to Swarm (early 2025) · OpenAI · Orchestration

Layer ② — control flow

🧠 What it is

A small, opinionated SDK on four primitives: Agents, Tools, Handoffs (agent→agent transfer of full context), and Guardrails (input/output validation). Plus Sessions & built-in Tracing.

✅ Strengths

Minimal, fast to ship, few decisions
Handoffs = clean multi-agent pattern
Guardrails for prompt-injection/policy
Tracing built in

⚠️ Watch-outs

Tuned for OpenAI models first
Handoffs less flexible than graph routing
Less control for complex branching

🎯 Choose when

You want a lightweight, opinionated multi-agent setup and value speed over fine-grained control. Great default for "teams of agents" with clean handoffs.

🔭 Observability

First-party tracing dashboard; OTel-compatible exporters; NeMo-instrumentable.

🟩 NVIDIA / hybrid fit

Good — underrated. It speaks the OpenAI Chat-Completions protocol, and NIM exposes an OpenAI-compatible endpoint. Point base_url at your NIM to run self-hosted models with the same code.

Flue TypeScript only

Agent harness framework · Fred K. Schott / withastro · Apache-2.0 · ~4.9k★

Layer ③ — full harness

🧠 What it is

"Claude Code, but 100% headless & programmable." A runtime-agnostic TypeScript harness — think Astro/Next.js, but for agents. Core idea: Agent = Model + Harness. Ships sandbox, durable execution, subagents, tools, skills, sessions, and MCP.

✅ Strengths

Harness-grade autonomy, write-once
Deploy anywhere: Node, Cloudflare, Vercel, Fly, GitHub Actions, GitLab CI
Multi-provider: Anthropic, OpenAI, Gemini, Kimi, OpenRouter
Built-in sandbox + durability

⚠️ Watch-outs

Young (2026) — smaller ecosystem
TypeScript only — no Python
Smaller community vs LangChain/OpenAI

⌨️ The model (from the docs)

// agents/triage.ts
import { createAgent } from 'flue';
export default createAgent(() => ({
  model: 'anthropic/claude-sonnet-4-6',
  instructions: 'Triage incoming tickets.',
}));

Agent = continuing assistant w/ sessions. Workflow = bounded job that runs once & returns a result (init(agent) → session → result). CLI: flue dev · flue connect · flue build.

🟩 NVIDIA / hybrid fit

Conditional. Edge/CI deploy targets are a unique strength for distributed/event-driven agents. For self-hosted NVIDIA models, route via an OpenAI-compatible provider/OpenRouter to your NIM. Best when your stack is TypeScript-first.

Pydantic AI Python only

Type-safe agent framework · Pydantic team (Samuel Colvin) · MIT

Layer ② — control flow

🧠 What it is

Agent framework built on type safety: tools via decorators, structured outputs validated by Pydantic (inputs before send, outputs after). pydantic-graph adds typed state machines for durable execution & HITL.

✅ Strengths

Compile-time-ish safety; fewer prod bugs
Reliable structured outputs
Clean, lightweight, "FastAPI feel"
Model-agnostic + Pydantic AI Gateway

⚠️ Watch-outs

Smaller ecosystem than LangChain
Python only
Newer than incumbents

🎯 Choose when

You want production discipline, validated structured I/O, and clean engineering — especially data-extraction / API-shaped agents where schema correctness matters.

🔭 Observability

Logfire — OpenTelemetry-native, traces the whole app not just the LLM layer. Best-in-class for enterprise OTel pipelines.

🟩 NVIDIA / hybrid fit

Strong. Model-agnostic incl. OpenAI-compatible NIM endpoints; OTel-native traces drop straight into enterprise/on-prem observability. Excellent on-prem candidate.

Sandcastle TypeScript only

Coding-agent fleet orchestration · Matt Pocock (AI Hero / Total TypeScript) · MIT · v0.8.0 · ~5.9k★

Layer ④ — fleet orchestration

🧠 What it is

Not an agent framework — it implements no loop. It orchestrates existing harnesses as plug-in providers (claudeCode(), codex(), cursor(), copilot(), opencode()) inside Docker/Podman containers or Vercel microVMs, with git worktrees, branch strategies (commits patch back to host), parallel fan-out, and session capture/resume/fork.

✅ Strengths

Safe parallel "AFK" coding-agent fleets
Harnesses are swappable — hedge against harness lock-in
Git-native: results land as branches/PRs
Tiny footprint: 7 npm pkgs / 15 MB (measured) — no model SDK, no loop

⚠️ Watch-outs

Coding agents only — not general agents
v0.8.0, pre-1.0 (2026)
Single high-profile maintainer, not company-backed
Needs Docker/Podman or Vercel sandbox infra

🎯 Choose when

You already use a harness (Claude Code, Codex…) and want to run N of them in parallel, sandboxed, with git lifecycle managed — task fan-out, review pipelines, planner/implementer/reviewer flows.

🔭 Observability

Session capture and lifecycle hooks; otherwise inherits whatever the underlying harness emits. Not an observability product.

🟨 NVIDIA / hybrid fit

Inherits from the harness. With Claude Code providers it's cloud-Claude; with opencode() custom models, local/NIM-backed models are reachable. Think of it as Kubernetes-for-harnesses: the layer above ③, not a competitor to ①–②.

NVIDIA NeMo Agent Toolkit Python

Formerly AgentIQ / Agent Intelligence Toolkit · NVIDIA · Apache-2.0 (open source)

⊥ Cross-cutting layer

🧠 What it is

Not a competing framework. A framework-agnostic layer that works around LangChain, LlamaIndex, CrewAI, Semantic Kernel, Google ADK & custom Python agents — adding instrumentation, profiling, eval & optimization.

✅ Strengths

Profiler: token/latency/bottleneck down to tool & agent level
Optimization, evaluation, scaling
Integrates Phoenix, LangSmith, Weave, Langfuse, OTel
Native NVIDIA stack (NIM, Triton, Dynamo)

⚠️ Watch-outs

Doesn't replace your authoring framework
Python-centric
Most valuable once you're on NVIDIA infra

🎯 Choose when

You're standardizing agents on NVIDIA infrastructure and need a single pane for profiling, cost/latency optimization, eval & observability regardless of which framework each team picked.

🔭 Observability

This is the observability/profiling play. OTel + major platforms; measures what NIM/GPU serving actually costs you.

🟩 NVIDIA / hybrid fit

Native — the NVIDIA glue. Pair it with whichever orchestration/harness you choose. For an NVIDIA team this is less "alternative" and more "default companion."

Side by side

The comparison matrix

Same axes, all eight. green=strong · amber=partial/conditional · red=weak/N-A

Framework	Layer	Lang	Model-agnostic	Self-host / NIM	Durability + HITL	Multi-agent	Deploy reach	Maturity	Best-fit verdict
LangChain	① Glue	Py/TS	Yes	Yes (NV endpoints)	Via LangGraph	Via LangGraph	Broad (lib)	High	Prototyping, RAG, integrations
LangGraph	② Orchestr.	Py/TS	Yes	Yes (NV endpoints)	Best-in-class	Typed graph	Server/Platform	High	Regulated, durable, complex flows
Claude Agent SDK	③ Harness	Py/TS	Claude only	No (cloud Claude)	Checkpoints	Subagents	Self-host lib	Newer	Top autonomy, Claude, coding/CUA
OpenAI Agents SDK	② Orchestr.	Py/TS	OpenAI-first	Yes (OpenAI-compat → NIM)	Sessions	Handoffs	Server/serverless	High	Lightweight multi-agent, fast ship
Flue	③ Harness	TS only	Yes (multi)	Via OpenAI-compat/OpenRouter	Durable + sandbox	Subagents	Edge/CI/serverless ★	Young (2026)	TS-first, edge/CI autonomous agents
Pydantic AI	② Orchestr.	Py only	Yes	Yes (OpenAI-compat → NIM)	pydantic-graph	Graph/delegation	Server	Newer	Type-safe, structured I/O, OTel
Sandcastle	④ Fleet orch.	TS only	Via harness choice	Inherits harness (opencode → local models)	Session resume/fork	Parallel fan-out of harnesses	Docker/Podman/Vercel µVMs	v0.8.0 (2026)	Sandboxed fleets of coding agents
NeMo Agent Toolkit	⊥ Cross-cut	Py	N/A (wraps all)	Native (NIM/Triton)	Inherits	Profiles teams	NVIDIA infra	Evolving	Instrument/optimize on NVIDIA infra

★ The self-host column is the hybrid decision-maker. Anything that speaks the OpenAI-compatible protocol can point at a NIM endpoint on your own GPUs by swapping a base_url. Claude Agent SDK is the one true cloud-only entry.

Your context · hybrid self-hosted + cloud

The NVIDIA hybrid deployment lens

At NVIDIA, model portability and on-prem GPU serving usually outrank raw dev velocity. Here's how that reshapes the ranking.

🏠 Self-hosted side (NIM on your GPUs)

NIM microservices expose an OpenAI-compatible API. Any framework that lets you set a custom base_url runs your self-hosted models with zero code rewrite.

Best fits: LangGraph · Pydantic AI · OpenAI Agents SDK · LangChain (native NV connector)
Conditional: Flue (via OpenAI-compatible / OpenRouter provider)
Does NOT apply: Claude Agent SDK (Claude can't be self-hosted)

☁️ Cloud-API side

When external APIs are allowed, optimize for capability & velocity.

Max autonomy: Claude Agent SDK (Claude) · Flue (any provider)
Fast multi-agent: OpenAI Agents SDK
All of the above remain hybrid-capable — same code, different endpoint

🔭 The cross-cutting requirement

NeMo Agent Toolkit instruments whatever you pick — profiling token/GPU cost, latency, bottlenecks, and evals across both sides of the hybrid. Treat it as a mandatory companion, not an alternative.

🔐 Data-sovereignty / air-gap

If data can't leave the building: model-agnostic + OpenAI-compatible + OTel-native is the filter. Winners: LangGraph or Pydantic AI, served by NIM, instrumented by NeMo. Cloud-only SDKs are out.

Make the call

Decision tree

Walk top to bottom. The first matching branch is your default starting point.

Must models run self-hosted / air-gapped on NVIDIA GPUs?

If YES → eliminate Claude Agent SDK. Use a model-agnostic framework against a NIM endpoint. → go Q2. If NO (cloud OK) → all seven in play → go Q2.

Python or TypeScript shop?

TypeScript → Flue (harness) or LangGraph.js / OpenAI Agents JS. Python → continue.

How much control flow complexity?

Complex/cyclic/regulated, needs durability+HITL+audit → LangGraph. Lightweight multi-agent, ship fast → OpenAI Agents SDK. Type-safe structured I/O → Pydantic AI.

Need a full autonomous harness (sandbox, file/computer use, coding loops)?

YES + Claude + cloud OK → Claude Agent SDK. YES + TS + portable/edge → Flue. YES + Python + open models → LangGraph "deep agent" pattern. Need N coding harnesses in parallel → Sandcastle on top of your harness.

Just integrating models/data fast / RAG prototype?

YES → LangChain (graduate to LangGraph when control flow grows).

Always-on overlay: whichever you pick, instrument with NeMo Agent Toolkit for profiling, eval & cost/latency optimization on NVIDIA infra.

Bottom line

Recommendations by scenario (NVIDIA, hybrid)

🏭 Production, on-prem, regulated

LangGraph + NIM + NeMo Agent Toolkit. Durable, auditable, model-agnostic, fully self-hostable. The safest enterprise default.

🧪 Structured/data-extraction agents

Pydantic AI + NIM + Logfire/NeMo. Type-safe validated I/O, OTel-native, clean to operate on-prem.

🤹 Lightweight multi-agent, fast ship

OpenAI Agents SDK with base_url → NIM for self-host, or OpenAI for cloud. Same code, both sides.

🤖 Max autonomy / coding & computer-use

Claude Agent SDK (if cloud-Claude allowed) for the cloud lane; Flue if you want TS + portable/edge + multi-provider. Scale to parallel fleets by orchestrating harnesses with Sandcastle.

⚡ Prototype / RAG / explore

LangChain for speed; migrate the agent loop to LangGraph once it hardens.

🌐 Distributed / event-driven / CI agents (TS)

Flue — uniquely deploys to Cloudflare/Vercel/Fly/GitHub Actions/GitLab CI with a real harness.

One-line answer: For most NVIDIA work, start with LangGraph (or Pydantic AI) on NIM, instrumented by NeMo Agent Toolkit. Reach for Claude Agent SDK / Flue when you specifically need a turnkey autonomous harness; reach for OpenAI Agents SDK when you want minimal multi-agent glue.

Don't get burned · references

Anti-patterns & sources

Common selection mistakes

Comparing across layers. "LangChain vs LangGraph" is a false choice — they compose.
Ignoring lock-in early. Picking a vendor-tied SDK, then needing on-prem open models later = rewrite.
Skipping the OpenAI-compat trick. Most "can't self-host" worries vanish by pointing base_url at NIM.
Treating NeMo as a competitor. It's the instrumentation layer — adopt it alongside, not instead.
Over-engineering. Don't reach for a graph/harness when a single agent loop suffices.
Forgetting observability is a Day-1 axis, not a Day-90 retrofit.
Betting prod on the newest tool. Flue is exciting but young; weigh ecosystem risk.

Primary sources

Flue — flueframework.com · github.com/withastro/flue (Apache-2.0)
Claude Agent SDK — code.claude.com/docs/en/agent-sdk · anthropics/claude-agent-sdk-python
OpenAI Agents SDK — openai.github.io/openai-agents-python · developers.openai.com
LangChain / LangGraph 1.x — docs.langchain.com · langchain-ai/langgraph (v1.2, May 2026)
Pydantic AI / Logfire — ai.pydantic.dev · pydantic.dev/logfire
Sandcastle — github.com/mattpocock/sandcastle (MIT) · npm @ai-hero/sandcastle
NVIDIA NeMo Agent Toolkit — developer.nvidia.com/nemo-agent-toolkit · github.com/NVIDIA/NeMo-Agent-Toolkit
NVIDIA NIM (OpenAI-compatible serving) — docs.nvidia.com / build.nvidia.com

Validity window: this space moves monthly. Versions/credit models cited as of June 2026. Re-verify model specifiers, pricing, and self-host connectors before committing.