TECHNICAL DECISION BRIEF · JUNE 2026

Choosing an Agent Framework

LangChain · LangGraph · Claude Agent SDK · OpenAI Agents SDK · Flue · Pydantic AI · Sandcastle · NVIDIA NeMo Agent Toolkit

A hands-on technical deep dive into what you are actually choosing between when you build & deploy an agent — viewed through a hybrid NVIDIA lens (self-hosted GPU/NIM and cloud APIs).

8 tools 12 decision axes decision tree hybrid deploy recommendations

Use ← → arrows or Space to navigate · press F for fullscreen

Read this first

These eight are not the same kind of thing

The #1 selection mistake is comparing tools that live at different layers of the stack. Place each one correctly and the choice gets easy.

① Integration / building blocks

Model wrappers, tools, retrievers, vector stores, prompt chains. The "glue".

LangChain

② Orchestration / control flow

How the agent loop runs: state, branching, durability, multi-agent.

LangGraphOpenAI Agents SDKPydantic AI

③ Harness (full autonomy)

Batteries-included loop: sandbox, tools, skills, sessions, subagents, context mgmt.

Claude Agent SDKFlue

④ Fleet orchestration

Runs N harnesses as units: container sandboxes, git worktrees/branch strategies, parallel fan-out, session fork/resume.

Sandcastle
NVIDIA NeMo Agent Toolkit sits sideways across all three. It is framework-agnostic — an observability, profiling, evaluation & optimization layer that wraps whatever you build above (LangChain, LangGraph, CrewAI, custom). It is a complement, not a competitor. The real question for you is rarely "which one" — it's "which orchestration/harness layer, instrumented by NeMo, served by NIM."
The selection framework

12 axes that should drive the decision

Score each candidate against the axes that matter for your specific agent — not all twelve weigh equally.

🧩
1 · Abstraction layer
Glue vs orchestration vs full harness — what do you actually need built for you?
🔓
2 · Model portability / lock-in
Model-agnostic or vendor-tied? Decisive for hybrid + multi-model strategy.
🏠
3 · Self-hosting & on-prem fit
Can it drive an OpenAI-compatible endpoint (NIM) on your own GPUs / air-gapped?
🎚️
4 · Control vs convenience
Low-level graph (you decide everything) vs opinionated (fewer decisions, faster).
💾
5 · State, durability & HITL
Checkpointing, resume-after-failure, human-in-the-loop, long-running jobs.
🤝
6 · Multi-agent model
Handoffs vs typed graph vs subagents — how teams of agents coordinate.
🔭
7 · Observability & eval
Tracing, token/latency profiling, evals. OTel-native vs proprietary dashboards.
🚀
8 · Deployment targets
Long-running server, serverless, edge, CI/CD, K8s. Where does it run?
🛡️
9 · Security & sandboxing
Permission gates, sandboxed tool execution, secrets handling, isolation.
🐍
10 · Language & team fit
Python vs TypeScript — match your team & existing services.
🌐
11 · Maturity & ecosystem
Integrations, community, docs, stability of the API surface over time.
💰
12 · Cost & operational model
Library you host vs managed platform; token credits; infra overhead.

LangChain Python JS/TS

v1.0 · Integration layer · LangChain Inc (Harrison Chase) · MIT
Layer ① — building blocks

🧠 What it is

The original, broadest toolkit: standardized chat-model wrappers, tools, retrievers, vector-store connectors, output parsers, and LCEL chains. v1.0 added a standard create_agent + middleware.

✅ Strengths

  • Unmatched ecosystem & integrations
  • Fastest path from idea → prototype
  • Swap models/vector DBs trivially
  • Model-agnostic by design

⚠️ Watch-outs

  • History of churn / leaky abstractions
  • Deep call stacks → hard to debug
  • Real agent control now lives in LangGraph

🎯 Choose when

You're prototyping, doing RAG, or need to integrate many models/data sources fast and don't yet need complex stateful control flow.

🔭 Observability

LangSmith (first-party tracing & eval). OTel export available. Also instrumentable by NeMo Agent Toolkit.

🟩 NVIDIA / hybrid fit

Strong. langchain-nvidia-ai-endpoints talks to NIM microservices on your own GPUs and hosted NIM/cloud APIs — true hybrid.

LangGraph Python JS/TS

v1.2 (May 2026) · Orchestration · LangChain Inc · MIT
Layer ② — control flow

🧠 What it is

Low-level graph orchestration: nodes (agents/functions), edges (incl. conditional routing), and a typed shared state object flowing through. Models the agent loop explicitly.

✅ Strengths

  • Deterministic, inspectable control flow
  • Durable execution + checkpointing
  • HITL interrupts, pause/resume, time-travel
  • Best for cyclic, branching, long-running

⚠️ Watch-outs

  • Steepest learning curve here
  • Graph boilerplate for simple agents
  • You own more of the design

🎯 Choose when

Production agents in regulated / high-stakes settings needing audit trails, approvals, retries, and resumability. Finance, healthcare, complex tool pipelines.

🔭 Observability

LangSmith + checkpointer state inspection. Pairs with Temporal for heavyweight durability. NeMo-instrumentable.

🟩 NVIDIA / hybrid fit

Strong. Same NVIDIA endpoint connector as LangChain; durable graphs map well to on-prem, long-running enterprise jobs. A top on-prem pick.

Claude Agent SDK Python TS

Renamed from Claude Code SDK (Sep 2025) · Anthropic · Harness
Layer ③ — full harness

🧠 What it is

The exact harness that powers Claude Code, made programmable: agent loop, automatic context management / compaction, built-in tools (file edit, bash, web), permissions, persistent sessions, subagents, first-class MCP.

✅ Strengths

  • Best-in-class autonomous loop out of the box
  • Context compaction handled for you
  • Superb for coding / computer-use agents
  • HITL checkpoints + sandboxed tools

⚠️ Watch-outs

  • Claude-only models (Anthropic API, Bedrock, Vertex)
  • Cannot drive self-hosted open models
  • Subscription credit model for SDK usage

🎯 Choose when

You want top-tier autonomy fast, Claude is your model, and the work is agentic coding, file/computer manipulation, or research loops.

🔭 Observability

SDK message stream (incl. parent_tool_use_id for subagents) + your own OTel. No managed dashboard equivalent to LangSmith.

🟨 NVIDIA / hybrid fit

Cloud-leaning. Excellent if Claude-via-API is allowed. Not a fit for self-hosted-on-NVIDIA-GPU models — Claude isn't self-hostable. Pick for the cloud side of a hybrid only.

OpenAI Agents SDK Python JS/TS

Successor to Swarm (early 2025) · OpenAI · Orchestration
Layer ② — control flow

🧠 What it is

A small, opinionated SDK on four primitives: Agents, Tools, Handoffs (agent→agent transfer of full context), and Guardrails (input/output validation). Plus Sessions & built-in Tracing.

✅ Strengths

  • Minimal, fast to ship, few decisions
  • Handoffs = clean multi-agent pattern
  • Guardrails for prompt-injection/policy
  • Tracing built in

⚠️ Watch-outs

  • Tuned for OpenAI models first
  • Handoffs less flexible than graph routing
  • Less control for complex branching

🎯 Choose when

You want a lightweight, opinionated multi-agent setup and value speed over fine-grained control. Great default for "teams of agents" with clean handoffs.

🔭 Observability

First-party tracing dashboard; OTel-compatible exporters; NeMo-instrumentable.

🟩 NVIDIA / hybrid fit

Good — underrated. It speaks the OpenAI Chat-Completions protocol, and NIM exposes an OpenAI-compatible endpoint. Point base_url at your NIM to run self-hosted models with the same code.

Flue TypeScript only

Agent harness framework · Fred K. Schott / withastro · Apache-2.0 · ~4.9k★
Layer ③ — full harness

🧠 What it is

"Claude Code, but 100% headless & programmable." A runtime-agnostic TypeScript harness — think Astro/Next.js, but for agents. Core idea: Agent = Model + Harness. Ships sandbox, durable execution, subagents, tools, skills, sessions, and MCP.

✅ Strengths

  • Harness-grade autonomy, write-once
  • Deploy anywhere: Node, Cloudflare, Vercel, Fly, GitHub Actions, GitLab CI
  • Multi-provider: Anthropic, OpenAI, Gemini, Kimi, OpenRouter
  • Built-in sandbox + durability

⚠️ Watch-outs

  • Young (2026) — smaller ecosystem
  • TypeScript only — no Python
  • Smaller community vs LangChain/OpenAI

⌨️ The model (from the docs)

// agents/triage.ts
import { createAgent } from 'flue';
export default createAgent(() => ({
  model: 'anthropic/claude-sonnet-4-6',
  instructions: 'Triage incoming tickets.',
}));

Agent = continuing assistant w/ sessions. Workflow = bounded job that runs once & returns a result (init(agent) → session → result). CLI: flue dev · flue connect · flue build.

🟩 NVIDIA / hybrid fit

Conditional. Edge/CI deploy targets are a unique strength for distributed/event-driven agents. For self-hosted NVIDIA models, route via an OpenAI-compatible provider/OpenRouter to your NIM. Best when your stack is TypeScript-first.

Pydantic AI Python only

Type-safe agent framework · Pydantic team (Samuel Colvin) · MIT
Layer ② — control flow

🧠 What it is

Agent framework built on type safety: tools via decorators, structured outputs validated by Pydantic (inputs before send, outputs after). pydantic-graph adds typed state machines for durable execution & HITL.

✅ Strengths

  • Compile-time-ish safety; fewer prod bugs
  • Reliable structured outputs
  • Clean, lightweight, "FastAPI feel"
  • Model-agnostic + Pydantic AI Gateway

⚠️ Watch-outs

  • Smaller ecosystem than LangChain
  • Python only
  • Newer than incumbents

🎯 Choose when

You want production discipline, validated structured I/O, and clean engineering — especially data-extraction / API-shaped agents where schema correctness matters.

🔭 Observability

Logfire — OpenTelemetry-native, traces the whole app not just the LLM layer. Best-in-class for enterprise OTel pipelines.

🟩 NVIDIA / hybrid fit

Strong. Model-agnostic incl. OpenAI-compatible NIM endpoints; OTel-native traces drop straight into enterprise/on-prem observability. Excellent on-prem candidate.

Sandcastle TypeScript only

Coding-agent fleet orchestration · Matt Pocock (AI Hero / Total TypeScript) · MIT · v0.8.0 · ~5.9k★
Layer ④ — fleet orchestration

🧠 What it is

Not an agent framework — it implements no loop. It orchestrates existing harnesses as plug-in providers (claudeCode(), codex(), cursor(), copilot(), opencode()) inside Docker/Podman containers or Vercel microVMs, with git worktrees, branch strategies (commits patch back to host), parallel fan-out, and session capture/resume/fork.

✅ Strengths

  • Safe parallel "AFK" coding-agent fleets
  • Harnesses are swappable — hedge against harness lock-in
  • Git-native: results land as branches/PRs
  • Tiny footprint: 7 npm pkgs / 15 MB (measured) — no model SDK, no loop

⚠️ Watch-outs

  • Coding agents only — not general agents
  • v0.8.0, pre-1.0 (2026)
  • Single high-profile maintainer, not company-backed
  • Needs Docker/Podman or Vercel sandbox infra

🎯 Choose when

You already use a harness (Claude Code, Codex…) and want to run N of them in parallel, sandboxed, with git lifecycle managed — task fan-out, review pipelines, planner/implementer/reviewer flows.

🔭 Observability

Session capture and lifecycle hooks; otherwise inherits whatever the underlying harness emits. Not an observability product.

🟨 NVIDIA / hybrid fit

Inherits from the harness. With Claude Code providers it's cloud-Claude; with opencode() custom models, local/NIM-backed models are reachable. Think of it as Kubernetes-for-harnesses: the layer above ③, not a competitor to ①–②.

NVIDIA NeMo Agent Toolkit Python

Formerly AgentIQ / Agent Intelligence Toolkit · NVIDIA · Apache-2.0 (open source)
⊥ Cross-cutting layer

🧠 What it is

Not a competing framework. A framework-agnostic layer that works around LangChain, LlamaIndex, CrewAI, Semantic Kernel, Google ADK & custom Python agents — adding instrumentation, profiling, eval & optimization.

✅ Strengths

  • Profiler: token/latency/bottleneck down to tool & agent level
  • Optimization, evaluation, scaling
  • Integrates Phoenix, LangSmith, Weave, Langfuse, OTel
  • Native NVIDIA stack (NIM, Triton, Dynamo)

⚠️ Watch-outs

  • Doesn't replace your authoring framework
  • Python-centric
  • Most valuable once you're on NVIDIA infra

🎯 Choose when

You're standardizing agents on NVIDIA infrastructure and need a single pane for profiling, cost/latency optimization, eval & observability regardless of which framework each team picked.

🔭 Observability

This is the observability/profiling play. OTel + major platforms; measures what NIM/GPU serving actually costs you.

🟩 NVIDIA / hybrid fit

Native — the NVIDIA glue. Pair it with whichever orchestration/harness you choose. For an NVIDIA team this is less "alternative" and more "default companion."

Side by side

The comparison matrix

Same axes, all eight. green=strong · amber=partial/conditional · red=weak/N-A

FrameworkLayerLangModel-agnosticSelf-host / NIMDurability + HITLMulti-agentDeploy reachMaturityBest-fit verdict
LangChain① GluePy/TSYesYes (NV endpoints)Via LangGraphVia LangGraphBroad (lib)HighPrototyping, RAG, integrations
LangGraph② Orchestr.Py/TSYesYes (NV endpoints)Best-in-classTyped graphServer/PlatformHighRegulated, durable, complex flows
Claude Agent SDK③ HarnessPy/TSClaude onlyNo (cloud Claude)CheckpointsSubagentsSelf-host libNewerTop autonomy, Claude, coding/CUA
OpenAI Agents SDK② Orchestr.Py/TSOpenAI-firstYes (OpenAI-compat → NIM)SessionsHandoffsServer/serverlessHighLightweight multi-agent, fast ship
Flue③ HarnessTS onlyYes (multi)Via OpenAI-compat/OpenRouterDurable + sandboxSubagentsEdge/CI/serverless ★Young (2026)TS-first, edge/CI autonomous agents
Pydantic AI② Orchestr.Py onlyYesYes (OpenAI-compat → NIM)pydantic-graphGraph/delegationServerNewerType-safe, structured I/O, OTel
Sandcastle④ Fleet orch.TS onlyVia harness choiceInherits harness (opencode → local models)Session resume/forkParallel fan-out of harnessesDocker/Podman/Vercel µVMsv0.8.0 (2026)Sandboxed fleets of coding agents
NeMo Agent Toolkit⊥ Cross-cutPyN/A (wraps all)Native (NIM/Triton)InheritsProfiles teamsNVIDIA infraEvolvingInstrument/optimize on NVIDIA infra
The self-host column is the hybrid decision-maker. Anything that speaks the OpenAI-compatible protocol can point at a NIM endpoint on your own GPUs by swapping a base_url. Claude Agent SDK is the one true cloud-only entry.
Your context · hybrid self-hosted + cloud

The NVIDIA hybrid deployment lens

At NVIDIA, model portability and on-prem GPU serving usually outrank raw dev velocity. Here's how that reshapes the ranking.

🏠 Self-hosted side (NIM on your GPUs)

NIM microservices expose an OpenAI-compatible API. Any framework that lets you set a custom base_url runs your self-hosted models with zero code rewrite.

  • Best fits: LangGraph · Pydantic AI · OpenAI Agents SDK · LangChain (native NV connector)
  • Conditional: Flue (via OpenAI-compatible / OpenRouter provider)
  • Does NOT apply: Claude Agent SDK (Claude can't be self-hosted)

☁️ Cloud-API side

When external APIs are allowed, optimize for capability & velocity.

  • Max autonomy: Claude Agent SDK (Claude) · Flue (any provider)
  • Fast multi-agent: OpenAI Agents SDK
  • All of the above remain hybrid-capable — same code, different endpoint

🔭 The cross-cutting requirement

NeMo Agent Toolkit instruments whatever you pick — profiling token/GPU cost, latency, bottlenecks, and evals across both sides of the hybrid. Treat it as a mandatory companion, not an alternative.

🔐 Data-sovereignty / air-gap

If data can't leave the building: model-agnostic + OpenAI-compatible + OTel-native is the filter. Winners: LangGraph or Pydantic AI, served by NIM, instrumented by NeMo. Cloud-only SDKs are out.

Make the call

Decision tree

Walk top to bottom. The first matching branch is your default starting point.

Q1
Must models run self-hosted / air-gapped on NVIDIA GPUs?

If YES → eliminate Claude Agent SDK. Use a model-agnostic framework against a NIM endpoint. → go Q2.   If NO (cloud OK) → all seven in play → go Q2.

Q2
Python or TypeScript shop?

TypeScript → Flue (harness) or LangGraph.js / OpenAI Agents JS.   Python → continue.

Q3
How much control flow complexity?

Complex/cyclic/regulated, needs durability+HITL+auditLangGraph.   Lightweight multi-agent, ship fastOpenAI Agents SDK.   Type-safe structured I/OPydantic AI.

Q4
Need a full autonomous harness (sandbox, file/computer use, coding loops)?

YES + Claude + cloud OKClaude Agent SDK.   YES + TS + portable/edgeFlue.   YES + Python + open models → LangGraph "deep agent" pattern.   Need N coding harnesses in parallelSandcastle on top of your harness.

Q5
Just integrating models/data fast / RAG prototype?

YESLangChain (graduate to LangGraph when control flow grows).

Always-on overlay: whichever you pick, instrument with NeMo Agent Toolkit for profiling, eval & cost/latency optimization on NVIDIA infra.
Bottom line

Recommendations by scenario (NVIDIA, hybrid)

🏭 Production, on-prem, regulated

LangGraph + NIM + NeMo Agent Toolkit. Durable, auditable, model-agnostic, fully self-hostable. The safest enterprise default.

🧪 Structured/data-extraction agents

Pydantic AI + NIM + Logfire/NeMo. Type-safe validated I/O, OTel-native, clean to operate on-prem.

🤹 Lightweight multi-agent, fast ship

OpenAI Agents SDK with base_url → NIM for self-host, or OpenAI for cloud. Same code, both sides.

🤖 Max autonomy / coding & computer-use

Claude Agent SDK (if cloud-Claude allowed) for the cloud lane; Flue if you want TS + portable/edge + multi-provider. Scale to parallel fleets by orchestrating harnesses with Sandcastle.

⚡ Prototype / RAG / explore

LangChain for speed; migrate the agent loop to LangGraph once it hardens.

🌐 Distributed / event-driven / CI agents (TS)

Flue — uniquely deploys to Cloudflare/Vercel/Fly/GitHub Actions/GitLab CI with a real harness.

One-line answer: For most NVIDIA work, start with LangGraph (or Pydantic AI) on NIM, instrumented by NeMo Agent Toolkit. Reach for Claude Agent SDK / Flue when you specifically need a turnkey autonomous harness; reach for OpenAI Agents SDK when you want minimal multi-agent glue.
Don't get burned · references

Anti-patterns & sources

Common selection mistakes

  • Comparing across layers. "LangChain vs LangGraph" is a false choice — they compose.
  • Ignoring lock-in early. Picking a vendor-tied SDK, then needing on-prem open models later = rewrite.
  • Skipping the OpenAI-compat trick. Most "can't self-host" worries vanish by pointing base_url at NIM.
  • Treating NeMo as a competitor. It's the instrumentation layer — adopt it alongside, not instead.
  • Over-engineering. Don't reach for a graph/harness when a single agent loop suffices.
  • Forgetting observability is a Day-1 axis, not a Day-90 retrofit.
  • Betting prod on the newest tool. Flue is exciting but young; weigh ecosystem risk.

Primary sources

  • Flue — flueframework.com · github.com/withastro/flue (Apache-2.0)
  • Claude Agent SDK — code.claude.com/docs/en/agent-sdk · anthropics/claude-agent-sdk-python
  • OpenAI Agents SDK — openai.github.io/openai-agents-python · developers.openai.com
  • LangChain / LangGraph 1.x — docs.langchain.com · langchain-ai/langgraph (v1.2, May 2026)
  • Pydantic AI / Logfire — ai.pydantic.dev · pydantic.dev/logfire
  • Sandcastle — github.com/mattpocock/sandcastle (MIT) · npm @ai-hero/sandcastle
  • NVIDIA NeMo Agent Toolkit — developer.nvidia.com/nemo-agent-toolkit · github.com/NVIDIA/NeMo-Agent-Toolkit
  • NVIDIA NIM (OpenAI-compatible serving) — docs.nvidia.com / build.nvidia.com
Validity window: this space moves monthly. Versions/credit models cited as of June 2026. Re-verify model specifiers, pricing, and self-host connectors before committing.
← → navigate · F fullscreen