LangChain · LangGraph · Claude Agent SDK · OpenAI Agents SDK · Flue · Pydantic AI · Sandcastle · NVIDIA NeMo Agent Toolkit
A hands-on technical deep dive into what you are actually choosing between when you build & deploy an agent — viewed through a hybrid NVIDIA lens (self-hosted GPU/NIM and cloud APIs).
Use ← → arrows or Space to navigate · press F for fullscreen
The #1 selection mistake is comparing tools that live at different layers of the stack. Place each one correctly and the choice gets easy.
Model wrappers, tools, retrievers, vector stores, prompt chains. The "glue".
How the agent loop runs: state, branching, durability, multi-agent.
Batteries-included loop: sandbox, tools, skills, sessions, subagents, context mgmt.
Runs N harnesses as units: container sandboxes, git worktrees/branch strategies, parallel fan-out, session fork/resume.
Score each candidate against the axes that matter for your specific agent — not all twelve weigh equally.
The original, broadest toolkit: standardized chat-model wrappers, tools, retrievers, vector-store connectors, output parsers, and LCEL chains. v1.0 added a standard create_agent + middleware.
You're prototyping, doing RAG, or need to integrate many models/data sources fast and don't yet need complex stateful control flow.
LangSmith (first-party tracing & eval). OTel export available. Also instrumentable by NeMo Agent Toolkit.
Strong. langchain-nvidia-ai-endpoints talks to NIM microservices on your own GPUs and hosted NIM/cloud APIs — true hybrid.
Low-level graph orchestration: nodes (agents/functions), edges (incl. conditional routing), and a typed shared state object flowing through. Models the agent loop explicitly.
Production agents in regulated / high-stakes settings needing audit trails, approvals, retries, and resumability. Finance, healthcare, complex tool pipelines.
LangSmith + checkpointer state inspection. Pairs with Temporal for heavyweight durability. NeMo-instrumentable.
Strong. Same NVIDIA endpoint connector as LangChain; durable graphs map well to on-prem, long-running enterprise jobs. A top on-prem pick.
The exact harness that powers Claude Code, made programmable: agent loop, automatic context management / compaction, built-in tools (file edit, bash, web), permissions, persistent sessions, subagents, first-class MCP.
You want top-tier autonomy fast, Claude is your model, and the work is agentic coding, file/computer manipulation, or research loops.
SDK message stream (incl. parent_tool_use_id for subagents) + your own OTel. No managed dashboard equivalent to LangSmith.
Cloud-leaning. Excellent if Claude-via-API is allowed. Not a fit for self-hosted-on-NVIDIA-GPU models — Claude isn't self-hostable. Pick for the cloud side of a hybrid only.
A small, opinionated SDK on four primitives: Agents, Tools, Handoffs (agent→agent transfer of full context), and Guardrails (input/output validation). Plus Sessions & built-in Tracing.
You want a lightweight, opinionated multi-agent setup and value speed over fine-grained control. Great default for "teams of agents" with clean handoffs.
First-party tracing dashboard; OTel-compatible exporters; NeMo-instrumentable.
Good — underrated. It speaks the OpenAI Chat-Completions protocol, and NIM exposes an OpenAI-compatible endpoint. Point base_url at your NIM to run self-hosted models with the same code.
"Claude Code, but 100% headless & programmable." A runtime-agnostic TypeScript harness — think Astro/Next.js, but for agents. Core idea: Agent = Model + Harness. Ships sandbox, durable execution, subagents, tools, skills, sessions, and MCP.
// agents/triage.ts import { createAgent } from 'flue'; export default createAgent(() => ({ model: 'anthropic/claude-sonnet-4-6', instructions: 'Triage incoming tickets.', }));
Agent = continuing assistant w/ sessions. Workflow = bounded job that runs once & returns a result (init(agent) → session → result). CLI: flue dev · flue connect · flue build.
Conditional. Edge/CI deploy targets are a unique strength for distributed/event-driven agents. For self-hosted NVIDIA models, route via an OpenAI-compatible provider/OpenRouter to your NIM. Best when your stack is TypeScript-first.
Agent framework built on type safety: tools via decorators, structured outputs validated by Pydantic (inputs before send, outputs after). pydantic-graph adds typed state machines for durable execution & HITL.
You want production discipline, validated structured I/O, and clean engineering — especially data-extraction / API-shaped agents where schema correctness matters.
Logfire — OpenTelemetry-native, traces the whole app not just the LLM layer. Best-in-class for enterprise OTel pipelines.
Strong. Model-agnostic incl. OpenAI-compatible NIM endpoints; OTel-native traces drop straight into enterprise/on-prem observability. Excellent on-prem candidate.
Not an agent framework — it implements no loop. It orchestrates existing harnesses as plug-in providers (claudeCode(), codex(), cursor(), copilot(), opencode()) inside Docker/Podman containers or Vercel microVMs, with git worktrees, branch strategies (commits patch back to host), parallel fan-out, and session capture/resume/fork.
You already use a harness (Claude Code, Codex…) and want to run N of them in parallel, sandboxed, with git lifecycle managed — task fan-out, review pipelines, planner/implementer/reviewer flows.
Session capture and lifecycle hooks; otherwise inherits whatever the underlying harness emits. Not an observability product.
Inherits from the harness. With Claude Code providers it's cloud-Claude; with opencode() custom models, local/NIM-backed models are reachable. Think of it as Kubernetes-for-harnesses: the layer above ③, not a competitor to ①–②.
Not a competing framework. A framework-agnostic layer that works around LangChain, LlamaIndex, CrewAI, Semantic Kernel, Google ADK & custom Python agents — adding instrumentation, profiling, eval & optimization.
You're standardizing agents on NVIDIA infrastructure and need a single pane for profiling, cost/latency optimization, eval & observability regardless of which framework each team picked.
This is the observability/profiling play. OTel + major platforms; measures what NIM/GPU serving actually costs you.
Native — the NVIDIA glue. Pair it with whichever orchestration/harness you choose. For an NVIDIA team this is less "alternative" and more "default companion."
Same axes, all eight. green=strong · amber=partial/conditional · red=weak/N-A
| Framework | Layer | Lang | Model-agnostic | Self-host / NIM | Durability + HITL | Multi-agent | Deploy reach | Maturity | Best-fit verdict |
|---|---|---|---|---|---|---|---|---|---|
| LangChain | ① Glue | Py/TS | Yes | Yes (NV endpoints) | Via LangGraph | Via LangGraph | Broad (lib) | High | Prototyping, RAG, integrations |
| LangGraph | ② Orchestr. | Py/TS | Yes | Yes (NV endpoints) | Best-in-class | Typed graph | Server/Platform | High | Regulated, durable, complex flows |
| Claude Agent SDK | ③ Harness | Py/TS | Claude only | No (cloud Claude) | Checkpoints | Subagents | Self-host lib | Newer | Top autonomy, Claude, coding/CUA |
| OpenAI Agents SDK | ② Orchestr. | Py/TS | OpenAI-first | Yes (OpenAI-compat → NIM) | Sessions | Handoffs | Server/serverless | High | Lightweight multi-agent, fast ship |
| Flue | ③ Harness | TS only | Yes (multi) | Via OpenAI-compat/OpenRouter | Durable + sandbox | Subagents | Edge/CI/serverless ★ | Young (2026) | TS-first, edge/CI autonomous agents |
| Pydantic AI | ② Orchestr. | Py only | Yes | Yes (OpenAI-compat → NIM) | pydantic-graph | Graph/delegation | Server | Newer | Type-safe, structured I/O, OTel |
| Sandcastle | ④ Fleet orch. | TS only | Via harness choice | Inherits harness (opencode → local models) | Session resume/fork | Parallel fan-out of harnesses | Docker/Podman/Vercel µVMs | v0.8.0 (2026) | Sandboxed fleets of coding agents |
| NeMo Agent Toolkit | ⊥ Cross-cut | Py | N/A (wraps all) | Native (NIM/Triton) | Inherits | Profiles teams | NVIDIA infra | Evolving | Instrument/optimize on NVIDIA infra |
base_url. Claude Agent SDK is the one true cloud-only entry.At NVIDIA, model portability and on-prem GPU serving usually outrank raw dev velocity. Here's how that reshapes the ranking.
NIM microservices expose an OpenAI-compatible API. Any framework that lets you set a custom base_url runs your self-hosted models with zero code rewrite.
When external APIs are allowed, optimize for capability & velocity.
NeMo Agent Toolkit instruments whatever you pick — profiling token/GPU cost, latency, bottlenecks, and evals across both sides of the hybrid. Treat it as a mandatory companion, not an alternative.
If data can't leave the building: model-agnostic + OpenAI-compatible + OTel-native is the filter. Winners: LangGraph or Pydantic AI, served by NIM, instrumented by NeMo. Cloud-only SDKs are out.
Walk top to bottom. The first matching branch is your default starting point.
If YES → eliminate Claude Agent SDK. Use a model-agnostic framework against a NIM endpoint. → go Q2. If NO (cloud OK) → all seven in play → go Q2.
TypeScript → Flue (harness) or LangGraph.js / OpenAI Agents JS. Python → continue.
Complex/cyclic/regulated, needs durability+HITL+audit → LangGraph. Lightweight multi-agent, ship fast → OpenAI Agents SDK. Type-safe structured I/O → Pydantic AI.
YES + Claude + cloud OK → Claude Agent SDK. YES + TS + portable/edge → Flue. YES + Python + open models → LangGraph "deep agent" pattern. Need N coding harnesses in parallel → Sandcastle on top of your harness.
YES → LangChain (graduate to LangGraph when control flow grows).
LangGraph + NIM + NeMo Agent Toolkit. Durable, auditable, model-agnostic, fully self-hostable. The safest enterprise default.
Pydantic AI + NIM + Logfire/NeMo. Type-safe validated I/O, OTel-native, clean to operate on-prem.
OpenAI Agents SDK with base_url → NIM for self-host, or OpenAI for cloud. Same code, both sides.
Claude Agent SDK (if cloud-Claude allowed) for the cloud lane; Flue if you want TS + portable/edge + multi-provider. Scale to parallel fleets by orchestrating harnesses with Sandcastle.
LangChain for speed; migrate the agent loop to LangGraph once it hardens.
Flue — uniquely deploys to Cloudflare/Vercel/Fly/GitHub Actions/GitLab CI with a real harness.
base_url at NIM.