Apr 12, 2026

LangGraph and Agent Frameworks: Using the Right Tool for the Job

There's a common trap when building AI-powered pipelines: reaching for an agentic framework because the problem feels “intelligent,” even when the solution is fundamentally deterministic. This post walks through a document ingestion system where that mistake shows up—and what the right mental model looks like.


The System: Ingesting Documents at Scale

The pipeline processes documents at scale—loading files from storage, extracting structured metadata via an LLM, enriching that metadata against external systems, and indexing everything into a vector store and document store for downstream retrieval.

The flow looks like this:

Object storage / local filesystem

list_documents

[per document]
load → classify → chunk → embed → extract_metadata → enrich → store → archive

Simple enough on paper. The complexity comes from two questions:

  1. How do you orchestrate deterministic steps cleanly?
  2. Where does the LLM fit in—and how?

The system uses two patterns to answer these: a graph-based workflow engine for orchestration and agent-based execution for LLM-driven tasks. Understanding when to use each is key.


LangGraph: When the Path Is Known

LangGraph is a workflow engine built on top of LangChain. Its core primitive is a directed graph where nodes are Python functions and edges define allowed transitions. State flows through the graph as a typed dictionary.

Here’s a simplified version of the ingestion graph:

from langgraph.graph import END, StateGraph

workflow = StateGraph(dict)

workflow.add_node("load_document", load_document)
workflow.add_node("classify_document", classify_document)
workflow.add_node("chunk_document", chunk_document)
workflow.add_node("embed_chunks", embed_chunks_node)
workflow.add_node("extract_metadata", extract_metadata_node)
workflow.add_node("enrich_metadata", enrich_metadata_node)
workflow.add_node("store_embeddings", store_embeddings_node)
workflow.add_node("store_summary", store_summary_node)
workflow.add_node("archive_document", archive_document)
workflow.add_node("skip_document", skip_document)

workflow.set_entry_point("load_document")
workflow.add_edge("load_document", "classify_document")

workflow.add_conditional_edges(
"classify_document",
should_process,
{"process": "chunk_document", "skip": "skip_document"},
)

workflow.add_edge("chunk_document", "embed_chunks")
workflow.add_edge("embed_chunks", "extract_metadata")
workflow.add_edge("extract_metadata", "enrich_metadata")
workflow.add_edge("enrich_metadata", "store_embeddings")
workflow.add_edge("store_embeddings", "store_summary")
workflow.add_edge("store_summary", "archive_document")
workflow.add_edge("archive_document", END)
workflow.add_edge("skip_document", END)

graph = workflow.compile()

What this gives you:

  • Explicit control flow: Every transition is defined in code.
  • Typed state management: Each node declares inputs and outputs.
  • Deterministic branching: Conditions are pure Python—no LLM needed.
  • Composability: Easy to wrap per-document flows into batch processing.

Mental model: Use LangGraph when you know the answer to “what happens next?”
If the pipeline topology is fixed, a deterministic DAG is the right tool.


Agent Frameworks: When the LLM Decides the Path

Agent frameworks introduce a different execution model: the LLM drives control flow by choosing tools, interpreting results, and deciding what to do next.

The Right Use: Orchestrator with Tools

At query time, an orchestrator agent can route user questions to specialized downstream components, each exposed as a tool.

Example pattern:

def build_tools():
return [
make_tool("query_domain_a"),
make_tool("query_domain_b"),
make_tool("synthesize_results"),
]

At runtime, the LLM decides:

  • Should it call one tool or multiple?
  • Does it need to combine results?
  • Does it need to resolve entities first?

This kind of routing depends on semantic understanding, not deterministic rules.

No static DAG can reliably express this.

Mental model: Use an agent when the path depends on meaning the LLM must interpret.


A Valid Use Case: Enrichment with Tool Interaction

In the enrichment step, an agent can call external systems (e.g., registries or APIs), interpret responses, and resolve ambiguity.

agent = Agent(
model=model,
system_prompt=prompt,
tools=tools,
)

response = agent(prompt)

This is justified when:

  • Tool results may be ambiguous
  • Multiple calls may be needed
  • The LLM must reason about correctness

However, it’s worth monitoring: if it always becomes a single tool call, a simpler pattern may be better.


The Anti-Pattern: Agent as a Thin Wrapper

A common mistake is using an agent for simple, single-step tasks:

agent = Agent(
model=model,
system_prompt=prompt,
)

response = agent(chunk)
parsed = parse_json(response)

No tools. No iteration. No decision-making.

This is just a prompt → JSON call with unnecessary overhead.

Problems:

  • Added latency from agent loop setup
  • Repeated overhead for each chunk
  • Fragile parsing logic
  • No strong structure guarantees

The Better Approach: Structured LLM Calls

Use direct structured output instead:

from langchain_core.messages import HumanMessage, SystemMessage

llm = SomeLLM(model="...", temperature=0.2)
chain = llm.with_structured_output(MySchema)

result = chain.invoke([
SystemMessage(content=system_prompt),
HumanMessage(content=chunk),
])

Benefits:

  • Strong typing via schema validation
  • No manual parsing
  • Lower latency
  • Simpler execution model

The Decision Framework

Does control flow depend on meaning the LLM must interpret?

├─ NO → Use LangGraph (or plain code)
│ Fixed steps, deterministic branching
│ Examples: ETL, document pipelines

└─ YES → Does the LLM need tools or iteration?

├─ NO → Use direct structured LLM call
│ Prompt → structured output
│ Examples: extraction, classification

└─ YES → Use an agent
Tool selection + reasoning loop
Examples: routing, research, disambiguation

When each layer does only its job, the system becomes simpler, faster, and easier to reason about.

No comments:

Post a Comment