Apr 12, 2026

Beyond Last-K Turns: Building Memory That Actually Thinks

Every multi-turn AI agent needs memory. The simplest implementation is obvious: load the last N turns of conversation before each call, then append the new turn after.

LangGraph and Agent Frameworks: Using the Right Tool for the Job

There's a common trap when building AI-powered pipelines: reaching for an agentic framework because the problem feels “intelligent,” even when the solution is fundamentally deterministic. This post walks through a document ingestion system where that mistake shows up—and what the right mental model looks like.


The System: Ingesting Documents at Scale

The pipeline processes documents at scale—loading files from storage, extracting structured metadata via an LLM, enriching that metadata against external systems, and indexing everything into a vector store and document store for downstream retrieval.

The flow looks like this:

Object storage / local filesystem

list_documents

[per document]
load → classify → chunk → embed → extract_metadata → enrich → store → archive

Simple enough on paper. The complexity comes from two questions:

  1. How do you orchestrate deterministic steps cleanly?
  2. Where does the LLM fit in—and how?

The system uses two patterns to answer these: a graph-based workflow engine for orchestration and agent-based execution for LLM-driven tasks. Understanding when to use each is key.


LangGraph: When the Path Is Known

LangGraph is a workflow engine built on top of LangChain. Its core primitive is a directed graph where nodes are Python functions and edges define allowed transitions. State flows through the graph as a typed dictionary.

Here’s a simplified version of the ingestion graph:

from langgraph.graph import END, StateGraph

workflow = StateGraph(dict)

workflow.add_node("load_document", load_document)
workflow.add_node("classify_document", classify_document)
workflow.add_node("chunk_document", chunk_document)
workflow.add_node("embed_chunks", embed_chunks_node)
workflow.add_node("extract_metadata", extract_metadata_node)
workflow.add_node("enrich_metadata", enrich_metadata_node)
workflow.add_node("store_embeddings", store_embeddings_node)
workflow.add_node("store_summary", store_summary_node)
workflow.add_node("archive_document", archive_document)
workflow.add_node("skip_document", skip_document)

workflow.set_entry_point("load_document")
workflow.add_edge("load_document", "classify_document")

workflow.add_conditional_edges(
"classify_document",
should_process,
{"process": "chunk_document", "skip": "skip_document"},
)

workflow.add_edge("chunk_document", "embed_chunks")
workflow.add_edge("embed_chunks", "extract_metadata")
workflow.add_edge("extract_metadata", "enrich_metadata")
workflow.add_edge("enrich_metadata", "store_embeddings")
workflow.add_edge("store_embeddings", "store_summary")
workflow.add_edge("store_summary", "archive_document")
workflow.add_edge("archive_document", END)
workflow.add_edge("skip_document", END)

graph = workflow.compile()

What this gives you:

  • Explicit control flow: Every transition is defined in code.
  • Typed state management: Each node declares inputs and outputs.
  • Deterministic branching: Conditions are pure Python—no LLM needed.
  • Composability: Easy to wrap per-document flows into batch processing.

Mental model: Use LangGraph when you know the answer to “what happens next?”
If the pipeline topology is fixed, a deterministic DAG is the right tool.


Agent Frameworks: When the LLM Decides the Path

Agent frameworks introduce a different execution model: the LLM drives control flow by choosing tools, interpreting results, and deciding what to do next.

The Right Use: Orchestrator with Tools

At query time, an orchestrator agent can route user questions to specialized downstream components, each exposed as a tool.

Example pattern:

def build_tools():
return [
make_tool("query_domain_a"),
make_tool("query_domain_b"),
make_tool("synthesize_results"),
]

At runtime, the LLM decides:

  • Should it call one tool or multiple?
  • Does it need to combine results?
  • Does it need to resolve entities first?

This kind of routing depends on semantic understanding, not deterministic rules.

No static DAG can reliably express this.

Mental model: Use an agent when the path depends on meaning the LLM must interpret.


A Valid Use Case: Enrichment with Tool Interaction

In the enrichment step, an agent can call external systems (e.g., registries or APIs), interpret responses, and resolve ambiguity.

agent = Agent(
model=model,
system_prompt=prompt,
tools=tools,
)

response = agent(prompt)

This is justified when:

  • Tool results may be ambiguous
  • Multiple calls may be needed
  • The LLM must reason about correctness

However, it’s worth monitoring: if it always becomes a single tool call, a simpler pattern may be better.


The Anti-Pattern: Agent as a Thin Wrapper

A common mistake is using an agent for simple, single-step tasks:

agent = Agent(
model=model,
system_prompt=prompt,
)

response = agent(chunk)
parsed = parse_json(response)

No tools. No iteration. No decision-making.

This is just a prompt → JSON call with unnecessary overhead.

Problems:

  • Added latency from agent loop setup
  • Repeated overhead for each chunk
  • Fragile parsing logic
  • No strong structure guarantees

The Better Approach: Structured LLM Calls

Use direct structured output instead:

from langchain_core.messages import HumanMessage, SystemMessage

llm = SomeLLM(model="...", temperature=0.2)
chain = llm.with_structured_output(MySchema)

result = chain.invoke([
SystemMessage(content=system_prompt),
HumanMessage(content=chunk),
])

Benefits:

  • Strong typing via schema validation
  • No manual parsing
  • Lower latency
  • Simpler execution model

The Decision Framework

Does control flow depend on meaning the LLM must interpret?

├─ NO → Use LangGraph (or plain code)
│ Fixed steps, deterministic branching
│ Examples: ETL, document pipelines

└─ YES → Does the LLM need tools or iteration?

├─ NO → Use direct structured LLM call
│ Prompt → structured output
│ Examples: extraction, classification

└─ YES → Use an agent
Tool selection + reasoning loop
Examples: routing, research, disambiguation

When each layer does only its job, the system becomes simpler, faster, and easier to reason about.

Apr 11, 2026

Managing Tool Output: Avoiding Context Explosion in Agent Systems

 

While reviewing and optimizing agent execution, another important issue surfaced:

👉 Tool outputs can silently bloat the context

Even with perfect planning and parallel execution, performance can degrade if the data flowing into the model is too large.


🧠 The Problem: Context Growth Over Cycles

In agent workflows, especially with chaining:

Cycle 1 → tool output  
Cycle 2 → tool output + previous data  
Cycle 3 → tool output + accumulated data  

👉 Context keeps growing with each step


🚨 Why this is a problem

  • Large payloads (nested JSON, unused fields)

  • Duplicate data across steps

  • Irrelevant fields carried forward

Impact

  • Increased token usage

  • Slower LLM response time

  • Higher cost

  • Greater chance of confusion or incorrect field usage


🔍 Root Cause

Tools typically return:

  • full API responses

  • deeply nested structures

  • more data than required

The LLM then:

  • has to sift through everything

  • often carries forward unnecessary data


🚀 Improvements

1. Let the LLM discard unnecessary data (lightweight fix)

Instruct the model to:

  • extract only required fields

  • ignore irrelevant data

👉 Helps, but not always reliable for large payloads


2. Add intelligence at the tool layer (stronger fix)

Instead of returning raw responses:

  • Return only relevant fields

  • Flatten nested structures

  • Provide clean, minimal data

👉 Similar to how GraphQL works:

  • client specifies what it needs

  • response includes only that


✅ Target Pattern

Tool → minimal structured output → LLM → format response

Instead of:

Tool → large raw JSON → LLM → filter + format

🎯 Final Thought

Efficient agents don’t just call the right tools —
they also control what data comes back from them


From Reactive Chaos to Planned Parallelism: Optimizing a Bedrock Agent

Reviewing a Bedrock Agent: From “Works” to “Works Efficiently”

I recently reviewed a Bedrock agent that was functionally correct — it answered queries, used tools properly, and produced accurate results.

👉 But it wasn’t efficient.

This is a quick summary of what was happening and what improved.


🧠 Initial Behavior: Reactive Execution

The agent followed a step-by-step loop:

Cycle 1 → resolve date via tool  
Cycle 2 → fetch primary data  
Cycle 3 → fetch additional data  
Cycle 4 → fetch metadata  
Cycle 5 → final response  

What was happening?

  • No upfront planning

  • Data discovered incrementally

  • Each missing piece triggered another call

  • All operations were sequential


⏱️ Why a time tool existed

The agent handled queries like:

“first 10 days of last month”

Since LLMs don’t reliably know the current date, a time tool was added to:

  • ensure correct date calculations

  • avoid inconsistent outputs

👉 It solved correctness
👉 But added an extra cycle every time


🚨 Core Issue

The agent was reactive instead of planned

Do something → realize missing data → do more → repeat

Instead of:

Understand everything → execute once

🚀 Improvements

1. Provide current date directly

  • Removed dependency on time tool

  • Eliminated one full cycle


2. Upfront planning

The agent now:

  • identifies all required data

  • plans execution before acting


3. Parallel execution

Independent data is now fetched together instead of sequentially


4. Dependency awareness

  • Independent data → parallel

  • Dependent data → separate step only when required


✅ Final Execution Patterns

No dependency

Cycle 1 → fetch all data (parallel)  
Cycle 2 → final response  

With dependency

Cycle 1 → fetch base data (parallel)  
Cycle 2 → fetch dependent data  
Cycle 3 → final response