Apr 11, 2026

Managing Tool Output: Avoiding Context Explosion in Agent Systems

 

While reviewing and optimizing agent execution, another important issue surfaced:

👉 Tool outputs can silently bloat the context

Even with perfect planning and parallel execution, performance can degrade if the data flowing into the model is too large.


🧠 The Problem: Context Growth Over Cycles

In agent workflows, especially with chaining:

Cycle 1 → tool output  
Cycle 2 → tool output + previous data  
Cycle 3 → tool output + accumulated data  

👉 Context keeps growing with each step


🚨 Why this is a problem

  • Large payloads (nested JSON, unused fields)

  • Duplicate data across steps

  • Irrelevant fields carried forward

Impact

  • Increased token usage

  • Slower LLM response time

  • Higher cost

  • Greater chance of confusion or incorrect field usage


🔍 Root Cause

Tools typically return:

  • full API responses

  • deeply nested structures

  • more data than required

The LLM then:

  • has to sift through everything

  • often carries forward unnecessary data


🚀 Improvements

1. Let the LLM discard unnecessary data (lightweight fix)

Instruct the model to:

  • extract only required fields

  • ignore irrelevant data

👉 Helps, but not always reliable for large payloads


2. Add intelligence at the tool layer (stronger fix)

Instead of returning raw responses:

  • Return only relevant fields

  • Flatten nested structures

  • Provide clean, minimal data

👉 Similar to how GraphQL works:

  • client specifies what it needs

  • response includes only that


✅ Target Pattern

Tool → minimal structured output → LLM → format response

Instead of:

Tool → large raw JSON → LLM → filter + format

🎯 Final Thought

Efficient agents don’t just call the right tools —
they also control what data comes back from them


From Reactive Chaos to Planned Parallelism: Optimizing a Bedrock Agent

Reviewing a Bedrock Agent: From “Works” to “Works Efficiently”

I recently reviewed a Bedrock agent that was functionally correct — it answered queries, used tools properly, and produced accurate results.

👉 But it wasn’t efficient.

This is a quick summary of what was happening and what improved.


🧠 Initial Behavior: Reactive Execution

The agent followed a step-by-step loop:

Cycle 1 → resolve date via tool  
Cycle 2 → fetch primary data  
Cycle 3 → fetch additional data  
Cycle 4 → fetch metadata  
Cycle 5 → final response  

What was happening?

  • No upfront planning

  • Data discovered incrementally

  • Each missing piece triggered another call

  • All operations were sequential


⏱️ Why a time tool existed

The agent handled queries like:

“first 10 days of last month”

Since LLMs don’t reliably know the current date, a time tool was added to:

  • ensure correct date calculations

  • avoid inconsistent outputs

👉 It solved correctness
👉 But added an extra cycle every time


🚨 Core Issue

The agent was reactive instead of planned

Do something → realize missing data → do more → repeat

Instead of:

Understand everything → execute once

🚀 Improvements

1. Provide current date directly

  • Removed dependency on time tool

  • Eliminated one full cycle


2. Upfront planning

The agent now:

  • identifies all required data

  • plans execution before acting


3. Parallel execution

Independent data is now fetched together instead of sequentially


4. Dependency awareness

  • Independent data → parallel

  • Dependent data → separate step only when required


✅ Final Execution Patterns

No dependency

Cycle 1 → fetch all data (parallel)  
Cycle 2 → final response  

With dependency

Cycle 1 → fetch base data (parallel)  
Cycle 2 → fetch dependent data  
Cycle 3 → final response