Rahul Raj: April 2026

Apr 11, 2026

Managing Tool Output: Avoiding Context Explosion in Agent Systems

While reviewing and optimizing agent execution, another important issue surfaced:

👉 Tool outputs can silently bloat the context

Even with perfect planning and parallel execution, performance can degrade if the data flowing into the model is too large.

🧠 The Problem: Context Growth Over Cycles

In agent workflows, especially with chaining:

Cycle 1 → tool output  
Cycle 2 → tool output + previous data  
Cycle 3 → tool output + accumulated data

👉 Context keeps growing with each step

🚨 Why this is a problem

Large payloads (nested JSON, unused fields)
Duplicate data across steps
Irrelevant fields carried forward

Impact

Increased token usage
Slower LLM response time
Higher cost
Greater chance of confusion or incorrect field usage

🔍 Root Cause

Tools typically return:

full API responses
deeply nested structures
more data than required

The LLM then:

has to sift through everything
often carries forward unnecessary data

🚀 Improvements

1. Let the LLM discard unnecessary data (lightweight fix)

Instruct the model to:

extract only required fields
ignore irrelevant data

👉 Helps, but not always reliable for large payloads

2. Add intelligence at the tool layer (stronger fix)

Instead of returning raw responses:

Return only relevant fields
Flatten nested structures
Provide clean, minimal data

👉 Similar to how GraphQL works:

client specifies what it needs
response includes only that

✅ Target Pattern

Tool → minimal structured output → LLM → format response

Instead of:

Tool → large raw JSON → LLM → filter + format

🎯 Final Thought

Efficient agents don’t just call the right tools —
they also control what data comes back from them

From Reactive Chaos to Planned Parallelism: Optimizing a Bedrock Agent

Reviewing a Bedrock Agent: From “Works” to “Works Efficiently”

I recently reviewed a Bedrock agent that was functionally correct — it answered queries, used tools properly, and produced accurate results.

👉 But it wasn’t efficient.

This is a quick summary of what was happening and what improved.

🧠 Initial Behavior: Reactive Execution

The agent followed a step-by-step loop:

Cycle 1 → resolve date via tool  
Cycle 2 → fetch primary data  
Cycle 3 → fetch additional data  
Cycle 4 → fetch metadata  
Cycle 5 → final response

What was happening?

No upfront planning
Data discovered incrementally
Each missing piece triggered another call
All operations were sequential

⏱️ Why a time tool existed

The agent handled queries like:

“first 10 days of last month”

Since LLMs don’t reliably know the current date, a time tool was added to:

ensure correct date calculations
avoid inconsistent outputs

👉 It solved correctness
👉 But added an extra cycle every time

🚨 Core Issue

The agent was reactive instead of planned

Do something → realize missing data → do more → repeat

Instead of:

Understand everything → execute once

🚀 Improvements

1. Provide current date directly

Removed dependency on time tool
Eliminated one full cycle

2. Upfront planning

The agent now:

identifies all required data
plans execution before acting

3. Parallel execution

Independent data is now fetched together instead of sequentially

4. Dependency awareness

Independent data → parallel
Dependent data → separate step only when required

✅ Final Execution Patterns

No dependency

Cycle 1 → fetch all data (parallel)  
Cycle 2 → final response

With dependency

Cycle 1 → fetch base data (parallel)  
Cycle 2 → fetch dependent data  
Cycle 3 → final response