May 18, 2026

🚦 Controlling Tool Output: Response Field Projection in Agent Workflows

One of the less obvious performance problems in agentic systems isn’t which tool gets called — it’s how much data comes back from it.

As agent workflows become more sophisticated, context growth can quietly become one of the biggest drivers of:

  • latency
  • token cost
  • reasoning instability

🧠 The Problem: Context Growth Across Cycles

In chained workflows, tool responses accumulate across reasoning cycles:

Cycle 1 → tool response (2,000 tokens)
Cycle 2 → tool response + prior context (4,500 tokens)
Cycle 3 → accumulated context (8,000+ tokens)

Most enterprise APIs are designed for systems integration, not LLM efficiency.

A financial data endpoint may return:

  • dozens of fields per record
  • nested metadata
  • audit attributes
  • internal identifiers
  • unused fields

But the agent may only need two or three fields to answer the user’s question.

When raw responses flow into the model unfiltered:

  • 📈 token usage grows every cycle
  • 🐢 latency increases as context expands
  • ⚠️ field-selection mistakes become more common
  • 🧾 prompt-level filtering becomes ineffective because tokens are already consumed before the instruction executes

A simple lookup can easily turn into thousands of unnecessary tokens.


🚀 The Fix: Projection at the Tool Layer

Instead of relying on the LLM to discard unnecessary fields after receiving the response, we moved the optimization into the tool layer itself.

We added a response_fields parameter to the HTTP request tool.

The agent specifies exactly which fields it needs before making the request, and the tool filters the response before returning it to the model.

Instead of:

Tool → large raw JSON → LLM → filter + reason + respond

We now use:

Tool → projected response → LLM → respond

The projection supports:

  • arrays and nested objects
  • dot-notation field selection
  • graceful fallback to full responses when projection is unavailable

✅ Minimal payloads
✅ Smaller context
✅ Faster reasoning


🧩 Closing the Loop

Projection only works if the agent knows which fields to request.

That knowledge can come from:

  • system prompts
  • tool metadata
  • endpoint descriptions
  • execution guidance
  • field-level documentation

The important part is that the agent identifies required fields before making the tool call instead of reasoning over a large payload afterward.

This shifts optimization from:

  • post-processing responses
    to:
  • controlling responses at the source

🔄 Execution Pattern

Execution guidance
→ agent identifies required fields
→ tool-level response projection
→ minimal structured output into LLM context

Rather than continuously expanding context across cycles, the agent keeps context compact and purpose-driven.


📊 Results

Queries that previously returned thousands of tokens per cycle now return only a fraction of that.

For multi-step workflows, this:

  • 💰 reduces token consumption
  • ⚡ lowers latency
  • 📏 stabilizes context growth
  • 🎯 improves reasoning reliability

The pattern is similar to GraphQL:
the client declares what it needs, and only that data comes back.

In this case, the “client” is the LLM itself.


🎯 Final Thought

Efficient agents don’t just call the right tools.

They also control what data comes back from them.

In many production systems, optimizing tool output has a larger impact on performance and reliability than changing the model itself.

May 14, 2026

Reviewing an Agent: From “Works” to “Works Efficiently”

 I recently reviewed an agent that was functionally correct — it answered queries, used tools properly, and produced accurate results.

👉 But it wasn’t efficient.

This is a quick summary of what was happening and what improved.

🧠 Initial Behavior: Reactive Execution

The agent followed a step-by-step execution loop:

Cycle 1 → resolve date via tool
Cycle 2 → fetch primary data
Cycle 3 → fetch additional data
Cycle 4 → fetch metadata
Cycle 5 → final response

What was happening?

  • No upfront planning
  • Data discovered incrementally
  • Each missing piece triggered another tool call
  • All operations executed sequentially

The agent was technically correct, but operationally inefficient.


⏱️ Why a Time Tool Existed

The agent handled queries like:

“first 10 days of last month”

Since LLMs don’t reliably know the current date, a time tool was added to:

  • ensure correct date calculations
  • avoid inconsistent outputs

👉 It solved correctness
👉 But added an extra execution cycle every time


🚨 Core Issue

The agent was reactive instead of planned.

Execution looked like:

Do something → discover missing data → do more → repeat

Instead of:

Understand requirements → plan execution → execute efficiently

🚀 Improvements

1. Provide Current Date Directly

Removed dependency on the time tool by injecting the current date into context.

✅ Eliminated one full execution cycle.


2. Add Upfront Planning

The agent now:

  • identifies required data first
  • plans execution before calling tools
  • understands dependencies early

3. Parallelize Independent Calls

Independent data fetches now execute together instead of sequentially.

This reduced unnecessary waiting between cycles.


4. Add Dependency Awareness

Execution flow became smarter:

  • independent data → parallel execution
  • dependent data → delayed until required

✅ Final Execution Patterns

No dependency

Cycle 1 → fetch all data (parallel)
Cycle 2 → final response

With dependency

Cycle 1 → fetch base data (parallel)
Cycle 2 → fetch dependent data
Cycle 3 → final response

🎯 Final Thought

Many agents already “work.”

The bigger challenge is making them:

  • efficient
  • predictable
  • low latency
  • cost aware

In agent systems, execution planning often matters as much as model quality.