May 18, 2026

🚦 Controlling Tool Output: Response Field Projection in Agent Workflows

One of the less obvious performance problems in agentic systems isn’t which tool gets called — it’s how much data comes back from it.

As agent workflows become more sophisticated, context growth can quietly become one of the biggest drivers of:

  • latency
  • token cost
  • reasoning instability

🧠 The Problem: Context Growth Across Cycles

In chained workflows, tool responses accumulate across reasoning cycles:

Cycle 1 → tool response (2,000 tokens)
Cycle 2 → tool response + prior context (4,500 tokens)
Cycle 3 → accumulated context (8,000+ tokens)

Most enterprise APIs are designed for systems integration, not LLM efficiency.

A financial data endpoint may return:

  • dozens of fields per record
  • nested metadata
  • audit attributes
  • internal identifiers
  • unused fields

But the agent may only need two or three fields to answer the user’s question.

When raw responses flow into the model unfiltered:

  • 📈 token usage grows every cycle
  • 🐢 latency increases as context expands
  • ⚠️ field-selection mistakes become more common
  • 🧾 prompt-level filtering becomes ineffective because tokens are already consumed before the instruction executes

A simple lookup can easily turn into thousands of unnecessary tokens.


🚀 The Fix: Projection at the Tool Layer

Instead of relying on the LLM to discard unnecessary fields after receiving the response, we moved the optimization into the tool layer itself.

We added a response_fields parameter to the HTTP request tool.

The agent specifies exactly which fields it needs before making the request, and the tool filters the response before returning it to the model.

Instead of:

Tool → large raw JSON → LLM → filter + reason + respond

We now use:

Tool → projected response → LLM → respond

The projection supports:

  • arrays and nested objects
  • dot-notation field selection
  • graceful fallback to full responses when projection is unavailable

✅ Minimal payloads
✅ Smaller context
✅ Faster reasoning


🧩 Closing the Loop

Projection only works if the agent knows which fields to request.

That knowledge can come from:

  • system prompts
  • tool metadata
  • endpoint descriptions
  • execution guidance
  • field-level documentation

The important part is that the agent identifies required fields before making the tool call instead of reasoning over a large payload afterward.

This shifts optimization from:

  • post-processing responses
    to:
  • controlling responses at the source

🔄 Execution Pattern

Execution guidance
→ agent identifies required fields
→ tool-level response projection
→ minimal structured output into LLM context

Rather than continuously expanding context across cycles, the agent keeps context compact and purpose-driven.


📊 Results

Queries that previously returned thousands of tokens per cycle now return only a fraction of that.

For multi-step workflows, this:

  • 💰 reduces token consumption
  • ⚡ lowers latency
  • 📏 stabilizes context growth
  • 🎯 improves reasoning reliability

The pattern is similar to GraphQL:
the client declares what it needs, and only that data comes back.

In this case, the “client” is the LLM itself.


🎯 Final Thought

Efficient agents don’t just call the right tools.

They also control what data comes back from them.

In many production systems, optimizing tool output has a larger impact on performance and reliability than changing the model itself.

No comments:

Post a Comment