May 30, 2026

🌪️ Chaos Engineering for Disaster Recovery: Proving Multi-Region Failover Actually Works

Most disaster recovery architectures are built with a hidden assumption:

The failover process will work when we need it.

The problem is that assumptions don't survive outages.

Infrastructure changes.
Deployments drift.
Permissions break.
Health checks evolve.
Automation silently fails.

A disaster recovery strategy is only as good as the last time it was tested.

That's why we built chaos engineering directly into our multi-region architecture.

The Architecture

Like many organizations, we run a primary AWS region that handles all production traffic.

The secondary region is fully provisioned but runs with zero application tasks during normal operation.

When the primary region becomes unhealthy:

CloudWatch detects degradation
Lambda initiates recovery
The secondary region scales up
Health checks begin passing
Route 53 shifts traffic

The entire process is automated and completes in roughly 10 minutes.

This isn't designed for instant failover.

It's designed to provide a balance between resilience and cost efficiency.

The Real Challenge Isn't Failover

The real challenge is confidence.

Most teams test disaster recovery once during implementation and then assume it continues to work forever.

But recovery paths are software.

And software breaks.

The critical question becomes:

How do you know your failover automation still works six months from now?

Enter Chaos Engineering

Once a month we intentionally trigger a failover event in production.

Not a simulation.

A real failover.

We reduce capacity in the primary region and allow the system to respond naturally.

Alarms fire.
Recovery automation executes.
The secondary region activates.
Route 53 redirects traffic.
Production traffic runs from the backup region.

Several hours later we restore the primary region and validate failback behavior.

What Gets Validated

Each exercise validates the entire recovery chain:

✅ CloudWatch alarms

✅ Lambda execution

✅ Auto-scaling behavior

✅ Route 53 failover

✅ Application startup

✅ Service dependencies

✅ Recovery procedures

Instead of testing components individually, we're testing the complete system under real conditions.

The Detail That Prevents Downtime

One implementation detail made these exercises much safer.

During chaos testing, we don't scale the primary region to zero.

Instead, we reduce capacity by a single task.

That leaves enough healthy capacity to continue serving traffic while the secondary region comes online.

As DNS transitions occur, users continue receiving responses.

The recovery path is exercised without creating customer-visible downtime.

Why This Matters

The biggest risk in disaster recovery isn't infrastructure failure.

It's recovery procedures that haven't been tested recently.

A recovery plan sitting in a wiki isn't resilience.

A recovery plan executed successfully every month is.

🎯 Final Thought

Most organizations invest heavily in disaster recovery infrastructure.

Far fewer invest in continuously validating it.

Our multi-region architecture is intentionally cost-optimized, with the secondary region sitting idle most of the time.

But the real value isn't the architecture.

It's the confidence that comes from proving every month that failover still works.

Because in disaster recovery, the question isn't:

"Do we have a failover plan?"

It's:

"When was the last time we proved it actually works?"

May 18, 2026

🚦 Controlling Tool Output: Response Field Projection in Agent Workflows

One of the less obvious performance problems in agentic systems isn’t which tool gets called — it’s how much data comes back from it.

As agent workflows become more sophisticated, context growth can quietly become one of the biggest drivers of:

latency
token cost
reasoning instability

🧠 The Problem: Context Growth Across Cycles

In chained workflows, tool responses accumulate across reasoning cycles:


Cycle 1 → tool response (2,000 tokens)
Cycle 2 → tool response + prior context (4,500 tokens)
Cycle 3 → accumulated context (8,000+ tokens)

Most enterprise APIs are designed for systems integration, not LLM efficiency.

A financial data endpoint may return:

dozens of fields per record
nested metadata
audit attributes
internal identifiers
unused fields

But the agent may only need two or three fields to answer the user’s question.

When raw responses flow into the model unfiltered:

📈 token usage grows every cycle
🐢 latency increases as context expands
⚠️ field-selection mistakes become more common
🧾 prompt-level filtering becomes ineffective because tokens are already consumed before the instruction executes

A simple lookup can easily turn into thousands of unnecessary tokens.

🚀 The Fix: Projection at the Tool Layer

Instead of relying on the LLM to discard unnecessary fields after receiving the response, we moved the optimization into the tool layer itself.

We added a response_fields parameter to the HTTP request tool.

The agent specifies exactly which fields it needs before making the request, and the tool filters the response before returning it to the model.

Instead of:


Tool → large raw JSON → LLM → filter + reason + respond

We now use:


Tool → projected response → LLM → respond

The projection supports:

arrays and nested objects
dot-notation field selection
graceful fallback to full responses when projection is unavailable

✅ Minimal payloads
✅ Smaller context
✅ Faster reasoning

🧩 Closing the Loop

Projection only works if the agent knows which fields to request.

That knowledge can come from:

system prompts
tool metadata
endpoint descriptions
execution guidance
field-level documentation

The important part is that the agent identifies required fields before making the tool call instead of reasoning over a large payload afterward.

This shifts optimization from:

post-processing responses
to:
controlling responses at the source

🔄 Execution Pattern


Execution guidance
    → agent identifies required fields
        → tool-level response projection
            → minimal structured output into LLM context

Rather than continuously expanding context across cycles, the agent keeps context compact and purpose-driven.

📊 Results

Queries that previously returned thousands of tokens per cycle now return only a fraction of that.

For multi-step workflows, this:

💰 reduces token consumption
⚡ lowers latency
📏 stabilizes context growth
🎯 improves reasoning reliability

The pattern is similar to GraphQL:
the client declares what it needs, and only that data comes back.

In this case, the “client” is the LLM itself.

🎯 Final Thought

Efficient agents don’t just call the right tools.

They also control what data comes back from them.

In many production systems, optimizing tool output has a larger impact on performance and reliability than changing the model itself.

May 14, 2026

Reviewing an Agent: From “Works” to “Works Efficiently”

I recently reviewed an agent that was functionally correct — it answered queries, used tools properly, and produced accurate results.

👉 But it wasn’t efficient.

This is a quick summary of what was happening and what improved.

🧠 Initial Behavior: Reactive Execution

The agent followed a step-by-step execution loop:


Cycle 1 → resolve date via tool
Cycle 2 → fetch primary data
Cycle 3 → fetch additional data
Cycle 4 → fetch metadata
Cycle 5 → final response

What was happening?

No upfront planning
Data discovered incrementally
Each missing piece triggered another tool call
All operations executed sequentially

The agent was technically correct, but operationally inefficient.

⏱️ Why a Time Tool Existed

The agent handled queries like:

“first 10 days of last month”

Since LLMs don’t reliably know the current date, a time tool was added to:

ensure correct date calculations
avoid inconsistent outputs

👉 It solved correctness
👉 But added an extra execution cycle every time

🚨 Core Issue

The agent was reactive instead of planned.

Execution looked like:


Do something → discover missing data → do more → repeat

Instead of:


Understand requirements → plan execution → execute efficiently

🚀 Improvements

1. Provide Current Date Directly

Removed dependency on the time tool by injecting the current date into context.

✅ Eliminated one full execution cycle.

2. Add Upfront Planning

The agent now:

identifies required data first
plans execution before calling tools
understands dependencies early

3. Parallelize Independent Calls

Independent data fetches now execute together instead of sequentially.

This reduced unnecessary waiting between cycles.

4. Add Dependency Awareness

Execution flow became smarter:

independent data → parallel execution
dependent data → delayed until required

✅ Final Execution Patterns

No dependency


Cycle 1 → fetch all data (parallel)
Cycle 2 → final response

With dependency


Cycle 1 → fetch base data (parallel)
Cycle 2 → fetch dependent data
Cycle 3 → final response

🎯 Final Thought

Many agents already “work.”

The bigger challenge is making them:

efficient
predictable
low latency
cost aware

In agent systems, execution planning often matters as much as model quality.