Rahul Raj: June 2025

Building an AI agent that can reason and use tools requires more than just a powerful LLM. Ever wondered what's happening behind the scenes of a conversational AI agent? This post breaks down—to help you understand where your agent's conversation state lives, how tools are managed, and which solution offers the most flexibility.

OpenAI Responses API

Open AI Responses API is a unified interface for building powerful, agent-like applications. Its an evolution of Chat Completions which doesn;t have server-side state, so you need to resend history.

  client.responses.create(
    model="gpt-4o-mini",
    input=[
        {
            "role": "system",
            "content": "You are a helpful assistant. Use tools when needed.",
        },
        {"role": "user", "content": user_question},
    ],
    tools=tools,
    parallel_tool_calls=True,
)

Conversation state lives On OpenAI’s servers (for OpenAI-hosted models) when you pass previous_response_id. If you point your OpenAI client at a proxy (e.g., LiteLLM), state (if any) is maintained by the proxy, not OpenAI. Pass previous_response_id to link to prior turns without resending them.
```
client.responses.create(
    model="gpt-4o-mini",
    input=tool_outputs,
    tools=tools,  # keep tools if follow-up calls might happen
    previous_response_id=resp1.id,  # <— important
)
```
Each turn re-bills the effective prompt (prior items + new items). You may get prompt caching discounts for repeated prefixes.

You execute tools and feed results back in a follow-up call.

request → model returns function_call → you run the tool(s) → send function_call_output → repeat until no more tool calls → final answer.

  if response_output["type"] == "function_call":
	function_name = response_output["name"]
	function_args = response_output["arguments"]
  if response_output["type"] == "message":
  	#an assistant message with content blocks

SDK provides built-in tracing & run history
Native only to OpenAI (and Azure OpenAI). For other LLMs you’d need a proxy that emulates the Responses API.

AWS Bedrock Agents

Fully managed AWS service, configured via console
When you invoke a Bedrock Agent (via API or console), AWS establishes a runtime session for that user/conversation. The conversation history (prior user inputs, model responses, tool invocations, intermediate results) is stored on AWS infrastructure associated with that sessionId.When the agent calls a tool, the outputs are persisted in session state.
Pay-per-use on AWS (per token + infra integration). Costs tied to Bedrock pricing. AWS runtime decides what minimal state to pass back into the LLM — e.g., compressed summaries, selected tool outputs, prior reasoning steps. You don’t control (or see) the exact serialization, but the idea is that AWS optimizes the context window management for you.
Tools are configured in AWS console (e.g., Lambdas, Step Functions). Execution handled natively by Bedrock runtime.
Integrated with AWS CloudWatch/X-Ray
Bedrock-hosted models (Anthropic, Llama, Claude, Mistral, etc.)

Strands Agent

A Python agent runtime (SDK) that runs in your process. You bring any model (OpenAI, Bedrock via LiteLLM, etc.), and Strands orchestrates prompts, tools, and streaming.
```
    agent=Agent(
        model=model, tools=tools, system_prompt=system_prompt
    )
    answer = agent(user_input)
    
```
Conversation state lives in your app. Strands holds the working memory/trace during a run; you decide what to persist (DB, Redis, files). If your underlying model/proxy also supports server state, you can choose to use it, but Strands doesn’t require it.
You only pay for tokens you actually send to the underlying model. No separate “state storage” cost; total cost depends on how much context you include.
you register functions (schemas), Strands drives the reason→act→observe loop, runs tools, and feeds results back to the model. Parallelization is under your control.
Rich observability (structured logs, OpenTelemetry)
Vendor-agnostic. Use OpenAI, Bedrock (Claude), local models, etc. via adapters (e.g., LiteLLMModel).

LangGraph

A Python agent graph runtime (SDK) that runs in your process. You model the agent as a graph (nodes = LLM/tool/human steps; edges = control flow). Use prebuilt agents like create_react_agent or compose your own nodes/routers. Works fine with OpenAI, Bedrock, local LLMs, etc.

Conversation state lives in your app via LangGraph checkpointers. You pass a thread_id and a checkpointer (in-memory, SQLite/Postgres, or the hosted LangGraph Platform). LangGraph restores prior turns/working memory automatically. If your model/proxy has server state, you can use it, but LangGraph doesn’t require it—you choose what to persist (messages, summaries, tool outputs).

agent = create_react_agent(
    model=llm,
    tools=tools,
    prompt="You are a helpful assistant. Use the tools provided to answer questions. If you don't know the answer, use your tools.",
)

# Define the graph with a state machine.
workflow = StateGraph(AgentState)
workflow.add_node("agent", agent)
workflow.add_edge(START, "agent")
workflow.add_edge(
    "agent", END
)  # In a simple case, the agent node can go directly to END

# Compile the graph
app = workflow.compile()

You only pay for tokens sent to the underlying model. There’s no separate “state storage” cost from LangGraph itself. Your total cost depends on how much context you rehydrate per turn (and any DB/Platform you choose for persistence).
You register tools (functions/schemas) and LangGraph drives the reason → act → observe loop. Tools can be simple Python callables or LangChain @tools. Prebuilt ReAct agents or custom graphs will invoke tools and feed results back to the model, with support for loops, branching, retries, timeouts, and parallelization via concurrent branches/map nodes—under your control.
Rich observability. LangGraph Studio (local or Platform) provides a visual graph, step-level inputs/outputs, token/cost traces, and checkpoint “time-travel” to replay from any step. Plays well with your logging/metrics stack.
Vendor-agnostic. Use OpenAI, AWS Bedrock (Claude), Google, local/Ollama, etc., through LangChain adapters; swap models without rewriting your graph.

What “good” looks like for an agent framework

Your bar should be simple: pick the stack that’s composable, observable, portable, and cheap to change.

Plug-and-play with the rest of your stack
Must integrate cleanly with eval (Ragas/DeepEval), observability (Langfuse/Helicone/OTel), and your data/vectors (pgvector, Weaviate, Pinecone, Redis), without adapters that fight each other.
Standard, first-class observability
Step-level traces, token/cost accounting, latency/error breakdowns, replay/time-travel, export to OpenTelemetry. If you can’t answer “what happened and why?” in one place, it won’t survive prod.
Model-agnostic
Swap OpenAI ↔ Bedrock/Anthropic ↔ local (Ollama/vLLM) with minimal code changes.

Model routing / cascades - Use a small/fast model for easy cases; fallback to Claude/GPT only when needed.
Distillation - Have a big model generate labeled data, then train an open, smaller model (e.g., 7–13B) on it. You own/serve this smaller model (and can quantize it) for big savings.

Cloud-agnostic
Run anywhere (local, k8s, AWS/Azure/GCP). No hard vendor lock-in for core logic. If you leave a cloud, your agent should come with you.
Lightweight & composable (“LEGO-style”)
Small primitives you can rearrange. Clear boundaries between reason → act → observe, easy to add/remove tools, and simple to test.

As large language models (LLMs) become increasingly capable, AI agents have emerged as powerful systems that combine language understanding with real-world action. But what exactly is an AI agent? How do LLMs fit into the picture? And how can developers build agents that are modular, secure, and adaptable?

Let’s break it down—from the fundamentals of LLM-powered agents to protocols like MCP and frameworks like Strands and LangGraph.

What Is an AI Agent?

An AI agent is a system designed to execute tasks on behalf of a user. It combines a reasoning engine (typically an LLM) with an action layer (tools, APIs, databases, etc.) to understand instructions and carry out operations.

In this setup, the LLM acts as the agent’s “brain.” It interprets the user’s goal, breaks it down into logical steps, and decides which tools are needed to fulfill the task. The agent, in turn, sends the user’s goal to the LLM—along with a list of available tools such as vector search APIs, HTTP endpoints, or email services.

The LLM plans the workflow and returns instructions: which tool to call, what parameters to pass, and in what sequence to proceed. The agent executes those tool calls, collects results, and loops back to the LLM for further planning. This iterative loop continues until the task is fully completed.

Importantly, agents maintain context over time—tracking prior steps, user input, and intermediate outputs—enabling them to handle complex, multi-turn tasks with coherence and adaptability.

Strands: A Model-Driven Agent Framework

The Strands Agent follows a model-driven approach, where the LLM is in charge of the logic and flow.

Instead of writing hardcoded logic like if this, then do that, the developer provides the LLM with:

Clear system and user prompts
A list of tools the agent can access
The overall task context

The LLM uses its reasoning and planning capabilities to decide which tools to call, how to call them, and in what sequence. This makes the agent dynamic and adaptive, rather than rigidly tied to predefined control paths.

In Strands, the agent's core responsibility is to execute tool calls, maintain memory, and facilitate the LLM's decisions. The LLM, in turn, drives the workflow using instructions encoded in each step.

A Program-Driven Flow Engine (LangGraph)

LangGraph is a state machine framework built on top of LangChain that allows developers to define agent workflows as directed graphs. Each node in the graph represents a function—often LLM-powered—and edges define how data flows from one node to the next.

By default, LangGraph follows a program-driven approach. The developer defines:

The graph structure (workflow)
The behavior of each node (LLM call, tool call, decision logic)
The conditions that determine transitions between nodes

While LLMs can be used inside nodes to reason or generate text, they do not control the overall execution flow. That logic is handled programmatically, making LangGraph ideal for scenarios where control, reliability, and testing are critical.

Program-Driven Agents Without a Framework

Not all agents need a dedicated framework. In many cases, developers can build lightweight, program-driven agents using plain code and selective use of LLMs.

In this approach:

The developer writes the full control logic
The LLM is used at specific points—for summarization, classification, interpretation, etc.
All tool interactions (e.g., API calls, database queries) are handled directly in code
The LLM does not control which tools to use or what happens next

This model gives developers maximum control and is well-suited for building LLM-in-the-loop systems, where the language model acts more like a helper than a planner.

MCP: A Standard Protocol for Tool Use

As agents grow more sophisticated, one challenge becomes clear: How do you standardize how agents call tools, especially across different frameworks or LLMs?

That’s where MCP (Model Context Protocol) comes in.

MCP standardizes how an AI agent interacts with tools, providing a consistent interface for invoking external systems. Whether the agent is built in Python, JavaScript, or another environment—and whether it uses GPT-4, Claude, or another LLM—MCP allows all of them to access the same tools in a uniform way.

An MCP server can also enforce important security and operational rules: access controls, rate limits, input/output validation, and more. Developers can build reusable libraries of MCP-compatible tools that plug seamlessly into any agent. Without MCP, every tool would require a custom integration, making development slower, error-prone, and hard to scale.

Rahul Raj

Jun 29, 2025

Agentic AI Framework