As large language models (LLMs) become increasingly capable, AI agents have emerged as powerful systems that combine language understanding with real-world action. But what exactly is an AI agent? How do LLMs fit into the picture? And how can developers build agents that are modular, secure, and adaptable?
Let’s break it down—from the fundamentals of LLM-powered agents to protocols like MCP and frameworks like Strands and LangGraph.
What Is an AI Agent?
An AI agent is a system designed to execute tasks on behalf of a user. It combines a reasoning engine (typically an LLM) with an action layer (tools, APIs, databases, etc.) to understand instructions and carry out operations.
In this setup, the LLM acts as the agent’s “brain.” It interprets the user’s goal, breaks it down into logical steps, and decides which tools are needed to fulfill the task. The agent, in turn, sends the user’s goal to the LLM—along with a list of available tools such as vector search APIs, HTTP endpoints, or email services.
The LLM plans the workflow and returns instructions: which tool to call, what parameters to pass, and in what sequence to proceed. The agent executes those tool calls, collects results, and loops back to the LLM for further planning. This iterative loop continues until the task is fully completed.
Importantly, agents maintain context over time—tracking prior steps, user input, and intermediate outputs—enabling them to handle complex, multi-turn tasks with coherence and adaptability.
Strands: A Model-Driven Agent Framework
The Strands Agent follows a model-driven approach, where the LLM is in charge of the logic and flow.
Instead of writing hardcoded logic like if this, then do that
, the developer provides the LLM with:
-
Clear system and user prompts
-
A list of tools the agent can access
-
The overall task context
The LLM uses its reasoning and planning capabilities to decide which tools to call, how to call them, and in what sequence. This makes the agent dynamic and adaptive, rather than rigidly tied to predefined control paths.
In Strands, the agent's core responsibility is to execute tool calls, maintain memory, and facilitate the LLM's decisions. The LLM, in turn, drives the workflow using instructions encoded in each step.
A Program-Driven Flow Engine (LangGraph)
LangGraph is a state machine framework built on top of LangChain that allows developers to define agent workflows as directed graphs. Each node in the graph represents a function—often LLM-powered—and edges define how data flows from one node to the next.
By default, LangGraph follows a program-driven approach. The developer defines:
-
The graph structure (workflow)
-
The behavior of each node (LLM call, tool call, decision logic)
-
The conditions that determine transitions between nodes
While LLMs can be used inside nodes to reason or generate text, they do not control the overall execution flow. That logic is handled programmatically, making LangGraph ideal for scenarios where control, reliability, and testing are critical.
Program-Driven Agents Without a Framework
Not all agents need a dedicated framework. In many cases, developers can build lightweight, program-driven agents using plain code and selective use of LLMs.
In this approach:
-
The developer writes the full control logic
-
The LLM is used at specific points—for summarization, classification, interpretation, etc.
-
All tool interactions (e.g., API calls, database queries) are handled directly in code
-
The LLM does not control which tools to use or what happens next
This model gives developers maximum control and is well-suited for building LLM-in-the-loop systems, where the language model acts more like a helper than a planner.
MCP: A Standard Protocol for Tool Use
As agents grow more sophisticated, one challenge becomes clear: How do you standardize how agents call tools, especially across different frameworks or LLMs?
That’s where MCP (Model Context Protocol) comes in.
MCP standardizes how an AI agent interacts with tools, providing a consistent interface for invoking external systems. Whether the agent is built in Python, JavaScript, or another environment—and whether it uses GPT-4, Claude, or another LLM—MCP allows all of them to access the same tools in a uniform way.
An MCP server can also enforce important security and operational rules: access controls, rate limits, input/output validation, and more. Developers can build reusable libraries of MCP-compatible tools that plug seamlessly into any agent. Without MCP, every tool would require a custom integration, making development slower, error-prone, and hard to scale.
No comments:
Post a Comment