Rahul Raj: Building Smarter AI Agents

Aug 30, 2025

Building Smarter AI Agents

When building production-ready AI agents, the model is just one part of the story. Equally important are the evaluation frameworks, logging/observability tools, lightweight client libraries, and prompt orchestration frameworks.

Here are the key packages I use, what they do, and other similar options in the ecosystem.

RAGAS (RAG Assessment & Scoring)

What it is:
Ragas is a framework to evaluate retrieval-augmented generation (RAG) systems. It provides automatic metrics for faithfulness, answer relevance, retrieval precision/recall, and more.
Use case:
When testing RAG pipelines, I can automatically score how well my system retrieves documents and whether the model’s answers stick to the evidence.
Similar libraries:
DeepEval – generic LLM evaluation toolkit.
TruLens – for evaluating and monitoring LLM apps (esp. RAG).
Evalchemy – simpler eval DSL for agents and RAG pipelines.

Langfuse

What it is:
Langfuse is an observability and logging platform for LLM applications. It captures traces, spans, prompts, model outputs, tool calls, and lets you replay & analyze runs.
Use case:
I use it to debug agent workflows, track cost/latency, and visualize multi-tool execution. It’s like “OpenTelemetry for LLMs.”

LiteLLM

What it is:
LiteLLM is a unified API wrapper for >100 LLM providers (OpenAI, Anthropic, Bedrock, Ollama, Azure, etc.).
Use case:
Lets me switch between models (Claude, GPT-4, Llama, etc.) without rewriting my code. Also supports rate-limiting, retries, logging, and cost tracking.

Rahul Raj

Aug 30, 2025

Building Smarter AI Agents

RAGAS (RAG Assessment & Scoring)

Langfuse

LiteLLM

No comments:

Post a Comment