Basic Agentic RAG: Building Intelligent Retrieval Systems

Introduction to Agentic RAG

Agentic Retrieval-Augmented Generation (RAG) enhances traditional RAG systems by adding decision-making capabilities. While standard RAG passively retrieves information from a predefined database, Agentic RAG can:

Dynamically decide which information sources to query
Fall back to alternative sources when primary sources lack necessary information
Reason about the quality and relevance of retrieved information
Take actions based on the nature of the query and available tools

Figure 1: Basic Agentic RAG Architecture

This tutorial builds a basic Agentic RAG system that can retrieve information from two sources:

A vector database containing indexed documents (Tesla Q3 report)
A web search API to fetch information not available in the vector database

The agent will intelligently decide which source to query based on the nature of the information needed, making it more versatile than traditional RAG systems.

Agentic RAG Pipeline Overview

Our Agentic RAG implementation follows these key steps:

Document Loading & Indexing: Preparing the knowledge base
Vector Store Creation: Building a searchable document repository
Web Search Integration: Setting up alternative information retrieval
Tool Definition: Creating callable functions for the agent
Agent Configuration: Setting up the decision-making framework
Query Processing: Executing queries through the agent

Note: The key difference from standard RAG is the agent's ability to choose between information sources based on the query and available information.

Document Loading and Indexing

We start by loading a PDF document (Tesla's Q3 report) and preparing it for indexing:

from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("/content/tesla_q3.pdf")
documents = loader.load()

# split documents
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
documents = text_splitter.split_documents(documents)

For efficient retrieval, we need to convert these document chunks into vector embeddings:

from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5", 
                                    encode_kwargs = {"normalize_embeddings": True})

We're using the BGE-small embedding model from BAAI, which offers a good balance between performance and efficiency for document retrieval tasks.

Creating the Vector Store

Next, we create a vector database to store our document embeddings:

from langchain.vectorstores import FAISS
vectorstore = FAISS.from_documents(documents, embeddings)

# create retriever
retriever = vectorstore.as_retriever()

We're using FAISS (Facebook AI Similarity Search), which is optimized for efficient similarity search in high-dimensional spaces. The retriever provides a simple interface for querying the vector store.

If needed, you can save your vector store locally:

# saving the vectorstore (commented out in the notebook)
# vectorstore.save_local("vectorstore.db")

Setting Up Web Search

To handle queries that can't be answered using our vector store, we integrate a web search capability:

from langchain_community.tools.tavily_search import TavilySearchResults
web_search_tool = TavilySearchResults(k=10)

The Tavily search API provides web search functionality with the parameter k=10 retrieving the top 10 results for each query.

We can test the web search functionality directly:

# Sample search (commented out in the notebook)
# web_search_tool.run("Tesla stock market summary for Q3?")

Setting Up the Language Model

For our agent's reasoning and response generation, we need a powerful language model:

from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash-exp")

We're using Google's Gemini model, specifically the "gemini-2.0-flash-exp" variant, which offers a good balance between speed and capability for agent-based systems.

Defining Retrieval Functions

Now we define the core functions our agent will use to retrieve information:

# define vector search
from langchain.chains import RetrievalQA
def vector_search(query: str):
    qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
    return qa_chain.run(query)

# define web search
def web_search(query: str):
   return web_search_tool.run(query)

These functions encapsulate two different retrieval strategies:

vector_search: Queries our internal vector database (Tesla Q3 report)
web_search: Searches the web for information not in our database

Creating Tool Interfaces

To make these functions accessible to our agent, we need to define them as tools:

from langchain.tools import tool

@tool
def vector_search_tool(query: str) -> str:
    """Tool for searching the vector store."""
    return vector_search(query)

@tool
def web_search_tool_func(query: str) -> str:
    """Tool for performing web search."""
    return web_search(query)

# define tools for the agent
from langchain.agents import Tool
tools = [
    Tool(
        name="VectorStoreSearch",
        func=vector_search_tool,
        description="Use this to search the vector store for information."
    ),
    Tool(
        name="WebSearch",
        func=web_search_tool_func,
        description="Use this to perform a web search for information."
    ),
]

The @tool decorator transforms our functions into LangChain tools. We then wrap these in Tool objects with names and descriptions that help the agent understand when to use each tool.

Defining the Agent's System Prompt

The agent's behavior is guided by a system prompt that defines its operational logic:

# define system prompt
system_prompt = """Respond to the human as helpfully and accurately as possible. You have access to the following tools: {tools}
Always try the \"VectorStoreSearch\" tool first. Only use \"WebSearch\" if the vector store does not contain the required information.
Use a json blob to specify a tool by providing an action key (tool name) and an action_input key (tool input).
Valid "action" values: "Final Answer" or {tool_names}
Provide only ONE action per $JSON_BLOB, as shown:"
```
{{
  "action": $TOOL_NAME,
  "action_input": $INPUT
}}
```
Follow this format:
Question: input question to answer
Thought: consider previous and subsequent steps
Action:
```
$JSON_BLOB
```
Observation: action result
... (repeat Thought/Action/Observation N times)
Thought: I know what to respond
Action:
```
{{
  "action": "Final Answer",
  "action_input": "Final response to human"
}}
Begin! Reminder to ALWAYS respond with a valid json blob of a single action.
Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation"""

# human prompt
human_prompt = """{input}
{agent_scratchpad}
(reminder to always respond in a JSON blob)"""

This system prompt is crucial as it defines:

The hierarchical strategy (try vector store first, then web search)
The expected format for tool calls using JSON
The reasoning process the agent should follow

Key Point: The instruction "Always try the \"VectorStoreSearch\" tool first" establishes a priority order for information retrieval, directing the agent to prefer local knowledge before searching externally.

Assembling the Agent Chain

Now we assemble the complete agent chain that will process queries:

# create prompt template
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", human_prompt),
    ]
)

# tool render
from langchain.tools.render import render_text_description_and_args
prompt = prompt.partial(
    tools=render_text_description_and_args(list(tools)),
    tool_names=", ".join([t.name for t in tools]),
)

# create rag chain
from langchain.schema.runnable import RunnablePassthrough
from langchain.agents.output_parsers import JSONAgentOutputParser
from langchain.agents.format_scratchpad import format_log_to_str
chain = (
    RunnablePassthrough.assign(
        agent_scratchpad=lambda x: format_log_to_str(x["intermediate_steps"]),
    )
    | prompt
    | llm
    | JSONAgentOutputParser()
)

# create agent
from langchain.agents import AgentExecutor
agent_executor = AgentExecutor(
    agent=chain,
    tools=tools,
    handle_parsing_errors=True,
    verbose=True
)

This chain connects all the components:

The prompt template with system and human instructions
The language model (Gemini) for reasoning
The tools for retrieving information
Output parsing to interpret the agent's decisions

The AgentExecutor manages the execution flow, handling the back-and-forth between thinking, tool usage, and final answer generation.

Testing the Agent with Queries

Let's test our agent with a couple of example queries:

# Query about information in the vector store (Tesla Q3 report)
agent_executor.invoke({"input": "Total automotive revenues Q3-2024"})

Since this information is likely in the Tesla Q3 report, the agent should use the VectorStoreSearch tool.

# Query about information likely not in the vector store
agent_executor.invoke({"input": "Tesla stock market summary for 2024?"})

Since this information is broader and likely not in the Q3 report, the agent may need to fall back to the WebSearch tool.

Note: The verbose=True parameter lets us see the agent's reasoning process and tool selection decisions during execution.

Processing Multiple Queries

For production use, we can create a non-verbose version of the agent and process multiple queries:

# create agent with verbose=False for production
agent_output = AgentExecutor(
    agent=chain,
    tools=tools,
    handle_parsing_errors=True,
    verbose=False
)

# Create dataset
question = [
    "What milestones did the Shanghai factory achieve in Q3 2024?",
    "Tesla stock market summary for 2024?"
]
response = []
contexts = []

# Inference
for query in question:
    vector_contexts = retriever.get_relevant_documents(query)
    if vector_contexts:
        context_texts = [doc.page_content for doc in vector_contexts]
        contexts.append(context_texts)
    else:
        print(f"[DEBUG] No relevant information in vector store for query: {query}. Falling back to web search.")
        web_results = web_search_tool.run(query)
        contexts.append([web_results])

    # Get the agent response
    result = agent_output.invoke({"input": query})
    response.append(result['output'])

# To dict
data = {
    "query": question,
    "response": response,
    "context": contexts,
}

This batch processing approach:

Attempts to retrieve information from the vector store first
Falls back to web search if no relevant documents are found
Tracks both the queries, responses, and the contexts used

The resulting data dictionary could be used for evaluation, logging, or further processing of the agent's responses.

Advantages of Agentic RAG

The Agentic RAG approach offers several advantages over traditional RAG systems:

Adaptability: Can switch between information sources based on need
Freshness: Access to up-to-date information through web search
Transparency: Visibility into decision-making through the agent's reasoning
Efficiency: Prioritizes faster local retrieval before resorting to external searches

Advanced Consideration: This basic implementation can be extended with additional tools, better fallback strategies, and more sophisticated reasoning about the quality and relevance of retrieved information.

Conclusion and Next Steps

We've built a basic Agentic RAG system that intelligently decides between local and web-based information retrieval. This approach can be extended in several ways:

Adding more specialized tools (e.g., database queries, API calls)
Implementing more sophisticated retrieval strategies
Adding memory to track previous queries and their results
Implementing feedback mechanisms to improve tool selection
Adding evaluation metrics to measure the quality of responses

Agentic RAG represents an evolution in retrieval-augmented generation, providing more flexible and powerful information retrieval capabilities that combine the strengths of both local knowledge bases and external information sources.