How to Build an Advanced Agentic Retrieval-Augmented Generation (RAG) System with Dynamic Strategy and Smart Retrieval

In this tutorial, we walk through the implementation of an Agentic Retrieval-Augmented Generation (RAG) system. This system is designed to do more than just retrieve documents; it actively decides when retrieval is needed, selects the best retrieval strategy, and synthesizes responses with contextual awareness. By combining embeddings, FAISS indexing, and a mock LLM, we create a practical demonstration of how agentic decision-making can elevate the standard RAG pipeline into something more adaptive and intelligent.

Understanding the Target Audience

The target audience for this tutorial includes AI developers, data scientists, and business managers interested in leveraging advanced AI systems for improved information retrieval. Their key characteristics are:

Pain Points: Difficulty in effectively retrieving relevant information from large datasets, lack of adaptability in existing systems, and the need for transparency in AI decision-making.
Goals: To implement advanced AI systems that enhance decision-making processes, improve operational efficiency, and provide accurate, context-aware responses.
Interests: Innovative AI techniques, practical applications of machine learning, and strategies for integrating AI into business operations.
Communication Preferences: Technical documentation, code examples, hands-on tutorials, and clear explanations of AI concepts and their business implications.

Building the Agentic RAG System

We set up the foundation of our Agentic RAG system by defining a mock LLM to simulate decision-making, creating a retrieval strategy enum, and designing a Document dataclass to structure and manage our knowledge base efficiently.

import numpy as np
import faiss
from sentence_transformers import SentenceTransformer
from typing import List, Dict, Any, Optional
from dataclasses import dataclass
from enum import Enum

class MockLLM:
    def generate(self, prompt: str, max_tokens: int = 150) -> str:
        prompt_lower = prompt.lower()
        
        if "decide whether to retrieve" in prompt_lower:
            if any(word in prompt_lower for word in ["specific", "recent", "data", "facts", "when", "who", "what"]):
                return "RETRIEVE: The query requires specific factual information that needs to be retrieved."
            else:
                return "NO_RETRIEVE: This is a general question that can be answered with existing knowledge."
        
        elif "choose retrieval strategy" in prompt_lower:
            if "comparison" in prompt_lower or "versus" in prompt_lower:
                return "STRATEGY: multi_query - Need to retrieve information about multiple entities for comparison."
            elif "recent" in prompt_lower or "latest" in prompt_lower:
                return "STRATEGY: temporal - Focus on recent information."
            else:
                return "STRATEGY: semantic - Standard semantic similarity search."
        
        elif "synthesize" in prompt_lower and "context:" in prompt_lower:
            return "Based on the retrieved information, here's a comprehensive answer that combines multiple sources and provides specific details with proper context."
        
        return "This is a mock response. In practice, use a real LLM like OpenAI's GPT or similar."

class RetrievalStrategy(Enum):
    SEMANTIC = "semantic"
    MULTI_QUERY = "multi_query"
    TEMPORAL = "temporal"
    HYBRID = "hybrid"

@dataclass
class Document:
    id: str
    content: str
    metadata: Dict[str, Any]
    embedding: Optional[np.ndarray] = None

Core Components of the Agentic RAG System

We build the core of our Agentic RAG system by initializing the embedding model, setting up the FAISS index, and adding documents by encoding their contents into vectors. This enables fast and accurate semantic retrieval from our knowledge base.

class AgenticRAGSystem:
    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        self.encoder = SentenceTransformer(model_name)
        self.llm = MockLLM()
        self.documents: List[Document] = []
        self.index: Optional[faiss.Index] = None
        
    def add_documents(self, documents: List[Dict[str, Any]]) -> None:
        for i, doc in enumerate(documents):
            doc_obj = Document(
                id=doc.get('id', str(i)),
                content=doc['content'],
                metadata=doc.get('metadata', {})
            )
            self.documents.append(doc_obj)
        
        contents = [doc.content for doc in self.documents]
        embeddings = self.encoder.encode(contents, show_progress_bar=True)
        
        for doc, embedding in zip(self.documents, embeddings):
            doc.embedding = embedding
        
        dimension = embeddings.shape[1]
        self.index = faiss.IndexFlatIP(dimension)
        
        faiss.normalize_L2(embeddings)
        self.index.add(embeddings.astype('float32'))

Decision-Making and Retrieval Strategy Selection

We give our agent the ability to think before it fetches. First, we determine if a query truly requires retrieval, then we select the most suitable strategy: semantic, multi-query, temporal, or hybrid. This allows us to target the correct context with clear reasoning for each step.

    def decide_retrieval(self, query: str) -> bool:
        decision_prompt = f"""
        Analyze the following query and decide whether to retrieve information:
        Query: "{query}"
        
        Decide whether to retrieve information from the knowledge base.
        Consider if this needs specific facts, recent data, or can be answered generally.
        
        Respond with either:
        RETRIEVE: [reason] or NO_RETRIEVE: [reason]
        """
        
        response = self.llm.generate(decision_prompt)
        should_retrieve = response.startswith("RETRIEVE:")
        
        return should_retrieve
    
    def choose_strategy(self, query: str) -> RetrievalStrategy:
        strategy_prompt = f"""
        Choose the best retrieval strategy for this query:
        Query: "{query}"
        
        Available strategies:
        - semantic: Standard similarity search
        - multi_query: Multiple related queries (for comparisons)
        - temporal: Focus on recent information
        - hybrid: Combination approach
        
        Choose retrieval strategy and explain why.
        Respond with: STRATEGY: [strategy_name] - [reasoning]
        """
        
        response = self.llm.generate(strategy_prompt)
        
        if "multi_query" in response.lower():
            strategy = RetrievalStrategy.MULTI_QUERY
        elif "temporal" in response.lower():
            strategy = RetrievalStrategy.TEMPORAL
        elif "hybrid" in response.lower():
            strategy = RetrievalStrategy.HYBRID
        else:
            strategy = RetrievalStrategy.SEMANTIC
        
        return strategy

Document Retrieval and Response Synthesis

We implement how we actually fetch and use knowledge. We perform semantic search, branch into multi-query or temporal re-ranking when needed, deduplicate results, and then synthesize a focused answer from the retrieved context. This maintains efficient, transparent, and tightly aligned retrieval.

    def retrieve_documents(self, query: str, strategy: RetrievalStrategy, k: int = 3) -> List[Document]:
        if not self.index:
            return []
        
        if strategy == RetrievalStrategy.MULTI_QUERY:
            queries = [query, f"advantages of {query}", f"disadvantages of {query}"]
            all_docs = []
            for q in queries:
                docs = self._semantic_search(q, k=2)
                all_docs.extend(docs)
            seen_ids = set()
            unique_docs = []
            for doc in all_docs:
                if doc.id not in seen_ids:
                    unique_docs.append(doc)
                    seen_ids.add(doc.id)
            return unique_docs[:k]
        
        elif strategy == RetrievalStrategy.TEMPORAL:
            docs = self._semantic_search(query, k=k*2)
            docs_with_dates = [(doc, doc.metadata.get('date', '1900-01-01')) for doc in docs]
            docs_with_dates.sort(key=lambda x: x[1], reverse=True)
            return [doc for doc, _ in docs_with_dates[:k]]
        
        else:
            return self._semantic_search(query, k=k)
    
    def _semantic_search(self, query: str, k: int) -> List[Document]:
        query_embedding = self.encoder.encode([query])
        faiss.normalize_L2(query_embedding)
        
        scores, indices = self.index.search(query_embedding.astype('float32'), k)
        
        results = []
        for score, idx in zip(scores[0], indices[0]):
            if idx < len(self.documents):
                results.append(self.documents[idx])
        
        return results
    
    def synthesize_response(self, query: str, retrieved_docs: List[Document]) -> str:
        if not retrieved_docs:
            return self.llm.generate(f"Answer this query: {query}")
        
        context = "\n\n".join([f"Document {i+1}: {doc.content}" for i, doc in enumerate(retrieved_docs)])
        
        synthesis_prompt = f"""
        Query: {query}
        
        Context: {context}
        
        Synthesize a comprehensive answer using the provided context.
        Be specific and reference the information sources when relevant.
        """
        
        return self.llm.generate(synthesis_prompt, max_tokens=200)

Final Integration and Execution

We bring all the parts together into a single pipeline. When we run a query, we first determine if retrieval is necessary, then select the appropriate strategy, fetch documents accordingly, and finally synthesize a response while also displaying the retrieved context for transparency. This makes the system feel more agentic and explainable.

    def query(self, query: str) -> str:
        if not self.decide_retrieval(query):
            return self.llm.generate(f"Answer this query: {query}")
        
        strategy = self.choose_strategy(query)
        
        retrieved_docs = self.retrieve_documents(query, strategy)
        
        response = self.synthesize_response(query, retrieved_docs)
        
        return response

Conclusion

In conclusion, we see how agent-driven retrieval decisions, dynamic strategy selection, and transparent reasoning come together to form an advanced Agentic RAG workflow. This system highlights the potential of adding agency to RAG, making information retrieval smarter, more targeted, and more human-like in its adaptability. This foundation allows for future extensions with real LLMs, larger knowledge bases, and more sophisticated strategies.

Check out the FULL CODES. Feel free to check out our GitHub Page for tutorials, codes, and notebooks. Also, follow us on Twitter and join our 100k+ ML SubReddit. Don’t forget to subscribe to our Newsletter.

The post How to Build an Advanced Agentic Retrieval-Augmented Generation (RAG) System with Dynamic Strategy and Smart Retrieval? appeared first on MarkTechPost.