prev next

Chapter 10.3: Retrieval Pipelines

Modern RAG systems require sophisticated retrieval pipelines that go beyond simple vector similarity search. Retrieval Pipelines combine multiple search techniques, apply filtering and reranking, and orchestrate complex workflows to find the most relevant and useful information for a given query. This chapter explores the architecture and components that make retrieval systems production-ready.

The Evolution of Retrieval

Early RAG systems relied solely on semantic similarity through vector search. However, real-world applications revealed limitations: keyword-specific queries, temporal relevance, metadata filtering, and result quality all required more sophisticated approaches.

  • Hybrid Search: Combining vector similarity with traditional keyword search (BM25)
  • Multi-Stage Retrieval: Initial broad retrieval followed by precise reranking
  • Dynamic Filtering: Context-aware filtering based on metadata, time, or user preferences
  • Query Enhancement: Expanding or reformulating queries for better retrieval

Hybrid Search Architecture

Let A_0 be the initial action or output generated by the agent for a given query Q. The agent then enters a reflection step.

Let R(A_i) be the reflection on action A_i, which produces a critique or a set of suggested improvements, C_i.

C_i = Reflect(A_i, Outcome_i)

The agent then generates a new, corrected action A_{i+1} by incorporating this critique.

A_{i+1} = Generate(Q, A_i, C_i)

This process can be repeated for a fixed number of steps or until the output is deemed satisfactory by an internal or external evaluator. This iterative refinement is key to improving the quality of the agent's responses.

Visualization: The Reflection and Correction Loop

The D3.js visualization below shows an agent producing an initial output. It then enters a reflection phase, critiques its own work, and generates a revised, improved output. This cycle demonstrates the agent's ability to learn from its experience.