Retrieval, Semantic Search, Reranking | RAG

Retrieval in RAG Systems

After storing embeddings, you need to retrieve relevant chunks to answer user queries.

Mastra provides flexible retrieval options with support for semantic search, filtering, and re-ranking.

How Retrieval Works

The user’s query is converted to an embedding using the same model used for document embeddings
This embedding is compared to stored embeddings using vector similarity
The most similar chunks are retrieved and can be optionally:

Filtered by metadata
Re-ranked for better relevance
Processed through a knowledge graph

Basic Retrieval

The simplest approach is direct semantic search. This method uses vector similarity to find chunks that are semantically similar to the query:


import { openai } from "@ai-sdk/openai";
import { embed } from "ai";
import { PgVector } from "@mastra/pg";
 
// Convert query to embedding
const { embedding } = await embed({
  value: "What are the main points in the article?",
  model: openai.embedding("text-embedding-3-small"),
});
 
// Query vector store
const pgVector = new PgVector(process.env.POSTGRES_CONNECTION_STRING);
const results = await pgVector.query({
  indexName: "embeddings",
  queryVector: embedding,
  topK: 10,
});
 
// Display results
console.log(results);

Results include both the text content and a similarity score:


[
  {
    text: "Climate change poses significant challenges...",
    score: 0.89,
    metadata: { source: "article1.txt" },
  },
  {
    text: "Rising temperatures affect crop yields...",
    score: 0.82,
    metadata: { source: "article1.txt" },
  },
  // ... more results
];

For an example of how to use the basic retrieval method, see the Retrieve Results example.

Advanced Retrieval options

Metadata Filtering

Filter results based on metadata fields to narrow down the search space. This is useful when you have documents from different sources, time periods, or with specific attributes. Mastra provides a unified MongoDB-style query syntax that works across all supported vector stores.

For detailed information about available operators and syntax, see the Metadata Filters Reference.

Basic filtering examples:


// Simple equality filter
const results = await pgVector.query({
  indexName: "embeddings",
  queryVector: embedding,
  topK: 10,
  filter: {
    source: "article1.txt",
  },
});
 
// Numeric comparison
const results = await pgVector.query({
  indexName: "embeddings",
  queryVector: embedding,
  topK: 10,
  filter: {
    price: { $gt: 100 },
  },
});
 
// Multiple conditions
const results = await pgVector.query({
  indexName: "embeddings",
  queryVector: embedding,
  topK: 10,
  filter: {
    category: "electronics",
    price: { $lt: 1000 },
    inStock: true,
  },
});
 
// Array operations
const results = await pgVector.query({
  indexName: "embeddings",
  queryVector: embedding,
  topK: 10,
  filter: {
    tags: { $in: ["sale", "new"] },
  },
});
 
// Logical operators
const results = await pgVector.query({
  indexName: "embeddings",
  queryVector: embedding,
  topK: 10,
  filter: {
    $or: [{ category: "electronics" }, { category: "accessories" }],
    $and: [{ price: { $gt: 50 } }, { price: { $lt: 200 } }],
  },
});

Common use cases for metadata filtering:

Filter by document source or type
Filter by date ranges
Filter by specific categories or tags
Filter by numerical ranges (e.g., price, rating)
Combine multiple conditions for precise querying
Filter by document attributes (e.g., language, author)

For an example of how to use metadata filtering, see the Hybrid Vector Search example.

Vector Query Tool

Sometimes you want to give your agent the ability to query a vector database directly. The Vector Query Tool allows your agent to be in charge of retrieval decisions, combining semantic search with optional filtering and reranking based on the agent’s understanding of the user’s needs.


const vectorQueryTool = createVectorQueryTool({
  vectorStoreName: "pgVector",
  indexName: "embeddings",
  model: openai.embedding("text-embedding-3-small"),
});

When creating the tool, pay special attention to the tool’s name and description - these help the agent understand when and how to use the retrieval capabilities. For example, you might name it “SearchKnowledgeBase” and describe it as “Search through our documentation to find relevant information about X topic.”

This is particularly useful when:

Your agent needs to dynamically decide what information to retrieve
The retrieval process requires complex decision-making
You want the agent to combine multiple retrieval strategies based on context

For detailed configuration options and advanced usage, see the Vector Query Tool Reference.

Vector Store Prompts

Vector store prompts define query patterns and filtering capabilities for each vector database implementation. When implementing filtering, these prompts are required in the agent’s instructions to specify valid operators and syntax for each vector store implementation.

Pg Vector


import { openai } from '@ai-sdk/openai';
import { PGVECTOR_PROMPT } from "@mastra/pg";
 
export const ragAgent = new Agent({
  name: 'RAG Agent',
  model: openai('gpt-4o-mini'),
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${PGVECTOR_PROMPT}
  `,
  tools: { vectorQueryTool },
});

Pinecone

vector-store.ts


import { openai } from '@ai-sdk/openai';
import { PINECONE_PROMPT } from "@mastra/pinecone";
 
export const ragAgent = new Agent({
  name: 'RAG Agent',
  model: openai('gpt-4o-mini'),
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${PINECONE_PROMPT}
  `,
  tools: { vectorQueryTool },
});

Qdrant

vector-store.ts


import { openai } from '@ai-sdk/openai';
import { QDRANT_PROMPT } from "@mastra/qdrant";
 
export const ragAgent = new Agent({
  name: 'RAG Agent',
  model: openai('gpt-4o-mini'),
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${QDRANT_PROMPT}
  `,
  tools: { vectorQueryTool },
});

Chroma

vector-store.ts


import { openai } from '@ai-sdk/openai';
import { CHROMA_PROMPT } from "@mastra/chroma";
 
export const ragAgent = new Agent({
  name: 'RAG Agent',
  model: openai('gpt-4o-mini'),
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${CHROMA_PROMPT}
  `,
  tools: { vectorQueryTool },
});

Astra

vector-store.ts


import { openai } from '@ai-sdk/openai';
import { ASTRA_PROMPT } from "@mastra/astra";
 
export const ragAgent = new Agent({
  name: 'RAG Agent',
  model: openai('gpt-4o-mini'),
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${ASTRA_PROMPT}
  `,
  tools: { vectorQueryTool },
});

LibSQL

vector-store.ts


import { openai } from '@ai-sdk/openai';
import { LIBSQL_PROMPT } from "@mastra/libsql";
 
export const ragAgent = new Agent({
  name: 'RAG Agent',
  model: openai('gpt-4o-mini'),
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${LIBSQL_PROMPT}
  `,
  tools: { vectorQueryTool },
});

Upstash

vector-store.ts


import { openai } from '@ai-sdk/openai';
import { UPSTASH_PROMPT } from "@mastra/upstash";
 
export const ragAgent = new Agent({
  name: 'RAG Agent',
  model: openai('gpt-4o-mini'),
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${UPSTASH_PROMPT}
  `,
  tools: { vectorQueryTool },
});

Cloudflare

vector-store.ts


import { openai } from '@ai-sdk/openai';
import { VECTORIZE_PROMPT } from "@mastra/vectorize";
 
export const ragAgent = new Agent({
  name: 'RAG Agent',
  model: openai('gpt-4o-mini'),
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${VECTORIZE_PROMPT}
  `,
  tools: { vectorQueryTool },
});

MongoDB

vector-store.ts


import { openai } from '@ai-sdk/openai';
import { MONGODB_PROMPT } from "@mastra/mongodb";
 
export const ragAgent = new Agent({
  name: 'RAG Agent',
  model: openai('gpt-4o-mini'),
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${MONGODB_PROMPT}
  `,
  tools: { vectorQueryTool },
});

Re-ranking

Initial vector similarity search can sometimes miss nuanced relevance. Re-ranking is a more computationally expensive process, but more accurate algorithm that improves results by:

Considering word order and exact matches
Applying more sophisticated relevance scoring
Using a method called cross-attention between query and documents

Here’s how to use re-ranking:


import { openai } from "@ai-sdk/openai";
import { rerank } from "@mastra/rag";
 
// Get initial results from vector search
const initialResults = await pgVector.query({
  indexName: "embeddings",
  queryVector: queryEmbedding,
  topK: 10,
});
 
// Re-rank the results
const rerankedResults = await rerank(
  initialResults,
  query,
  openai("gpt-4o-mini"),
);

Note: For semantic scoring to work properly during re-ranking, each result must include the text content in its metadata.text field.

The re-ranked results combine vector similarity with semantic understanding to improve retrieval quality.

For more details about re-ranking, see the rerank() method.

For an example of how to use the re-ranking method, see the Re-ranking Results example.

Graph-based Retrieval

For documents with complex relationships, graph-based retrieval can follow connections between chunks. This helps when:

Information is spread across multiple documents
Documents reference each other
You need to traverse relationships to find complete answers

Example setup:


const graphQueryTool = createGraphQueryTool({
  vectorStoreName: "pgVector",
  indexName: "embeddings",
  model: openai.embedding("text-embedding-3-small"),
  graphOptions: {
    threshold: 0.7,
  },
});

For more details about graph-based retrieval, see the GraphRAG class and the createGraphQueryTool() function.

For an example of how to use the graph-based retrieval method, see the Graph-based Retrieval example.