RAG (Retrieval-Augmented Generation) in Mastra

RAG in Mastra helps you enhance LLM outputs by incorporating relevant context from your own data sources, improving accuracy and grounding responses in real information.

Mastra’s RAG system provides:

Standardized APIs to process and embed documents
Support for multiple vector stores
Chunking and embedding strategies for optimal retrieval
Observability for tracking embedding and retrieval performance

Example

To implement RAG, you process your documents into chunks, create embeddings, store them in a vector database, and then retrieve relevant context at query time.


import { embedMany } from "ai";
import { openai } from "@ai-sdk/openai";
import { PgVector } from "@mastra/pg";
import { MDocument } from "@mastra/rag";
import { z } from "zod";
 
// 1. Initialize document
const doc = MDocument.fromText(`Your document text here...`);
 
// 2. Create chunks
const chunks = await doc.chunk({
  strategy: "recursive",
  size: 512,
  overlap: 50,
});
 
// 3. Generate embeddings; we need to pass the text of each chunk
const { embeddings } = await embedMany({
  values: chunks.map((chunk) => chunk.text),
  model: openai.embedding("text-embedding-3-small"),
});
 
// 4. Store in vector database
const pgVector = new PgVector({
  connectionString: process.env.POSTGRES_CONNECTION_STRING,
});
await pgVector.upsert({
  indexName: "embeddings",
  vectors: embeddings,
}); // using an index name of 'embeddings'
 
// 5. Query similar chunks
const results = await pgVector.query({
  indexName: "embeddings",
  queryVector: queryVector,
  topK: 3,
}); // queryVector is the embedding of the query
 
console.log("Similar chunks:", results);

This example shows the essentials: initialize a document, create chunks, generate embeddings, store them, and query for similar content.

Document Processing

The basic building block of RAG is document processing. Documents can be chunked using various strategies (recursive, sliding window, etc.) and enriched with metadata. See the chunking and embedding doc.

Vector Storage

Mastra supports multiple vector stores for embedding persistence and similarity search, including pgvector, Pinecone, Qdrant, and MongoDB. See the vector database doc.

Observability and Debugging

Mastra’s RAG system includes observability features to help you optimize your retrieval pipeline:

Track embedding generation performance and costs
Monitor chunk quality and retrieval relevance
Analyze query patterns and cache hit rates
Export metrics to your observability platform

See the OTel Configuration page for more details.

More resources

Chain of Thought RAG Example
All RAG Examples (including different chunking strategies, embedding models, and vector stores)