RAG (Retrieval-Augmented Generation) in Mastra
RAG in Mastra helps you enhance LLM outputs by incorporating relevant context from your own data sources, improving accuracy and grounding responses in real information.
Mastra’s RAG system provides:
- Standardized APIs to process and embed documents
- Support for multiple vector stores
- Chunking and embedding strategies for optimal retrieval
- Observability for tracking embedding and retrieval performance
Example
To implement RAG, you process your documents into chunks, create embeddings, store them in a vector database, and then retrieve relevant context at query time.
import { MDocument, embed, PgVector } from "@mastra/rag";
import { z } from "zod";
// 1. Initialize document
const doc = MDocument.fromText(`Your document text here...`);
// 2. Create chunks
const chunks = await doc.chunk({
strategy: "recursive",
size: 512,
overlap: 50
});
// 3. Generate embeddings
const { embeddings } = await embed(chunks, {
provider: "OPEN_AI",
model: "text-embedding-ada-002"
});
// 4. Store in vector database
const pgVector = new PgVector(process.env.POSTGRES_CONNECTION_STRING);
await pgVector.store(embeddings);
// 5. Query similar chunks
const results = await pgVector.query("Your search query", {
topK: 3
});
console.log("Similar chunks:", results);
This example shows the essentials: initialize a document, create chunks, generate embeddings, store them, and query for similar content.
Document Processing
The basic building block of RAG is document processing. Documents can be chunked using various strategies (recursive, sliding window, etc.) and enriched with metadata. See the chunking and embedding doc.
Vector Storage
Mastra supports multiple vector stores for embedding persistence and similarity search, including pgvector, Pinecone, and Qdrant. See the vector database doc.
Observability and Debugging
Mastra’s RAG system includes observability features to help you optimize your retrieval pipeline:
- Track embedding generation performance and costs
- Monitor chunk quality and retrieval relevance
- Analyze query patterns and cache hit rates
- Export metrics to your observability platform
See the OTel Configuration page for more details.
More resources
- Chain of Thought RAG Example
- All RAG Examples (including different chunking strategies, embedding models, and vector stores)