1
 2// 1. Initialize document
 3const doc = MDocument.fromText("Your document text here...");
 4
 5// 2. Create chunks
 6const chunks = await doc.chunk({
 7  strategy: "recursive",
 8  size: 512,
 9  overlap: 50,
10});
11
12// 3. Generate embeddings; we need to pass the text of each chunk
13const { embeddings } = await embedMany({
14    values: chunks.map((c) => c.text),
15    model: openai.embeddings("text-embedding-3-small"),
16  })
17
18// 4. Store in vector database
19const pgVector = new pgVector({
20    connectionString: process.env.POSTGRES_CONNECTION_STRING,
21})

RAG Pipeline

The simplest way to build a full agent RAG pipeline in TypeScript

Mastra handles the complete RAG pipeline and enhances LLM outputs by incorporating relevant context from your own data sources. As a dedicated RAG platform, it replaces the need to stitch together chunking, embedding, storage and retrieval across separate tools and gives agents accurate, grounded responses.

Retrieving documents from mastra-db

Build your full RAG pipeline

Mastra provides standardized APIs for every step of the RAG pipeline in a single framework. Chunk documents using recursive or sliding window strategies, generate embeddings, store them in your preferred vector database and retrieve relevant context at query time. Mastra includes observability for tracking embedding and retrieval performance, making it a complete RAG service for production use.

.embed()

.query()

.rerank()

input

knowledge base

.embed()

embedding model

.query()

retrieval

vector
stores

.rerank()

llm

output

Chunking and Embedding

Split documents and transform into vectors

Split documents using recursive, sentence, or custom strategies. Then transform chunks into vector embeddings with OpenAI, Cohere, or any compatible provider. The RAG engine handles both steps through a single, typed API.

docs

Vector Database

Choose from supported vector store providers

Store embeddings where they fit your stack. Mastra supports pgVector, Pinecone, Qdrant and MongoDB out of the box.

docs

Retrieval

Retrieve using advanced metadata filtering

Query stored vectors with semantic search and advanced metadata filtering. Rerank results to surface the most relevant context before passing it to your LLM. Build a RAG app that returns precise, grounded answers.

docs

Give agents knowledge of your data

Mastra RAG gives agents access to relevant context from your own data sources at query time. Connect your vector store to an agent and Mastra retrieves semantically similar chunks to include in the LLM prompt, grounding agent responses in real information. This RAG platform makes it straightforward to build agents that stay accurate.

RAG as an Agent Tool

Register your knowledge base as a tool that agents can call on demand. The agent decides when retrieval is needed and incorporates the results into its reasoning. This is the core of any production RAG AI tool workflow.

docs

Traced Retrieval

See every query, lookup, and rerank step

Every query, embedding lookup, and rerank step is logged with full traces. See which documents were retrieved, how they scored, and whether the agent used them in its final response.

docs

Advanced RAG techniques

Mastra goes beyond standard retrieval with context engineering techniques for more accurate, grounded responses. ReAG enables models to reason directly over your documents rather than retrieving pre-embedded chunks. Graph RAG and agentic RAG extend context engineering with structured knowledge and agent-driven retrieval.

Reasoning-Augmented Generation

Reason directly over full documents

Instead of retrieving chunks and hoping the right context is included, ReAG lets agents reason directly over full documents. This RAG platform produces more context-aware answers for complex queries where standard retrieval falls short.

docs

Advanced Context Engineering

Shape context with memory, history, and RAG

Shape the context your agents receive by combining memory, conversation history, and RAG results. Control exactly what information enters the prompt and in what order, so the LLM focuses on what matters most.

docs

Frequently asked questions

How does RAG work in Mastra?

Mastra RAG enhances LLM outputs by incorporating relevant context from your own data sources. The pipeline chunks documents, generates embeddings, stores them in a vector database and retrieves relevant context at query time. Mastra provides standardized APIs for each step.

What vector databases does Mastra RAG support?

Mastra RAG supports multiple vector stores including pgvector, Pinecone, Qdrant and MongoDB. Configure your preferred vector database in the storage layer of your RAG pipeline. Mastra's standardized APIs work consistently across all supported vector stores.

What chunking strategies does Mastra RAG support?

Mastra RAG supports multiple document chunking strategies including recursive and sliding window approaches. Documents can be enriched with metadata during chunking. Configure chunk size and overlap to optimize embedding quality and retrieval accuracy.

How does Mastra RAG integrate with agents?

What is ReAG and how does Mastra support it?

Mastra supports Reasoning-Augmented Generation, or ReAG, which enables models to reason directly over your documents rather than retrieving pre-embedded chunks. Mastra also supports Graph RAG and agentic RAG for more advanced context engineering when standard retrieval does not provide sufficient accuracy.

Ship better agents

Build production RAG pipelines on the open-source framework for AI agents

Get Started Documentation