Build a RAG agent with Mastra and Elasticsearch

Last month, our friends at Elastic shipped native Elasticsearch vector store support to Mastra, plus a demo at elastic/mastra-elasticsearch-example: a RAG agent that answers questions over a corpus of 500 sci-fi movies.

Enrico Zimuel (Elastic) wrote up the integration tutorial: what to install, how to set up Elasticsearch, how to ingest the dataset. Here's the agent-side view of how the pieces fit together.

Why RAG agents need their own primitives

You can build a RAG flow without an agent. Take the user's question, embed it, search the vector store, stuff the results into a prompt, generate an answer. Done.

But a RAG agent is doing more. It needs memory across turns so a follow-up question keeps context. It needs retrieval as a tool the model decides when to call, not a hardcoded step in a pipeline. And it needs a structured loop — receive a message, decide whether to search, search if useful, synthesize an answer — that holds up over a conversation rather than a single turn.

Mastra gives you those three primitives (agent, tool, memory), and Elastic's contribution is what makes the retrieval step plug in cleanly.

One agent, one vector tool, one memory store

The whole demo agent lives in a single file. From src/mastra/agents/elasticsearch-agent.ts:

import { Agent } from "@mastra/core/agent";
import { ElasticSearchVector } from "@mastra/elasticsearch";
import { createVectorQueryTool } from "@mastra/rag";
import { ModelRouterEmbeddingModel } from "@mastra/core/llm";
import { Memory } from "@mastra/memory";
 
const esVector = new ElasticSearchVector({
  id: "elasticsearch-vector",
  url: process.env.ELASTICSEARCH_URL!,
  auth: { apiKey: process.env.ELASTICSEARCH_API_KEY! },
});
 
const vectorQueryTool = createVectorQueryTool({
  vectorStore: esVector,
  indexName: process.env.ELASTICSEARCH_INDEX_NAME!,
  model: new ModelRouterEmbeddingModel("openai/text-embedding-3-small"),
});
 
export const elasticsearchAgent = new Agent({
  id: "elasticsearch-agent",
  name: "Elasticsearch Agent",
  instructions: `You are a helpful assistant that answers questions...`,
  model: "openai/gpt-5-nano",
  tools: { vectorQueryTool },
  memory: new Memory(),
});

Three things to note.

The chat model and the embedding model are both strings: "openai/gpt-5-nano" and "openai/text-embedding-3-small". Mastra's router parses provider/model-name and dispatches. The same model-routing pattern works for chat and embeddings.

Memory is wired in with a single line: memory: new Memory(). Default in-memory thread store, persisting per-conversation context across turns. If you want a backed store later, swap the constructor argument and the agent code stays put.

The agent's tools field is just an object map. The model decides when to invoke vectorQueryTool; the framework handles the round-trip.

`createVectorQueryTool` is the portable abstraction

The tool definition does the load-bearing work:

const vectorQueryTool = createVectorQueryTool({
  vectorStore: esVector,
  indexName: process.env.ELASTICSEARCH_INDEX_NAME!,
  model: new ModelRouterEmbeddingModel("openai/text-embedding-3-small"),
});

createVectorQueryTool (from @mastra/rag) is what makes the agent portable across vector backends. Swap ElasticSearchVector for PgVector, PineconeVector, QdrantVector, or any of the other vector stores Mastra supports without rewriting the agent itself. The tool is the abstraction; the store is the implementation.

For the demo, what Elastic shipped makes ElasticSearchVector a first-class citizen alongside every other store, with the same construction shape and the same query interface. From the agent's perspective, Elasticsearch is just one more backend it can ask for context.

The synthesizer doesn't need the big model

The answer-generating model in Elastic's demo is gpt-5-nano. That's deliberate.

When retrieval has already done the heavy lifting, the LLM at the end of the pipeline doesn't need world-class reasoning. It needs to read the retrieved chunks and write a coherent answer.

The same logic shows up across well-built agent pipelines: pick the smallest model that can do the stage's job, and put the bigger models where they earn their keep. Mastra's per-stage model selection lets you do that explicitly.

Beyond stock vector search

The demo uses pure vector search: embed the query, find nearest neighbors, return chunks. That's the right starting point.

If accuracy matters at scale, Elasticsearch lets you go further without changing the agent file: hybrid search (combining lexical with vector for better retrieval on rare or specific terms) and reranking (running a separate model on the top results to reorder them). Both extend the same ElasticSearchVector abstraction; only the query plan changes.

The Elastic team has a deeper writeup on hybrid search if you want details on when and how to add it.

What this gives you

A working RAG agent in ~60 lines of TypeScript: memory across turns from one constructor, retrieval pluggable across every vector backend Mastra supports through a single tool, and a small synthesis model that's plenty for the stage's job.

Clone the demo, point it at an Elasticsearch instance (Elastic Cloud trial or local), and you can talk to your own data in a few minutes:

git clone https://github.com/elastic/mastra-elasticsearch-example

For the install steps, ingestion, and env config, read Enrico's post. For the agent-layer documentation, the Mastra docs.