Build Agents with Retrieval and Long-Term Memory Using Mastra and MongoDB

We're building a movie recommendation agent that searches a knowledge base of 20,000 films and remembers your preferences across sessions. The whole thing — vector search, storage, Observational Memory — runs on a single MongoDB Atlas cluster.

Why MongoDB?

The @mastra/mongodb package was contributed to the Mastra project by MongoDB's engineering team. It covers three integration surfaces from a single Atlas cluster:

Vector Search — MongoDBVector uses Atlas Vector Search for semantic similarity queries. It supports cosine, euclidean, and dot product distance metrics, and metadata filtering uses native MongoDB query syntax — $gt, $in, nested fields, arrays. If you already know how to query MongoDB, you already know how to filter vector results.

Storage — MongoDBStore manages workflow state, conversation threads, messages, and evaluation data. Mastra auto-creates the collections it needs.

Memory — Mastra's Observational Memory system — the framework's default — compresses conversations into prioritized observations and persists them in MongoDB. Your agent remembers what mattered from previous sessions and lets the rest fade. Prompt caching on the observation log keeps costs down over time.

Requirements: MongoDB 7.0 or later with Atlas Search enabled.

Project Setup

Prerequisites: Node.js, a MongoDB Atlas cluster (free tier works), an OpenAI API key, and a Model API key from MongoDB Atlas to access Voyage AI models.

Scaffold a new Mastra project:

npx create-mastra@latest

Install the MongoDB package:

npm install @mastra/mongodb voyage-ai-provider

Create a .env file with your credentials:

OPENAI_API_KEY=your-openai-key
VOYAGE_API_KEY=your-model-api-key
MONGODB_URI=mongodb+srv://user:password@cluster.mongodb.net
MONGODB_DATABASE=mastra_movies

Your project structure will look like this:

src/
├── mastra/
│   ├── agents/
│   │   └── movieAgent.ts
│   ├── tools/
│   │   └── movieTool.ts
│   └── index.ts
├── scripts/
│   └── embed.ts
└── .env

Ingesting Data: Embedding 20,000 Movies

We're using Voyage AI for embeddings via the Embedding and Reranking API on MongoDB Atlas (currently in preview).

We'll use MongoDB's sample_mflix dataset — roughly 20,000 movies with titles, plots, cast, directors, genres, year, and IMDb ratings. The goal is to embed the plot text so our agent can search it semantically, while preserving the metadata for filtering.

First, set up the vector store and create an index:

import { MongoDBVector } from "@mastra/mongodb";
 
const vectorStore = new MongoDBVector({
  id: "mongodb-vector",
  uri: process.env.MONGODB_URI!,
  dbName: process.env.MONGODB_DATABASE!,
});
 
await vectorStore.createIndex({
  indexName: "movie_embeddings",
  dimension: 1024, // voyage-4-large
});

For each movie, we convert the plot into a document, chunk it, generate embeddings, and upsert with metadata:

import { MDocument } from "@mastra/rag";
import { embedMany } from "ai";
import { createVoyage } from "voyage-ai-provider";
 
const voyage = createVoyage();
 
// Create a document from the movie plot
const doc = MDocument.fromText(movie.plot);
 
// Chunk with recursive strategy for overlap
const chunks = await doc.chunk({
  strategy: "recursive",
  size: 512,
  overlap: 50,
});
 
// Generate embeddings
const { embeddings } = await embedMany({
  values: chunks.map((chunk) => chunk.text),
  model: voyage.textEmbeddingModel("voyage-4-large"),
});
 
// Upsert vectors with metadata
await vectorStore.upsert({
  indexName: "movie_embeddings",
  vectors: embeddings,
  metadata: chunks.map((chunk) => ({
    text: chunk.text,
    title: movie.title,
    year: movie.year,
    genres: movie.genres,
    cast: movie.cast,
    director: movie.directors,
    imdbRating: movie.imdb?.rating,
  })),
});

The metadata is important. Storing fields like year, genres, and imdbRating alongside the vectors means the agent can pre-filter results — "horror movies from the 1960s" becomes a metadata filter before the semantic search runs.

For 20,000 movies, you'll want to batch the embedding calls to stay within API rate limits. The full ingestion script in the companion repo handles this with configurable batch sizes and progress tracking.

Building the Movie Agent with RAG

We need two things: a tool that searches the movie knowledge base, and an agent that uses it.

Mastra provides createVectorQueryTool — a pre-built tool that wraps the embed-query-retrieve cycle into something an agent can call. You point it at a vector store and an embedding model, and the agent handles the rest:

import { createVectorQueryTool } from "@mastra/rag";
import { createVoyage } from "voyage-ai-provider";
 
const voyage = createVoyage();
 
const movieSearchTool = createVectorQueryTool({
  vectorStoreName: "mongoVector",
  indexName: "movie_embeddings",
  model: voyage.textEmbeddingModel("voyage-4-large"),
});

The vectorStoreName maps to the key you'll use when registering the vector store in your Mastra configuration. The @mastra/mongodb package also exports a MONGODB_PROMPT constant — a system prompt fragment that optimizes how the agent formulates queries against MongoDB's vector search. We'll use both when we define the agent in the next section.

First, register the vector store in your Mastra config:

// src/mastra/index.ts
import { Mastra } from "@mastra/core";
import { MongoDBVector } from "@mastra/mongodb";
import { movieAgent } from "./agents/movieAgent";
 
export const mastra = new Mastra({
  agents: { movieAgent },
  vectors: {
    mongoVector: new MongoDBVector({
      id: "mongodb-vector",
      uri: process.env.MONGODB_URI!,
      dbName: process.env.MONGODB_DATABASE!,
    }),
  },
});

The mongoVector key here matches the vectorStoreName in the tool definition. Mastra resolves the connection at runtime.

Adding Observational Memory

The agent can search movies. Now we'll make it remember and wire both capabilities together.

Mastra's default memory system is Observational Memory. It works the way human memory does: you don't remember every word of a conversation — you remember what mattered. Observational Memory compresses raw message history into timestamped, prioritized observations and lets irrelevant details fade.

Under the hood, two lightweight background agents handle this. The observer activates when conversation tokens hit a threshold (30k by default), reviews the chat, and condenses it into observations tagged with priority levels. The reflector kicks in when observations themselves accumulate, merging duplicates and dropping low-priority details. The result is 5–40x compression of conversation history.

The same cluster that handles your vector search also persists the observation log. Here's the complete agent definition — RAG tool, model, instructions, and memory all in one place:

// src/mastra/agents/movieAgent.ts
import { Agent } from "@mastra/core/agent";
import { Memory } from "@mastra/memory";
import { MongoDBStore } from "@mastra/mongodb";
import { createVectorQueryTool } from "@mastra/rag";
import { MONGODB_PROMPT } from "@mastra/mongodb";
import { createVoyage } from "voyage-ai-provider";
 
const voyage = createVoyage();
 
const movieSearchTool = createVectorQueryTool({
  vectorStoreName: "mongoVector",
  indexName: "movie_embeddings",
  model: voyage.textEmbeddingModel("voyage-4-large"),
});
 
export const movieAgent = new Agent({
  id: "movie-agent",
  name: "Movie Agent",
  model: "openai/gpt-4.1-nano",
  instructions: `
    You are a movie recommendation agent with access to a database of 20,000 films.
    Use the movie search tool to find relevant movies based on the user's query.
    When recommending movies, include the title, year, and a brief reason.
    Remember the user's preferences and past recommendations to improve suggestions over time.
    ${MONGODB_PROMPT}
  `,
  tools: { movieSearchTool },
  memory: new Memory({
    storage: new MongoDBStore({
      id: "mongodb-storage",
      uri: process.env.MONGODB_URI!,
      dbName: process.env.MONGODB_DATABASE!,
    }),
    options: {
      observationalMemory: {
        model: "openai/gpt-4.1-mini",
      },
    },
  }),
});

We're using gpt-4.1-nano here because it's cost-effective for RAG — the knowledge base provides the factual grounding, and a smaller model handles the rest. Swap to any supported model by changing the string.

The memory configuration points a MongoDBStore at the same cluster and sets observationalMemory with openai/gpt-4.1-mini as the model for the observer and reflector. You can also use token-tiered routing to scale cost with input size.

const agent = mastra.getAgentById("movie-agent");
 
// First session — user shares preferences
const response = await agent.stream(
  "I love psychological thrillers from the 90s — anything dark and cerebral",
  {
    memory: {
      thread: "user-123-movies",
      resource: "user-123",
    },
  }
);
 
// ... later, maybe a different session
 
const followUp = await agent.stream(
  "What else would I like?",
  {
    memory: {
      thread: "user-123-movies",
      resource: "user-123",
    },
  }
);

On the follow-up, the agent's observation log contains a high-priority entry like 🔴 User prefers dark, cerebral psychological thrillers from the 1990s. It searches the movie database with that context and recommends new films — without the user repeating themselves, and without the original messages needing to be in the context window.

As conversations grow longer, the observer and reflector keep the context manageable. Recommendations that were already given get marked as completed. Preferences that change get updated.

Going further

For agents that need to search raw message history in addition to observations, Observational Memory supports an optional retrieval mode that uses MongoDB as both the vector store and storage backend — keeping everything on a single cluster.

Running the Demo

Start Mastra Studio, the local development playground for testing agents:

npm run dev

Open http://localhost:4111 and select the Movie Agent from the sidebar. You'll see the chat interface on the left and agent configuration — instructions, tools, model settings — on the right.

Try a few queries:

"Recommend sci-fi movies from the 1980s with practical effects"
"What are the highest-rated films directed by Denis Villeneuve?"
"I like slow-burn horror — what should I watch?"

Each query triggers the vector search tool. You can inspect the tool call, see what the agent retrieved from MongoDB, and trace how it synthesized the response. The tracing view shows every step: the embedding of the query, the vector search results, and the final generation.

Then test memory. Tell the agent something about your taste, keep chatting, and watch the observations form. In a long conversation, the observer compresses your message history into prioritized entries — visible in the trace. Start a new message in the same thread and ask for more recommendations. The agent recalls your preferences from the observation log, not from re-reading every prior message.

The complete working example is available in the companion repository.

Conclusion

We built an AI agent that searches 20,000 movies by meaning and remembers user preferences using Observational Memory — all backed by a single MongoDB Atlas cluster.

MongoDB Atlas serves two roles here: Atlas Vector Search and the Embedding and Reranking API power the RAG pipeline, and MongoDBStore persists the observation log that gives the agent long-term memory.

From here, you could add metadata filters for genre- or year-scoped queries, enable retrieval mode to search raw message history alongside observations, or use token-tiered routing to scale memory costs with conversation length. The @mastra/mongodb package and Mastra docs cover all three.

Resources: