Streaming Working Memory (advanced)

This example demonstrates how to create an agent that maintains a todo list using working memory, even with minimal context. For a simpler introduction to working memory, see the basic working memory example.

Setup

Let’s break down how to create an agent with working memory capabilities. We’ll build a todo list manager that remembers tasks even with minimal context.

1. Setting up Memory

First, we’ll configure the memory system with a short context window since we’ll be using working memory to maintain state. Note that new Memory() without a configured storage provider will not persist data across application restarts. For persistence, you can configure a storage provider like @mastra/libsql:


import { Memory } from "@mastra/memory";
import { LibSQLStore } from "@mastra/libsql";
 
const memory = new Memory({
  options: {
    lastMessages: 1, // working memory means we can have a shorter context window and still maintain conversational coherence
    workingMemory: {
      enabled: true,
    },
  },
  storage: new LibSQLStore({
    url: "file:../mastra.db",
  }),
});

2. Defining the Working Memory Template

Next, we’ll define a template that shows the agent how to structure the todo list data. The template uses Markdown to represent the data structure. This helps the agent understand what information to track for each todo item.


const memory = new Memory({
  options: {
    lastMessages: 1,
    workingMemory: {
      enabled: true,
      template: `
# Todo List
## Item Status
- Active items:
  - Example (Due: Feb 7 3028, Started: Feb 7 2025)
    - Description: This is an example task
## Completed
- None yet
`,
    },
  },
  storage: new LibSQLStore({
    url: "file:../mastra.db",
  }),
});

3. Creating the Todo List Agent

Finally, we’ll create an agent that uses this memory system. The agent’s instructions define how it should interact with users and manage the todo list.


import { openai } from "@ai-sdk/openai";
 
const todoAgent = new Agent({
  name: "TODO Agent",
  instructions:
    "You are a helpful todolist AI agent. Help the user manage their todolist. If there is no list yet ask them what to add! If there is a list always print it out when the chat starts. For each item add emojis, dates, titles (with an index number starting at 1), descriptions, and statuses. For each piece of info add an emoji to the left of it. Also support subtask lists with bullet points inside a box. Help the user timebox each task by asking them how long it will take.",
  model: openai("gpt-4o-mini"),
  memory,
});

Note: The template and instructions are optional - when workingMemory.enabled is set to true, a default system message is automatically injected to help the agent understand how to use working memory.

Usage Example

The agent’s responses will contain XML-like <working_memory>$data</working_memory> tags that Mastra uses to automatically update the working memory. We’ll look at two ways to handle this:

Basic Usage

For simple cases, you can use maskStreamTags to hide the working memory updates from users:


import { randomUUID } from "crypto";
import { maskStreamTags } from "@mastra/core/utils";
 
// Start a conversation
const threadId = randomUUID();
const resourceId = "SOME_USER_ID";
 
// Add a new todo item
const response = await todoAgent.stream(
  "Add a task: Build a new feature for our app. It should take about 2 hours and needs to be done by next Friday.",
  {
    threadId,
    resourceId,
  },
);
 
// Process the stream, hiding working memory updates
for await (const chunk of maskStreamTags(
  response.textStream,
  "working_memory",
)) {
  process.stdout.write(chunk);
}

Advanced Usage with UI Feedback

For a better user experience, you can show loading states while working memory is being updated:


// Same imports and setup as above...
 
// Add lifecycle hooks to provide UI feedback
const maskedStream = maskStreamTags(response.textStream, "working_memory", {
  // Called when a working_memory tag starts
  onStart: () => showLoadingSpinner("Updating todo list..."),
  // Called when a working_memory tag ends
  onEnd: () => hideLoadingSpinner(),
  // Called with the content that was masked
  onMask: (chunk) => console.debug("Updated todo list:", chunk),
});
 
// Process the masked stream
for await (const chunk of maskedStream) {
  process.stdout.write(chunk);
}

The example demonstrates:

Setting up a memory system with working memory enabled
Creating a todo list template with structured XML
Using maskStreamTags to hide memory updates from users
Providing UI loading states during memory updates with lifecycle hooks

Even with only one message in context (lastMessages: 1), the agent maintains the complete todo list in working memory. Each time the agent responds, it updates the working memory with the current state of the todo list, ensuring persistence across interactions.

To learn more about agent memory, including other memory types and storage options, check out the Memory documentation page.