Introducing Response Caching for Mastra Agents

You can now cut costs and reduce latency with cached responses.

Paul ScanlonPaul Scanlon·

May 15, 2026

·

2 min read

When an agent receives a request, its default behavior is to pass it to the LLM. Each round trip incurs a cost and takes time to resolve. With response caching, agents can handle identical requests and pull responses from cache, skipping the LLM call entirely.

When ResponseCache is configured on your agent, the first response is cached for a ttl (time-to-live) you set in seconds. Any identical requests within that window are pulled directly from cache.

Response caching works best when the same request repeats across users or sessions: prompt templates, suggested-prompt buttons, repeated agentic searches, or guardrail agents classifying the same input.

Get started

Add a ResponseCache to the agent's inputProcessors and pass it a cache storage backend:

note

Requires @mastra/core@1.33.0 or later, added in PR #16283.

 1import { Agent } from "@mastra/core/agent";
 2import { InMemoryServerCache } from "@mastra/core/cache";
 3import { ResponseCache } from "@mastra/core/processors";
 4
 5const cache = new InMemoryServerCache();
 6
 7export const searchAgent = new Agent({
 8  id: "search-agent",
 9  name: "Search Agent",
10  model: "anthropic/claude-sonnet-4-6",
11  instructions: "You answer questions concisely.",
12  inputProcessors: [
13    new ResponseCache({
14      cache,
15      // ttl = time to live
16      // 60 seconds = 1 minute
17      // 600 seconds / 60 = 10 minutes
18      ttl: 600
19    })
20  ]
21});

Use InMemoryServerCache for development and a custom cache backend for production.

The first request is passed to the LLM and the response is cached. Any subsequent identical requests are pulled directly from cache, skipping the LLM call.

 1// First request: Passed to the LLM, response written to cache.
 2await searchAgent.generate("What is the capital of France?");
 3
 4// Subsequent identical requests: Pulled from cache, no LLM call.
 5await searchAgent.generate("What is the capital of France?");

For more information and full configuration options, see:

ResponseCache is experimental. The API may change between releases.

Share:
Paul Scanlon
Paul ScanlonTechnical Product Marketing Manager

Paul Scanlon sits between Developer Education and Product Marketing at Mastra. Previously, he was a Technical Product Marketing Manager at Neon and worked in Developer Relations at Gatsby, where he created educational content and developer experiences.

All articles by Paul Scanlon