ResponseCache
ResponseCache is an input processor that caches LLM responses on the request/response boundary inside the agentic loop. It hooks into processLLMRequest (cache lookup; short-circuits on hit) and processLLMResponse (cache write on completion).
The cache key is derived from the resolved LanguageModelV2Prompt Mastra is about to send to the model — i.e. after memory has loaded and earlier input processors have transformed the prompt — so two users with different memory contexts produce different cache keys. Each step in an agentic tool loop is independently cached.
There is no agent-level option for response caching; register ResponseCache explicitly on inputProcessors. Per-call overrides flow through RequestContext via ResponseCache.context() and ResponseCache.applyContext().
Usage exampleDirect link to Usage example
import { Agent } from '@mastra/core/agent'
import { InMemoryServerCache } from '@mastra/core/cache'
import { ResponseCache } from '@mastra/core/processors'
const cache = new InMemoryServerCache()
const agent = new Agent({
name: 'Search Agent',
instructions: 'You answer questions concisely.',
model: 'openai/gpt-5',
inputProcessors: [new ResponseCache({ cache, ttl: 600 })],
})
// First call hits the LLM and writes to the cache.
await agent.generate('What is the capital of France?')
// Second identical call replays the cached response.
await agent.generate('What is the capital of France?')
// Force a fresh call but still update the cache.
await agent.generate('What is the capital of France?', {
requestContext: ResponseCache.context({ bust: true }),
})
See Response caching for the conceptual overview, scoping rules, and recommended deployment patterns.
Constructor parametersDirect link to Constructor parameters
cache:
ttl?:
scope?:
key?:
bust?:
agentId?:
Static helpersDirect link to Static helpers
ResponseCache exposes two static helpers for setting per-call overrides on a RequestContext. The helpers keep the underlying context key a private implementation detail — prefer them over reading/writing the raw key.
ResponseCache.context(options)Direct link to responsecachecontextoptions
Build a fresh RequestContext preloaded with per-call response cache overrides.
await agent.stream('hello', {
requestContext: ResponseCache.context({ key: 'custom', bust: true }),
})
ResponseCache.applyContext(requestContext, options)Direct link to responsecacheapplycontextrequestcontext-options
Merge per-call response cache overrides into an existing RequestContext. Returns the same context for chaining.
const ctx = new RequestContext()
ctx.set('caller-meta', { userId: 'u-123' })
ResponseCache.applyContext(ctx, { bust: true })
await agent.stream('hello', { requestContext: ctx })
ResponseCacheContextOptionsDirect link to ResponseCacheContextOptions
The shape passed to ResponseCache.context() / ResponseCache.applyContext().
key?:
scope?:
bust?:
cache, ttl, and agentId are intentionally not overridable per call — they are instance-level concerns that should not vary per request.
ResponseCacheKeyInputsDirect link to ResponseCacheKeyInputs
The argument passed to a key function (constructor or per-call). All fields contribute to the deterministic hash by default.
agentId:
scope?:
model:
prompt:
stepNumber:
Helper exportsDirect link to Helper exports
buildResponseCacheKey(inputs)— the deterministic hash used by default. Re-export it to override individual fields while preserving the rest of the standard key shape.DEFAULT_RESPONSE_CACHE_TTL_SECONDS— the defaultttl(300).RESPONSE_CACHE_CONTEXT_KEY— theRequestContextkey the static helpers write to. Exposed for advanced cases (e.g. clearing the override mid-pipeline); prefer the helpers.