Skip to main content

ResponseCache

ResponseCache is an input processor that caches LLM responses on the request/response boundary inside the agentic loop. It hooks into processLLMRequest (cache lookup; short-circuits on hit) and processLLMResponse (cache write on completion).

The cache key is derived from the resolved LanguageModelV2Prompt Mastra is about to send to the model — i.e. after memory has loaded and earlier input processors have transformed the prompt — so two users with different memory contexts produce different cache keys. Each step in an agentic tool loop is independently cached.

There is no agent-level option for response caching; register ResponseCache explicitly on inputProcessors. Per-call overrides flow through RequestContext via ResponseCache.context() and ResponseCache.applyContext().

Usage example
Direct link to Usage example

import { Agent } from '@mastra/core/agent'
import { InMemoryServerCache } from '@mastra/core/cache'
import { ResponseCache } from '@mastra/core/processors'

const cache = new InMemoryServerCache()

const agent = new Agent({
name: 'Search Agent',
instructions: 'You answer questions concisely.',
model: 'openai/gpt-5',
inputProcessors: [new ResponseCache({ cache, ttl: 600 })],
})

// First call hits the LLM and writes to the cache.
await agent.generate('What is the capital of France?')

// Second identical call replays the cached response.
await agent.generate('What is the capital of France?')

// Force a fresh call but still update the cache.
await agent.generate('What is the capital of France?', {
requestContext: ResponseCache.context({ bust: true }),
})

See Response caching for the conceptual overview, scoping rules, and recommended deployment patterns.

Constructor parameters
Direct link to Constructor parameters

cache:

MastraServerCache
The cache backend. Required. Pass any `MastraServerCache` implementation — `InMemoryServerCache` for local development, `RedisCache` from `@mastra/redis` for production, or your own subclass for a custom backend.

ttl?:

number
= 300
Time-to-live (seconds) for entries written by this processor. Defaults to 300 seconds (5 minutes), matching OpenRouter's reference implementation.

scope?:

string | null
Tenant scope appended to the cache key. `null` opts out of scoping. When omitted, the processor falls back to the resource id resolved from the request context (`MASTRA_RESOURCE_ID_KEY`) for automatic per-user isolation.

key?:

string | (inputs: ResponseCacheKeyInputs) => string | Promise<string>
Override the auto-derived cache key. Pass a string to pin a key, or a function that receives `{ agentId, scope, model, prompt, stepNumber }` and returns a key. If the function throws, the processor falls back to the deterministic hash so the call still benefits from caching.

bust?:

boolean
= false
Force a cache miss on every call: skip the read but still write on completion. Useful for explicit refresh paths.

agentId?:

string
= 'mastra-response-cache'
Logical id used in the cache key namespace. Defaults to `'mastra-response-cache'`. Set this to the owning agent's id when you want cache entries scoped per-agent.

Static helpers
Direct link to Static helpers

ResponseCache exposes two static helpers for setting per-call overrides on a RequestContext. The helpers keep the underlying context key a private implementation detail — prefer them over reading/writing the raw key.

ResponseCache.context(options)
Direct link to responsecachecontextoptions

Build a fresh RequestContext preloaded with per-call response cache overrides.

await agent.stream('hello', {
requestContext: ResponseCache.context({ key: 'custom', bust: true }),
})

ResponseCache.applyContext(requestContext, options)
Direct link to responsecacheapplycontextrequestcontext-options

Merge per-call response cache overrides into an existing RequestContext. Returns the same context for chaining.

const ctx = new RequestContext()
ctx.set('caller-meta', { userId: 'u-123' })
ResponseCache.applyContext(ctx, { bust: true })
await agent.stream('hello', { requestContext: ctx })

ResponseCacheContextOptions
Direct link to ResponseCacheContextOptions

The shape passed to ResponseCache.context() / ResponseCache.applyContext().

key?:

string | (inputs: ResponseCacheKeyInputs) => string | Promise<string>
Overrides the auto-derived cache key for this request only.

scope?:

string | null
Overrides the tenant scope for this request only. `null` opts out of scoping.

bust?:

boolean
Skip the cache read but still write on completion.

cache, ttl, and agentId are intentionally not overridable per call — they are instance-level concerns that should not vary per request.

ResponseCacheKeyInputs
Direct link to ResponseCacheKeyInputs

The argument passed to a key function (constructor or per-call). All fields contribute to the deterministic hash by default.

agentId:

string
Logical processor id used to namespace the cache key.

scope?:

string | null | undefined
Resolved scope for this request, or `null` when scoping is disabled.

model:

{ provider?: string; modelId?: string; specVersion?: string }
Provider/model identity. Different models produce different responses.

prompt:

LanguageModelV2Prompt
The exact prompt the provider would receive, post memory load and post any prompt-modifying input processors.

stepNumber:

number
0-indexed step number within the agentic loop. Greater than zero for tool steps.

Helper exports
Direct link to Helper exports

  • buildResponseCacheKey(inputs) — the deterministic hash used by default. Re-export it to override individual fields while preserving the rest of the standard key shape.
  • DEFAULT_RESPONSE_CACHE_TTL_SECONDS — the default ttl (300).
  • RESPONSE_CACHE_CONTEXT_KEY — the RequestContext key the static helpers write to. Exposed for advanced cases (e.g. clearing the override mid-pipeline); prefer the helpers.