DocsReferenceEvalsContextualRecall

ContextualRecallMetric

The ContextualRecallMetric class evaluates how effectively an LLM’s response incorporates all relevant information from the provided context. It measures whether important information from the reference documents was successfully included in the response, focusing on completeness rather than precision.

Basic Usage

import { ContextualRecallMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = {
  provider: "OPEN_AI",
  name: "gpt-4",
  apiKey: process.env.OPENAI_API_KEY
};
 
const metric = new ContextualRecallMetric(model, {
  context: [
    "Product features: cloud synchronization capability",
    "Offline mode available for all users",
    "Supports multiple devices simultaneously",
    "End-to-end encryption for all data"
  ]
});
 
const result = await metric.measure(
  "What are the key features of the product?",
  "The product includes cloud sync, offline mode, and multi-device support.",
);
 
console.log(result.score); // Score from 0-1

Constructor Parameters

model:

ModelConfig
Configuration for the model used to evaluate contextual recall

options:

ContextualRecallMetricOptions
Configuration options for the metric

ContextualRecallMetricOptions

scale?:

number
= 1
Maximum score value

context:

string[]
Array of reference documents or pieces of information to check against

measure() Parameters

input:

string
The original query or prompt

output:

string
The LLM's response to evaluate

Returns

score:

number
Recall score (0 to scale, default 0-1)

info:

object
Object containing the reason for the score
string

reason:

string
Detailed explanation of the score

Scoring Details

The metric calculates recall by comparing the LLM’s response against the provided context documents:

Score interpretation:

  • 1.0: Perfect recall - all relevant information was included
  • 0.7-0.9: High recall - most relevant information was included
  • 0.4-0.6: Moderate recall - some relevant information was missed
  • 0.1-0.3: Low recall - significant information was missed
  • 0: No recall - failed to include any relevant information

The score is calculated as:

score = (number of correctly recalled items) / (total number of relevant items in context)

Example with Custom Configuration

const metric = new ContextualRecallMetric(
  {
    provider: "OPEN_AI",
    name: "gpt-4",
    apiKey: process.env.OPENAI_API_KEY
  },
  {
    scale: 100, // Use 0-100 scale instead of 0-1
    context: [
      "All data is encrypted at rest and in transit",
      "Two-factor authentication (2FA) is mandatory",
      "Regular security audits are performed",
      "Incident response team available 24/7"
    ]
  }
);
 
const result = await metric.measure(
  "Summarize the company's security measures",
  "The company implements encryption for data protection and requires 2FA for all users.",
);
 
// Example output:
// {
//   score: 50, // Only half of the security measures were mentioned
//   info: {
//     reason: "The score is 50 because only half of the security measures were mentioned 
//           in the response. The response missed the regular security audits and incident 
//           response team information."
//   }
// }