Contextual Recall
This example demonstrates how to use Mastra’s Contextual Recall metric to evaluate how effectively responses incorporate information from provided context.
Overview
The example shows how to:
- Configure the Contextual Recall metric
- Evaluate context incorporation
- Analyze recall scores
- Handle different recall levels
Setup
Environment Setup
Make sure to set up your environment variables:
.env
OPENAI_API_KEY=your_api_key_here
Dependencies
Import the necessary dependencies:
src/index.ts
import { openai } from "@ai-sdk/openai";
import { ContextualRecallMetric } from "@mastra/evals/llm";
Example Usage
High Recall Example
Evaluate a response that includes all context information:
src/index.ts
const context1 = [
"Product features include cloud sync.",
"Offline mode is available.",
"Supports multiple devices.",
];
const metric1 = new ContextualRecallMetric(openai("gpt-4o-mini"), {
context: context1,
});
const query1 = "What are the key features of the product?";
const response1 =
"The product features cloud synchronization, offline mode support, and the ability to work across multiple devices.";
console.log("Example 1 - High Recall:");
console.log("Context:", context1);
console.log("Query:", query1);
console.log("Response:", response1);
const result1 = await metric1.measure(query1, response1);
console.log("Metric Result:", {
score: result1.score,
reason: result1.info.reason,
});
// Example Output:
// Metric Result: { score: 1, reason: 'All elements of the output are supported by the context.' }
Mixed Recall Example
Evaluate a response that includes some context information:
src/index.ts
const context2 = [
"Python is a high-level programming language.",
"Python emphasizes code readability.",
"Python supports multiple programming paradigms.",
"Python is widely used in data science.",
];
const metric2 = new ContextualRecallMetric(openai("gpt-4o-mini"), {
context: context2,
});
const query2 = "What are Python's key characteristics?";
const response2 =
"Python is a high-level programming language. It is also a type of snake.";
console.log("Example 2 - Mixed Recall:");
console.log("Context:", context2);
console.log("Query:", query2);
console.log("Response:", response2);
const result2 = await metric2.measure(query2, response2);
console.log("Metric Result:", {
score: result2.score,
reason: result2.info.reason,
});
// Example Output:
// Metric Result: { score: 0.5, reason: 'Only half of the output is supported by the context.' }
Low Recall Example
Evaluate a response that misses most context information:
src/index.ts
const context3 = [
"The solar system has eight planets.",
"Mercury is closest to the Sun.",
"Venus is the hottest planet.",
"Mars is called the Red Planet.",
];
const metric3 = new ContextualRecallMetric(openai("gpt-4o-mini"), {
context: context3,
});
const query3 = "Tell me about the solar system.";
const response3 = "Jupiter is the largest planet in the solar system.";
console.log("Example 3 - Low Recall:");
console.log("Context:", context3);
console.log("Query:", query3);
console.log("Response:", response3);
const result3 = await metric3.measure(query3, response3);
console.log("Metric Result:", {
score: result3.score,
reason: result3.info.reason,
});
// Example Output:
// Metric Result: { score: 0, reason: 'None of the output is supported by the context.' }
Understanding the Results
The metric provides:
-
A recall score between 0 and 1:
- 1.0: Perfect recall - all context information used
- 0.7-0.9: High recall - most context information used
- 0.4-0.6: Mixed recall - some context information used
- 0.1-0.3: Low recall - little context information used
- 0.0: No recall - no context information used
-
Detailed reason for the score, including analysis of:
- Information incorporation
- Missing context
- Response completeness
- Overall recall quality
View Example on GitHub