Skip to Content
ExamplesEvalsContextual Recall

Contextual Recall

This example demonstrates how to use Mastra’s Contextual Recall metric to evaluate how effectively responses incorporate information from provided context.

Overview

The example shows how to:

  1. Configure the Contextual Recall metric
  2. Evaluate context incorporation
  3. Analyze recall scores
  4. Handle different recall levels

Setup

Environment Setup

Make sure to set up your environment variables:

.env
OPENAI_API_KEY=your_api_key_here

Dependencies

Import the necessary dependencies:

src/index.ts
import { openai } from '@ai-sdk/openai'; import { ContextualRecallMetric } from '@mastra/evals/llm';

Example Usage

High Recall Example

Evaluate a response that includes all context information:

src/index.ts
const context1 = [ 'Product features include cloud sync.', 'Offline mode is available.', 'Supports multiple devices.', ]; const metric1 = new ContextualRecallMetric(openai('gpt-4o-mini'), { context: context1, }); const query1 = 'What are the key features of the product?'; const response1 = 'The product features cloud synchronization, offline mode support, and the ability to work across multiple devices.'; console.log('Example 1 - High Recall:'); console.log('Context:', context1); console.log('Query:', query1); console.log('Response:', response1); const result1 = await metric1.measure(query1, response1); console.log('Metric Result:', { score: result1.score, reason: result1.info.reason, }); // Example Output: // Metric Result: { score: 1, reason: 'All elements of the output are supported by the context.' }

Mixed Recall Example

Evaluate a response that includes some context information:

src/index.ts
const context2 = [ 'Python is a high-level programming language.', 'Python emphasizes code readability.', 'Python supports multiple programming paradigms.', 'Python is widely used in data science.', ]; const metric2 = new ContextualRecallMetric(openai('gpt-4o-mini'), { context: context2, }); const query2 = 'What are Python\'s key characteristics?'; const response2 = 'Python is a high-level programming language. It is also a type of snake.'; console.log('Example 2 - Mixed Recall:'); console.log('Context:', context2); console.log('Query:', query2); console.log('Response:', response2); const result2 = await metric2.measure(query2, response2); console.log('Metric Result:', { score: result2.score, reason: result2.info.reason, }); // Example Output: // Metric Result: { score: 0.5, reason: 'Only half of the output is supported by the context.' }

Low Recall Example

Evaluate a response that misses most context information:

src/index.ts
const context3 = [ 'The solar system has eight planets.', 'Mercury is closest to the Sun.', 'Venus is the hottest planet.', 'Mars is called the Red Planet.', ]; const metric3 = new ContextualRecallMetric(openai('gpt-4o-mini'), { context: context3, }); const query3 = 'Tell me about the solar system.'; const response3 = 'Jupiter is the largest planet in the solar system.'; console.log('Example 3 - Low Recall:'); console.log('Context:', context3); console.log('Query:', query3); console.log('Response:', response3); const result3 = await metric3.measure(query3, response3); console.log('Metric Result:', { score: result3.score, reason: result3.info.reason, }); // Example Output: // Metric Result: { score: 0, reason: 'None of the output is supported by the context.' }

Understanding the Results

The metric provides:

  1. A recall score between 0 and 1:

    • 1.0: Perfect recall - all context information used
    • 0.7-0.9: High recall - most context information used
    • 0.4-0.6: Mixed recall - some context information used
    • 0.1-0.3: Low recall - little context information used
    • 0.0: No recall - no context information used
  2. Detailed reason for the score, including analysis of:

    • Information incorporation
    • Missing context
    • Response completeness
    • Overall recall quality





View Example on GitHub