Skip to Content
ExamplesEvalsContext Precision

Context Precision

This example demonstrates how to use Mastra’s Context Precision metric to evaluate how precisely responses use provided context information.

Overview

The example shows how to:

  1. Configure the Context Precision metric
  2. Evaluate context precision
  3. Analyze precision scores
  4. Handle different precision levels

Setup

Environment Setup

Make sure to set up your environment variables:

.env
OPENAI_API_KEY=your_api_key_here

Dependencies

Import the necessary dependencies:

src/index.ts
import { openai } from '@ai-sdk/openai'; import { ContextPrecisionMetric } from '@mastra/evals/llm';

Example Usage

High Precision Example

Evaluate a response where all context is relevant:

src/index.ts
const context1 = [ 'Photosynthesis converts sunlight into energy.', 'Plants use chlorophyll for photosynthesis.', 'Photosynthesis produces oxygen as a byproduct.', 'The process requires sunlight and chlorophyll.', ]; const metric1 = new ContextPrecisionMetric(openai('gpt-4o-mini'), { context: context1, }); const query1 = 'What is photosynthesis and how does it work?'; const response1 = 'Photosynthesis is a process where plants convert sunlight into energy using chlorophyll, producing oxygen as a byproduct.'; console.log('Example 1 - High Precision:'); console.log('Context:', context1); console.log('Query:', query1); console.log('Response:', response1); const result1 = await metric1.measure(query1, response1); console.log('Metric Result:', { score: result1.score, reason: result1.info.reason, }); // Example Output: // Metric Result: { score: 1, reason: 'The context uses all relevant information and does not include any irrelevant information.' }

Mixed Precision Example

Evaluate a response where some context is irrelevant:

src/index.ts
const context2 = [ 'Volcanoes are openings in the Earth\'s crust.', 'Volcanoes can be active, dormant, or extinct.', 'Hawaii has many active volcanoes.', 'The Pacific Ring of Fire has many volcanoes.', ]; const metric2 = new ContextPrecisionMetric(openai('gpt-4o-mini'), { context: context2, }); const query2 = 'What are the different types of volcanoes?'; const response2 = 'Volcanoes can be classified as active, dormant, or extinct based on their activity status.'; console.log('Example 2 - Mixed Precision:'); console.log('Context:', context2); console.log('Query:', query2); console.log('Response:', response2); const result2 = await metric2.measure(query2, response2); console.log('Metric Result:', { score: result2.score, reason: result2.info.reason, }); // Example Output: // Metric Result: { score: 0.5, reason: 'The context uses some relevant information and includes some irrelevant information.' }

Low Precision Example

Evaluate a response where most context is irrelevant:

src/index.ts
const context3 = [ 'The Nile River is in Africa.', 'The Nile is the longest river.', 'Ancient Egyptians used the Nile.', 'The Nile flows north.', ]; const metric3 = new ContextPrecisionMetric(openai('gpt-4o-mini'), { context: context3, }); const query3 = 'Which direction does the Nile River flow?'; const response3 = 'The Nile River flows northward.'; console.log('Example 3 - Low Precision:'); console.log('Context:', context3); console.log('Query:', query3); console.log('Response:', response3); const result3 = await metric3.measure(query3, response3); console.log('Metric Result:', { score: result3.score, reason: result3.info.reason, }); // Example Output: // Metric Result: { score: 0.2, reason: 'The context only has one relevant piece, which is at the end.' }

Understanding the Results

The metric provides:

  1. A precision score between 0 and 1:

    • 1.0: Perfect precision - all context pieces are relevant and used
    • 0.7-0.9: High precision - most context pieces are relevant
    • 0.4-0.6: Mixed precision - some context pieces are relevant
    • 0.1-0.3: Low precision - few context pieces are relevant
    • 0.0: No precision - no context pieces are relevant
  2. Detailed reason for the score, including analysis of:

    • Relevance of each context piece
    • Usage in the response
    • Contribution to answering the query
    • Overall context usefulness





View Example on GitHub