DocsReferenceEvalsContextPrecision

ContextPrecisionMetric

The ContextPrecisionMetric class evaluates how relevant and precise the retrieved context nodes are for generating the expected output. It uses a judge-based system to analyze each context piece’s contribution and provides weighted scoring based on position.

Basic Usage

import { openai } from "@ai-sdk/openai";
import { ContextPrecisionMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new ContextPrecisionMetric(model, {
  context: [
    "Photosynthesis is a biological process used by plants to create energy from sunlight.",
    "Plants need water and nutrients from the soil to grow.",
    "The process of photosynthesis produces oxygen as a byproduct.",
  ],
});
 
const result = await metric.measure(
  "What is photosynthesis?",
  "Photosynthesis is the process by which plants convert sunlight into energy.",
);
 
console.log(result.score); // Precision score from 0-1
console.log(result.info.reason); // Explanation of the score

Constructor Parameters

model:

LanguageModel
Configuration for the model used to evaluate context relevance

options:

ContextPrecisionMetricOptions
Configuration options for the metric

ContextPrecisionMetricOptions

scale?:

number
= 1
Maximum score value

context:

string[]
Array of context pieces in their retrieval order

measure() Parameters

input:

string
The original query or prompt

output:

string
The generated response to evaluate

Returns

score:

number
Precision score (0 to scale, default 0-1)

info:

object
Object containing the reason for the score
string

reason:

string
Detailed explanation of the score

Scoring Details

The metric evaluates context precision through binary relevance assessment and Mean Average Precision (MAP) scoring.

Scoring Process

  1. Assigns binary relevance scores:

    • Relevant context: 1
    • Irrelevant context: 0
  2. Calculates Mean Average Precision:

    • Computes precision at each position
    • Weights earlier positions more heavily
    • Normalizes to configured scale

Final score: Mean Average Precision * scale

Score interpretation

(0 to scale, default 0-1)

  • 1.0: All relevant context in optimal order
  • 0.7-0.9: Mostly relevant context with good ordering
  • 0.4-0.6: Mixed relevance or suboptimal ordering
  • 0.1-0.3: Limited relevance or poor ordering
  • 0.0: No relevant context

Example with Analysis

import { openai } from "@ai-sdk/openai";
import { ContextPrecisionMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new ContextPrecisionMetric(model, {
  context: [
    "Exercise strengthens the heart and improves blood circulation.",
    "A balanced diet is important for health.",
    "Regular physical activity reduces stress and anxiety.",
    "Exercise equipment can be expensive.",
  ],
});
 
const result = await metric.measure(
  "What are the benefits of exercise?",
  "Regular exercise improves cardiovascular health and mental wellbeing.",
);
 
// Example output:
// {
//   score: 0.75,
//   info: {
//     reason: "The score is 0.75 because the first and third contexts are highly relevant
//           to the benefits mentioned in the output, while the second and fourth contexts
//           are not directly related to exercise benefits. The relevant contexts are well-positioned
//           at the beginning and middle of the sequence."
//   }
// }