DocsReferenceEvalsContextPrecision

ContextPrecisionMetric

The ContextPrecisionMetric class evaluates how relevant and precise the retrieved context nodes are for generating the expected output. It uses a judge-based system to analyze each context piece’s contribution and provides weighted scoring based on position.

Basic Usage

import { ContextPrecisionMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = {
  provider: "OPEN_AI",
  name: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
};
 
const metric = new ContextPrecisionMetric(model, {
  context: [
    "Photosynthesis is a biological process used by plants to create energy from sunlight.",
    "Plants need water and nutrients from the soil to grow.",
    "The process of photosynthesis produces oxygen as a byproduct.",
  ],
});
 
const result = await metric.measure(
  "What is photosynthesis?",
  "Photosynthesis is the process by which plants convert sunlight into energy.",
);
 
console.log(result.score); // Precision score from 0-1
console.log(result.info.reason); // Explanation of the score

Constructor Parameters

model:

ModelConfig
Configuration for the model used to evaluate context relevance

options:

ContextPrecisionMetricOptions
Configuration options for the metric

ContextPrecisionMetricOptions

scale?:

number
= 1
Maximum score value

context:

string[]
Array of context pieces in their retrieval order

measure() Parameters

input:

string
The original query or prompt

output:

string
The generated response to evaluate

Returns

score:

number
Precision score (0 to scale, default 0-1)

info:

object
Object containing the reason for the score
string

reason:

string
Detailed explanation of the score

Scoring Details

The metric evaluates context precision through:

  • Individual assessment of each context piece’s relevance
  • Position-weighted scoring (earlier positions weighted more heavily)
  • Binary relevance verdicts (yes/no) with detailed reasoning
  • Consideration of context ordering

The final score is calculated using Mean Average Precision (MAP):

  1. Converts verdicts to binary scores (1 for relevant, 0 for not)
  2. Calculates precision at each position
  3. Weights earlier positions more heavily
  4. Normalizes to the configured scale (default 0-1)

Score interpretation:

  • 1.0: All relevant context in optimal order
  • 0.7-0.9: Mostly relevant context with good ordering
  • 0.4-0.6: Mixed relevance or suboptimal ordering
  • 0.1-0.3: Limited relevance or poor ordering
  • 0: No relevant context

Example with Analysis

const metric = new ContextPrecisionMetric(model, {
  context: [
    "Exercise strengthens the heart and improves blood circulation.",
    "A balanced diet is important for health.",
    "Regular physical activity reduces stress and anxiety.",
    "Exercise equipment can be expensive.",
  ],
});
 
const result = await metric.measure(
  "What are the benefits of exercise?",
  "Regular exercise improves cardiovascular health and mental wellbeing.",
);
 
// Example output:
// {
//   score: 0.75,
//   info: {
//     reason: "The score is 0.75 because the first and third contexts are highly relevant
//           to the benefits mentioned in the output, while the second and fourth contexts
//           are not directly related to exercise benefits. The relevant contexts are well-positioned
//           at the beginning and middle of the sequence."
//   }
// }