ContextPrecisionMetric

New Scorer API

We just released a new evals API called Scorers, with a more ergonomic API and more metadata stored for error analysis, and more flexibility to evaluate data structures. It’s fairly simple to migrate, but we will continue to support the existing Evals API.

The ContextPrecisionMetric class evaluates how relevant and precise the retrieved context nodes are for generating the expected output. It uses a judge-based system to analyze each context piece’s contribution and provides weighted scoring based on position.

Basic Usage


import { openai } from "@ai-sdk/openai";
import { ContextPrecisionMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new ContextPrecisionMetric(model, {
  context: [
    "Photosynthesis is a biological process used by plants to create energy from sunlight.",
    "Plants need water and nutrients from the soil to grow.",
    "The process of photosynthesis produces oxygen as a byproduct.",
  ],
});
 
const result = await metric.measure(
  "What is photosynthesis?",
  "Photosynthesis is the process by which plants convert sunlight into energy.",
);
 
console.log(result.score); // Precision score from 0-1
console.log(result.info.reason); // Explanation of the score

Constructor Parameters

model:

LanguageModel

Configuration for the model used to evaluate context relevance

options:

ContextPrecisionMetricOptions

Configuration options for the metric

ContextPrecisionMetricOptions

scale?:

number

= 1

Maximum score value

context:

string[]

Array of context pieces in their retrieval order

measure() Parameters

input:

string

The original query or prompt

output:

string

The generated response to evaluate

Returns

score:

number

Precision score (0 to scale, default 0-1)

info:

object

Object containing the reason for the score

string

reason:

string

Detailed explanation of the score

Scoring Details

The metric evaluates context precision through binary relevance assessment and Mean Average Precision (MAP) scoring.

Scoring Process

Assigns binary relevance scores:
- Relevant context: 1
- Irrelevant context: 0
Calculates Mean Average Precision:
- Computes precision at each position
- Weights earlier positions more heavily
- Normalizes to configured scale

Final score: Mean Average Precision * scale

Score interpretation

(0 to scale, default 0-1)

1.0: All relevant context in optimal order
0.7-0.9: Mostly relevant context with good ordering
0.4-0.6: Mixed relevance or suboptimal ordering
0.1-0.3: Limited relevance or poor ordering
0.0: No relevant context

Example with Analysis


import { openai } from "@ai-sdk/openai";
import { ContextPrecisionMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new ContextPrecisionMetric(model, {
  context: [
    "Exercise strengthens the heart and improves blood circulation.",
    "A balanced diet is important for health.",
    "Regular physical activity reduces stress and anxiety.",
    "Exercise equipment can be expensive.",
  ],
});
 
const result = await metric.measure(
  "What are the benefits of exercise?",
  "Regular exercise improves cardiovascular health and mental wellbeing.",
);
 
// Example output:
// {
//   score: 0.75,
//   info: {
//     reason: "The score is 0.75 because the first and third contexts are highly relevant
//           to the benefits mentioned in the output, while the second and fourth contexts
//           are not directly related to exercise benefits. The relevant contexts are well-positioned
//           at the beginning and middle of the sequence."
//   }
// }

ContextPosition ContextRelevancy