ContextPrecisionMetric
We just released a new evals API called Scorers, with a more ergonomic API and more metadata stored for error analysis, and more flexibility to evaluate data structures. It’s fairly simple to migrate, but we will continue to support the existing Evals API.
The ContextPrecisionMetric
class evaluates how relevant and precise the retrieved context nodes are for generating the expected output. It uses a judge-based system to analyze each context piece’s contribution and provides weighted scoring based on position.
Basic Usage
import { openai } from "@ai-sdk/openai";
import { ContextPrecisionMetric } from "@mastra/evals/llm";
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
const metric = new ContextPrecisionMetric(model, {
context: [
"Photosynthesis is a biological process used by plants to create energy from sunlight.",
"Plants need water and nutrients from the soil to grow.",
"The process of photosynthesis produces oxygen as a byproduct.",
],
});
const result = await metric.measure(
"What is photosynthesis?",
"Photosynthesis is the process by which plants convert sunlight into energy.",
);
console.log(result.score); // Precision score from 0-1
console.log(result.info.reason); // Explanation of the score
Constructor Parameters
model:
options:
ContextPrecisionMetricOptions
scale?:
context:
measure() Parameters
input:
output:
Returns
score:
info:
reason:
Scoring Details
The metric evaluates context precision through binary relevance assessment and Mean Average Precision (MAP) scoring.
Scoring Process
-
Assigns binary relevance scores:
- Relevant context: 1
- Irrelevant context: 0
-
Calculates Mean Average Precision:
- Computes precision at each position
- Weights earlier positions more heavily
- Normalizes to configured scale
Final score: Mean Average Precision * scale
Score interpretation
(0 to scale, default 0-1)
- 1.0: All relevant context in optimal order
- 0.7-0.9: Mostly relevant context with good ordering
- 0.4-0.6: Mixed relevance or suboptimal ordering
- 0.1-0.3: Limited relevance or poor ordering
- 0.0: No relevant context
Example with Analysis
import { openai } from "@ai-sdk/openai";
import { ContextPrecisionMetric } from "@mastra/evals/llm";
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
const metric = new ContextPrecisionMetric(model, {
context: [
"Exercise strengthens the heart and improves blood circulation.",
"A balanced diet is important for health.",
"Regular physical activity reduces stress and anxiety.",
"Exercise equipment can be expensive.",
],
});
const result = await metric.measure(
"What are the benefits of exercise?",
"Regular exercise improves cardiovascular health and mental wellbeing.",
);
// Example output:
// {
// score: 0.75,
// info: {
// reason: "The score is 0.75 because the first and third contexts are highly relevant
// to the benefits mentioned in the output, while the second and fourth contexts
// are not directly related to exercise benefits. The relevant contexts are well-positioned
// at the beginning and middle of the sequence."
// }
// }