ContextPrecisionMetric
The ContextPrecisionMetric
class evaluates how relevant and precise the retrieved context nodes are for generating the expected output. It uses a judge-based system to analyze each context piece’s contribution and provides weighted scoring based on position.
Basic Usage
import { ContextPrecisionMetric } from "@mastra/evals/llm";
// Configure the model for evaluation
const model = {
provider: "OPEN_AI",
name: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
};
const metric = new ContextPrecisionMetric(model, {
context: [
"Photosynthesis is a biological process used by plants to create energy from sunlight.",
"Plants need water and nutrients from the soil to grow.",
"The process of photosynthesis produces oxygen as a byproduct.",
],
});
const result = await metric.measure(
"What is photosynthesis?",
"Photosynthesis is the process by which plants convert sunlight into energy.",
);
console.log(result.score); // Precision score from 0-1
console.log(result.info.reason); // Explanation of the score
Constructor Parameters
model:
ModelConfig
Configuration for the model used to evaluate context relevance
options:
ContextPrecisionMetricOptions
Configuration options for the metric
ContextPrecisionMetricOptions
scale?:
number
= 1
Maximum score value
context:
string[]
Array of context pieces in their retrieval order
measure() Parameters
input:
string
The original query or prompt
output:
string
The generated response to evaluate
Returns
score:
number
Precision score (0 to scale, default 0-1)
info:
object
Object containing the reason for the score
string
reason:
string
Detailed explanation of the score
Scoring Details
The metric evaluates context precision through:
- Individual assessment of each context piece’s relevance
- Position-weighted scoring (earlier positions weighted more heavily)
- Binary relevance verdicts (yes/no) with detailed reasoning
- Consideration of context ordering
The final score is calculated using Mean Average Precision (MAP):
- Converts verdicts to binary scores (1 for relevant, 0 for not)
- Calculates precision at each position
- Weights earlier positions more heavily
- Normalizes to the configured scale (default 0-1)
Score interpretation:
- 1.0: All relevant context in optimal order
- 0.7-0.9: Mostly relevant context with good ordering
- 0.4-0.6: Mixed relevance or suboptimal ordering
- 0.1-0.3: Limited relevance or poor ordering
- 0: No relevant context
Example with Analysis
const metric = new ContextPrecisionMetric(model, {
context: [
"Exercise strengthens the heart and improves blood circulation.",
"A balanced diet is important for health.",
"Regular physical activity reduces stress and anxiety.",
"Exercise equipment can be expensive.",
],
});
const result = await metric.measure(
"What are the benefits of exercise?",
"Regular exercise improves cardiovascular health and mental wellbeing.",
);
// Example output:
// {
// score: 0.75,
// info: {
// reason: "The score is 0.75 because the first and third contexts are highly relevant
// to the benefits mentioned in the output, while the second and fourth contexts
// are not directly related to exercise benefits. The relevant contexts are well-positioned
// at the beginning and middle of the sequence."
// }
// }