ContextPrecisionMetric

The ContextPrecisionMetric class evaluates how relevant and precise the retrieved context nodes are for generating the expected output. It uses a judge-based system to analyze each context piece’s contribution and provides weighted scoring based on position.

Basic Usage


import { openai } from "@ai-sdk/openai";
import { ContextPrecisionMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new ContextPrecisionMetric(model, {
  context: [
    "Photosynthesis is a biological process used by plants to create energy from sunlight.",
    "Plants need water and nutrients from the soil to grow.",
    "The process of photosynthesis produces oxygen as a byproduct.",
  ],
});
 
const result = await metric.measure(
  "What is photosynthesis?",
  "Photosynthesis is the process by which plants convert sunlight into energy.",
);
 
console.log(result.score); // Precision score from 0-1
console.log(result.info.reason); // Explanation of the score

Constructor Parameters

model:

LanguageModel

Configuration for the model used to evaluate context relevance

options:

ContextPrecisionMetricOptions

Configuration options for the metric

ContextPrecisionMetricOptions

scale?:

number

= 1

Maximum score value

context:

string[]

Array of context pieces in their retrieval order

measure() Parameters

input:

string

The original query or prompt

output:

string

The generated response to evaluate

Returns

score:

number

Precision score (0 to scale, default 0-1)

info:

object

Object containing the reason for the score

string

reason:

string

Detailed explanation of the score

Scoring Details

The metric evaluates context precision through binary relevance assessment and Mean Average Precision (MAP) scoring.

Scoring Process

Assigns binary relevance scores:
- Relevant context: 1
- Irrelevant context: 0
Calculates Mean Average Precision:
- Computes precision at each position
- Weights earlier positions more heavily
- Normalizes to configured scale

Final score: Mean Average Precision * scale

Score interpretation

(0 to scale, default 0-1)

1.0: All relevant context in optimal order
0.7-0.9: Mostly relevant context with good ordering
0.4-0.6: Mixed relevance or suboptimal ordering
0.1-0.3: Limited relevance or poor ordering
0.0: No relevant context

Example with Analysis


import { openai } from "@ai-sdk/openai";
import { ContextPrecisionMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new ContextPrecisionMetric(model, {
  context: [
    "Exercise strengthens the heart and improves blood circulation.",
    "A balanced diet is important for health.",
    "Regular physical activity reduces stress and anxiety.",
    "Exercise equipment can be expensive.",
  ],
});
 
const result = await metric.measure(
  "What are the benefits of exercise?",
  "Regular exercise improves cardiovascular health and mental wellbeing.",
);
 
// Example output:
// {
//   score: 0.75,
//   info: {
//     reason: "The score is 0.75 because the first and third contexts are highly relevant
//           to the benefits mentioned in the output, while the second and fourth contexts
//           are not directly related to exercise benefits. The relevant contexts are well-positioned
//           at the beginning and middle of the sequence."
//   }
// }

ContextPrecisionMetric

Basic Usage

Constructor Parameters

model:

options:

ContextPrecisionMetricOptions

scale?:

context:

measure() Parameters

input:

output:

Returns

score:

info:

reason:

Scoring Details

Scoring Process

Score interpretation

Example with Analysis

Related