Skip to Content
ReferenceEvalsContextPrecision

ContextPrecisionMetric

The ContextPrecisionMetric class evaluates how relevant and precise the retrieved context nodes are for generating the expected output. It uses a judge-based system to analyze each context piece’s contribution and provides weighted scoring based on position.

Basic Usage

import { openai } from "@ai-sdk/openai"; import { ContextPrecisionMetric } from "@mastra/evals/llm"; // Configure the model for evaluation const model = openai("gpt-4o-mini"); const metric = new ContextPrecisionMetric(model, { context: [ "Photosynthesis is a biological process used by plants to create energy from sunlight.", "Plants need water and nutrients from the soil to grow.", "The process of photosynthesis produces oxygen as a byproduct.", ], }); const result = await metric.measure( "What is photosynthesis?", "Photosynthesis is the process by which plants convert sunlight into energy.", ); console.log(result.score); // Precision score from 0-1 console.log(result.info.reason); // Explanation of the score

Constructor Parameters

model:

LanguageModel
Configuration for the model used to evaluate context relevance

options:

ContextPrecisionMetricOptions
Configuration options for the metric

ContextPrecisionMetricOptions

scale?:

number
= 1
Maximum score value

context:

string[]
Array of context pieces in their retrieval order

measure() Parameters

input:

string
The original query or prompt

output:

string
The generated response to evaluate

Returns

score:

number
Precision score (0 to scale, default 0-1)

info:

object
Object containing the reason for the score
string

reason:

string
Detailed explanation of the score

Scoring Details

The metric evaluates context precision through binary relevance assessment and Mean Average Precision (MAP) scoring.

Scoring Process

  1. Assigns binary relevance scores:

    • Relevant context: 1
    • Irrelevant context: 0
  2. Calculates Mean Average Precision:

    • Computes precision at each position
    • Weights earlier positions more heavily
    • Normalizes to configured scale

Final score: Mean Average Precision * scale

Score interpretation

(0 to scale, default 0-1)

  • 1.0: All relevant context in optimal order
  • 0.7-0.9: Mostly relevant context with good ordering
  • 0.4-0.6: Mixed relevance or suboptimal ordering
  • 0.1-0.3: Limited relevance or poor ordering
  • 0.0: No relevant context

Example with Analysis

import { openai } from "@ai-sdk/openai"; import { ContextPrecisionMetric } from "@mastra/evals/llm"; // Configure the model for evaluation const model = openai("gpt-4o-mini"); const metric = new ContextPrecisionMetric(model, { context: [ "Exercise strengthens the heart and improves blood circulation.", "A balanced diet is important for health.", "Regular physical activity reduces stress and anxiety.", "Exercise equipment can be expensive.", ], }); const result = await metric.measure( "What are the benefits of exercise?", "Regular exercise improves cardiovascular health and mental wellbeing.", ); // Example output: // { // score: 0.75, // info: { // reason: "The score is 0.75 because the first and third contexts are highly relevant // to the benefits mentioned in the output, while the second and fourth contexts // are not directly related to exercise benefits. The relevant contexts are well-positioned // at the beginning and middle of the sequence." // } // }