ContextRelevancyMetric
Scorers
This documentation refers to the legacy evals API. For the latest scorer features, see Scorers.
The ContextRelevancyMetric class evaluates the quality of your RAG (Retrieval-Augmented Generation) pipeline's retriever by measuring how relevant the retrieved context is to the input query. It uses an LLM-based evaluation system that first extracts statements from the context and then assesses their relevance to the input.
Basic UsageDirect link to Basic Usage
import { openai } from "@ai-sdk/openai";
import { ContextRelevancyMetric } from "@mastra/evals/llm";
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
const metric = new ContextRelevancyMetric(model, {
context: [
"All data is encrypted at rest and in transit",
"Two-factor authentication is mandatory",
"The platform supports multiple languages",
"Our offices are located in San Francisco",
],
});
const result = await metric.measure(
"What are our product's security features?",
"Our product uses encryption and requires 2FA.",
);
console.log(result.score); // Score from 0-1
console.log(result.info.reason); // Explanation of the relevancy assessment
Constructor ParametersDirect link to Constructor Parameters
model:
LanguageModel
Configuration for the model used to evaluate context relevancy
options:
ContextRelevancyMetricOptions
Configuration options for the metric
ContextRelevancyMetricOptionsDirect link to ContextRelevancyMetricOptions
scale?:
number
= 1
Maximum score value
context:
string[]
Array of retrieved context documents used to generate the response
measure() ParametersDirect link to measure() Parameters
input:
string
The original query or prompt
output:
string
The LLM's response to evaluate
ReturnsDirect link to Returns
score:
number
Context relevancy score (0 to scale, default 0-1)
info:
object
Object containing the reason for the score
string
reason:
string
Detailed explanation of the relevancy assessment
Scoring DetailsDirect link to Scoring Details
The metric evaluates how well retrieved context matches the query through binary relevance classification.
Scoring ProcessDirect link to Scoring Process
-
Extracts statements from context:
- Breaks down context into meaningful units
- Preserves semantic relationships
-
Evaluates statement relevance:
- Assesses each statement against query
- Counts relevant statements
- Calculates relevance ratio
Final score: (relevant_statements / total_statements) * scale
Score interpretationDirect link to Score interpretation
(0 to scale, default 0-1)
- 1.0: Perfect relevancy - all retrieved context is relevant
- 0.7-0.9: High relevancy - most context is relevant with few irrelevant pieces
- 0.4-0.6: Moderate relevancy - a mix of relevant and irrelevant context
- 0.1-0.3: Low relevancy - mostly irrelevant context
- 0.0: No relevancy - completely irrelevant context
Example with Custom ConfigurationDirect link to Example with Custom Configuration
import { openai } from "@ai-sdk/openai";
import { ContextRelevancyMetric } from "@mastra/evals/llm";
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
const metric = new ContextRelevancyMetric(model, {
scale: 100, // Use 0-100 scale instead of 0-1
context: [
"Basic plan costs $10/month",
"Pro plan includes advanced features at $30/month",
"Enterprise plan has custom pricing",
"Our company was founded in 2020",
"We have offices worldwide",
],
});
const result = await metric.measure(
"What are our pricing plans?",
"We offer Basic, Pro, and Enterprise plans.",
);
// Example output:
// {
// score: 60,
// info: {
// reason: "3 out of 5 statements are relevant to pricing plans. The statements about
// company founding and office locations are not relevant to the pricing query."
// }
// }