ContextRelevancyMetric
The ContextRelevancyMetric
class evaluates the quality of your RAG (Retrieval-Augmented Generation) pipeline’s retriever by measuring how relevant the retrieved context is to the input query. It uses an LLM-based evaluation system that first extracts statements from the context and then assesses their relevance to the input.
Basic Usage
import { openai } from "@ai-sdk/openai";
import { ContextRelevancyMetric } from "@mastra/evals/llm";
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
const metric = new ContextRelevancyMetric(model, {
context: [
"All data is encrypted at rest and in transit",
"Two-factor authentication is mandatory",
"The platform supports multiple languages",
"Our offices are located in San Francisco"
]
});
const result = await metric.measure(
"What are our product's security features?",
"Our product uses encryption and requires 2FA.",
);
console.log(result.score); // Score from 0-1
console.log(result.info.reason); // Explanation of the relevancy assessment
Constructor Parameters
model:
LanguageModel
Configuration for the model used to evaluate context relevancy
options:
ContextRelevancyMetricOptions
Configuration options for the metric
ContextRelevancyMetricOptions
scale?:
number
= 1
Maximum score value
context:
string[]
Array of retrieved context documents used to generate the response
measure() Parameters
input:
string
The original query or prompt
output:
string
The LLM's response to evaluate
Returns
score:
number
Context relevancy score (0 to scale, default 0-1)
info:
object
Object containing the reason for the score
string
reason:
string
Detailed explanation of the relevancy assessment
Scoring Details
The metric evaluates how well retrieved context matches the query through binary relevance classification.
Scoring Process
-
Extracts statements from context:
- Breaks down context into meaningful units
- Preserves semantic relationships
-
Evaluates statement relevance:
- Assesses each statement against query
- Counts relevant statements
- Calculates relevance ratio
Final score: (relevant_statements / total_statements) * scale
Score interpretation
(0 to scale, default 0-1)
- 1.0: Perfect relevancy - all retrieved context is relevant
- 0.7-0.9: High relevancy - most context is relevant with few irrelevant pieces
- 0.4-0.6: Moderate relevancy - a mix of relevant and irrelevant context
- 0.1-0.3: Low relevancy - mostly irrelevant context
- 0.0: No relevancy - completely irrelevant context
Example with Custom Configuration
import { openai } from "@ai-sdk/openai";
import { ContextRelevancyMetric } from "@mastra/evals/llm";
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
const metric = new ContextRelevancyMetric(model, {
scale: 100, // Use 0-100 scale instead of 0-1
context: [
"Basic plan costs $10/month",
"Pro plan includes advanced features at $30/month",
"Enterprise plan has custom pricing",
"Our company was founded in 2020",
"We have offices worldwide"
]
});
const result = await metric.measure(
"What are our pricing plans?",
"We offer Basic, Pro, and Enterprise plans.",
);
// Example output:
// {
// score: 60,
// info: {
// reason: "3 out of 5 statements are relevant to pricing plans. The statements about
// company founding and office locations are not relevant to the pricing query."
// }
// }