ContextRelevancyMetric

The ContextRelevancyMetric class evaluates the quality of your RAG (Retrieval-Augmented Generation) pipeline’s retriever by measuring how relevant the retrieved context is to the input query. It uses an LLM-based evaluation system that first extracts statements from the context and then assesses their relevance to the input.

Basic Usage


import { openai } from "@ai-sdk/openai";
import { ContextRelevancyMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new ContextRelevancyMetric(model, {
  context: [
    "All data is encrypted at rest and in transit",
    "Two-factor authentication is mandatory",
    "The platform supports multiple languages",
    "Our offices are located in San Francisco",
  ],
});
 
const result = await metric.measure(
  "What are our product's security features?",
  "Our product uses encryption and requires 2FA.",
);
 
console.log(result.score); // Score from 0-1
console.log(result.info.reason); // Explanation of the relevancy assessment

Constructor Parameters

model:

LanguageModel

Configuration for the model used to evaluate context relevancy

options:

ContextRelevancyMetricOptions

Configuration options for the metric

ContextRelevancyMetricOptions

scale?:

number

= 1

Maximum score value

context:

string[]

Array of retrieved context documents used to generate the response

measure() Parameters

input:

string

The original query or prompt

output:

string

The LLM's response to evaluate

Returns

score:

number

Context relevancy score (0 to scale, default 0-1)

info:

object

Object containing the reason for the score

string

reason:

string

Detailed explanation of the relevancy assessment

Scoring Details

The metric evaluates how well retrieved context matches the query through binary relevance classification.

Scoring Process

Extracts statements from context:
- Breaks down context into meaningful units
- Preserves semantic relationships
Evaluates statement relevance:
- Assesses each statement against query
- Counts relevant statements
- Calculates relevance ratio

Final score: (relevant_statements / total_statements) * scale

Score interpretation

(0 to scale, default 0-1)

1.0: Perfect relevancy - all retrieved context is relevant
0.7-0.9: High relevancy - most context is relevant with few irrelevant pieces
0.4-0.6: Moderate relevancy - a mix of relevant and irrelevant context
0.1-0.3: Low relevancy - mostly irrelevant context
0.0: No relevancy - completely irrelevant context

Example with Custom Configuration


import { openai } from "@ai-sdk/openai";
import { ContextRelevancyMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new ContextRelevancyMetric(model, {
  scale: 100, // Use 0-100 scale instead of 0-1
  context: [
    "Basic plan costs $10/month",
    "Pro plan includes advanced features at $30/month",
    "Enterprise plan has custom pricing",
    "Our company was founded in 2020",
    "We have offices worldwide",
  ],
});
 
const result = await metric.measure(
  "What are our pricing plans?",
  "We offer Basic, Pro, and Enterprise plans.",
);
 
// Example output:
// {
//   score: 60,
//   info: {
//     reason: "3 out of 5 statements are relevant to pricing plans. The statements about
//           company founding and office locations are not relevant to the pricing query."
//   }
// }

ContextRelevancyMetric

Basic Usage

Constructor Parameters

model:

options:

ContextRelevancyMetricOptions

scale?:

context:

measure() Parameters

input:

output:

Returns

score:

info:

reason:

Scoring Details

Scoring Process

Score interpretation

Example with Custom Configuration

Related