Skip to Content
ReferenceEvalsContextRelevancy

ContextRelevancyMetric

The ContextRelevancyMetric class evaluates the quality of your RAG (Retrieval-Augmented Generation) pipeline’s retriever by measuring how relevant the retrieved context is to the input query. It uses an LLM-based evaluation system that first extracts statements from the context and then assesses their relevance to the input.

Basic Usage

import { openai } from "@ai-sdk/openai"; import { ContextRelevancyMetric } from "@mastra/evals/llm"; // Configure the model for evaluation const model = openai("gpt-4o-mini"); const metric = new ContextRelevancyMetric(model, { context: [ "All data is encrypted at rest and in transit", "Two-factor authentication is mandatory", "The platform supports multiple languages", "Our offices are located in San Francisco" ] }); const result = await metric.measure( "What are our product's security features?", "Our product uses encryption and requires 2FA.", ); console.log(result.score); // Score from 0-1 console.log(result.info.reason); // Explanation of the relevancy assessment

Constructor Parameters

model:

LanguageModel
Configuration for the model used to evaluate context relevancy

options:

ContextRelevancyMetricOptions
Configuration options for the metric

ContextRelevancyMetricOptions

scale?:

number
= 1
Maximum score value

context:

string[]
Array of retrieved context documents used to generate the response

measure() Parameters

input:

string
The original query or prompt

output:

string
The LLM's response to evaluate

Returns

score:

number
Context relevancy score (0 to scale, default 0-1)

info:

object
Object containing the reason for the score
string

reason:

string
Detailed explanation of the relevancy assessment

Scoring Details

The metric evaluates how well retrieved context matches the query through binary relevance classification.

Scoring Process

  1. Extracts statements from context:

    • Breaks down context into meaningful units
    • Preserves semantic relationships
  2. Evaluates statement relevance:

    • Assesses each statement against query
    • Counts relevant statements
    • Calculates relevance ratio

Final score: (relevant_statements / total_statements) * scale

Score interpretation

(0 to scale, default 0-1)

  • 1.0: Perfect relevancy - all retrieved context is relevant
  • 0.7-0.9: High relevancy - most context is relevant with few irrelevant pieces
  • 0.4-0.6: Moderate relevancy - a mix of relevant and irrelevant context
  • 0.1-0.3: Low relevancy - mostly irrelevant context
  • 0.0: No relevancy - completely irrelevant context

Example with Custom Configuration

import { openai } from "@ai-sdk/openai"; import { ContextRelevancyMetric } from "@mastra/evals/llm"; // Configure the model for evaluation const model = openai("gpt-4o-mini"); const metric = new ContextRelevancyMetric(model, { scale: 100, // Use 0-100 scale instead of 0-1 context: [ "Basic plan costs $10/month", "Pro plan includes advanced features at $30/month", "Enterprise plan has custom pricing", "Our company was founded in 2020", "We have offices worldwide" ] }); const result = await metric.measure( "What are our pricing plans?", "We offer Basic, Pro, and Enterprise plans.", ); // Example output: // { // score: 60, // info: { // reason: "3 out of 5 statements are relevant to pricing plans. The statements about // company founding and office locations are not relevant to the pricing query." // } // }