Skip to Content
ReferenceEvalsContextRelevancy

ContextRelevancyMetric

New Scorer API

We just released a new evals API called Scorers, with a more ergonomic API and more metadata stored for error analysis, and more flexibility to evaluate data structures. It’s fairly simple to migrate, but we will continue to support the existing Evals API.

The ContextRelevancyMetric class evaluates the quality of your RAG (Retrieval-Augmented Generation) pipeline’s retriever by measuring how relevant the retrieved context is to the input query. It uses an LLM-based evaluation system that first extracts statements from the context and then assesses their relevance to the input.

Basic Usage

import { openai } from "@ai-sdk/openai"; import { ContextRelevancyMetric } from "@mastra/evals/llm"; // Configure the model for evaluation const model = openai("gpt-4o-mini"); const metric = new ContextRelevancyMetric(model, { context: [ "All data is encrypted at rest and in transit", "Two-factor authentication is mandatory", "The platform supports multiple languages", "Our offices are located in San Francisco", ], }); const result = await metric.measure( "What are our product's security features?", "Our product uses encryption and requires 2FA.", ); console.log(result.score); // Score from 0-1 console.log(result.info.reason); // Explanation of the relevancy assessment

Constructor Parameters

model:

LanguageModel
Configuration for the model used to evaluate context relevancy

options:

ContextRelevancyMetricOptions
Configuration options for the metric

ContextRelevancyMetricOptions

scale?:

number
= 1
Maximum score value

context:

string[]
Array of retrieved context documents used to generate the response

measure() Parameters

input:

string
The original query or prompt

output:

string
The LLM's response to evaluate

Returns

score:

number
Context relevancy score (0 to scale, default 0-1)

info:

object
Object containing the reason for the score
string

reason:

string
Detailed explanation of the relevancy assessment

Scoring Details

The metric evaluates how well retrieved context matches the query through binary relevance classification.

Scoring Process

  1. Extracts statements from context:

    • Breaks down context into meaningful units
    • Preserves semantic relationships
  2. Evaluates statement relevance:

    • Assesses each statement against query
    • Counts relevant statements
    • Calculates relevance ratio

Final score: (relevant_statements / total_statements) * scale

Score interpretation

(0 to scale, default 0-1)

  • 1.0: Perfect relevancy - all retrieved context is relevant
  • 0.7-0.9: High relevancy - most context is relevant with few irrelevant pieces
  • 0.4-0.6: Moderate relevancy - a mix of relevant and irrelevant context
  • 0.1-0.3: Low relevancy - mostly irrelevant context
  • 0.0: No relevancy - completely irrelevant context

Example with Custom Configuration

import { openai } from "@ai-sdk/openai"; import { ContextRelevancyMetric } from "@mastra/evals/llm"; // Configure the model for evaluation const model = openai("gpt-4o-mini"); const metric = new ContextRelevancyMetric(model, { scale: 100, // Use 0-100 scale instead of 0-1 context: [ "Basic plan costs $10/month", "Pro plan includes advanced features at $30/month", "Enterprise plan has custom pricing", "Our company was founded in 2020", "We have offices worldwide", ], }); const result = await metric.measure( "What are our pricing plans?", "We offer Basic, Pro, and Enterprise plans.", ); // Example output: // { // score: 60, // info: { // reason: "3 out of 5 statements are relevant to pricing plans. The statements about // company founding and office locations are not relevant to the pricing query." // } // }