Context Relevancy Evaluation

note

The Scorers API is currently in beta. We're actively working on improvements and appreciate your feedback. For questions or feature requests, please reach out on Discord.

Use ContextRelevancyMetric to evaluate how well the retrieved context aligns with the original query. The metric accepts a query and a response, and returns a score and an info object containing a reason.

InstallationDirect link to Installation

npm install @mastra/evals

High relevancy exampleDirect link to High relevancy example

In this example, the response uses only context that is directly relevant to the query. Every context item supports the answer, resulting in a perfect relevancy score.

src/example-high-context-relevancy.ts
import { openai } from "@ai-sdk/openai";
import { ContextRelevancyMetric } from "@mastra/evals/llm";

const metric = new ContextRelevancyMetric(openai("gpt-4o-mini"), {
  context: [
    "Einstein won the Nobel Prize for his discovery of the photoelectric effect.",
    "He published his theory of relativity in 1905.",
    "His work revolutionized modern physics.",
  ],
});

const query = "What were some of Einstein's achievements?";
const response =
  "Einstein won the Nobel Prize for discovering the photoelectric effect and published his groundbreaking theory of relativity.";

const result = await metric.measure(query, response);

console.log(result);

High relevancy outputDirect link to High relevancy output

The output receives a perfect score because all context statements directly contribute to answering the query without including any unrelated information.

{
  score: 1,
  info: {
    reason: "The score is 1 because the retrieval context directly addresses the input by highlighting Einstein's significant achievements, making it entirely relevant."
  }
}

Mixed relevancy exampleDirect link to Mixed relevancy example

In this example, the response uses one or more relevant context items but also includes unrelated or less useful information. This lowers the overall relevancy score.

src/example-mixed-context-relevancy.ts
import { openai } from "@ai-sdk/openai";
import { ContextRelevancyMetric } from "@mastra/evals/llm";

const metric = new ContextRelevancyMetric(openai("gpt-4o-mini"), {
  context: [
    "Solar eclipses occur when the Moon blocks the Sun.",
    "The Moon moves between the Earth and Sun during eclipses.",
    "The Moon is visible at night.",
    "The Moon has no atmosphere.",
  ],
});

const query = "What causes solar eclipses?";
const response =
  "Solar eclipses happen when the Moon moves between Earth and the Sun, blocking sunlight.";

const result = await metric.measure(query, response);

console.log(result);

Mixed relevancy outputDirect link to Mixed relevancy output

The output receives a mid-range score because it includes relevant context about the eclipse mechanism, but also includes unrelated facts that dilute the overall relevancy.

{
  score: 0.5,
  info: {
    reason: "The score is 0.5 because the retrieval context contains statements that are irrelevant to the input, such as 'The Moon is visible at night' and 'The Moon has no atmosphere', which do not explain the causes of solar eclipses. This lack of relevant information significantly lowers the contextual relevancy score."
  }
}

Low relevancy exampleDirect link to Low relevancy example

In this example, most of the context is unrelated to the query. Only one item is relevant, which results in a low relevancy score.

src/example-low-context-relevancy.ts
import { openai } from "@ai-sdk/openai";
import { ContextRelevancyMetric } from "@mastra/evals/llm";

const metric = new ContextRelevancyMetric(openai("gpt-4o-mini"), {
  context: [
    "The Great Barrier Reef is in Australia.",
    "Coral reefs need warm water to survive.",
    "Marine life depends on coral reefs.",
    "The capital of Australia is Canberra.",
  ],
});

const query = "What is the capital of Australia?";
const response = "The capital of Australia is Canberra.";

const result = await metric.measure(query, response);

console.log(result);

Low relevancy outputDirect link to Low relevancy output

The output receives a low score because only one context item is relevant to the query. The remaining items introduce unrelated information that does not support the response.

{
  score: 0.25,
  info: {
    reason: "The score is 0.25 because the retrieval context contains statements that are completely irrelevant to the input question about the capital of Australia. For instance, 'The Great Barrier Reef is in Australia' and 'Coral reefs need warm water to survive' do not provide any geographical or political information related to the capital, thus failing to address the inquiry."
  }
}

Metric configurationDirect link to Metric configuration

You can create a ContextRelevancyMetric instance by providing a context array representing background information relevant to a query. You can also configure optional parameters such as scale to define the scoring range.

const metric = new ContextRelevancyMetric(openai("gpt-4o-mini"), {
  context: [""],
  scale: 1,
});

See ContextRelevancyMetric for a full list of configuration options.

Understanding the resultsDirect link to Understanding the results

ContextRelevancyMetric returns a result in the following shape:

{
  score: number,
  info: {
    reason: string
  }
}

Relevancy scoreDirect link to Relevancy score

A relevancy score between 0 and 1:

1.0: Perfect relevancy – all context directly relevant to query.
0.7–0.9: High relevancy – most context relevant to query.
0.4–0.6: Mixed relevancy – some context relevant to query.
0.1–0.3: Low relevancy – little context relevant to query.
0.0: No relevancy – no context relevant to query.

Relevancy infoDirect link to Relevancy info

An explanation for the score, with details including:

Relevance to input query.
Statement extraction from context.
Usefulness for response.
Overall context quality.

View source on GitHub