Skip to main content

AnswerRelevancyMetric

Scorers

This documentation refers to the legacy evals API. For the latest scorer features, see Scorers.

The AnswerRelevancyMetric class evaluates how well an LLM's output answers or addresses the input query. It uses a judge-based system to determine relevancy and provides detailed scoring and reasoning.

Basic UsageDirect link to Basic Usage

import { openai } from "@ai-sdk/openai";
import { AnswerRelevancyMetric } from "@mastra/evals/llm";

// Configure the model for evaluation
const model = openai("gpt-4o-mini");

const metric = new AnswerRelevancyMetric(model, {
uncertaintyWeight: 0.3,
scale: 1,
});

const result = await metric.measure(
"What is the capital of France?",
"Paris is the capital of France.",
);

console.log(result.score); // Score from 0-1
console.log(result.info.reason); // Explanation of the score

Constructor ParametersDirect link to Constructor Parameters

model:

LanguageModel
Configuration for the model used to evaluate relevancy

options?:

AnswerRelevancyMetricOptions
= { uncertaintyWeight: 0.3, scale: 1 }
Configuration options for the metric

AnswerRelevancyMetricOptionsDirect link to AnswerRelevancyMetricOptions

uncertaintyWeight?:

number
= 0.3
Weight given to 'unsure' verdicts in scoring (0-1)

scale?:

number
= 1
Maximum score value

measure() ParametersDirect link to measure() Parameters

input:

string
The original query or prompt

output:

string
The LLM's response to evaluate

ReturnsDirect link to Returns

score:

number
Relevancy score (0 to scale, default 0-1)

info:

object
Object containing the reason for the score
string

reason:

string
Explanation of the score

Scoring DetailsDirect link to Scoring Details

The metric evaluates relevancy through query-answer alignment, considering completeness, accuracy, and detail level.

Scoring ProcessDirect link to Scoring Process

  1. Statement Analysis:

    • Breaks output into meaningful statements while preserving context
    • Evaluates each statement against query requirements
  2. Evaluates relevance of each statement:

    • "yes": Full weight for direct matches
    • "unsure": Partial weight (default: 0.3) for approximate matches
    • "no": Zero weight for irrelevant content

Final score: ((direct + uncertainty * partial) / total_statements) * scale

Score interpretationDirect link to Score interpretation

(0 to scale, default 0-1)

  • 1.0: Perfect relevance - complete and accurate
  • 0.7-0.9: High relevance - minor gaps or imprecisions
  • 0.4-0.6: Moderate relevance - significant gaps
  • 0.1-0.3: Low relevance - major issues
  • 0.0: No relevance - incorrect or off-topic

Example with Custom ConfigurationDirect link to Example with Custom Configuration

import { openai } from "@ai-sdk/openai";
import { AnswerRelevancyMetric } from "@mastra/evals/llm";

// Configure the model for evaluation
const model = openai("gpt-4o-mini");

const metric = new AnswerRelevancyMetric(model, {
uncertaintyWeight: 0.5, // Higher weight for uncertain verdicts
scale: 5, // Use 0-5 scale instead of 0-1
});

const result = await metric.measure(
"What are the benefits of exercise?",
"Regular exercise improves cardiovascular health, builds strength, and boosts mental wellbeing.",
);

// Example output:
// {
// score: 4.5,
// info: {
// reason: "The score is 4.5 out of 5 because the response directly addresses the query
// with specific, accurate benefits of exercise. It covers multiple aspects
// (cardiovascular, muscular, and mental health) in a clear and concise manner.
// The answer is highly relevant and provides appropriate detail without
// including unnecessary information."
// }
// }