AnswerRelevancyMetric

Scorers

This documentation refers to the legacy evals API. For the latest scorer features, see Scorers.

The AnswerRelevancyMetric class evaluates how well an LLM's output answers or addresses the input query. It uses a judge-based system to determine relevancy and provides detailed scoring and reasoning.

Basic UsageDirect link to Basic Usage

import { openai } from "@ai-sdk/openai";
import { AnswerRelevancyMetric } from "@mastra/evals/llm";

// Configure the model for evaluation
const model = openai("gpt-4o-mini");

const metric = new AnswerRelevancyMetric(model, {
  uncertaintyWeight: 0.3,
  scale: 1,
});

const result = await metric.measure(
  "What is the capital of France?",
  "Paris is the capital of France.",
);

console.log(result.score); // Score from 0-1
console.log(result.info.reason); // Explanation of the score

Constructor ParametersDirect link to Constructor Parameters

model:

LanguageModel

Configuration for the model used to evaluate relevancy

options?:

AnswerRelevancyMetricOptions

= { uncertaintyWeight: 0.3, scale: 1 }

Configuration options for the metric

AnswerRelevancyMetricOptionsDirect link to AnswerRelevancyMetricOptions

uncertaintyWeight?:

number

= 0.3

Weight given to 'unsure' verdicts in scoring (0-1)

scale?:

number

= 1

Maximum score value

measure() ParametersDirect link to measure() Parameters

input:

string

The original query or prompt

output:

string

The LLM's response to evaluate

ReturnsDirect link to Returns

score:

number

Relevancy score (0 to scale, default 0-1)

info:

object

Object containing the reason for the score

string

reason:

string

Explanation of the score

Scoring DetailsDirect link to Scoring Details

The metric evaluates relevancy through query-answer alignment, considering completeness, accuracy, and detail level.

Scoring ProcessDirect link to Scoring Process

Statement Analysis:
- Breaks output into meaningful statements while preserving context
- Evaluates each statement against query requirements
Evaluates relevance of each statement:
- "yes": Full weight for direct matches
- "unsure": Partial weight (default: 0.3) for approximate matches
- "no": Zero weight for irrelevant content

Final score: ((direct + uncertainty * partial) / total_statements) * scale

Score interpretationDirect link to Score interpretation

(0 to scale, default 0-1)

1.0: Perfect relevance - complete and accurate
0.7-0.9: High relevance - minor gaps or imprecisions
0.4-0.6: Moderate relevance - significant gaps
0.1-0.3: Low relevance - major issues
0.0: No relevance - incorrect or off-topic

Example with Custom ConfigurationDirect link to Example with Custom Configuration

import { openai } from "@ai-sdk/openai";
import { AnswerRelevancyMetric } from "@mastra/evals/llm";

// Configure the model for evaluation
const model = openai("gpt-4o-mini");

const metric = new AnswerRelevancyMetric(model, {
  uncertaintyWeight: 0.5, // Higher weight for uncertain verdicts
  scale: 5, // Use 0-5 scale instead of 0-1
});

const result = await metric.measure(
  "What are the benefits of exercise?",
  "Regular exercise improves cardiovascular health, builds strength, and boosts mental wellbeing.",
);

// Example output:
// {
//   score: 4.5,
//   info: {
//     reason: "The score is 4.5 out of 5 because the response directly addresses the query
//           with specific, accurate benefits of exercise. It covers multiple aspects
//           (cardiovascular, muscular, and mental health) in a clear and concise manner.
//           The answer is highly relevant and provides appropriate detail without
//           including unnecessary information."
//   }
// }

Basic UsageDirect link to Basic Usage

Constructor ParametersDirect link to Constructor Parameters

model:

options?:

AnswerRelevancyMetricOptionsDirect link to AnswerRelevancyMetricOptions

uncertaintyWeight?:

scale?:

measure() ParametersDirect link to measure() Parameters

input:

output:

ReturnsDirect link to Returns

score:

info:

reason:

Scoring DetailsDirect link to Scoring Details

Scoring ProcessDirect link to Scoring Process

Score interpretationDirect link to Score interpretation

Example with Custom ConfigurationDirect link to Example with Custom Configuration

RelatedDirect link to Related