DocsReferenceEvalsHallucination

HallucinationMetric

The HallucinationMetric evaluates whether an LLM generates factually correct information by comparing its output against the provided context. This metric measures hallucination by identifying direct contradictions between the context and the output.

Basic Usage

import { openai } from "@ai-sdk/openai";
import { HallucinationMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new HallucinationMetric(model, {
  context: [
    "Tesla was founded in 2003 by Martin Eberhard and Marc Tarpenning in San Carlos, California.",
  ],
});
 
const result = await metric.measure(
  "Tell me about Tesla's founding.",
  "Tesla was founded in 2004 by Elon Musk in California.",
);
 
console.log(result.score); // Score from 0-1
console.log(result.info.reason); // Explanation of the score
 
// Example output:
// {
//   score: 0.67,
//   info: {
//     reason: "The score is 0.67 because two out of three statements from the context
//           (founding year and founders) were contradicted by the output, while the
//           location statement was not contradicted."
//   }
// }

Constructor Parameters

model:

LanguageModel
Configuration for the model used to evaluate hallucination

options:

HallucinationMetricOptions
Configuration options for the metric

HallucinationMetricOptions

scale?:

number
= 1
Maximum score value

context:

string[]
Array of context pieces used as the source of truth

measure() Parameters

input:

string
The original query or prompt

output:

string
The LLM's response to evaluate

Returns

score:

number
Hallucination score (0 to scale, default 0-1)

info:

object
Object containing the reason for the score
string

reason:

string
Detailed explanation of the score and identified contradictions

Scoring Details

The metric evaluates hallucination through contradiction detection and numerical precision analysis.

Scoring Process

  1. Analyzes factual content:

    • Extracts statements from context
    • Identifies numerical values
    • Maps statement relationships
  2. Analyzes output for contradictions:

    • Compares against context statements
    • Marks direct conflicts as contradictions
    • Evaluates numerical accuracy
    • Considers approximation context
  3. Calculates hallucination score:

    • Counts contradicted statements
    • Divides by total statements
    • Scales to configured range

Final score: (contradicted_statements / total_statements) * scale

Important Considerations

  • Speculative language (“might”, “possibly”, “believe”) does not constitute contradictions
  • Additional information beyond context scope is allowed unless it directly conflicts
  • Empty outputs result in zero contradictions
  • Numerical evaluation considers:
    • Scale-appropriate precision
    • Contextual approximations
    • Explicit precision indicators

Score interpretation

(0 to scale, default 0-1)

  • 1.0: Complete hallucination - contradicts all context statements
  • 0.75: High hallucination - contradicts 75% of context statements
  • 0.5: Moderate hallucination - contradicts half of context statements
  • 0.25: Low hallucination - contradicts 25% of context statements
  • 0.0: No hallucination - output aligns with all context statements

Note: The score represents the degree of hallucination - lower scores indicate better factual alignment with the provided context

Example with Analysis

import { openai } from "@ai-sdk/openai";
import { HallucinationMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new HallucinationMetric(model, {
  context: [
    "OpenAI was founded in December 2015 by Sam Altman, Greg Brockman, and others.",
    "The company launched with a $1 billion investment commitment.",
    "Elon Musk was an early supporter but left the board in 2018.",
  ],
});
 
const result = await metric.measure({
  input: "What are the key details about OpenAI?",
  output:
    "OpenAI was founded in 2015 by Elon Musk and Sam Altman with a $2 billion investment.",
});
 
// Example output:
// {
//   score: 0.33,
//   info: {
//     reason: "The score is 0.33 because one out of three statements from the context
//           was contradicted (the investment amount was stated as $2 billion instead
//           of $1 billion). The founding date was correct, and while the output's
//           description of founders was incomplete, it wasn't strictly contradictory."
//   }
// }