HallucinationMetric

The HallucinationMetric evaluates whether an LLM generates factually correct information by comparing its output against the provided context. This metric measures hallucination by identifying direct contradictions between the context and the output.

Basic Usage


import { openai } from "@ai-sdk/openai";
import { HallucinationMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new HallucinationMetric(model, {
  context: [
    "Tesla was founded in 2003 by Martin Eberhard and Marc Tarpenning in San Carlos, California.",
  ],
});
 
const result = await metric.measure(
  "Tell me about Tesla's founding.",
  "Tesla was founded in 2004 by Elon Musk in California.",
);
 
console.log(result.score); // Score from 0-1
console.log(result.info.reason); // Explanation of the score
 
// Example output:
// {
//   score: 0.67,
//   info: {
//     reason: "The score is 0.67 because two out of three statements from the context
//           (founding year and founders) were contradicted by the output, while the
//           location statement was not contradicted."
//   }
// }

Constructor Parameters

model:

LanguageModel

Configuration for the model used to evaluate hallucination

options:

HallucinationMetricOptions

Configuration options for the metric

HallucinationMetricOptions

scale?:

number

= 1

Maximum score value

context:

string[]

Array of context pieces used as the source of truth

measure() Parameters

input:

string

The original query or prompt

output:

string

The LLM's response to evaluate

Returns

score:

number

Hallucination score (0 to scale, default 0-1)

info:

object

Object containing the reason for the score

string

reason:

string

Detailed explanation of the score and identified contradictions

Scoring Details

The metric evaluates hallucination through contradiction detection and unsupported claim analysis.

Scoring Process

Analyzes factual content:
- Extracts statements from context
- Identifies numerical values and dates
- Maps statement relationships
Analyzes output for hallucinations:
- Compares against context statements
- Marks direct conflicts as hallucinations
- Identifies unsupported claims as hallucinations
- Evaluates numerical accuracy
- Considers approximation context
Calculates hallucination score:
- Counts hallucinated statements (contradictions and unsupported claims)
- Divides by total statements
- Scales to configured range

Final score: (hallucinated_statements / total_statements) * scale

Important Considerations

Claims not present in context are treated as hallucinations
Subjective claims are hallucinations unless explicitly supported
Speculative language (“might”, “possibly”) about facts IN context is allowed
Speculative language about facts NOT in context is treated as hallucination
Empty outputs result in zero hallucinations
Numerical evaluation considers:
- Scale-appropriate precision
- Contextual approximations
- Explicit precision indicators

Score interpretation

(0 to scale, default 0-1)

1.0: Complete hallucination - contradicts all context statements
0.75: High hallucination - contradicts 75% of context statements
0.5: Moderate hallucination - contradicts half of context statements
0.25: Low hallucination - contradicts 25% of context statements
0.0: No hallucination - output aligns with all context statements

Note: The score represents the degree of hallucination - lower scores indicate better factual alignment with the provided context

Example with Analysis


import { openai } from "@ai-sdk/openai";
import { HallucinationMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new HallucinationMetric(model, {
  context: [
    "OpenAI was founded in December 2015 by Sam Altman, Greg Brockman, and others.",
    "The company launched with a $1 billion investment commitment.",
    "Elon Musk was an early supporter but left the board in 2018.",
  ],
});
 
const result = await metric.measure({
  input: "What are the key details about OpenAI?",
  output:
    "OpenAI was founded in 2015 by Elon Musk and Sam Altman with a $2 billion investment.",
});
 
// Example output:
// {
//   score: 0.33,
//   info: {
//     reason: "The score is 0.33 because one out of three statements from the context
//           was contradicted (the investment amount was stated as $2 billion instead
//           of $1 billion). The founding date was correct, and while the output's
//           description of founders was incomplete, it wasn't strictly contradictory."
//   }
// }

HallucinationMetric

Basic Usage

Constructor Parameters

model:

options:

HallucinationMetricOptions

scale?:

context:

measure() Parameters

input:

output:

Returns

score:

info:

reason:

Scoring Details

Scoring Process

Important Considerations

Score interpretation

Example with Analysis

Related