HallucinationMetric
The HallucinationMetric
evaluates whether an LLM generates factually correct information by comparing its output against the provided context. This metric measures hallucination by identifying direct contradictions between the context and the output.
Basic Usage
import { openai } from "@ai-sdk/openai";
import { HallucinationMetric } from "@mastra/evals/llm";
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
const metric = new HallucinationMetric(model, {
context: [
"Tesla was founded in 2003 by Martin Eberhard and Marc Tarpenning in San Carlos, California.",
],
});
const result = await metric.measure(
"Tell me about Tesla's founding.",
"Tesla was founded in 2004 by Elon Musk in California.",
);
console.log(result.score); // Score from 0-1
console.log(result.info.reason); // Explanation of the score
// Example output:
// {
// score: 0.67,
// info: {
// reason: "The score is 0.67 because two out of three statements from the context
// (founding year and founders) were contradicted by the output, while the
// location statement was not contradicted."
// }
// }
Constructor Parameters
model:
LanguageModel
Configuration for the model used to evaluate hallucination
options:
HallucinationMetricOptions
Configuration options for the metric
HallucinationMetricOptions
scale?:
number
= 1
Maximum score value
context:
string[]
Array of context pieces used as the source of truth
measure() Parameters
input:
string
The original query or prompt
output:
string
The LLM's response to evaluate
Returns
score:
number
Hallucination score (0 to scale, default 0-1)
info:
object
Object containing the reason for the score
string
reason:
string
Detailed explanation of the score and identified contradictions
Scoring Details
The metric evaluates hallucination through contradiction detection and numerical precision analysis.
Scoring Process
-
Analyzes factual content:
- Extracts statements from context
- Identifies numerical values
- Maps statement relationships
-
Analyzes output for contradictions:
- Compares against context statements
- Marks direct conflicts as contradictions
- Evaluates numerical accuracy
- Considers approximation context
-
Calculates hallucination score:
- Counts contradicted statements
- Divides by total statements
- Scales to configured range
Final score: (contradicted_statements / total_statements) * scale
Important Considerations
- Speculative language (“might”, “possibly”, “believe”) does not constitute contradictions
- Additional information beyond context scope is allowed unless it directly conflicts
- Empty outputs result in zero contradictions
- Numerical evaluation considers:
- Scale-appropriate precision
- Contextual approximations
- Explicit precision indicators
Score interpretation
(0 to scale, default 0-1)
- 1.0: Complete hallucination - contradicts all context statements
- 0.75: High hallucination - contradicts 75% of context statements
- 0.5: Moderate hallucination - contradicts half of context statements
- 0.25: Low hallucination - contradicts 25% of context statements
- 0.0: No hallucination - output aligns with all context statements
Note: The score represents the degree of hallucination - lower scores indicate better factual alignment with the provided context
Example with Analysis
import { openai } from "@ai-sdk/openai";
import { HallucinationMetric } from "@mastra/evals/llm";
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
const metric = new HallucinationMetric(model, {
context: [
"OpenAI was founded in December 2015 by Sam Altman, Greg Brockman, and others.",
"The company launched with a $1 billion investment commitment.",
"Elon Musk was an early supporter but left the board in 2018.",
],
});
const result = await metric.measure({
input: "What are the key details about OpenAI?",
output:
"OpenAI was founded in 2015 by Elon Musk and Sam Altman with a $2 billion investment.",
});
// Example output:
// {
// score: 0.33,
// info: {
// reason: "The score is 0.33 because one out of three statements from the context
// was contradicted (the investment amount was stated as $2 billion instead
// of $1 billion). The founding date was correct, and while the output's
// description of founders was incomplete, it wasn't strictly contradictory."
// }
// }