HallucinationMetric
The HallucinationMetric
evaluates whether an LLM generates factually correct information by comparing its output against provided context. This metric measures hallucination by identifying direct contradictions between the context and the output.
Basic Usage
import { HallucinationMetric } from "@mastra/evals/llm";
// Configure the model for evaluation
const model = {
provider: "OPEN_AI",
name: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
};
const metric = new HallucinationMetric(model, {
context: [
"Tesla was founded in 2003 by Martin Eberhard and Marc Tarpenning in San Carlos, California.",
],
});
const result = await metric.measure(
"Tell me about Tesla's founding.",
"Tesla was founded in 2004 by Elon Musk in California.",
);
console.log(result.score); // Score from 0-1
console.log(result.info.reason); // Explanation of the score
// Example output:
// {
// score: 0.67,
// info: {
// reason: "The score is 0.67 because two out of three statements from the context
// (founding year and founders) were contradicted by the output, while the
// location statement was not contradicted."
// }
// }
Constructor Parameters
model:
ModelConfig
Configuration for the model used to evaluate hallucination
options:
HallucinationMetricOptions
Configuration options for the metric
HallucinationMetricOptions
scale?:
number
= 1
Maximum score value
context:
string[]
Array of context pieces used as the source of truth
measure() Parameters
input:
string
The original query or prompt
output:
string
The LLM's response to evaluate
Returns
score:
number
Hallucination score (0 to scale, default 0-1)
info:
object
Object containing the reason for the score
string
reason:
string
Detailed explanation of the score and identified contradictions
Scoring Details
The metric evaluates hallucination through:
- Extracting key factual statements from each context piece
- Checking if the output contradicts any of these statements
- Calculating the ratio of contradicted statements to total statements
The scoring process:
- Each statement from the context is evaluated against the output
- A contradiction is marked when the output directly conflicts with a statement
- Score = (number of contradicted statements) / (total number of statements)
- Result is scaled to the configured range (default 0-1)
Important considerations:
- Numerical approximations are evaluated based on:
- Scale of the numbers involved
- Use of approximation terms (“about”, “around”, “approximately”)
- Context-appropriate precision
- Explicit precision markers (“exactly”, “precisely”)
- Speculative language (might, possibly, believe) does not constitute contradictions
- Additional information beyond context scope is not counted as contradictions unless it directly conflicts
- Empty outputs result in zero contradictions
Score interpretation:
- 0.0: No hallucination - output doesn’t contradict any context statements
- 0.25: Low hallucination - contradicts 25% of context statements
- 0.5: Moderate hallucination - contradicts half of context statements
- 0.75: High hallucination - contradicts 75% of context statements
- 1.0: Complete hallucination - contradicts all context statements
Note: The score represents the degree of hallucination, so a lower score indicates better factual alignment with the provided context.
Example with Analysis
const metric = new HallucinationMetric(model, {
context: [
"OpenAI was founded in December 2015 by Sam Altman, Greg Brockman, and others.",
"The company launched with a $1 billion investment commitment.",
"Elon Musk was an early supporter but left the board in 2018.",
],
});
const result = await metric.measure({
input: "What are the key details about OpenAI?",
output:
"OpenAI was founded in 2015 by Elon Musk and Sam Altman with a $2 billion investment.",
});
// Example output:
// {
// score: 0.33,
// info: {
// reason: "The score is 0.33 because one out of three statements from the context
// was contradicted (the investment amount was stated as $2 billion instead
// of $1 billion). The founding date was correct, and while the output's
// description of founders was incomplete, it wasn't strictly contradictory."
// }
// }