Skip to Content
ReferenceScorersHallucination

Hallucination Scorer

The createHallucinationScorer() function evaluates whether an LLM generates factually correct information by comparing its output against the provided context. This scorer measures hallucination by identifying direct contradictions between the context and the output.

For a usage example, see the Hallucination Examples.

Parameters

The createHallucinationScorer() function accepts a single options object with the following properties:

model:

LanguageModel
Configuration for the model used to evaluate hallucination.

scale:

number
= 1
Maximum score value.

This function returns an instance of the MastraScorer class. The .run() method accepts the same input as other scorers (see the MastraScorer reference), but the return value includes LLM-specific fields as documented below.

.run() Returns

runId:

string
The id of the run (optional).

extractStepResult:

object
Object with extracted claims: { claims: string[] }

extractPrompt:

string
The prompt sent to the LLM for the extract step (optional).

analyzeStepResult:

object
Object with verdicts: { verdicts: Array<{ statement: string, verdict: 'yes' | 'no', reason: string }> }

analyzePrompt:

string
The prompt sent to the LLM for the analyze step (optional).

score:

number
Hallucination score (0 to scale, default 0-1).

reason:

string
Detailed explanation of the score and identified contradictions.

reasonPrompt:

string
The prompt sent to the LLM for the reason step (optional).

Scoring Details

The scorer evaluates hallucination through contradiction detection and unsupported claim analysis.

Scoring Process

  1. Analyzes factual content:
    • Extracts statements from context
    • Identifies numerical values and dates
    • Maps statement relationships
  2. Analyzes output for hallucinations:
    • Compares against context statements
    • Marks direct conflicts as hallucinations
    • Identifies unsupported claims as hallucinations
    • Evaluates numerical accuracy
    • Considers approximation context
  3. Calculates hallucination score:
    • Counts hallucinated statements (contradictions and unsupported claims)
    • Divides by total statements
    • Scales to configured range

Final score: (hallucinated_statements / total_statements) * scale

Important Considerations

  • Claims not present in context are treated as hallucinations
  • Subjective claims are hallucinations unless explicitly supported
  • Speculative language (“might”, “possibly”) about facts IN context is allowed
  • Speculative language about facts NOT in context is treated as hallucination
  • Empty outputs result in zero hallucinations
  • Numerical evaluation considers:
    • Scale-appropriate precision
    • Contextual approximations
    • Explicit precision indicators

Score interpretation

(0 to scale, default 0-1)

  • 1.0: Complete hallucination - contradicts all context statements
  • 0.75: High hallucination - contradicts 75% of context statements
  • 0.5: Moderate hallucination - contradicts half of context statements
  • 0.25: Low hallucination - contradicts 25% of context statements
  • 0.0: No hallucination - output aligns with all context statements

Note: The score represents the degree of hallucination - lower scores indicate better factual alignment with the provided context