Hallucination scorer

The createHallucinationScorer() function evaluates whether an LLM generates factually correct information by comparing its output against the provided context. This scorer measures hallucination by identifying direct contradictions between the context and the output.

Parameters
Direct link to Parameters

The createHallucinationScorer() function accepts a single options object with the following properties:

model:

LanguageModel

Configuration for the model used to evaluate hallucination.

options?:

Options

Configuration options.

Options

scale:

number

Maximum score value.

context:

string[]

Static context strings to use as ground truth for hallucination detection.

getContext:

(params: GetContextParams) => string[] | Promise<string[]>

A hook to dynamically resolve context at runtime. Takes priority over static context. Useful for live scoring where context (like tool results) is only available when the scorer runs.

This function returns an instance of the MastraScorer class. The .run() method accepts the same input as other scorers (see the MastraScorer reference), but the return value includes LLM-specific fields as documented below.

`.run()` returns
Direct link to run-returns

runId:

string

The id of the run (optional).

preprocessStepResult:

object

Object with extracted claims: { claims: string[] }

preprocessPrompt:

string

The prompt sent to the LLM for the preprocess step (optional).

analyzeStepResult:

object

Object with verdicts: { verdicts: Array<{ statement: string, verdict: 'yes' | 'no', reason: string }> }

analyzePrompt:

string

The prompt sent to the LLM for the analyze step (optional).

score:

number

Hallucination score (0 to scale, default 0-1).

reason:

string

Detailed explanation of the score and identified contradictions.

generateReasonPrompt:

string

The prompt sent to the LLM for the generateReason step (optional).

Scoring details
Direct link to Scoring details

The scorer evaluates hallucination through contradiction detection and unsupported claim analysis.

Scoring Process
Direct link to Scoring Process

Analyzes factual content:
- Extracts statements from context
- Identifies numerical values and dates
- Maps statement relationships
Analyzes output for hallucinations:
- Compares against context statements
- Marks direct conflicts as hallucinations
- Identifies unsupported claims as hallucinations
- Evaluates numerical accuracy
- Considers approximation context
Calculates hallucination score:
- Counts hallucinated statements (contradictions and unsupported claims)
- Divides by total statements
- Scales to configured range

Final score: (hallucinated_statements / total_statements) * scale

Important Considerations
Direct link to Important Considerations

Claims not present in context are treated as hallucinations
Subjective claims are hallucinations unless explicitly supported
Speculative language ("might", "possibly") about facts IN context is allowed
Speculative language about facts NOT in context is treated as hallucination
Empty outputs result in zero hallucinations
Numerical evaluation considers:
- Scale-appropriate precision
- Contextual approximations
- Explicit precision indicators

Score interpretation
Direct link to Score interpretation

A hallucination score between 0 and 1:

0.0: No hallucination — all claims match the context.
0.3–0.4: Low hallucination — a few contradictions.
0.5–0.6: Mixed hallucination — several contradictions.
0.7–0.8: High hallucination — many contradictions.
0.9–1.0: Complete hallucination — most or all claims contradict the context.

Note: The score represents the degree of hallucination - lower scores indicate better factual alignment with the provided context

Examples
Direct link to Examples

Static Context
Direct link to Static Context

Use static context when you have known ground truth to compare against:

src/example-static-context.ts
import { createHallucinationScorer } from '@mastra/evals/scorers/prebuilt'

const scorer = createHallucinationScorer({
  model: 'openai/gpt-5.4',
  options: {
    context: [
      'The first iPhone was announced on January 9, 2007.',
      'It was released on June 29, 2007.',
      'Steve Jobs introduced it at Macworld.',
    ],
  },
})

Dynamic Context with `getContext`
Direct link to dynamic-context-with-getcontext

Use getContext for live scoring scenarios where context comes from tool results:

src/example-dynamic-context.ts
import { createHallucinationScorer } from '@mastra/evals/scorers/prebuilt'
import { extractToolResults } from '@mastra/evals/scorers'

const scorer = createHallucinationScorer({
  model: 'openai/gpt-5.4',
  options: {
    getContext: ({ run, step }) => {
      // Extract tool results as context
      const toolResults = extractToolResults(run.output)
      return toolResults.map(t => JSON.stringify({ tool: t.toolName, result: t.result }))
    },
  },
})

Live Scoring with Agent
Direct link to Live Scoring with Agent

Attach the scorer to an agent for live evaluation:

src/example-live-scoring.ts
import { Agent } from '@mastra/core/agent'
import { createHallucinationScorer } from '@mastra/evals/scorers/prebuilt'
import { extractToolResults } from '@mastra/evals/scorers'

const hallucinationScorer = createHallucinationScorer({
  model: 'openai/gpt-5.4',
  options: {
    getContext: ({ run }) => {
      const toolResults = extractToolResults(run.output)
      return toolResults.map(t => JSON.stringify({ tool: t.toolName, result: t.result }))
    },
  },
})

const agent = new Agent({
  name: 'my-agent',
  model: 'openai/gpt-5.4',
  instructions: 'You are a helpful assistant.',
  evals: {
    scorers: [hallucinationScorer],
  },
})

Batch Evaluation with `runEvals`
Direct link to batch-evaluation-with-runevals

src/example-batch-evals.ts
import { runEvals } from '@mastra/core/evals'
import { createHallucinationScorer } from '@mastra/evals/scorers/prebuilt'
import { myAgent } from './agent'

const scorer = createHallucinationScorer({
  model: 'openai/gpt-5.4',
  options: {
    context: ['Known fact 1', 'Known fact 2'],
  },
})

const result = await runEvals({
  data: [{ input: 'Tell me about topic A' }, { input: 'Tell me about topic B' }],
  scorers: [scorer],
  target: myAgent,
  onItemComplete: ({ scorerResults }) => {
    console.log({
      score: scorerResults[scorer.id].score,
      reason: scorerResults[scorer.id].reason,
    })
  },
})

console.log(result.scores)

For more details on runEvals, see the runEvals reference.

To add this scorer to an agent, see the Scorers overview guide.

ParametersDirect link to Parameters

model:

options?:

scale:

context:

getContext:

.run() returnsDirect link to run-returns

runId:

preprocessStepResult:

preprocessPrompt:

analyzeStepResult:

analyzePrompt:

score:

reason:

generateReasonPrompt:

Scoring detailsDirect link to Scoring details

Scoring ProcessDirect link to Scoring Process

Important ConsiderationsDirect link to Important Considerations

Score interpretationDirect link to Score interpretation

ExamplesDirect link to Examples

Static ContextDirect link to Static Context

Dynamic Context with getContextDirect link to dynamic-context-with-getcontext

Live Scoring with AgentDirect link to Live Scoring with Agent

Batch Evaluation with runEvalsDirect link to batch-evaluation-with-runevals

RelatedDirect link to Related

Parameters
Direct link to Parameters

`.run()` returns
Direct link to run-returns

Scoring details
Direct link to Scoring details

Scoring Process
Direct link to Scoring Process

Important Considerations
Direct link to Important Considerations

Score interpretation
Direct link to Score interpretation

Examples
Direct link to Examples

Static Context
Direct link to Static Context

Dynamic Context with `getContext`
Direct link to dynamic-context-with-getcontext

Live Scoring with Agent
Direct link to Live Scoring with Agent

Batch Evaluation with `runEvals`
Direct link to batch-evaluation-with-runevals

Related
Direct link to Related