Skip to Content
ReferenceScorersFaithfulness

Faithfulness Scorer

The createFaithfulnessScorer() function evaluates how factually accurate an LLM’s output is compared to the provided context. It extracts claims from the output and verifies them against the context, making it essential to measure RAG pipeline responses’ reliability.

For a usage example, see the Faithfulness Examples.

Parameters

The createFaithfulnessScorer() function accepts a single options object with the following properties:

model:

LanguageModel
Configuration for the model used to evaluate faithfulness.

context:

string[]
Array of context chunks against which the output's claims will be verified.

scale:

number
= 1
The maximum score value. The final score will be normalized to this scale.

This function returns an instance of the MastraScorer class. The .run() method accepts the same input as other scorers (see the MastraScorer reference), but the return value includes LLM-specific fields as documented below.

.run() Returns

runId:

string
The id of the run (optional).

extractStepResult:

string[]
Array of extracted claims from the output.

extractPrompt:

string
The prompt sent to the LLM for the extract step (optional).

analyzeStepResult:

object
Object with verdicts: { verdicts: Array<{ verdict: 'yes' | 'no' | 'unsure', reason: string }> }

analyzePrompt:

string
The prompt sent to the LLM for the analyze step (optional).

score:

number
A score between 0 and the configured scale, representing the proportion of claims that are supported by the context.

reason:

string
A detailed explanation of the score, including which claims were supported, contradicted, or marked as unsure.

reasonPrompt:

string
The prompt sent to the LLM for the reason step (optional).

Scoring Details

The scorer evaluates faithfulness through claim verification against provided context.

Scoring Process

  1. Analyzes claims and context:
    • Extracts all claims (factual and speculative)
    • Verifies each claim against context
    • Assigns one of three verdicts:
      • “yes” - claim supported by context
      • “no” - claim contradicts context
      • “unsure” - claim unverifiable
  2. Calculates faithfulness score:
    • Counts supported claims
    • Divides by total claims
    • Scales to configured range

Final score: (supported_claims / total_claims) * scale

Score interpretation

(0 to scale, default 0-1)

  • 1.0: All claims supported by context
  • 0.7-0.9: Most claims supported, few unverifiable
  • 0.4-0.6: Mixed support with some contradictions
  • 0.1-0.3: Limited support, many contradictions
  • 0.0: No supported claims