Faithfulness Scorer
The createFaithfulnessScorer()
function evaluates how factually accurate an LLM’s output is compared to the provided context. It extracts claims from the output and verifies them against the context, making it essential to measure RAG pipeline responses’ reliability.
For a usage example, see the Faithfulness Examples.
Parameters
The createFaithfulnessScorer()
function accepts a single options object with the following properties:
model:
context:
scale:
This function returns an instance of the MastraScorer class. The .run()
method accepts the same input as other scorers (see the MastraScorer reference), but the return value includes LLM-specific fields as documented below.
.run() Returns
runId:
extractStepResult:
extractPrompt:
analyzeStepResult:
analyzePrompt:
score:
reason:
reasonPrompt:
Scoring Details
The scorer evaluates faithfulness through claim verification against provided context.
Scoring Process
- Analyzes claims and context:
- Extracts all claims (factual and speculative)
- Verifies each claim against context
- Assigns one of three verdicts:
- “yes” - claim supported by context
- “no” - claim contradicts context
- “unsure” - claim unverifiable
- Calculates faithfulness score:
- Counts supported claims
- Divides by total claims
- Scales to configured range
Final score: (supported_claims / total_claims) * scale
Score interpretation
(0 to scale, default 0-1)
- 1.0: All claims supported by context
- 0.7-0.9: Most claims supported, few unverifiable
- 0.4-0.6: Mixed support with some contradictions
- 0.1-0.3: Limited support, many contradictions
- 0.0: No supported claims