Faithfulness Scorer
The createFaithfulnessScorer() function evaluates how factually accurate an LLM's output is compared to the provided context. It extracts claims from the output and verifies them against the context, making it essential to measure RAG pipeline responses' reliability.
Parameters
The createFaithfulnessScorer() function accepts a single options object with the following properties:
model:
context:
scale:
This function returns an instance of the MastraScorer class. The .run() method accepts the same input as other scorers (see the MastraScorer reference), but the return value includes LLM-specific fields as documented below.
.run() Returns
runId:
preprocessStepResult:
preprocessPrompt:
analyzeStepResult:
analyzePrompt:
score:
reason:
generateReasonPrompt:
Scoring Details
The scorer evaluates faithfulness through claim verification against provided context.
Scoring Process
- Analyzes claims and context:
- Extracts all claims (factual and speculative)
- Verifies each claim against context
- Assigns one of three verdicts:
- "yes" - claim supported by context
- "no" - claim contradicts context
- "unsure" - claim unverifiable
- Calculates faithfulness score:
- Counts supported claims
- Divides by total claims
- Scales to configured range
Final score: (supported_claims / total_claims) * scale
Score interpretation
A faithfulness score between 0 and 1:
- 1.0: All claims are accurate and directly supported by the context.
- 0.7–0.9: Most claims are correct, with minor additions or omissions.
- 0.4–0.6: Some claims are supported, but others are unverifiable.
- 0.1–0.3: Most of the content is inaccurate or unsupported.
- 0.0: All claims are false or contradict the context.
Example
Evaluate agent responses for faithfulness to provided context:
import { runEvals } from "@mastra/core/evals";
import { createFaithfulnessScorer } from "@mastra/evals/scorers/prebuilt";
import { myAgent } from "./agent";
// Context is typically populated from agent tool calls or RAG retrieval
const scorer = createFaithfulnessScorer({
model: "openai/gpt-4o",
});
const result = await runEvals({
data: [
{
input: "Tell me about the Tesla Model 3.",
},
{
input: "What are the key features of this electric vehicle?",
},
],
scorers: [scorer],
target: myAgent,
onItemComplete: ({ scorerResults }) => {
console.log({
score: scorerResults[scorer.id].score,
reason: scorerResults[scorer.id].reason,
});
},
});
console.log(result.scores);
For more details on runEvals, see the runEvals reference.
To add this scorer to an agent, see the Scorers overview guide.