FaithfulnessMetric Reference
The FaithfulnessMetric
in Mastra evaluates how factually accurate an LLM’s output is compared to the provided context. It extracts claims from the output and verifies them against the context, making it essential to measure RAG pipeline responses’ reliability.
Basic Usage
import { openai } from "@ai-sdk/openai";
import { FaithfulnessMetric } from "@mastra/evals/llm";
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
const metric = new FaithfulnessMetric(model, {
context: [
"The company was established in 1995.",
"Currently employs around 450-550 people.",
],
});
const result = await metric.measure(
"Tell me about the company.",
"The company was founded in 1995 and has 500 employees.",
);
console.log(result.score); // 1.0
console.log(result.info.reason); // "All claims are supported by the context."
Constructor Parameters
model:
LanguageModel
Configuration for the model used to evaluate faithfulness.
options:
FaithfulnessMetricOptions
Additional options for configuring the metric.
FaithfulnessMetricOptions
scale:
number
= 1
The maximum score value. The final score will be normalized to this scale.
context:
string[]
Array of context chunks against which the output's claims will be verified.
measure() Parameters
input:
string
The original query or prompt given to the LLM.
output:
string
The LLM's response to be evaluated for faithfulness.
Returns
score:
number
A score between 0 and the configured scale, representing the proportion of claims that are supported by the context.
info:
object
Object containing the reason for the score
string
reason:
string
A detailed explanation of the score, including which claims were supported, contradicted, or marked as unsure.
Scoring Details
The metric evaluates faithfulness through claim verification against provided context.
Scoring Process
-
Analyzes claims and context:
- Extracts all claims (factual and speculative)
- Verifies each claim against context
- Assigns one of three verdicts:
- “yes” - claim supported by context
- “no” - claim contradicts context
- “unsure” - claim unverifiable
-
Calculates faithfulness score:
- Counts supported claims
- Divides by total claims
- Scales to configured range
Final score: (supported_claims / total_claims) * scale
Score interpretation
(0 to scale, default 0-1)
- 1.0: All claims supported by context
- 0.7-0.9: Most claims supported, few unverifiable
- 0.4-0.6: Mixed support with some contradictions
- 0.1-0.3: Limited support, many contradictions
- 0.0: No supported claims
Advanced Example
import { openai } from "@ai-sdk/openai";
import { FaithfulnessMetric } from "@mastra/evals/llm";
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
const metric = new FaithfulnessMetric(model, {
context: [
"The company had 100 employees in 2020.",
"Current employee count is approximately 500.",
],
});
// Example with mixed claim types
const result = await metric.measure(
"What's the company's growth like?",
"The company has grown from 100 employees in 2020 to 500 now, and might expand to 1000 by next year.",
);
// Example output:
// {
// score: 0.67,
// info: {
// reason: "The score is 0.67 because two claims are supported by the context
// (initial employee count of 100 in 2020 and current count of 500),
// while the future expansion claim is marked as unsure as it cannot
// be verified against the context."
// }
// }