FaithfulnessMetric Reference
The FaithfulnessMetric
in Mastra evaluates how factually accurate an LLM’s output is compared to the provided context. It extracts claims from the output and verifies them against the context, making it essential for measuring the reliability of RAG pipeline responses.
Basic Usage
import { FaithfulnessMetric } from "@mastra/evals/llm";
// Configure the model for evaluation
const model = {
provider: "OPEN_AI",
name: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
};
const metric = new FaithfulnessMetric(model, {
context: [
"The company was established in 1995.",
"Currently employs around 450-550 people.",
],
});
const result = await metric.measure(
"Tell me about the company.",
"The company was founded in 1995 and has 500 employees.",
);
console.log(result.score); // 1.0
console.log(result.info.reason); // "All claims are supported by the context."
Constructor Parameters
model:
ModelConfig
Configuration for the model used to evaluate faithfulness.
options:
FaithfulnessMetricOptions
Additional options for configuring the metric.
FaithfulnessMetricOptions
scale:
number
= 1
The maximum score value. The final score will be normalized to this scale.
context:
string[]
Array of context chunks against which the output's claims will be verified.
measure() Parameters
input:
string
The original query or prompt given to the LLM.
output:
string
The LLM's response to be evaluated for faithfulness.
Returns
score:
number
A score between 0 and the configured scale, representing the proportion of claims that are supported by the context.
info:
object
Object containing the reason for the score
string
reason:
string
A detailed explanation of the score, including which claims were supported, contradicted, or marked as unsure.
Scoring Details
The FaithfulnessMetric evaluates the output by:
- Extracting all claims from the output (both factual and speculative)
- Verifying each claim against the provided context
- Calculating a score based on the proportion of supported claims
Claims can receive one of three verdicts:
- “yes” - The claim is supported by the context
- “no” - The claim contradicts the context
- “unsure” - The claim cannot be verified using the context (e.g., future predictions or claims outside the context scope)
The final score is calculated as: (number of supported claims / total number of claims) * scale
Score interpretation:
- 1.0: All claims are supported by the context
- 0.67: Two-thirds of claims are supported
- 0.5: Half of the claims are supported
- 0.33: One-third of claims are supported
- 0: No claims are supported or output is empty
Advanced Example
const metric = new FaithfulnessMetric(model, {
context: [
"The company had 100 employees in 2020.",
"Current employee count is approximately 500.",
],
});
// Example with mixed claim types
const result = await metric.measure(
"What's the company's growth like?",
"The company has grown from 100 employees in 2020 to 500 now, and might expand to 1000 by next year.",
);
// Example output:
// {
// score: 0.67,
// info: {
// reason: "The score is 0.67 because two claims are supported by the context
// (initial employee count of 100 in 2020 and current count of 500),
// while the future expansion claim is marked as unsure as it cannot
// be verified against the context."
// }
// }