FaithfulnessMetric Reference

The FaithfulnessMetric in Mastra evaluates how factually accurate an LLM’s output is compared to the provided context. It extracts claims from the output and verifies them against the context, making it essential to measure RAG pipeline responses’ reliability.

Basic Usage


import { openai } from "@ai-sdk/openai";
import { FaithfulnessMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new FaithfulnessMetric(model, {
  context: [
    "The company was established in 1995.",
    "Currently employs around 450-550 people.",
  ],
});
 
const result = await metric.measure(
  "Tell me about the company.",
  "The company was founded in 1995 and has 500 employees.",
);
 
console.log(result.score); // 1.0
console.log(result.info.reason); // "All claims are supported by the context."

Constructor Parameters

model:

LanguageModel

Configuration for the model used to evaluate faithfulness.

options:

FaithfulnessMetricOptions

Additional options for configuring the metric.

FaithfulnessMetricOptions

scale:

number

= 1

The maximum score value. The final score will be normalized to this scale.

context:

string[]

Array of context chunks against which the output's claims will be verified.

measure() Parameters

input:

string

The original query or prompt given to the LLM.

output:

string

The LLM's response to be evaluated for faithfulness.

Returns

score:

number

A score between 0 and the configured scale, representing the proportion of claims that are supported by the context.

info:

object

Object containing the reason for the score

string

reason:

string

A detailed explanation of the score, including which claims were supported, contradicted, or marked as unsure.

Scoring Details

The metric evaluates faithfulness through claim verification against provided context.

Scoring Process

Analyzes claims and context:
- Extracts all claims (factual and speculative)
- Verifies each claim against context
- Assigns one of three verdicts:
  - “yes” - claim supported by context
  - “no” - claim contradicts context
  - “unsure” - claim unverifiable
Calculates faithfulness score:
- Counts supported claims
- Divides by total claims
- Scales to configured range

Final score: (supported_claims / total_claims) * scale

Score interpretation

(0 to scale, default 0-1)

1.0: All claims supported by context
0.7-0.9: Most claims supported, few unverifiable
0.4-0.6: Mixed support with some contradictions
0.1-0.3: Limited support, many contradictions
0.0: No supported claims

Advanced Example


import { openai } from "@ai-sdk/openai";
import { FaithfulnessMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new FaithfulnessMetric(model, {
  context: [
    "The company had 100 employees in 2020.",
    "Current employee count is approximately 500.",
  ],
});
 
// Example with mixed claim types
const result = await metric.measure(
  "What's the company's growth like?",
  "The company has grown from 100 employees in 2020 to 500 now, and might expand to 1000 by next year.",
);
 
// Example output:
// {
//   score: 0.67,
//   info: {
//     reason: "The score is 0.67 because two claims are supported by the context
//           (initial employee count of 100 in 2020 and current count of 500),
//           while the future expansion claim is marked as unsure as it cannot
//           be verified against the context."
//   }
// }

FaithfulnessMetric Reference

Basic Usage

Constructor Parameters

model:

options:

FaithfulnessMetricOptions

scale:

context:

measure() Parameters

input:

output:

Returns

score:

info:

reason:

Scoring Details

Scoring Process

Score interpretation

Advanced Example

Related