Answer Relevancy Scorer

The createAnswerRelevancyScorer() function accepts a single options object with the following properties:

Parameters
Direct link to Parameters

model:

LanguageModel

Configuration for the model used to evaluate relevancy.

uncertaintyWeight:

number

= 0.3

Weight given to 'unsure' verdicts in scoring (0-1).

scale:

number

= 1

Maximum score value.

This function returns an instance of the MastraScorer class. The .run() method accepts the same input as other scorers (see the MastraScorer reference), but the return value includes LLM-specific fields as documented below.

.run() Returns
Direct link to .run() Returns

runId:

string

The id of the run (optional).

score:

number

Relevancy score (0 to scale, default 0-1)

preprocessPrompt:

string

The prompt sent to the LLM for the preprocess step (optional).

preprocessStepResult:

object

Object with extracted statements: { statements: string[] }

analyzePrompt:

string

The prompt sent to the LLM for the analyze step (optional).

analyzeStepResult:

object

Object with results: { results: Array<{ result: 'yes' | 'unsure' | 'no', reason: string }> }

generateReasonPrompt:

string

The prompt sent to the LLM for the reason step (optional).

reason:

string

Explanation of the score.

Scoring Details
Direct link to Scoring Details

The scorer evaluates relevancy through query-answer alignment, considering completeness and detail level, but not factual correctness.

Scoring Process
Direct link to Scoring Process

Statement Preprocess:
- Breaks output into meaningful statements while preserving context.
Relevance Analysis:
- Each statement is evaluated as:
  - "yes": Full weight for direct matches
  - "unsure": Partial weight (default: 0.3) for approximate matches
  - "no": Zero weight for irrelevant content
Score Calculation:
- ((direct + uncertainty * partial) / total_statements) * scale

Score Interpretation
Direct link to Score Interpretation

A relevancy score between 0 and 1:

1.0: The response fully answers the query with relevant and focused information.
0.7–0.9: The response mostly answers the query but may include minor unrelated content.
0.4–0.6: The response partially answers the query, mixing relevant and unrelated information.
0.1–0.3: The response includes minimal relevant content and largely misses the intent of the query.
0.0: The response is entirely unrelated and does not answer the query.

Example
Direct link to Example

Evaluate agent responses for relevancy across different scenarios:

src/example-answer-relevancy.ts
import { runEvals } from "@mastra/core/evals";
import { createAnswerRelevancyScorer } from "@mastra/evals/scorers/prebuilt";
import { myAgent } from "./agent";

const scorer = createAnswerRelevancyScorer({ model: "openai/gpt-4o" });

const result = await runEvals({
  data: [
    {
      input: "What are the health benefits of regular exercise?",
    },
    {
      input: "What should a healthy breakfast include?",
    },
    {
      input: "What are the benefits of meditation?",
    },
  ],
  scorers: [scorer],
  target: myAgent,
  onItemComplete: ({ scorerResults }) => {
    console.log({
      score: scorerResults[scorer.id].score,
      reason: scorerResults[scorer.id].reason,
    });
  },
});

console.log(result.scores);

For more details on runEvals, see the runEvals reference.

To add this scorer to an agent, see the Scorers overview guide.

Faithfulness Scorer

ParametersDirect link to Parameters

model:

uncertaintyWeight:

scale:

.run() ReturnsDirect link to .run() Returns

runId:

score:

preprocessPrompt:

preprocessStepResult:

analyzePrompt:

analyzeStepResult:

generateReasonPrompt:

reason:

Scoring DetailsDirect link to Scoring Details

Scoring ProcessDirect link to Scoring Process

Score InterpretationDirect link to Score Interpretation

ExampleDirect link to Example

RelatedDirect link to Related

Parameters
Direct link to Parameters

.run() Returns
Direct link to .run() Returns

Scoring Details
Direct link to Scoring Details

Scoring Process
Direct link to Scoring Process

Score Interpretation
Direct link to Score Interpretation

Example
Direct link to Example

Related
Direct link to Related