Skip to main content
Mastra v1 is coming in January 2026. Get ahead by starting new projects with the beta or upgrade your existing project today.

Answer Similarity Scorer

The createAnswerSimilarityScorer() function creates a scorer that evaluates how similar an agent's output is to a ground truth answer. This scorer is specifically designed for CI/CD testing scenarios where you have expected answers and want to ensure consistency over time.

ParametersDirect link to Parameters

model:

LanguageModel
The language model used to evaluate semantic similarity between outputs and ground truth.

options:

AnswerSimilarityOptions
Configuration options for the scorer.

AnswerSimilarityOptionsDirect link to AnswerSimilarityOptions

requireGroundTruth:

boolean
= true
Whether to require ground truth for evaluation. If false, missing ground truth returns score 0.

semanticThreshold:

number
= 0.8
Weight for semantic matches vs exact matches (0-1).

exactMatchBonus:

number
= 0.2
Additional score bonus for exact matches (0-1).

missingPenalty:

number
= 0.15
Penalty per missing key concept from ground truth.

contradictionPenalty:

number
= 1.0
Penalty for contradictory information. High value ensures wrong answers score near 0.

extraInfoPenalty:

number
= 0.05
Mild penalty for extra information not present in ground truth (capped at 0.2).

scale:

number
= 1
Score scaling factor.

This function returns an instance of the MastraScorer class. The .run() method accepts the same input as other scorers (see the MastraScorer reference), but requires ground truth to be provided in the run object.

.run() ReturnsDirect link to .run() Returns

runId:

string
The id of the run (optional).

score:

number
Similarity score between 0-1 (or 0-scale if custom scale used). Higher scores indicate better similarity to ground truth.

reason:

string
Human-readable explanation of the score with actionable feedback.

preprocessStepResult:

object
Extracted semantic units from output and ground truth.

analyzeStepResult:

object
Detailed analysis of matches, contradictions, and extra information.

preprocessPrompt:

string
The prompt used for semantic unit extraction.

analyzePrompt:

string
The prompt used for similarity analysis.

generateReasonPrompt:

string
The prompt used for generating the explanation.

Scoring DetailsDirect link to Scoring Details

The scorer uses a multi-step process:

  1. Extract: Breaks down output and ground truth into semantic units
  2. Analyze: Compares units and identifies matches, contradictions, and gaps
  3. Score: Calculates weighted similarity with penalties for contradictions
  4. Reason: Generates human-readable explanation

Score calculation: max(0, base_score - contradiction_penalty - missing_penalty - extra_info_penalty) × scale

ExampleDirect link to Example

Evaluate agent responses for similarity to ground truth across different scenarios:

src/example-answer-similarity.ts
import { runExperiment } from "@mastra/core/scores";
import { createAnswerSimilarityScorer } from "@mastra/evals/scorers/llm";
import { myAgent } from "./agent";

const scorer = createAnswerSimilarityScorer({ model: "openai/gpt-4o" });

const result = await runExperiment({
data: [
{
input: "What is 2+2?",
groundTruth: "4",
},
{
input: "What is the capital of France?",
groundTruth: "The capital of France is Paris",
},
{
input: "What are the primary colors?",
groundTruth: "The primary colors are red, blue, and yellow",
},
],
scorers: [scorer],
target: myAgent,
onItemComplete: ({ scorerResults }) => {
console.log({
score: scorerResults[scorer.name].score,
reason: scorerResults[scorer.name].reason,
});
},
});

console.log(result.scores);

For more details on runExperiment, see the runExperiment reference.

To add this scorer to an agent, see the Scorers overview guide.

On this page