# Answer Similarity Scorer

The `createAnswerSimilarityScorer()` function creates a scorer that evaluates how similar an agent's output is to a ground truth answer. This scorer is specifically designed for CI/CD testing scenarios where you have expected answers and want to ensure consistency over time.

## Parameters

**model:** (`LanguageModel`): The language model used to evaluate semantic similarity between outputs and ground truth.

**options:** (`AnswerSimilarityOptions`): Configuration options for the scorer.

### AnswerSimilarityOptions

**requireGroundTruth:** (`boolean`): Whether to require ground truth for evaluation. If false, missing ground truth returns score 0. (Default: `true`)

**semanticThreshold:** (`number`): Weight for semantic matches vs exact matches (0-1). (Default: `0.8`)

**exactMatchBonus:** (`number`): Additional score bonus for exact matches (0-1). (Default: `0.2`)

**missingPenalty:** (`number`): Penalty per missing key concept from ground truth. (Default: `0.15`)

**contradictionPenalty:** (`number`): Penalty for contradictory information. High value ensures wrong answers score near 0. (Default: `1.0`)

**extraInfoPenalty:** (`number`): Mild penalty for extra information not present in ground truth (capped at 0.2). (Default: `0.05`)

**scale:** (`number`): Score scaling factor. (Default: `1`)

This function returns an instance of the MastraScorer class. The `.run()` method accepts the same input as other scorers (see the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer)), but **requires ground truth** to be provided in the run object.

## .run() Returns

**runId:** (`string`): The id of the run (optional).

**score:** (`number`): Similarity score between 0-1 (or 0-scale if custom scale used). Higher scores indicate better similarity to ground truth.

**reason:** (`string`): Human-readable explanation of the score with actionable feedback.

**preprocessStepResult:** (`object`): Extracted semantic units from output and ground truth.

**analyzeStepResult:** (`object`): Detailed analysis of matches, contradictions, and extra information.

**preprocessPrompt:** (`string`): The prompt used for semantic unit extraction.

**analyzePrompt:** (`string`): The prompt used for similarity analysis.

**generateReasonPrompt:** (`string`): The prompt used for generating the explanation.

## Scoring Details

The scorer uses a multi-step process:

1. **Extract**: Breaks down output and ground truth into semantic units
2. **Analyze**: Compares units and identifies matches, contradictions, and gaps
3. **Score**: Calculates weighted similarity with penalties for contradictions
4. **Reason**: Generates human-readable explanation

Score calculation: `max(0, base_score - contradiction_penalty - missing_penalty - extra_info_penalty) × scale`

## Example

Evaluate agent responses for similarity to ground truth across different scenarios:

```typescript
import { runEvals } from "@mastra/core/evals";
import { createAnswerSimilarityScorer } from "@mastra/evals/scorers/prebuilt";
import { myAgent } from "./agent";

const scorer = createAnswerSimilarityScorer({ model: "openai/gpt-4o" });

const result = await runEvals({
  data: [
    {
      input: "What is 2+2?",
      groundTruth: "4",
    },
    {
      input: "What is the capital of France?",
      groundTruth: "The capital of France is Paris",
    },
    {
      input: "What are the primary colors?",
      groundTruth: "The primary colors are red, blue, and yellow",
    },
  ],
  scorers: [scorer],
  target: myAgent,
  onItemComplete: ({ scorerResults }) => {
    console.log({
      score: scorerResults[scorer.id].score,
      reason: scorerResults[scorer.id].reason,
    });
  },
});

console.log(result.scores);
```

For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).

To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide.