Textual Difference Scorer

Use createTextualDifferenceScorer to evaluate the similarity between two text strings by analyzing sequence differences and edit operations.

Installation

npm install @mastra/evals

For complete API documentation and configuration options, see createTextualDifferenceScorer.

No differences example

In this example, the texts are exactly the same. The scorer identifies complete similarity with a perfect score and no detected changes.

import { createTextualDifferenceScorer } from "@mastra/evals/scorers/code";

const scorer = createTextualDifferenceScorer();

const input = "The quick brown fox jumps over the lazy dog";
const output = "The quick brown fox jumps over the lazy dog";

const result = await scorer.run({
  input: [{ role: "user", content: input }],
  output: { role: "assistant", text: output },
});

console.log("Score:", result.score);
console.log("AnalyzeStepResult:", result.analyzeStepResult);

No differences output

The scorer returns a high score, indicating the texts are identical. The detailed info confirms zero changes and no length difference.

{
  score: 1,
  analyzeStepResult: {
    confidence: 1,
    ratio: 1,
    changes: 0,
    lengthDiff: 0,
  },
}

Minor differences example

In this example, the texts have small variations. The scorer detects these minor differences and returns a moderate similarity score.

import { createTextualDifferenceScorer } from "@mastra/evals/scorers/code";

const scorer = createTextualDifferenceScorer();

const input = "Hello world! How are you?";
const output = "Hello there! How is it going?";

const result = await scorer.run({
  input: [{ role: "user", content: input }],
  output: { role: "assistant", text: output },
});

console.log("Score:", result.score);
console.log("AnalyzeStepResult:", result.analyzeStepResult);

Minor differences output

The scorer returns a moderate score reflecting the small variations between the texts. The detailed info includes the number of changes and length difference observed.

{
  score: 0.5925925925925926,
  analyzeStepResult: {
    confidence: 0.8620689655172413,
    ratio: 0.5925925925925926,
    changes: 5,
    lengthDiff: 0.13793103448275862
  }
}

Major differences example

In this example, the texts differ significantly. The scorer detects extensive changes and returns a low similarity score.

import { createTextualDifferenceScorer } from "@mastra/evals/scorers/code";

const scorer = createTextualDifferenceScorer();

const input = "Python is a high-level programming language";
const output = "JavaScript is used for web development";

const result = await scorer.run({
  input: [{ role: "user", content: input }],
  output: { role: "assistant", text: output },
});

console.log("Score:", result.score);
console.log("AnalyzeStepResult:", result.analyzeStepResult);

Major differences output

The scorer returns a low score due to significant differences between the texts. The detailed analyzeStepResult shows numerous changes and a notable length difference.

{
  score: 0.3170731707317073,
  analyzeStepResult: {
    confidence: 0.8636363636363636,
    ratio: 0.3170731707317073,
    changes: 8,
    lengthDiff: 0.13636363636363635
  }
}

Scorer configuration

You can create a TextualDifferenceScorer instance with default settings. No additional configuration is required.

const scorer = createTextualDifferenceScorer();

See TextualDifferenceScorer for a full list of configuration options.

Understanding the results

.run() returns a result in the following shape:

{
  runId: string,
  analyzeStepResult: {
    confidence: number,
    ratio: number,
    changes: number,
    lengthDiff: number
  },
  score: number
}

score

A textual difference score between 0 and 1:

1.0: Identical texts – no differences detected.
0.7–0.9: Minor differences – few changes needed.
0.4–0.6: Moderate differences – noticeable changes required.
0.1–0.3: Major differences – extensive changes needed.
0.0: Completely different texts.

runId

The unique identifier for this scorer run.

analyzeStepResult

Object with difference metrics:

confidence: Confidence score based on length difference (higher is better).
ratio: Similarity ratio between the texts (0-1).
changes: Number of edit operations required to match the texts.
lengthDiff: Normalized difference in text lengths.

View source on GitHub

Installation​

No differences example​

No differences output​

Minor differences example​

Minor differences output​

Major differences example​

Major differences output​

Scorer configuration​

Understanding the results​

score​

runId​

analyzeStepResult​