Skip to main content

Textual Difference Scorer

Use createTextualDifferenceScorer to evaluate the similarity between two text strings by analyzing sequence differences and edit operations.

Installation

npm install @mastra/evals

For complete API documentation and configuration options, see createTextualDifferenceScorer.

No differences example

In this example, the texts are exactly the same. The scorer identifies complete similarity with a perfect score and no detected changes.

import { createTextualDifferenceScorer } from "@mastra/evals/scorers/code";

const scorer = createTextualDifferenceScorer();

const input = "The quick brown fox jumps over the lazy dog";
const output = "The quick brown fox jumps over the lazy dog";

const result = await scorer.run({
input: [{ role: "user", content: input }],
output: { role: "assistant", text: output },
});

console.log("Score:", result.score);
console.log("AnalyzeStepResult:", result.analyzeStepResult);

No differences output

The scorer returns a high score, indicating the texts are identical. The detailed info confirms zero changes and no length difference.

{
score: 1,
analyzeStepResult: {
confidence: 1,
ratio: 1,
changes: 0,
lengthDiff: 0,
},
}

Minor differences example

In this example, the texts have small variations. The scorer detects these minor differences and returns a moderate similarity score.

import { createTextualDifferenceScorer } from "@mastra/evals/scorers/code";

const scorer = createTextualDifferenceScorer();

const input = "Hello world! How are you?";
const output = "Hello there! How is it going?";

const result = await scorer.run({
input: [{ role: "user", content: input }],
output: { role: "assistant", text: output },
});

console.log("Score:", result.score);
console.log("AnalyzeStepResult:", result.analyzeStepResult);

Minor differences output

The scorer returns a moderate score reflecting the small variations between the texts. The detailed info includes the number of changes and length difference observed.

{
score: 0.5925925925925926,
analyzeStepResult: {
confidence: 0.8620689655172413,
ratio: 0.5925925925925926,
changes: 5,
lengthDiff: 0.13793103448275862
}
}

Major differences example

In this example, the texts differ significantly. The scorer detects extensive changes and returns a low similarity score.

import { createTextualDifferenceScorer } from "@mastra/evals/scorers/code";

const scorer = createTextualDifferenceScorer();

const input = "Python is a high-level programming language";
const output = "JavaScript is used for web development";

const result = await scorer.run({
input: [{ role: "user", content: input }],
output: { role: "assistant", text: output },
});

console.log("Score:", result.score);
console.log("AnalyzeStepResult:", result.analyzeStepResult);

Major differences output

The scorer returns a low score due to significant differences between the texts. The detailed analyzeStepResult shows numerous changes and a notable length difference.

{
score: 0.3170731707317073,
analyzeStepResult: {
confidence: 0.8636363636363636,
ratio: 0.3170731707317073,
changes: 8,
lengthDiff: 0.13636363636363635
}
}

Scorer configuration

You can create a TextualDifferenceScorer instance with default settings. No additional configuration is required.

const scorer = createTextualDifferenceScorer();

See TextualDifferenceScorer for a full list of configuration options.

Understanding the results

.run() returns a result in the following shape:

{
runId: string,
analyzeStepResult: {
confidence: number,
ratio: number,
changes: number,
lengthDiff: number
},
score: number
}

score

A textual difference score between 0 and 1:

  • 1.0: Identical texts – no differences detected.
  • 0.7–0.9: Minor differences – few changes needed.
  • 0.4–0.6: Moderate differences – noticeable changes required.
  • 0.1–0.3: Major differences – extensive changes needed.
  • 0.0: Completely different texts.

runId

The unique identifier for this scorer run.

analyzeStepResult

Object with difference metrics:

  • confidence: Confidence score based on length difference (higher is better).
  • ratio: Similarity ratio between the texts (0-1).
  • changes: Number of edit operations required to match the texts.
  • lengthDiff: Normalized difference in text lengths.
View source on GitHub