ContentSimilarityMetric
New Scorer API
We just released a new evals API called Scorers, with a more ergonomic API and more metadata stored for error analysis, and more flexibility to evaluate data structures. It’s fairly simple to migrate, but we will continue to support the existing Evals API.
The ContentSimilarityMetric
class measures the textual similarity between two strings, providing a score that indicates how closely they match. It supports configurable options for case sensitivity and whitespace handling.
Basic Usage
import { ContentSimilarityMetric } from "@mastra/evals/nlp";
const metric = new ContentSimilarityMetric({
ignoreCase: true,
ignoreWhitespace: true,
});
const result = await metric.measure("Hello, world!", "hello world");
console.log(result.score); // Similarity score from 0-1
console.log(result.info); // Detailed similarity metrics
Constructor Parameters
options?:
ContentSimilarityOptions
= { ignoreCase: true, ignoreWhitespace: true }
Configuration options for similarity comparison
ContentSimilarityOptions
ignoreCase?:
boolean
= true
Whether to ignore case differences when comparing strings
ignoreWhitespace?:
boolean
= true
Whether to normalize whitespace when comparing strings
measure() Parameters
input:
string
The reference text to compare against
output:
string
The text to evaluate for similarity
Returns
score:
number
Similarity score (0-1) where 1 indicates perfect similarity
info:
object
Detailed similarity metrics
number
similarity:
number
Raw similarity score between the two texts
Scoring Details
The metric evaluates textual similarity through character-level matching and configurable text normalization.
Scoring Process
-
Normalizes text:
- Case normalization (if ignoreCase: true)
- Whitespace normalization (if ignoreWhitespace: true)
-
Compares processed strings using string-similarity algorithm:
- Analyzes character sequences
- Aligns word boundaries
- Considers relative positions
- Accounts for length differences
Final score: similarity_value * scale
Score interpretation
(0 to scale, default 0-1)
- 1.0: Perfect match - identical texts
- 0.7-0.9: High similarity - mostly matching content
- 0.4-0.6: Moderate similarity - partial matches
- 0.1-0.3: Low similarity - few matching patterns
- 0.0: No similarity - completely different texts
Example with Different Options
import { ContentSimilarityMetric } from "@mastra/evals/nlp";
// Case-sensitive comparison
const caseSensitiveMetric = new ContentSimilarityMetric({
ignoreCase: false,
ignoreWhitespace: true,
});
const result1 = await caseSensitiveMetric.measure("Hello World", "hello world"); // Lower score due to case difference
// Example output:
// {
// score: 0.75,
// info: { similarity: 0.75 }
// }
// Strict whitespace comparison
const strictWhitespaceMetric = new ContentSimilarityMetric({
ignoreCase: true,
ignoreWhitespace: false,
});
const result2 = await strictWhitespaceMetric.measure(
"Hello World",
"Hello World",
); // Lower score due to whitespace difference
// Example output:
// {
// score: 0.85,
// info: { similarity: 0.85 }
// }