Skip to Content
ReferenceEvalsContentSimilarity

ContentSimilarityMetric

New Scorer API

We just released a new evals API called Scorers, with a more ergonomic API and more metadata stored for error analysis, and more flexibility to evaluate data structures. It’s fairly simple to migrate, but we will continue to support the existing Evals API.

The ContentSimilarityMetric class measures the textual similarity between two strings, providing a score that indicates how closely they match. It supports configurable options for case sensitivity and whitespace handling.

Basic Usage

import { ContentSimilarityMetric } from "@mastra/evals/nlp"; const metric = new ContentSimilarityMetric({ ignoreCase: true, ignoreWhitespace: true, }); const result = await metric.measure("Hello, world!", "hello world"); console.log(result.score); // Similarity score from 0-1 console.log(result.info); // Detailed similarity metrics

Constructor Parameters

options?:

ContentSimilarityOptions
= { ignoreCase: true, ignoreWhitespace: true }
Configuration options for similarity comparison

ContentSimilarityOptions

ignoreCase?:

boolean
= true
Whether to ignore case differences when comparing strings

ignoreWhitespace?:

boolean
= true
Whether to normalize whitespace when comparing strings

measure() Parameters

input:

string
The reference text to compare against

output:

string
The text to evaluate for similarity

Returns

score:

number
Similarity score (0-1) where 1 indicates perfect similarity

info:

object
Detailed similarity metrics
number

similarity:

number
Raw similarity score between the two texts

Scoring Details

The metric evaluates textual similarity through character-level matching and configurable text normalization.

Scoring Process

  1. Normalizes text:

    • Case normalization (if ignoreCase: true)
    • Whitespace normalization (if ignoreWhitespace: true)
  2. Compares processed strings using string-similarity algorithm:

    • Analyzes character sequences
    • Aligns word boundaries
    • Considers relative positions
    • Accounts for length differences

Final score: similarity_value * scale

Score interpretation

(0 to scale, default 0-1)

  • 1.0: Perfect match - identical texts
  • 0.7-0.9: High similarity - mostly matching content
  • 0.4-0.6: Moderate similarity - partial matches
  • 0.1-0.3: Low similarity - few matching patterns
  • 0.0: No similarity - completely different texts

Example with Different Options

import { ContentSimilarityMetric } from "@mastra/evals/nlp"; // Case-sensitive comparison const caseSensitiveMetric = new ContentSimilarityMetric({ ignoreCase: false, ignoreWhitespace: true, }); const result1 = await caseSensitiveMetric.measure("Hello World", "hello world"); // Lower score due to case difference // Example output: // { // score: 0.75, // info: { similarity: 0.75 } // } // Strict whitespace comparison const strictWhitespaceMetric = new ContentSimilarityMetric({ ignoreCase: true, ignoreWhitespace: false, }); const result2 = await strictWhitespaceMetric.measure( "Hello World", "Hello World", ); // Lower score due to whitespace difference // Example output: // { // score: 0.85, // info: { similarity: 0.85 } // }