Tone Consistency Evaluation

New Scorer API

We just released a new evals API called Scorers, with a more ergonomic API and more metadata stored for error analysis, and more flexibility to evaluate data structures. It’s fairly simple to migrate, but we will continue to support the existing Evals API.

Use ToneConsistencyMetric to evaluate emotional tone patterns and sentiment consistency in text. The metric accepts a query and a response, and returns a score and an info object containing sentiment scores and their difference.

Installation


npm install @mastra/evals

Positive tone example

In this example, the texts exhibit a similar positive sentiment. The metric measures the consistency between the tones, resulting in a high score.

src/example-positive-tone.ts


import { ToneConsistencyMetric } from "@mastra/evals/nlp";
 
const metric = new ToneConsistencyMetric();
 
const query = "This product is fantastic and amazing!";
const response = "The product is excellent and wonderful!";
 
const result = await metric.measure(query, response);
 
console.log(result);

Positive tone output

The metric returns a high score reflecting strong sentiment alignment. The info field provides sentiment values and the difference between them.


{
  score: 0.8333333333333335,
  info: {
    responseSentiment: 1.3333333333333333,
    referenceSentiment: 1.1666666666666667,
    difference: 0.16666666666666652
  }
}

Stable tone example

In this example, the text’s internal tone consistency is analyzed by passing an empty response. This signals the metric to evaluate sentiment stability within the single input text, resulting in a score reflecting how uniform the tone is throughout.

src/example-stable-tone.ts


import { ToneConsistencyMetric } from "@mastra/evals/nlp";
 
const metric = new ToneConsistencyMetric();
 
const query = "Great service! Friendly staff. Perfect atmosphere.";
const response = "";
 
const result = await metric.measure(query, response);
 
console.log(result);

Stable tone output

The metric returns a high score indicating consistent sentiment throughout the input text. The info field includes the average sentiment and sentiment variance, reflecting tone stability.


{
  score: 0.9444444444444444,
  info: {
    avgSentiment: 1.3333333333333333,
    sentimentVariance: 0.05555555555555556
  }
}

Mixed tone example

In this example, the input and response have different emotional tones. The metric picks up on these variations and gives a lower consistency score.

src/example-mixed-tone.ts


import { ToneConsistencyMetric } from "@mastra/evals/nlp";
 
const metric = new ToneConsistencyMetric();
 
const query = "The interface is frustrating and confusing, though it has potential.";
const response = "The design shows promise but needs significant improvements to be usable.";
 
const result = await metric.measure(query, response);
 
console.log(result);

Mixed tone output

The metric returns a low score due to the noticeable differences in emotional tone. The info field highlights the sentiment values and the degree of variation between them.


{
  score: 0.4181818181818182,
  info: {
    responseSentiment: -0.4,
    referenceSentiment: 0.18181818181818182,
    difference: 0.5818181818181818
  }
}

Metric configuration

You can create a ToneConsistencyMetric instance with default settings. No additional configuration is required.


const metric = new ToneConsistencyMetric();

See ToneConsistencyMetric for a full list of configuration options.

Understanding the results

ToneConsistencyMetric returns a result in the following shape:


{
  score: number,
  info: {
    responseSentiment?: number,
    referenceSentiment?: number,
    difference?: number,
    avgSentiment?: number,
    sentimentVariance?: number
  }
}

Tone consistency score

A tone consistency score between 0 and 1:

0.8–1.0: Very consistent tone.
0.6–0.7: Generally consistent tone.
0.4–0.5: Mixed tone.
0.0–0.3: Conflicting tone.

Tone consistency info

An explanation for the score, with details including:

Sentiment alignment between input and response.
Tone stability within a single text.
Degree of sentiment difference or variance.

View Example on GitHub