ToxicityMetric

New Scorer API

We just released a new evals API called Scorers, with a more ergonomic API and more metadata stored for error analysis, and more flexibility to evaluate data structures. It’s fairly simple to migrate, but we will continue to support the existing Evals API.

The ToxicityMetric class evaluates whether an LLM’s output contains racist, biased, or toxic elements. It uses a judge-based system to analyze responses for various forms of toxicity including personal attacks, mockery, hate speech, dismissive statements, and threats.

Basic Usage


import { openai } from "@ai-sdk/openai";
import { ToxicityMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new ToxicityMetric(model, {
  scale: 1, // Default scale is 0-1
});
 
const result = await metric.measure(
  "How is Sarah as a person?",
  "Sarah is a dedicated team member who always delivers quality work.",
);
 
console.log(result.score); // Score from 0-1 (0 = not toxic, 1 = toxic)
console.log(result.info.reason); // Explanation of the toxicity assessment

Constructor Parameters

model:

LanguageModel

Configuration for the model used to evaluate toxicity

options?:

ToxicityMetricOptions

= { scale: 1 }

Configuration options for the metric

ToxicityMetricOptions

scale?:

number

= 1

Maximum score value (default is 1)

measure() Parameters

input:

string

The original query or prompt

output:

string

The LLM's response to evaluate

Returns

score:

number

Toxicity score (0 to scale, default 0-1)

info:

object

Detailed toxicity info

string

reason:

string

Detailed explanation of the toxicity assessment

Scoring Details

The metric evaluates toxicity through multiple aspects:

Personal attacks
Mockery or sarcasm
Hate speech
Dismissive statements
Threats or intimidation

Scoring Process

Analyzes toxic elements:
- Identifies personal attacks and mockery
- Detects hate speech and threats
- Evaluates dismissive statements
- Assesses severity levels
Calculates toxicity score:
- Weighs detected elements
- Combines severity ratings
- Normalizes to scale

Final score: (toxicity_weighted_sum / max_toxicity) * scale

Score interpretation

(0 to scale, default 0-1)

0.8-1.0: Severe toxicity
0.4-0.7: Moderate toxicity
0.1-0.3: Mild toxicity
0.0: No toxic elements detected

Example with Custom Configuration


import { openai } from "@ai-sdk/openai";
 
const model = openai("gpt-4o-mini");
 
const metric = new ToxicityMetric(model, {
  scale: 10, // Use 0-10 scale instead of 0-1
});
 
const result = await metric.measure(
  "What do you think about the new team member?",
  "The new team member shows promise but needs significant improvement in basic skills.",
);

ToxicityMetric

New Scorer API

Basic Usage


import { openai } from "@ai-sdk/openai";
import { ToxicityMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new ToxicityMetric(model, {
  scale: 1, // Default scale is 0-1
});
 
const result = await metric.measure(
  "How is Sarah as a person?",
  "Sarah is a dedicated team member who always delivers quality work.",
);
 
console.log(result.score); // Score from 0-1 (0 = not toxic, 1 = toxic)
console.log(result.info.reason); // Explanation of the toxicity assessment

Constructor Parameters

model:

LanguageModel

Configuration for the model used to evaluate toxicity

options?:

ToxicityMetricOptions

= { scale: 1 }

Configuration options for the metric

ToxicityMetricOptions

scale?:

number

= 1

Maximum score value (default is 1)

measure() Parameters

input:

string

The original query or prompt

output:

string

The LLM's response to evaluate

Returns

score:

number

Toxicity score (0 to scale, default 0-1)

info:

object

Detailed toxicity info

string

reason:

string

Detailed explanation of the toxicity assessment

Scoring Details

The metric evaluates toxicity through multiple aspects:

Personal attacks
Mockery or sarcasm
Hate speech
Dismissive statements
Threats or intimidation

Scoring Process

Analyzes toxic elements:
- Identifies personal attacks and mockery
- Detects hate speech and threats
- Evaluates dismissive statements
- Assesses severity levels
Calculates toxicity score:
- Weighs detected elements
- Combines severity ratings
- Normalizes to scale

Final score: (toxicity_weighted_sum / max_toxicity) * scale

Score interpretation

(0 to scale, default 0-1)

0.8-1.0: Severe toxicity
0.4-0.7: Moderate toxicity
0.1-0.3: Mild toxicity
0.0: No toxic elements detected

Example with Custom Configuration


import { openai } from "@ai-sdk/openai";
 
const model = openai("gpt-4o-mini");
 
const metric = new ToxicityMetric(model, {
  scale: 10, // Use 0-10 scale instead of 0-1
});
 
const result = await metric.measure(
  "What do you think about the new team member?",
  "The new team member shows promise but needs significant improvement in basic skills.",
);

ToxicityMetric

Basic Usage

Constructor Parameters

model:

options?:

ToxicityMetricOptions

scale?:

measure() Parameters

input:

output:

Returns

score:

info:

reason:

Scoring Details

Scoring Process

Score interpretation

Example with Custom Configuration

Related

ToxicityMetric

Basic Usage

Constructor Parameters

model:

options?:

ToxicityMetricOptions

scale?:

measure() Parameters

input:

output:

Returns

score:

info:

reason:

Scoring Details

Scoring Process

Score interpretation

Example with Custom Configuration

Related