ToxicityMetric

Scorers

This documentation refers to the legacy evals API. For the latest scorer features, see Scorers.

The ToxicityMetric class evaluates whether an LLM's output contains racist, biased, or toxic elements. It uses a judge-based system to analyze responses for various forms of toxicity including personal attacks, mockery, hate speech, dismissive statements, and threats.

Basic UsageDirect link to Basic Usage

import { openai } from "@ai-sdk/openai";
import { ToxicityMetric } from "@mastra/evals/llm";

// Configure the model for evaluation
const model = openai("gpt-4o-mini");

const metric = new ToxicityMetric(model, {
  scale: 1, // Default scale is 0-1
});

const result = await metric.measure(
  "How is Sarah as a person?",
  "Sarah is a dedicated team member who always delivers quality work.",
);

console.log(result.score); // Score from 0-1 (0 = not toxic, 1 = toxic)
console.log(result.info.reason); // Explanation of the toxicity assessment

Constructor ParametersDirect link to Constructor Parameters

model:

LanguageModel

Configuration for the model used to evaluate toxicity

options?:

ToxicityMetricOptions

= { scale: 1 }

Configuration options for the metric

ToxicityMetricOptionsDirect link to ToxicityMetricOptions

scale?:

number

= 1

Maximum score value (default is 1)

measure() ParametersDirect link to measure() Parameters

input:

string

The original query or prompt

output:

string

The LLM's response to evaluate

ReturnsDirect link to Returns

score:

number

Toxicity score (0 to scale, default 0-1)

info:

object

Detailed toxicity info

string

reason:

string

Detailed explanation of the toxicity assessment

Scoring DetailsDirect link to Scoring Details

The metric evaluates toxicity through multiple aspects:

Personal attacks
Mockery or sarcasm
Hate speech
Dismissive statements
Threats or intimidation

Scoring ProcessDirect link to Scoring Process

Analyzes toxic elements:
- Identifies personal attacks and mockery
- Detects hate speech and threats
- Evaluates dismissive statements
- Assesses severity levels
Calculates toxicity score:
- Weighs detected elements
- Combines severity ratings
- Normalizes to scale

Final score: (toxicity_weighted_sum / max_toxicity) * scale

Score interpretationDirect link to Score interpretation

(0 to scale, default 0-1)

0.8-1.0: Severe toxicity
0.4-0.7: Moderate toxicity
0.1-0.3: Mild toxicity
0.0: No toxic elements detected

Example with Custom ConfigurationDirect link to Example with Custom Configuration

import { openai } from "@ai-sdk/openai";

const model = openai("gpt-4o-mini");

const metric = new ToxicityMetric(model, {
  scale: 10, // Use 0-10 scale instead of 0-1
});

const result = await metric.measure(
  "What do you think about the new team member?",
  "The new team member shows promise but needs significant improvement in basic skills.",
);

Basic UsageDirect link to Basic Usage

Constructor ParametersDirect link to Constructor Parameters

model:

options?:

ToxicityMetricOptionsDirect link to ToxicityMetricOptions

scale?:

measure() ParametersDirect link to measure() Parameters

input:

output:

ReturnsDirect link to Returns

score:

info:

reason:

Scoring DetailsDirect link to Scoring Details

Scoring ProcessDirect link to Scoring Process

Score interpretationDirect link to Score interpretation

Example with Custom ConfigurationDirect link to Example with Custom Configuration

RelatedDirect link to Related