CompletenessMetric
We just released a new evals API called Scorers, with a more ergonomic API and more metadata stored for error analysis, and more flexibility to evaluate data structures. It’s fairly simple to migrate, but we will continue to support the existing Evals API.
The CompletenessMetric
class evaluates how thoroughly an LLM’s output covers the key elements present in the input. It analyzes nouns, verbs, topics, and terms to determine coverage and provides a detailed completeness score.
Basic Usage
import { CompletenessMetric } from "@mastra/evals/nlp";
const metric = new CompletenessMetric();
const result = await metric.measure(
"Explain how photosynthesis works in plants using sunlight, water, and carbon dioxide.",
"Plants use sunlight to convert water and carbon dioxide into glucose through photosynthesis.",
);
console.log(result.score); // Coverage score from 0-1
console.log(result.info); // Object containing detailed metrics about element coverage
measure() Parameters
input:
output:
Returns
score:
info:
inputElements:
outputElements:
missingElements:
elementCounts:
Element Extraction Details
The metric extracts and analyzes several types of elements:
- Nouns: Key objects, concepts, and entities
- Verbs: Actions and states (converted to infinitive form)
- Topics: Main subjects and themes
- Terms: Individual significant words
The extraction process includes:
- Normalization of text (removing diacritics, converting to lowercase)
- Splitting camelCase words
- Handling of word boundaries
- Special handling of short words (3 characters or less)
- Deduplication of elements
Scoring Details
The metric evaluates completeness through linguistic element coverage analysis.
Scoring Process
-
Extracts key elements:
- Nouns and named entities
- Action verbs
- Topic-specific terms
- Normalized word forms
-
Calculates coverage of input elements:
- Exact matches for short terms (≤3 chars)
- Substantial overlap (>60%) for longer terms
Final score: (covered_elements / total_input_elements) * scale
Score interpretation
(0 to scale, default 0-1)
- 1.0: Complete coverage - contains all input elements
- 0.7-0.9: High coverage - includes most key elements
- 0.4-0.6: Partial coverage - contains some key elements
- 0.1-0.3: Low coverage - missing most key elements
- 0.0: No coverage - output lacks all input elements
Example with Analysis
import { CompletenessMetric } from "@mastra/evals/nlp";
const metric = new CompletenessMetric();
const result = await metric.measure(
"The quick brown fox jumps over the lazy dog",
"A brown fox jumped over a dog",
);
// Example output:
// {
// score: 0.75,
// info: {
// inputElements: ["quick", "brown", "fox", "jump", "lazy", "dog"],
// outputElements: ["brown", "fox", "jump", "dog"],
// missingElements: ["quick", "lazy"],
// elementCounts: { input: 6, output: 4 }
// }
// }