Completeness Scorer
The createCompletenessScorer() function evaluates how thoroughly an LLM's output covers the key elements present in the input. It analyzes nouns, verbs, topics, and terms to determine coverage and provides a detailed completeness score.
Parameters
The createCompletenessScorer() function does not take any options.
This function returns an instance of the MastraScorer class. See the MastraScorer reference for details on the .run() method and its input/output.
.run() Returns
runId:
preprocessStepResult:
score:
The .run() method returns a result in the following shape:
{
runId: string,
extractStepResult: {
inputElements: string[],
outputElements: string[],
missingElements: string[],
elementCounts: { input: number, output: number }
},
score: number
}
Element Extraction Details
The scorer extracts and analyzes several types of elements:
- Nouns: Key objects, concepts, and entities
- Verbs: Actions and states (converted to infinitive form)
- Topics: Main subjects and themes
- Terms: Individual significant words
The extraction process includes:
- Normalization of text (removing diacritics, converting to lowercase)
- Splitting camelCase words
- Handling of word boundaries
- Special handling of short words (3 characters or less)
- Deduplication of elements
extractStepResult
From the .run() method, you can get the extractStepResult object with the following properties:
- inputElements: Key elements found in the input (e.g., nouns, verbs, topics, terms).
- outputElements: Key elements found in the output.
- missingElements: Input elements not found in the output.
- elementCounts: The number of elements in the input and output.
Scoring Details
The scorer evaluates completeness through linguistic element coverage analysis.
Scoring Process
- Extracts key elements:
- Nouns and named entities
- Action verbs
- Topic-specific terms
- Normalized word forms
- Calculates coverage of input elements:
- Exact matches for short terms (≤3 chars)
- Substantial overlap (>60%) for longer terms
Final score: (covered_elements / total_input_elements) * scale
Score interpretation
A completeness score between 0 and 1:
- 1.0: Thoroughly addresses all aspects of the query with comprehensive detail.
- 0.7–0.9: Covers most important aspects with good detail, minor gaps.
- 0.4–0.6: Addresses some key points but missing important aspects or lacking detail.
- 0.1–0.3: Only partially addresses the query with significant gaps.
- 0.0: Fails to address the query or provides irrelevant information.
Example
Evaluate agent responses for completeness across different query complexities:
import { runEvals } from "@mastra/core/evals";
import { createCompletenessScorer } from "@mastra/evals/scorers/prebuilt";
import { myAgent } from "./agent";
const scorer = createCompletenessScorer();
const result = await runEvals({
data: [
{
input:
"Explain the process of photosynthesis, including the inputs, outputs, and stages involved.",
},
{
input:
"What are the benefits and drawbacks of remote work for both employees and employers?",
},
{
input:
"Compare renewable and non-renewable energy sources in terms of cost, environmental impact, and sustainability.",
},
],
scorers: [scorer],
target: myAgent,
onItemComplete: ({ scorerResults }) => {
console.log({
score: scorerResults[scorer.id].score,
});
},
});
console.log(result.scores);
For more details on runEvals, see the runEvals reference.
To add this scorer to an agent, see the Scorers overview guide.