Completeness Scorer

Use createCompletenessScorer to evaluate whether the response thoroughly addresses all aspects and requirements of the input query.

Installation


npm install @mastra/evals

For complete API documentation and configuration options, see createCompletenessScorer.

High completeness example

In this example, the response comprehensively addresses all aspects of the query with detailed information covering multiple dimensions.

src/example-high-completeness.ts


import { openai } from "@ai-sdk/openai";
import { createCompletenessScorer } from "@mastra/evals/scorers/llm";
 
const scorer = createCompletenessScorer({ model: openai("gpt-4o-mini") });
 
const query = "Explain the process of photosynthesis, including the inputs, outputs, and stages involved.";
const response =
  "Photosynthesis is the process by which plants convert sunlight into chemical energy. Inputs: Carbon dioxide (CO2) from the air enters through stomata, water (H2O) is absorbed by roots, and sunlight provides energy captured by chlorophyll. The process occurs in two main stages: 1) Light-dependent reactions in the thylakoids convert light energy to ATP and NADPH while splitting water and releasing oxygen. 2) Light-independent reactions (Calvin cycle) in the stroma use ATP, NADPH, and CO2 to produce glucose. Outputs: Glucose (C6H12O6) serves as food for the plant, and oxygen (O2) is released as a byproduct. The overall equation is: 6CO2 + 6H2O + light energy → C6H12O6 + 6O2.";
 
const result = await scorer.run({
  input: [{ role: 'user', content: query }],
  output: { text: response },
});
 
console.log(result);

High completeness output

The output receives a high score because it addresses all requested aspects: inputs, outputs, stages, and provides additional context.


{
  score: 1,
  reason: "The score is 1 because the response comprehensively addresses all aspects of the query: it explains what photosynthesis is, lists all inputs (CO2, H2O, sunlight), describes both stages in detail (light-dependent and light-independent reactions), specifies all outputs (glucose and oxygen), and even provides the chemical equation. No significant aspects are missing."
}

Partial completeness example

In this example, the response addresses some key points but misses important aspects or lacks sufficient detail.

src/example-partial-completeness.ts


import { openai } from "@ai-sdk/openai";
import { createCompletenessScorer } from "@mastra/evals/scorers/llm";
 
const scorer = createCompletenessScorer({ model: openai("gpt-4o-mini") });
 
const query = "What are the benefits and drawbacks of remote work for both employees and employers?";
const response =
  "Remote work offers several benefits for employees including flexible schedules, no commuting time, and better work-life balance. It also reduces costs for office space and utilities for employers. However, remote work can lead to isolation and communication challenges for employees.";
 
const result = await scorer.run({
  input: [{ role: 'user', content: query }],
  output: { text: response },
});
 
console.log(result);

Partial completeness output

The output receives a moderate score because it covers employee benefits and some drawbacks, but lacks comprehensive coverage of employer drawbacks.


{
  score: 0.6,
  reason: "The score is 0.6 because the response covers employee benefits (flexibility, no commuting, work-life balance) and one employer benefit (reduced costs), as well as some employee drawbacks (isolation, communication challenges). However, it fails to address potential drawbacks for employers such as reduced oversight, team cohesion challenges, or productivity monitoring difficulties."
}

Low completeness example

In this example, the response only partially addresses the query and misses several important aspects.

src/example-low-completeness.ts


import { openai } from "@ai-sdk/openai";
import { createCompletenessScorer } from "@mastra/evals/scorers/llm";
 
const scorer = createCompletenessScorer({ model: openai("gpt-4o-mini") });
 
const query = "Compare renewable and non-renewable energy sources in terms of cost, environmental impact, and sustainability.";
const response =
  "Renewable energy sources like solar and wind are becoming cheaper. They're better for the environment than fossil fuels.";
 
const result = await scorer.run({
  input: [{ role: 'user', content: query }],
  output: { text: response },
});
 
console.log(result);

Low completeness output

The output receives a low score because it only briefly mentions cost and environmental impact while completely missing sustainability and lacking detailed comparison.


{
  score: 0.2,
  reason: "The score is 0.2 because the response only superficially touches on cost (renewable getting cheaper) and environmental impact (renewable better than fossil fuels) but provides no detailed comparison, fails to address sustainability aspects, doesn't discuss specific non-renewable sources, and lacks depth in all mentioned areas."
}

Scorer configuration

You can adjust how the CompletenessScorer scores responses by configuring optional parameters. For example, scale sets the maximum possible score returned by the scorer.


const scorer = createCompletenessScorer({ model: openai("gpt-4o-mini"), options: {
  scale: 1
});

See CompletenessScorer for a full list of configuration options.

Understanding the results

.run() returns a result in the following shape:


{
  runId: string,
  extractStepResult: {
    inputElements: string[],
    outputElements: string[],
    missingElements: string[],
    elementCounts: { input: number, output: number }
  },
  score: number
}

score

A completeness score between 0 and 1:

1.0: Thoroughly addresses all aspects of the query with comprehensive detail.
0.7–0.9: Covers most important aspects with good detail, minor gaps.
0.4–0.6: Addresses some key points but missing important aspects or lacking detail.
0.1–0.3: Only partially addresses the query with significant gaps.
0.0: Fails to address the query or provides irrelevant information.

runId

The unique identifier for this scorer run.

extractStepResult

Object with extracted elements and coverage details:

inputElements: Key elements found in the input (e.g., nouns, verbs, topics, terms).
outputElements: Key elements found in the output.
missingElements: Input elements not found in the output.
elementCounts: The number of elements in the input and output.

View Example on GitHub