Skip to Content
ExamplesEvalsCompleteness

Completeness Evaluation

New Scorer API

We just released a new evals API called Scorers, with a more ergonomic API and more metadata stored for error analysis, and more flexibility to evaluate data structures. It’s fairly simple to migrate, but we will continue to support the existing Evals API.

Use CompletenessMetric to evaluate whether the response includes all key elements from the input. The metric accepts a query and a response, and returns a score and an info object with detailed element level comparisons.

Installation

npm install @mastra/evals

Complete coverage example

In this example, the response contains every element from the input. The content matches exactly, resulting in full coverage.

src/example-complete-coverage.ts
import { CompletenessMetric } from "@mastra/evals/nlp"; const metric = new CompletenessMetric(); const query = "The primary colors are red, blue, and yellow."; const response = "The primary colors are red, blue, and yellow."; const result = await metric.measure(query, response); console.log(result);

Complete coverage output

The output receives a score of 1 because all input elements are present in the response with no missing content.

{ score: 1, info: { inputElements: [ 'the', 'primary', 'colors', 'be', 'red', 'blue', 'and', 'yellow' ], outputElements: [ 'the', 'primary', 'colors', 'be', 'red', 'blue', 'and', 'yellow' ], missingElements: [], elementCounts: { input: 8, output: 8 } } }

Partial coverage example

In this example, the response includes all of the input elements, but also adds extra content that wasn’t in the original query.

src/example-partial-coverage.ts
import { CompletenessMetric } from "@mastra/evals/nlp"; const metric = new CompletenessMetric(); const query = "The primary colors are red and blue."; const response = "The primary colors are red, blue, and yellow."; const result = await metric.measure(query, response); console.log(result);

Partial coverage output

The output receives a high score because no input elements are missing. However, the response includes additional content that goes beyond the input.

{ score: 1, info: { inputElements: [ 'the', 'primary', 'colors', 'be', 'red', 'and', 'blue' ], outputElements: [ 'the', 'primary', 'colors', 'be', 'red', 'blue', 'and', 'yellow' ], missingElements: [], elementCounts: { input: 7, output: 8 } } }

Minimal coverage example

In this example, the response contains only some of the elements from the input. Key terms are missing or altered, resulting in reduced coverage.

src/example-minimal-coverage.ts
import { CompletenessMetric } from "@mastra/evals/nlp"; const metric = new CompletenessMetric(); const query = "The seasons include summer."; const response = "The four seasons are spring, summer, fall, and winter."; const result = await metric.measure(query, response); console.log(result);

Minimal coverage output

The output receives a lower score because one or more elements from the input are missing. The response overlaps in part, but does not fully reflect the original content.

{ score: 0.75, info: { inputElements: [ 'the', 'seasons', 'summer', 'include' ], outputElements: [ 'the', 'four', 'seasons', 'spring', 'summer', 'winter', 'be', 'fall', 'and' ], missingElements: [ 'include' ], elementCounts: { input: 4, output: 9 } } }

Metric configuration

You can create a CompletenessMetric instance with default settings. No additional configuration is required.

const metric = new CompletenessMetric();

See CompletenessMetric for a full list of configuration options.

Understanding the results

CompletenessMetric returns a result in the following shape:

{ score: number, info: { inputElements: string[], outputElements: string[], missingElements: string[], elementCounts: { input: number, output: number } } }

Completeness score

A completeness score between 0 and 1:

  • 1.0: All input elements are present in the response.
  • 0.7–0.9: Most key elements are included, with minimal omissions.
  • 0.4–0.6: Some input elements are covered, but important ones are missing.
  • 0.1–0.3: Few input elements are matched; most are missing.
  • 0.0: No input elements are present in the response.

Completeness info

An explanation for the score, with details including:

  • Input elements extracted from the query.
  • Output elements matched in the response.
  • Any input elements missing from the response.
  • Comparison of element counts between input and output.
View Example on GitHub