DocsReferenceEvalsKeywordCoverage

KeywordCoverageMetric

The KeywordCoverageMetric class evaluates how well an LLM’s output covers the important keywords from the input. It analyzes keyword presence and matches while ignoring common words and stop words.

Basic Usage

import { KeywordCoverageMetric } from "@mastra/evals/nlp";
 
const metric = new KeywordCoverageMetric();
 
const result = await metric.measure(
  "What are the key features of Python programming language?",
  "Python is a high-level programming language known for its simple syntax and extensive libraries."
);
 
console.log(result.score); // Coverage score from 0-1
console.log(result.info); // Object containing detailed metrics about keyword coverage

measure() Parameters

input:

string
The original text containing keywords to be matched

output:

string
The text to evaluate for keyword coverage

Returns

score:

number
Coverage score (0-1) representing the proportion of matched keywords

info:

object
Object containing detailed metrics about keyword coverage
number

matchedKeywords:

number
Number of keywords found in the output
number

totalKeywords:

number
Total number of keywords from the input

Scoring Details

The metric evaluates keyword coverage by matching keywords with the following features:

  • Common word and stop word filtering (e.g., “the”, “a”, “and”)
  • Case-insensitive matching
  • Word form variation handling
  • Special handling of technical terms and compound words

Scoring Process

  1. Processes keywords from input and output:

    • Filters out common words and stop words
    • Normalizes case and word forms
    • Handles special terms and compounds
  2. Calculates keyword coverage:

    • Matches keywords between texts
    • Counts successful matches
    • Computes coverage ratio

Final score: (matched_keywords / total_keywords) * scale

Score interpretation

(0 to scale, default 0-1)

  • 1.0: Perfect keyword coverage
  • 0.7-0.9: Good coverage with most keywords present
  • 0.4-0.6: Moderate coverage with some keywords missing
  • 0.1-0.3: Poor coverage with many keywords missing
  • 0.0: No keyword matches

Examples with Analysis

import { KeywordCoverageMetric } from "@mastra/evals/nlp";
 
const metric = new KeywordCoverageMetric();
 
// Perfect coverage example
const result1 = await metric.measure(
  "The quick brown fox jumps over the lazy dog",
  "A quick brown fox jumped over a lazy dog"
);
// {
//   score: 1.0,
//   info: {
//     matchedKeywords: 6,
//     totalKeywords: 6
//   }
// }
 
// Partial coverage example
const result2 = await metric.measure(
  "Python features include easy syntax, dynamic typing, and extensive libraries",
  "Python has simple syntax and many libraries"
);
// {
//   score: 0.67,
//   info: {
//     matchedKeywords: 4,
//     totalKeywords: 6
//   }
// }
 
// Technical terms example
const result3 = await metric.measure(
  "Discuss React.js component lifecycle and state management",
  "React components have lifecycle methods and manage state"
);
// {
//   score: 1.0,
//   info: {
//     matchedKeywords: 4,
//     totalKeywords: 4
//   }
// }

Special Cases

The metric handles several special cases:

  • Empty input/output: Returns score of 1.0 if both empty, 0.0 if only one is empty
  • Single word: Treated as a single keyword
  • Technical terms: Preserves compound technical terms (e.g., “React.js”, “machine learning”)
  • Case differences: “JavaScript” matches “javascript”
  • Common words: Ignored in scoring to focus on meaningful keywords