KeywordCoverageMetric
The KeywordCoverageMetric
class evaluates how well an LLM’s output covers the important keywords from the input. It analyzes keyword presence and matches while ignoring common words and stop words.
Basic Usage
import { KeywordCoverageMetric } from "@mastra/evals/nlp";
const metric = new KeywordCoverageMetric();
const result = await metric.measure(
"What are the key features of Python programming language?",
"Python is a high-level programming language known for its simple syntax and extensive libraries."
);
console.log(result.score); // Coverage score from 0-1
console.log(result.info); // Object containing detailed metrics about keyword coverage
measure() Parameters
input:
string
The original text containing keywords to be matched
output:
string
The text to evaluate for keyword coverage
Returns
score:
number
Coverage score (0-1) representing the proportion of matched keywords
info:
object
Object containing detailed metrics about keyword coverage
number
matchedKeywords:
number
Number of keywords found in the output
number
totalKeywords:
number
Total number of keywords from the input
Keyword Processing Details
The metric processes keywords with the following features:
- Ignores common words and stop words (e.g., “the”, “a”, “and”)
- Case-insensitive matching
- Handles variations in word forms
- Ignores numbers by default
- Special handling of technical terms and compound words
Score interpretation:
- 1.0: Perfect keyword coverage
- 0.7-0.9: Good coverage with most keywords present
- 0.4-0.6: Moderate coverage with some keywords missing
- 0.1-0.3: Poor coverage with many keywords missing
- 0.0: No keyword matches
Examples with Analysis
const metric = new KeywordCoverageMetric();
// Perfect coverage example
const result1 = await metric.measure(
"The quick brown fox jumps over the lazy dog",
"A quick brown fox jumped over a lazy dog"
);
// {
// score: 1.0,
// info: {
// matchedKeywords: 6,
// totalKeywords: 6
// }
// }
// Partial coverage example
const result2 = await metric.measure(
"Python features include easy syntax, dynamic typing, and extensive libraries",
"Python has simple syntax and many libraries"
);
// {
// score: 0.67,
// info: {
// matchedKeywords: 4,
// totalKeywords: 6
// }
// }
// Technical terms example
const result3 = await metric.measure(
"Discuss React.js component lifecycle and state management",
"React components have lifecycle methods and manage state"
);
// {
// score: 1.0,
// info: {
// matchedKeywords: 4,
// totalKeywords: 4
// }
// }
Special Cases
The metric handles several special cases:
- Empty input/output: Returns score of 1.0 if both empty, 0.0 if only one is empty
- Single word: Treated as a single keyword
- Technical terms: Preserves compound technical terms (e.g., “React.js”, “machine learning”)
- Case differences: “JavaScript” matches “javascript”
- Common words: Ignored in scoring to focus on meaningful keywords