Keyword Coverage Scorer

The createKeywordCoverageScorer() function evaluates how well an LLM’s output covers the important keywords from the input. It analyzes keyword presence and matches while ignoring common words and stop words.

Parameters

The createKeywordCoverageScorer() function does not take any options.

This function returns an instance of the MastraScorer class. See the MastraScorer reference for details on the .run() method and its input/output.

.run() Returns

runId:

string

The id of the run (optional).

preprocessStepResult:

object

Object with extracted keywords: { referenceKeywords: Set<string>, responseKeywords: Set<string> }

analyzeStepResult:

object

Object with keyword coverage: { totalKeywords: number, matchedKeywords: number }

score:

number

Coverage score (0-1) representing the proportion of matched keywords.

.run() returns a result in the following shape:


{
  runId: string,
  extractStepResult: {
    referenceKeywords: Set<string>,
    responseKeywords: Set<string>
  },
  analyzeStepResult: {
    totalKeywords: number,
    matchedKeywords: number
  },
  score: number
}

Scoring Details

The scorer evaluates keyword coverage by matching keywords with the following features:

Common word and stop word filtering (e.g., “the”, “a”, “and”)
Case-insensitive matching
Word form variation handling
Special handling of technical terms and compound words

Scoring Process

Processes keywords from input and output:
- Filters out common words and stop words
- Normalizes case and word forms
- Handles special terms and compounds
Calculates keyword coverage:
- Matches keywords between texts
- Counts successful matches
- Computes coverage ratio

Final score: (matched_keywords / total_keywords) * scale

Score interpretation

A coverage score between 0 and 1:

1.0: Complete coverage – all keywords present.
0.7–0.9: High coverage – most keywords included.
0.4–0.6: Partial coverage – some keywords present.
0.1–0.3: Low coverage – few keywords matched.
0.0: No coverage – no keywords found.

Special Cases

The scorer handles several special cases:

Empty input/output: Returns score of 1.0 if both empty, 0.0 if only one is empty
Single word: Treated as a single keyword
Technical terms: Preserves compound technical terms (e.g., “React.js”, “machine learning”)
Case differences: “JavaScript” matches “javascript”
Common words: Ignored in scoring to focus on meaningful keywords

Examples

Full coverage example

In this example, the response fully reflects the key terms from the input. All required keywords are present, resulting in complete coverage with no omissions.

src/example-full-keyword-coverage.ts


import { createKeywordCoverageScorer } from "@mastra/evals/scorers/code";
 
const scorer = createKeywordCoverageScorer();
 
const input = 'JavaScript frameworks like React and Vue';
const output = 'Popular JavaScript frameworks include React and Vue for web development';
 
const result = await scorer.run({
  input: [{ role: 'user', content: input }],
  output: { role: 'assistant', text: output },
});
 
console.log('Score:', result.score);
console.log('AnalyzeStepResult:', result.analyzeStepResult);

Full coverage output

A score of 1 indicates that all expected keywords were found in the response. The analyzeStepResult field confirms that the number of matched keywords equals the total number extracted from the input.


{
  score: 1,
  analyzeStepResult: {
    totalKeywords: 4,
    matchedKeywords: 4
  }
}

Partial coverage example

In this example, the response includes some, but not all, of the important keywords from the input. The score reflects partial coverage, with key terms either missing or only partially matched.

src/example-partial-keyword-coverage.ts


import { createKeywordCoverageScorer } from "@mastra/evals/scorers/code";
 
const scorer = createKeywordCoverageScorer();
 
const input = 'TypeScript offers interfaces, generics, and type inference';
const output = 'TypeScript provides type inference and some advanced features';
 
const result = await scorer.run({
  input: [{ role: 'user', content: input }],
  output: { role: 'assistant', text: output },
});
 
console.log('Score:', result.score);
console.log('AnalyzeStepResult:', result.analyzeStepResult);

Partial coverage output

A score of 0.5 indicates that only half of the expected keywords were found in the response. The analyzeStepResult field shows how many terms were matched compared to the total identified in the input.


{
  score: 0.5,
  analyzeStepResult: {
    totalKeywords: 6,
    matchedKeywords: 3
  }
}

Minimal coverage example

In this example, the response includes very few of the important keywords from the input. The score reflects minimal coverage, with most key terms missing or unaccounted for.

src/example-minimal-keyword-coverage.ts


import { createKeywordCoverageScorer } from "@mastra/evals/scorers/code";
 
const scorer = createKeywordCoverageScorer();
 
const input = 'Machine learning models require data preprocessing, feature engineering, and hyperparameter tuning';
const output = 'Data preparation is important for models';
 
const result = await scorer.run({
  input: [{ role: 'user', content: input }],
  output: { role: 'assistant', text: output },
});
 
console.log('Score:', result.score);
console.log('AnalyzeStepResult:', result.analyzeStepResult);

Minimal coverage output

A low score indicates that only a small number of the expected keywords were present in the response. The analyzeStepResult field highlights the gap between total and matched keywords, signaling insufficient coverage.


{
  score: 0.2,
  analyzeStepResult: {
    totalKeywords: 10,
    matchedKeywords: 2
  }
}

Metric configuration

You can create a KeywordCoverageMetric instance with default settings. No additional configuration is required.


const metric = new KeywordCoverageMetric();

See KeywordCoverageScorer for a full list of configuration options.

Keyword Coverage Scorer

Parameters

The createKeywordCoverageScorer() function does not take any options.

This function returns an instance of the MastraScorer class. See the MastraScorer reference for details on the .run() method and its input/output.

.run() Returns

runId:

string

The id of the run (optional).

preprocessStepResult:

object

Object with extracted keywords: { referenceKeywords: Set<string>, responseKeywords: Set<string> }

analyzeStepResult:

object

Object with keyword coverage: { totalKeywords: number, matchedKeywords: number }

score:

number

Coverage score (0-1) representing the proportion of matched keywords.

.run() returns a result in the following shape:


{
  runId: string,
  extractStepResult: {
    referenceKeywords: Set<string>,
    responseKeywords: Set<string>
  },
  analyzeStepResult: {
    totalKeywords: number,
    matchedKeywords: number
  },
  score: number
}

Scoring Details

The scorer evaluates keyword coverage by matching keywords with the following features:

Common word and stop word filtering (e.g., “the”, “a”, “and”)
Case-insensitive matching
Word form variation handling
Special handling of technical terms and compound words

Scoring Process

Processes keywords from input and output:
- Filters out common words and stop words
- Normalizes case and word forms
- Handles special terms and compounds
Calculates keyword coverage:
- Matches keywords between texts
- Counts successful matches
- Computes coverage ratio

Final score: (matched_keywords / total_keywords) * scale

Score interpretation

A coverage score between 0 and 1:

1.0: Complete coverage – all keywords present.
0.7–0.9: High coverage – most keywords included.
0.4–0.6: Partial coverage – some keywords present.
0.1–0.3: Low coverage – few keywords matched.
0.0: No coverage – no keywords found.

Special Cases

The scorer handles several special cases:

Empty input/output: Returns score of 1.0 if both empty, 0.0 if only one is empty
Single word: Treated as a single keyword
Technical terms: Preserves compound technical terms (e.g., “React.js”, “machine learning”)
Case differences: “JavaScript” matches “javascript”
Common words: Ignored in scoring to focus on meaningful keywords

Examples

Full coverage example

In this example, the response fully reflects the key terms from the input. All required keywords are present, resulting in complete coverage with no omissions.

src/example-full-keyword-coverage.ts


import { createKeywordCoverageScorer } from "@mastra/evals/scorers/code";
 
const scorer = createKeywordCoverageScorer();
 
const input = 'JavaScript frameworks like React and Vue';
const output = 'Popular JavaScript frameworks include React and Vue for web development';
 
const result = await scorer.run({
  input: [{ role: 'user', content: input }],
  output: { role: 'assistant', text: output },
});
 
console.log('Score:', result.score);
console.log('AnalyzeStepResult:', result.analyzeStepResult);

Full coverage output


{
  score: 1,
  analyzeStepResult: {
    totalKeywords: 4,
    matchedKeywords: 4
  }
}

Partial coverage example

In this example, the response includes some, but not all, of the important keywords from the input. The score reflects partial coverage, with key terms either missing or only partially matched.

src/example-partial-keyword-coverage.ts


import { createKeywordCoverageScorer } from "@mastra/evals/scorers/code";
 
const scorer = createKeywordCoverageScorer();
 
const input = 'TypeScript offers interfaces, generics, and type inference';
const output = 'TypeScript provides type inference and some advanced features';
 
const result = await scorer.run({
  input: [{ role: 'user', content: input }],
  output: { role: 'assistant', text: output },
});
 
console.log('Score:', result.score);
console.log('AnalyzeStepResult:', result.analyzeStepResult);

Partial coverage output


{
  score: 0.5,
  analyzeStepResult: {
    totalKeywords: 6,
    matchedKeywords: 3
  }
}

Minimal coverage example

In this example, the response includes very few of the important keywords from the input. The score reflects minimal coverage, with most key terms missing or unaccounted for.

src/example-minimal-keyword-coverage.ts


import { createKeywordCoverageScorer } from "@mastra/evals/scorers/code";
 
const scorer = createKeywordCoverageScorer();
 
const input = 'Machine learning models require data preprocessing, feature engineering, and hyperparameter tuning';
const output = 'Data preparation is important for models';
 
const result = await scorer.run({
  input: [{ role: 'user', content: input }],
  output: { role: 'assistant', text: output },
});
 
console.log('Score:', result.score);
console.log('AnalyzeStepResult:', result.analyzeStepResult);

Minimal coverage output


{
  score: 0.2,
  analyzeStepResult: {
    totalKeywords: 10,
    matchedKeywords: 2
  }
}

Metric configuration

You can create a KeywordCoverageMetric instance with default settings. No additional configuration is required.


const metric = new KeywordCoverageMetric();

See KeywordCoverageScorer for a full list of configuration options.

Keyword Coverage Scorer

Parameters

.run() Returns

runId:

preprocessStepResult:

analyzeStepResult:

score:

Scoring Details

Scoring Process

Score interpretation

Special Cases

Examples

Full coverage example

Full coverage output

Partial coverage example

Partial coverage output

Minimal coverage example

Minimal coverage output

Metric configuration

Related

Keyword Coverage Scorer

Parameters

.run() Returns

runId:

preprocessStepResult:

analyzeStepResult:

score:

Scoring Details

Scoring Process

Score interpretation

Special Cases

Examples

Full coverage example

Full coverage output

Partial coverage example

Partial coverage output

Minimal coverage example

Minimal coverage output

Metric configuration

Related