Custom Native JavaScript Evaluation
This example shows how to create a custom evaluation metric using JavaScript logic. The metric accepts a query
and a response
, and returns a score and an info
object containing the total and matched words.
Installation
npm install @mastra/evals
Create a custom eval
A custom eval in Mastra can use native JavaScript methods to evaluate conditions.
import { Metric, type MetricResult } from "@mastra/core";
export class WordInclusionMetric extends Metric {
constructor() {
super();
}
async measure(input: string, output: string): Promise<MetricResult> {
const tokenize = (text: string) => text.toLowerCase().match(/\b\w+\b/g) || [];
const referenceWords = [...new Set(tokenize(input))];
const outputText = output.toLowerCase();
const matchedWords = referenceWords.filter((word) => outputText.includes(word));
const totalWords = referenceWords.length;
const score = totalWords > 0 ? matchedWords.length / totalWords : 0;
return {
score,
info: {
totalWords,
matchedWords: matchedWords.length
}
};
}
}
High custom example
In this example, the response contains all the words listed in the input query. The metric returns a high score indicating complete word inclusion.
import { WordInclusionMetric } from "./mastra/evals/example-word-inclusion";
const metric = new WordInclusionMetric();
const query = "apple, banana, orange";
const response = "My favorite fruits are: apple, banana, and orange.";
const result = await metric.measure(query, response);
console.log(result);
High custom output
The output receives a high score because all the unique words from the input are present in the response, demonstrating full coverage.
{
score: 1,
info: {
totalWords: 3,
matchedWords: 3
}
}
Partial custom example
In this example, the response includes some but not all of the words from the input query. The metric returns a partial score reflecting this incomplete word coverage.
import { WordInclusionMetric } from "./mastra/evals/example-word-inclusion";
const metric = new WordInclusionMetric();
const query = "cats, dogs, rabbits";
const response = "I like dogs and rabbits";
const result = await metric.measure(query, response);
console.log(result);
Partial custom output
The score reflects partial success because the response contains only a subset of the unique words from the input, indicating incomplete word inclusion.
{
score: 0.6666666666666666,
info: {
totalWords: 3,
matchedWords: 2
}
}
Low custom example
In this example, the response does not contain any of the words from the input query. The metric returns a low score indicating no word inclusion.
import { WordInclusionMetric } from "./mastra/evals/example-word-inclusion";
const metric = new WordInclusionMetric();
const query = "Colombia, Brazil, Panama";
const response = "Let's go to Mexico";
const result = await metric.measure(query, response);
console.log(result);
Low custom output
The score is 0 because none of the unique words from the input appear in the response, indicating no overlap between the texts.
{
score: 0,
info: {
totalWords: 3,
matchedWords: 0
}
}
Understanding the results
WordInclusionMetric
returns a result in the following shape:
{
score: number,
info: {
totalWords: number,
matchedWords: number
}
}
Custom score
A score between 0 and 1:
- 1.0: The response includes all words from the input.
- 0.5–0.9: The response includes some but not all words.
- 0.0: None of the input words appear in the response.
Custom info
An explanation for the score, with details including:
totalWords
is the number of unique words found in the input.matchedWords
is the count of those words that also appear in the response.- The score is calculated as
matchedWords / totalWords
. - If no valid words are found in the input, the score defaults to
0
.