Custom Native JavaScript Evaluation
We just released a new evals API called Scorers, with a more ergonomic API and more metadata stored for error analysis, and more flexibility to evaluate data structures. It’s fairly simple to migrate, but we will continue to support the existing Evals API.
This example shows how to create a custom evaluation metric using JavaScript logic. The metric accepts a query
and a response
, and returns a score and an info
object containing the total and matched words.
Installation
npm install @mastra/evals
Create a custom eval
A custom eval in Mastra can use native JavaScript methods to evaluate conditions.
import { Metric, type MetricResult } from "@mastra/core";
export class WordInclusionMetric extends Metric {
constructor() {
super();
}
async measure(input: string, output: string): Promise<MetricResult> {
const tokenize = (text: string) => text.toLowerCase().match(/\b\w+\b/g) || [];
const referenceWords = [...new Set(tokenize(input))];
const outputText = output.toLowerCase();
const matchedWords = referenceWords.filter((word) => outputText.includes(word));
const totalWords = referenceWords.length;
const score = totalWords > 0 ? matchedWords.length / totalWords : 0;
return {
score,
info: {
totalWords,
matchedWords: matchedWords.length
}
};
}
}
High custom example
In this example, the response contains all the words listed in the input query. The metric returns a high score indicating complete word inclusion.
import { WordInclusionMetric } from "./mastra/evals/example-word-inclusion";
const metric = new WordInclusionMetric();
const query = "apple, banana, orange";
const response = "My favorite fruits are: apple, banana, and orange.";
const result = await metric.measure(query, response);
console.log(result);
High custom output
The output receives a high score because all the unique words from the input are present in the response, demonstrating full coverage.
{
score: 1,
info: {
totalWords: 3,
matchedWords: 3
}
}
Partial custom example
In this example, the response includes some but not all of the words from the input query. The metric returns a partial score reflecting this incomplete word coverage.
import { WordInclusionMetric } from "./mastra/evals/example-word-inclusion";
const metric = new WordInclusionMetric();
const query = "cats, dogs, rabbits";
const response = "I like dogs and rabbits";
const result = await metric.measure(query, response);
console.log(result);
Partial custom output
The score reflects partial success because the response contains only a subset of the unique words from the input, indicating incomplete word inclusion.
{
score: 0.6666666666666666,
info: {
totalWords: 3,
matchedWords: 2
}
}
Low custom example
In this example, the response does not contain any of the words from the input query. The metric returns a low score indicating no word inclusion.
import { WordInclusionMetric } from "./mastra/evals/example-word-inclusion";
const metric = new WordInclusionMetric();
const query = "Colombia, Brazil, Panama";
const response = "Let's go to Mexico";
const result = await metric.measure(query, response);
console.log(result);
Low custom output
The score is 0 because none of the unique words from the input appear in the response, indicating no overlap between the texts.
{
score: 0,
info: {
totalWords: 3,
matchedWords: 0
}
}
Understanding the results
WordInclusionMetric
returns a result in the following shape:
{
score: number,
info: {
totalWords: number,
matchedWords: number
}
}
Custom score
A score between 0 and 1:
- 1.0: The response includes all words from the input.
- 0.5–0.9: The response includes some but not all words.
- 0.0: None of the input words appear in the response.
Custom info
An explanation for the score, with details including:
totalWords
is the number of unique words found in the input.matchedWords
is the count of those words that also appear in the response.- The score is calculated as
matchedWords / totalWords
. - If no valid words are found in the input, the score defaults to
0
.