Custom Native JavaScript Evaluation

note

The Scorers API is currently in beta. We're actively working on improvements and appreciate your feedback. For questions or feature requests, please reach out on Discord.

This example shows how to create a custom evaluation metric using JavaScript logic. The metric accepts a query and a response, and returns a score and an info object containing the total and matched words.

Installation

npm install @mastra/evals

Create a custom eval

A custom eval in Mastra can use native JavaScript methods to evaluate conditions.

src/mastra/evals/example-word-inclusion.ts
import { Metric, type MetricResult } from "@mastra/core";

export class WordInclusionMetric extends Metric {
  constructor() {
    super();
  }

  async measure(input: string, output: string): Promise<MetricResult> {
    const tokenize = (text: string) =>
      text.toLowerCase().match(/\b\w+\b/g) || [];

    const referenceWords = [...new Set(tokenize(input))];
    const outputText = output.toLowerCase();

    const matchedWords = referenceWords.filter((word) =>
      outputText.includes(word),
    );

    const totalWords = referenceWords.length;
    const score = totalWords > 0 ? matchedWords.length / totalWords : 0;

    return {
      score,
      info: {
        totalWords,
        matchedWords: matchedWords.length,
      },
    };
  }
}

High custom example

In this example, the response contains all the words listed in the input query. The metric returns a high score indicating complete word inclusion.

src/example-high-word-inclusion.ts
import { WordInclusionMetric } from "./mastra/evals/example-word-inclusion";

const metric = new WordInclusionMetric();

const query = "apple, banana, orange";
const response = "My favorite fruits are: apple, banana, and orange.";

const result = await metric.measure(query, response);

console.log(result);

High custom output

The output receives a high score because all the unique words from the input are present in the response, demonstrating full coverage.

{
  score: 1,
  info: {
    totalWords: 3,
    matchedWords: 3
  }
}

Partial custom example

In this example, the response includes some but not all of the words from the input query. The metric returns a partial score reflecting this incomplete word coverage.

src/example-partial-word-inclusion.ts
import { WordInclusionMetric } from "./mastra/evals/example-word-inclusion";

const metric = new WordInclusionMetric();

const query = "cats, dogs, rabbits";
const response = "I like dogs and rabbits";

const result = await metric.measure(query, response);

console.log(result);

Partial custom output

The score reflects partial success because the response contains only a subset of the unique words from the input, indicating incomplete word inclusion.

{
  score: 0.6666666666666666,
  info: {
    totalWords: 3,
    matchedWords: 2
  }
}

Low custom example

In this example, the response does not contain any of the words from the input query. The metric returns a low score indicating no word inclusion.

src/example-low-word-inclusion.ts
import { WordInclusionMetric } from "./mastra/evals/example-word-inclusion";

const metric = new WordInclusionMetric();

const query = "Colombia, Brazil, Panama";
const response = "Let's go to Mexico";

const result = await metric.measure(query, response);

console.log(result);

Low custom output

The score is 0 because none of the unique words from the input appear in the response, indicating no overlap between the texts.

{
  score: 0,
  info: {
    totalWords: 3,
    matchedWords: 0
  }
}

Understanding the results

WordInclusionMetric returns a result in the following shape:

{
  score: number,
  info: {
    totalWords: number,
    matchedWords: number
  }
}

Custom score

A score between 0 and 1:

1.0: The response includes all words from the input.
0.5–0.9: The response includes some but not all words.
0.0: None of the input words appear in the response.

Custom info

An explanation for the score, with details including:

totalWords is the number of unique words found in the input.
matchedWords is the count of those words that also appear in the response.
The score is calculated as matchedWords / totalWords.
If no valid words are found in the input, the score defaults to 0.

View source on GitHub

Installation​

Create a custom eval​

High custom example​

High custom output​

Partial custom example​

Partial custom output​

Low custom example​

Low custom output​

Understanding the results​

Custom score​

Custom info​