Skip to main content

Create a Custom Eval

Scorers

This documentation refers to the legacy evals API. For the latest scorer features, see Scorers.

Create a custom eval by extending the Metric class and implementing the measure method. This gives you full control over how scores are calculated and what information is returned. For LLM-based evaluations, extend the MastraAgentJudge class to define how the model reasons and scores output.

Native JavaScript evaluation

You can write lightweight custom metrics using plain JavaScript/TypeScript. These are ideal for simple string comparisons, pattern checks, or other rule-based logic.

See our Word Inclusion example, which scores responses based on the number of reference words found in the output.

LLM as a judge evaluation

For more complex evaluations, you can build a judge powered by an LLM. This lets you capture more nuanced criteria, like factual accuracy, tone, or reasoning.

See the Real World Countries example for a complete walkthrough of building a custom judge and metric that evaluates real-world factual accuracy.

On this page