Create a Custom Eval

Scorers

This documentation refers to the legacy evals API. For the latest scorer features, see Scorers.

Create a custom eval by extending the Metric class and implementing the measure method. This gives you full control over how scores are calculated and what information is returned. For LLM-based evaluations, extend the MastraAgentJudge class to define how the model reasons and scores output.

Native JavaScript evaluationDirect link to Native JavaScript evaluation

You can write lightweight custom metrics using plain JavaScript/TypeScript. These are ideal for simple string comparisons, pattern checks, or other rule-based logic.

See our Word Inclusion example, which scores responses based on the number of reference words found in the output.

LLM as a judge evaluationDirect link to LLM as a judge evaluation

For more complex evaluations, you can build a judge powered by an LLM. This lets you capture more nuanced criteria, like factual accuracy, tone, or reasoning.

See the Real World Countries example for a complete walkthrough of building a custom judge and metric that evaluates real-world factual accuracy.