Create your own Eval

Creating your own eval is as easy as creating a new function. You simply create a class that extends the Metric class and implement the measure method.

Basic example

For a simple example of creating a custom metric that checks if the output contains certain words, see our Word Inclusion example.

Creating a custom LLM-Judge

A custom LLM judge helps evaluate specific aspects of your AI’s responses. Think of it like having an expert reviewer for your particular use case:

Medical Q&A → Judge checks for medical accuracy and safety
Customer Service → Judge evaluates tone and helpfulness
Code Generation → Judge verifies code correctness and style

For a practical example, see how we evaluate Chef Michel’s recipes for gluten content in our Gluten Checker example.