Create your own Eval
Creating your own eval is as easy as creating a new function. You simply create a class that extends the Metric
class and implement the measure
method.
Basic example
For a simple example of creating a custom metric that checks if the output contains certain words, see our Word Inclusion example.
Creating a custom LLM-Judge
A custom LLM judge helps evaluate specific aspects of your AI’s responses. Think of it like having an expert reviewer for your particular use case:
- Medical Q&A → Judge checks for medical accuracy and safety
- Customer Service → Judge evaluates tone and helpfulness
- Code Generation → Judge verifies code correctness and style
For a practical example, see how we evaluate Chef Michel’s recipes for gluten content in our Gluten Checker example.