Custom LLM Judge Scorer
This example shows how to create a custom LLM-based scorer using createLLMScorer
. We’ll build a “Gluten Checker” that evaluates whether a recipe contains gluten, using a language model as the judge.
Installation
npm install @mastra/evals
For complete API documentation and configuration options, see
createLLMScorer
.
Create a custom LLM scorer
A custom LLM scorer in Mastra uses createLLMScorer
to define:
- Instructions for the LLM judge
- Prompt for the analyze step
- Zod schema for the expected output
- Scoring logic
- (Optionally) a reason step for explanations
See createLLMScorer for the full API and configuration options.
src/mastra/evals/gluten-checker.ts
import { openai } from "@ai-sdk/openai";
import { createLLMScorer } from "@mastra/core/scores";
import { z } from "zod";
const GLUTEN_INSTRUCTIONS = `You are a Chef that identifies if recipes contain gluten.`;
const generateGlutenPrompt = ({ output }: { output: string }) => `
You are a chef who checks if a recipe contains gluten. List all ingredients in the recipe that contain gluten and return them in a JSON object.
Gluten is commonly found in:
- Wheat (including wheat flour, whole wheat, semolina, durum, spelt, farro, etc.)
- Barley (including malt, malt extract, malt vinegar)
- Rye
- Triticale
- Products made from these grains (e.g., bread, pasta, cake, cookies, seitan, beer, soy sauce unless labeled gluten-free)
**Instructions:**
- Carefully read the recipe and list every ingredient that contains gluten.
- If an ingredient is ambiguous (e.g., "flour" without specifying type), assume it contains gluten unless otherwise stated.
- If you are unsure, include the ingredient and note it in a comment in the JSON (see example).
- If there are no gluten-containing ingredients, return an empty array.
**Return ONLY the following JSON object, with no extra text:**
{
"glutenSources": ["list of gluten-containing ingredients"]
}
=== Recipe to analyze ===
${output}
=== End of recipe to analyze ===
JSON:
`;
const generateReasonPrompt = ({ isGlutenFree, glutenSources }: { isGlutenFree: boolean; glutenSources: string[] }) => `Explain why this recipe is${isGlutenFree ? '' : ' not'} gluten-free.\n${glutenSources.length > 0 ? `Sources of gluten: ${glutenSources.join(', ')}` : 'No gluten-containing ingredients found'}\nReturn your response in this format:\n{\n "reason": "This recipe is [gluten-free/contains gluten] because [explanation]"\n}`;
export const glutenCheckerScorer = createLLMScorer({
name: 'Gluten Checker',
description: 'Check if the output contains any gluten',
judge: {
model: openai('gpt-4o'),
instructions: GLUTEN_INSTRUCTIONS,
},
analyze: {
description: 'Analyze the output for gluten',
outputSchema: z.object({ glutenSources: z.array(z.string()) }),
createPrompt: ({ run }) => generateGlutenPrompt({ output: run.output.text }),
},
calculateScore: ({ run }) => run.analyzeStepResult.glutenSources.length > 0 ? 0 : 1,
reason: {
createPrompt: ({ run }) => generateReasonPrompt({
glutenSources: run.analyzeStepResult.glutenSources,
isGlutenFree: run.analyzeStepResult.glutenSources.length === 0,
}),
},
});
High gluten-free example
src/example-high-gluten-free.ts
const input = 'Mix rice, beans, and vegetables';
const output = 'Mix rice, beans, and vegetables';
const result = await glutenCheckerScorer.run({
input: [{ role: 'user', content: input }],
output: { text: output },
});
console.log('Score:', result.score);
console.log('Gluten sources:', result.analyzeStepResult.glutenSources);
console.log('Reason:', result.reason);
High gluten-free output
{
score: 1,
analyzeStepResult: { glutenSources: [] },
reason: 'This recipe is gluten-free because no gluten-containing ingredients were found.'
}
Partial gluten example
src/example-partial-gluten.ts
const input = 'Mix flour and water to make dough';
const output = 'Mix flour and water to make dough';
const result = await glutenCheckerScorer.run({
input: [{ role: 'user', content: input }],
output: { text: output },
});
console.log('Score:', result.score);
console.log('Gluten sources:', result.analyzeStepResult.glutenSources);
console.log('Reason:', result.reason);
Partial gluten output
{
score: 0,
analyzeStepResult: { glutenSources: ['flour'] },
reason: 'This recipe is not gluten-free because it contains flour.'
}
Low gluten-free example
src/example-low-gluten-free.ts
const input = 'Add soy sauce and noodles';
const output = 'Add soy sauce and noodles';
const result = await glutenCheckerScorer.run({
input: [{ role: 'user', content: input }],
output: { text: output },
});
console.log('Score:', result.score);
console.log('Gluten sources:', result.analyzeStepResult.glutenSources);
console.log('Reason:', result.reason);
Low gluten-free output
{
score: 0,
analyzeStepResult: { glutenSources: ['soy sauce', 'noodles'] },
reason: 'This recipe is not gluten-free because it contains soy sauce, noodles.'
}
Understanding the results
.run()
returns a result in the following shape:
{
runId: string,
analyzeStepResult: {
glutenSources: string[]
},
score: number,
reason: string
}
score
A score of 1 means the recipe is gluten-free. A score of 0 means gluten was detected.
runId
The unique identifier for this scorer run.
analyzeStepResult
Object with gluten sources:
- glutenSources: Array of gluten-containing ingredients found in the recipe.
reason
Explanation of why the recipe is or is not gluten-free.
View Example on GitHub