Skip to Content
ExamplesScorersLLM as a Judge

Custom LLM Judge Scorer

This example shows how to create a custom LLM-based scorer using createLLMScorer. We’ll build a “Gluten Checker” that evaluates whether a recipe contains gluten, using a language model as the judge.

Installation

npm install @mastra/evals

For complete API documentation and configuration options, see createLLMScorer.

Create a custom LLM scorer

A custom LLM scorer in Mastra uses createLLMScorer to define:

  • Instructions for the LLM judge
  • Prompt for the analyze step
  • Zod schema for the expected output
  • Scoring logic
  • (Optionally) a reason step for explanations

See createLLMScorer for the full API and configuration options.

src/mastra/evals/gluten-checker.ts
import { openai } from "@ai-sdk/openai"; import { createLLMScorer } from "@mastra/core/scores"; import { z } from "zod"; const GLUTEN_INSTRUCTIONS = `You are a Chef that identifies if recipes contain gluten.`; const generateGlutenPrompt = ({ output }: { output: string }) => ` You are a chef who checks if a recipe contains gluten. List all ingredients in the recipe that contain gluten and return them in a JSON object. Gluten is commonly found in: - Wheat (including wheat flour, whole wheat, semolina, durum, spelt, farro, etc.) - Barley (including malt, malt extract, malt vinegar) - Rye - Triticale - Products made from these grains (e.g., bread, pasta, cake, cookies, seitan, beer, soy sauce unless labeled gluten-free) **Instructions:** - Carefully read the recipe and list every ingredient that contains gluten. - If an ingredient is ambiguous (e.g., "flour" without specifying type), assume it contains gluten unless otherwise stated. - If you are unsure, include the ingredient and note it in a comment in the JSON (see example). - If there are no gluten-containing ingredients, return an empty array. **Return ONLY the following JSON object, with no extra text:** { "glutenSources": ["list of gluten-containing ingredients"] } === Recipe to analyze === ${output} === End of recipe to analyze === JSON: `; const generateReasonPrompt = ({ isGlutenFree, glutenSources }: { isGlutenFree: boolean; glutenSources: string[] }) => `Explain why this recipe is${isGlutenFree ? '' : ' not'} gluten-free.\n${glutenSources.length > 0 ? `Sources of gluten: ${glutenSources.join(', ')}` : 'No gluten-containing ingredients found'}\nReturn your response in this format:\n{\n "reason": "This recipe is [gluten-free/contains gluten] because [explanation]"\n}`; export const glutenCheckerScorer = createLLMScorer({ name: 'Gluten Checker', description: 'Check if the output contains any gluten', judge: { model: openai('gpt-4o'), instructions: GLUTEN_INSTRUCTIONS, }, analyze: { description: 'Analyze the output for gluten', outputSchema: z.object({ glutenSources: z.array(z.string()) }), createPrompt: ({ run }) => generateGlutenPrompt({ output: run.output.text }), }, calculateScore: ({ run }) => run.analyzeStepResult.glutenSources.length > 0 ? 0 : 1, reason: { createPrompt: ({ run }) => generateReasonPrompt({ glutenSources: run.analyzeStepResult.glutenSources, isGlutenFree: run.analyzeStepResult.glutenSources.length === 0, }), }, });

High gluten-free example

src/example-high-gluten-free.ts
const input = 'Mix rice, beans, and vegetables'; const output = 'Mix rice, beans, and vegetables'; const result = await glutenCheckerScorer.run({ input: [{ role: 'user', content: input }], output: { text: output }, }); console.log('Score:', result.score); console.log('Gluten sources:', result.analyzeStepResult.glutenSources); console.log('Reason:', result.reason);

High gluten-free output

{ score: 1, analyzeStepResult: { glutenSources: [] }, reason: 'This recipe is gluten-free because no gluten-containing ingredients were found.' }

Partial gluten example

src/example-partial-gluten.ts
const input = 'Mix flour and water to make dough'; const output = 'Mix flour and water to make dough'; const result = await glutenCheckerScorer.run({ input: [{ role: 'user', content: input }], output: { text: output }, }); console.log('Score:', result.score); console.log('Gluten sources:', result.analyzeStepResult.glutenSources); console.log('Reason:', result.reason);

Partial gluten output

{ score: 0, analyzeStepResult: { glutenSources: ['flour'] }, reason: 'This recipe is not gluten-free because it contains flour.' }

Low gluten-free example

src/example-low-gluten-free.ts
const input = 'Add soy sauce and noodles'; const output = 'Add soy sauce and noodles'; const result = await glutenCheckerScorer.run({ input: [{ role: 'user', content: input }], output: { text: output }, }); console.log('Score:', result.score); console.log('Gluten sources:', result.analyzeStepResult.glutenSources); console.log('Reason:', result.reason);

Low gluten-free output

{ score: 0, analyzeStepResult: { glutenSources: ['soy sauce', 'noodles'] }, reason: 'This recipe is not gluten-free because it contains soy sauce, noodles.' }

Understanding the results

.run() returns a result in the following shape:

{ runId: string, analyzeStepResult: { glutenSources: string[] }, score: number, reason: string }

score

A score of 1 means the recipe is gluten-free. A score of 0 means gluten was detected.

runId

The unique identifier for this scorer run.

analyzeStepResult

Object with gluten sources:

  • glutenSources: Array of gluten-containing ingredients found in the recipe.

reason

Explanation of why the recipe is or is not gluten-free.

View Example on GitHub