Skip to Content
DocsScorersCustom Scorers

Creating scorers

Mastra provides two approaches for creating custom scorers:

Code scorers use programmatic logic and algorithms. They’re ideal for deterministic evaluations, performance-critical scenarios, and cases where you have clear algorithmic criteria.

LLM scorers use language models as judges. They’re perfect for subjective evaluations, complex criteria that are difficult to code algorithmically, and cases where human-like judgment is needed.

Code-based scorers

Code scorers use createScorer to build evaluation logic with programmatic algorithms. They’re ideal for deterministic evaluations, performance-critical scenarios, and cases where you have clear algorithmic criteria or need integration with existing libraries.

Code scorers follow Mastra’s three-step evaluation pipeline:

  • an optional extract step for preprocessing complex data
  • a required analyze step for core evaluation and scoring
  • and an optional reason step for generating explanations.

For the complete API reference, see createScorer, and for a detailed explanation of the pipeline, see evaluation process.

Extract Step

This optional step preprocesses input/output data when you need to evaluate multiple distinct elements, filter content, or focus analysis on specific parts of complex data.

  • Receives:
    • input: User messages (when used with agents) or workflow step input (when used with workflow steps)
    • output: Agent’s response (when used with agents) or workflow step output (when used with workflow steps)
    • runtimeContext: Runtime context from the agent or workflow step being evaluated
  • Must return: { results: any }
  • Data flow: The results value is passed to the analyze step as extractStepResult
src/mastra/scorers/keyword-coverage-scorer.ts
import { createScorer } from "@mastra/core/scores"; import keywordExtractor from "keyword-extractor"; export const keywordCoverageScorer = createScorer({ name: "Keyword Coverage", description: "Evaluates how well the output covers keywords from the input", // Step 1: Extract keywords from input and output extract: async ({ input, output }) => { const inputText = input?.map(i => i.content).join(", ") || ""; const outputText = output.text; const extractKeywords = (text: string) => { return keywordExtractor.extract(text); }; const inputKeywords = new Set(extractKeywords(inputText)); const outputKeywords = new Set(extractKeywords(outputText)); return { results: { inputKeywords, outputKeywords, }, }; }, // ... analyze and reason steps });

Analyze Step

This required step performs the core evaluation and generates the numerical score for all scorers.

  • Receives: Everything from extract step, plus:
    • extractStepResult: Results from the extract step (if extract step was defined)
  • Must return: { score: number, results?: any }
  • Data flow: The score and optional results are passed to the reason step
src/mastra/scorers/keyword-coverage-scorer.ts
export const keywordCoverageScorer = createScorer({ // ... name, description, extract step // Step 2: Analyze keyword coverage and calculate score analyze: async ({ input, output, extractStepResult }) => { const { inputKeywords, outputKeywords } = extractStepResult.results; if (inputKeywords.size === 0) { return { score: 1, results: { coverage: 1, matched: 0, total: 0 } }; } const matchedKeywords = [...inputKeywords].filter(keyword => outputKeywords.has(keyword) ); const coverage = matchedKeywords.length / inputKeywords.size; return { score: coverage, results: { coverage, matched: matchedKeywords.length, total: inputKeywords.size, matchedKeywords, }, }; }, // ... reason step });

Reason Step

This optional step generates human-readable explanations for scores, useful for actionable feedback, debugging transparency, or compliance documentation.

  • Receives: Everything from analyze step, plus:
    • score: The numerical score (0-1) calculated by the analyze step
    • analyzeStepResult: Results from the analyze step (contains the score and any additional results)
  • Must return: { reason: string }
src/mastra/scorers/keyword-coverage-scorer.ts
export const keywordCoverageScorer = createScorer({ // ... name, description, extract and analyze steps // Step 3: Generate explanation for the score reason: async ({ score, analyzeStepResult, extractStepResult }) => { const { matched, total, matchedKeywords } = analyzeStepResult.results; const { inputKeywords } = extractStepResult.results; const percentage = Math.round(score * 100); const missedKeywords = [...inputKeywords].filter( keyword => !matchedKeywords.includes(keyword) ); let reason = `The output achieved ${percentage}% keyword coverage (${matched}/${total} keywords).`; if (matchedKeywords.length > 0) { reason += ` Covered keywords: ${matchedKeywords.join(", ")}.`; } if (missedKeywords.length > 0) { reason += ` Missing keywords: ${missedKeywords.join(", ")}.`; } return { reason }; }, });

Examples and Resources:

LLM-based scorers

LLM scorers use createLLMScorer to build evaluations that leverage language models as judges. They’re perfect for subjective evaluations that require understanding context, complex criteria that are difficult to code algorithmically, natural language understanding tasks, and cases where human-like judgment is needed.

LLM scorers follow the same evaluation pipeline as code scorers with an additional calculateScore function:

  • an optional extract step where the LLM processes input/output and returns structured data
  • a required analyze step where the LLM performs evaluation and returns structured analysis
  • a required calculateScore function that converts LLM analysis into numerical score
  • and an optional reason step where the LLM generates human-readable explanations

The calculateScore function leverages the best of both approaches: LLMs excel at qualitative analysis and understanding, while deterministic functions ensure precise and consistent numerical scoring.

For the complete API reference, see createLLMScorer, and for a detailed explanation of the pipeline, see evaluation process.

Judge Configuration

All LLM scorer steps share this required configuration that defines the model and system instructions.

  • Configuration: judge object containing:
    • model: The LLM model instance for evaluation
    • instructions: System prompt that guides the LLM’s behavior
src/mastra/scorers/tone-scorer.ts
import { openai } from "@ai-sdk/openai"; import { createLLMScorer } from "@mastra/core/scores"; export const toneScorer = createLLMScorer({ name: 'Tone Scorer', description: 'Evaluates the tone of the output', // Shared judge configuration judge: { model: openai('gpt-4o'), instructions: 'You are an expert in analyzing text tone and communication style.', }, // ... other steps });

Extract Step

This optional step uses an LLM to preprocess input/output data when you need to evaluate multiple distinct elements, filter content, or focus analysis on specific parts of complex data.

  • Configuration: { description, outputSchema, createPrompt }
  • Data flow: The structured output (defined by outputSchema) is passed to the analyze step as extractStepResult
src/mastra/scorers/content-scorer.ts
export const contentScorer = createLLMScorer({ // ... judge configuration extract: { description: 'Extract key themes and topics from the content', outputSchema: z.object({ themes: z.array(z.string()), topics: z.array(z.string()), keyPhrases: z.array(z.string()) }), createPrompt: ({ run }) => ` Analyze this content and extract: 1. Main themes (3-5 high-level concepts) 2. Specific topics mentioned 3. Key phrases that capture the essence Content: ${run.output.text} Return a JSON object with themes, topics, and keyPhrases arrays. `, }, // ... other steps });

Analyze Step

This required step uses an LLM to perform the core evaluation and return structured analysis that will be converted to a numerical score.

  • Configuration: { description, outputSchema, createPrompt }
  • Data flow: The structured output is passed to the calculateScore function and then to the reason step
src/mastra/scorers/quality-scorer.ts
export const qualityScorer = createLLMScorer({ // ... judge configuration analyze: { description: 'Evaluate content quality across multiple dimensions', outputSchema: z.object({ clarity: z.number().min(1).max(5), accuracy: z.number().min(1).max(5), completeness: z.number().min(1).max(5), relevance: z.number().min(1).max(5) }), createPrompt: ({ run }) => ` Evaluate this content on a scale of 1-5 for: - Clarity: How clear and understandable is it? - Accuracy: How factually correct does it appear? - Completeness: How thorough is the response? - Relevance: How well does it address the input? Input: ${run.input.map(i => i.content).join(', ')} Output: ${run.output.text} Return a JSON object with numeric scores for each dimension. `, }, // ... other steps });

Calculate Score Step

This required function converts the LLM’s structured analysis into a numerical score, providing deterministic scoring logic since LLMs aren’t reliable for consistent numerical outputs.

  • Configuration: calculateScore function that receives { run } and returns a number
  • Data flow: Converts the analyze step’s structured output into a numerical score (0-1 range)
src/mastra/scorers/quality-scorer.ts
export const qualityScorer = createLLMScorer({ // ... previous steps calculateScore: ({ run }) => { const { clarity, accuracy, completeness, relevance } = run.analyzeStepResult; // Calculate weighted average (scale of 1-5 to 0-1) const weights = { clarity: 0.3, accuracy: 0.3, completeness: 0.2, relevance: 0.2 }; const weightedSum = (clarity * weights.clarity) + (accuracy * weights.accuracy) + (completeness * weights.completeness) + (relevance * weights.relevance); // Convert from 1-5 scale to 0-1 scale return (weightedSum - 1) / 4; }, // ... other steps });

Reason Step

This optional step uses an LLM to generate human-readable explanations for scores, useful for actionable feedback, debugging transparency, or compliance documentation.

  • Configuration: { description, createPrompt }
  • Data flow: Receives all previous step results and score, returns a string explanation
src/mastra/scorers/quality-scorer.ts
export const qualityScorer = createLLMScorer({ // ... previous steps reason: { createPrompt: ({ run }) => { const { clarity, accuracy, completeness, relevance } = run.analyzeStepResult; const percentage = Math.round(run.score * 100); return ` The content received a ${percentage}% quality score based on: - Clarity: ${clarity}/5 - Accuracy: ${accuracy}/5 - Completeness: ${completeness}/5 - Relevance: ${relevance}/5 Provide a brief explanation of what contributed to this score. `; }, }, });

Examples and Resources: