createScorer

Mastra provides a unified createScorer factory that allows you to define custom scorers for evaluating input/output pairs. You can use either native JavaScript functions or LLM-based prompt objects for each evaluation step. Custom scorers can be added to Agents and Workflow steps.

How to create a custom scorer
Direct link to How to create a custom scorer

Use the createScorer factory to define your scorer with a name, description, and optional judge configuration. Then chain step methods to build your evaluation pipeline. You must provide at least a generateScore step.

Prompt object steps are step configurations expressed as objects with description + createPrompt (and outputSchema for preprocess/analyze). These steps invoke the judge LLM. Function steps are plain functions and never call the judge.

import { createScorer } from '@mastra/core/evals'

const scorer = createScorer({
  id: 'my-custom-scorer',
  name: 'My Custom Scorer', // Optional, defaults to id
  description: 'Evaluates responses based on custom criteria',
  type: 'agent', // Optional: for agent evaluation with automatic typing
  judge: {
    model: myModel,
    instructions: 'You are an expert evaluator...',
  },
})
  .preprocess({
    /* step config */
  })
  .analyze({
    /* step config */
  })
  .generateScore(({ run, results }) => {
    // Return a number
  })
  .generateReason({
    /* step config */
  })

`createScorer` options
Direct link to createscorer-options

id:

string

Unique identifier for the scorer. Used as the name if `name` is not provided.

name?:

string

Name of the scorer. Defaults to `id` if not provided.

description:

string

Description of what the scorer does.

judge?:

object

Optional judge configuration for LLM-based steps.

object

model:

LanguageModel

The LLM model instance to use for evaluation.

instructions:

string

System prompt/instructions for the LLM.

type?:

string

Type specification for input/output. Use 'agent' for automatic agent types. For custom types, use the generic approach instead.

This function returns a scorer builder that you can chain step methods onto. See the MastraScorer reference for details on the .run() method and its input/output.

The judge only runs for steps defined as prompt objects (preprocess, analyze, generateScore, generateReason in prompt mode). If you use function steps only, the judge is never called and there is no LLM output to inspect. In that case, any score/reason must be produced by your functions.

When a prompt-object step runs, its structured LLM output is stored in the corresponding result field (preprocessStepResult, analyzeStepResult, or the value consumed by calculateScore in generateScore).

Type safety
Direct link to Type safety

You can specify input/output types when creating scorers for better type inference and IntelliSense support:

Agent Type Shortcut
Direct link to Agent Type Shortcut

For evaluating agents, use type: 'agent' to automatically get the correct types for agent input/output:

import { createScorer } from '@mastra/core/evals'

// Agent scorer with automatic typing
const agentScorer = createScorer({
  id: 'agent-response-quality',
  description: 'Evaluates agent responses',
  type: 'agent', // Automatically provides ScorerRunInputForAgent/ScorerRunOutputForAgent
})
  .preprocess(({ run }) => {
    // run.input is automatically typed as ScorerRunInputForAgent
    const userMessage = run.inputData.inputMessages[0]?.content
    return { userMessage }
  })
  .generateScore(({ run, results }) => {
    // run.output is automatically typed as ScorerRunOutputForAgent
    const response = run.output[0]?.content
    return response.length > 10 ? 1.0 : 0.5
  })

Custom Types with Generics
Direct link to Custom Types with Generics

For custom input/output types, use the generic approach:

import { createScorer } from '@mastra/core/evals'

type CustomInput = { query: string; context: string[] }
type CustomOutput = { answer: string; confidence: number }

const customScorer = createScorer<CustomInput, CustomOutput>({
  id: 'custom-scorer',
  description: 'Evaluates custom data',
}).generateScore(({ run }) => {
  // run.input is typed as CustomInput
  // run.output is typed as CustomOutput
  return run.output.confidence
})

Built-in Agent Types
Direct link to Built-in Agent Types

ScorerRunInputForAgent - Contains inputMessages, rememberedMessages, systemMessages, and taggedSystemMessages for agent evaluation
ScorerRunOutputForAgent - Array of agent response messages

Using these types provides autocomplete, compile-time validation, and better documentation for your scoring logic.

Trace scoring with agent types
Direct link to Trace scoring with agent types

When you use type: 'agent', your scorer is compatible for both adding directly to agents and scoring traces from agent interactions. The scorer automatically transforms trace data into the proper agent input/output format:

const agentTraceScorer = createScorer({
  id: 'agent-trace-length',
  description: 'Evaluates agent response length',
  type: 'agent',
}).generateScore(({ run }) => {
  // Trace data is automatically transformed to agent format
  const userMessages = run.inputData.inputMessages
  const agentResponse = run.output[0]?.content

  // Score based on response length
  return agentResponse?.length > 50 ? 0 : 1
})

// Register with Mastra for trace scoring
const mastra = new Mastra({
  scorers: {
    agentTraceScorer,
  },
})

Step method signatures
Direct link to Step method signatures

preprocess
Direct link to preprocess

Optional preprocessing step that can extract or transform data before analysis.

Function Mode: Function: ({ run, results }) => any

run.input:

any

Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.requestContext?:

object

Request Context from the agent or workflow step being evaluated (optional).

results:

object

Empty object (no previous steps).

Returns: any
The method can return any value. The returned value will be available to subsequent steps as preprocessStepResult.

Prompt Object Mode:

description:

string

Description of what this preprocessing step does.

outputSchema:

ZodSchema

Zod schema for the expected output of the preprocess step.

createPrompt:

function

Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge?:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

analyze
Direct link to analyze

Optional analysis step that processes the input/output and any preprocessed data.

Function Mode: Function: ({ run, results }) => any

run.input:

any

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.requestContext?:

object

Request Context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult?:

any

Result from preprocess step, if defined (optional).

Returns: any
The method can return any value. The returned value will be available to subsequent steps as analyzeStepResult.

Prompt Object Mode:

description:

string

Description of what this analysis step does.

outputSchema:

ZodSchema

Zod schema for the expected output of the analyze step.

createPrompt:

function

Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge?:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

`generateScore`
Direct link to generatescore

Required step that computes the final numerical score.

Function Mode: Function: ({ run, results }) => number

run.input:

any

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.requestContext?:

object

Request Context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult?:

any

Result from preprocess step, if defined (optional).

results.analyzeStepResult?:

any

Result from analyze step, if defined (optional).

Returns: number
The method must return a numerical score.

Prompt Object Mode:

description:

string

Description of what this scoring step does.

outputSchema:

ZodSchema

Zod schema for the expected output of the generateScore step.

createPrompt:

function

Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge?:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

When using prompt object mode, you must also provide a calculateScore function to convert the LLM output to a numerical score:

calculateScore:

function

Function: ({ run, results, analyzeStepResult }) => number. Converts the LLM's structured output into a numerical score.

`generateReason`
Direct link to generatereason

Optional step that provides an explanation for the score.

Function Mode: Function: ({ run, results, score }) => string

run.input:

any

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.requestContext?:

object

Request Context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult?:

any

Result from preprocess step, if defined (optional).

results.analyzeStepResult?:

any

Result from analyze step, if defined (optional).

score:

number

Score computed by the generateScore step.

Returns: string
The method must return a string explaining the score.

Prompt Object Mode:

description:

string

Description of what this reasoning step does.

createPrompt:

function

Function: ({ run, results, score }) => string. Returns the prompt for the LLM.

judge?:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

All step functions can be async.

How to create a custom scorerDirect link to How to create a custom scorer

createScorer optionsDirect link to createscorer-options

id:

name?:

description:

judge?:

model:

instructions:

type?:

Type safetyDirect link to Type safety

Agent Type ShortcutDirect link to Agent Type Shortcut

Custom Types with GenericsDirect link to Custom Types with Generics

Built-in Agent TypesDirect link to Built-in Agent Types

Trace scoring with agent typesDirect link to Trace scoring with agent types

Step method signaturesDirect link to Step method signatures

preprocessDirect link to preprocess

run.input:

run.output:

run.runId:

run.requestContext?:

results:

description:

outputSchema:

createPrompt:

judge?:

analyzeDirect link to analyze

run.input:

run.output:

run.runId:

run.requestContext?:

results.preprocessStepResult?:

description:

outputSchema:

createPrompt:

judge?:

generateScoreDirect link to generatescore

run.input:

run.output:

run.runId:

run.requestContext?:

results.preprocessStepResult?:

results.analyzeStepResult?:

description:

outputSchema:

createPrompt:

judge?:

calculateScore:

generateReasonDirect link to generatereason

run.input:

run.output:

run.runId:

run.requestContext?:

results.preprocessStepResult?:

results.analyzeStepResult?:

score:

description:

createPrompt:

judge?:

How to create a custom scorer
Direct link to How to create a custom scorer

`createScorer` options
Direct link to createscorer-options

Type safety
Direct link to Type safety

Agent Type Shortcut
Direct link to Agent Type Shortcut

Custom Types with Generics
Direct link to Custom Types with Generics

Built-in Agent Types
Direct link to Built-in Agent Types

Trace scoring with agent types
Direct link to Trace scoring with agent types

Step method signatures
Direct link to Step method signatures

preprocess
Direct link to preprocess

analyze
Direct link to analyze

`generateScore`
Direct link to generatescore

`generateReason`
Direct link to generatereason