createScorer

Mastra provides a unified createScorer factory that allows you to define custom scorers for evaluating input/output pairs. You can use either native JavaScript functions or LLM-based prompt objects for each evaluation step. Custom scorers can be added to Agents and Workflow steps.

How to Create a Custom ScorerDirect link to How to Create a Custom Scorer

Use the createScorer factory to define your scorer with a name, description, and optional judge configuration. Then chain step methods to build your evaluation pipeline. You must provide at least a generateScore step.

const scorer = createScorer({
  name: "My Custom Scorer",
  description: "Evaluates responses based on custom criteria",
  type: "agent", // Optional: for agent evaluation with automatic typing
  judge: {
    model: myModel,
    instructions: "You are an expert evaluator...",
  },
})
  .preprocess({
    /* step config */
  })
  .analyze({
    /* step config */
  })
  .generateScore(({ run, results }) => {
    // Return a number
  })
  .generateReason({
    /* step config */
  });

createScorer OptionsDirect link to createScorer Options

name:

string

Name of the scorer.

description:

string

Description of what the scorer does.

judge:

object

Optional judge configuration for LLM-based steps. See Judge Object section below.

type:

string

Type specification for input/output. Use 'agent' for automatic agent types. For custom types, use the generic approach instead.

This function returns a scorer builder that you can chain step methods onto. See the MastraScorer reference for details on the .run() method and its input/output.

Judge ObjectDirect link to Judge Object

model:

LanguageModel

The LLM model instance to use for evaluation.

instructions:

string

System prompt/instructions for the LLM.

Type SafetyDirect link to Type Safety

You can specify input/output types when creating scorers for better type inference and IntelliSense support:

Agent Type ShortcutDirect link to Agent Type Shortcut

For evaluating agents, use type: 'agent' to automatically get the correct types for agent input/output:

import { createScorer } from "@mastra/core/scorers";

// Agent scorer with automatic typing
const agentScorer = createScorer({
  name: "Agent Response Quality",
  description: "Evaluates agent responses",
  type: "agent", // Automatically provides ScorerRunInputForAgent/ScorerRunOutputForAgent
})
  .preprocess(({ run }) => {
    // run.input is automatically typed as ScorerRunInputForAgent
    const userMessage = run.input.inputMessages[0]?.content;
    return { userMessage };
  })
  .generateScore(({ run, results }) => {
    // run.output is automatically typed as ScorerRunOutputForAgent
    const response = run.output[0]?.content;
    return response.length > 10 ? 1.0 : 0.5;
  });

Custom Types with GenericsDirect link to Custom Types with Generics

For custom input/output types, use the generic approach:

import { createScorer } from "@mastra/core/scorers";

type CustomInput = { query: string; context: string[] };
type CustomOutput = { answer: string; confidence: number };

const customScorer = createScorer<CustomInput, CustomOutput>({
  name: "Custom Scorer",
  description: "Evaluates custom data",
}).generateScore(({ run }) => {
  // run.input is typed as CustomInput
  // run.output is typed as CustomOutput
  return run.output.confidence;
});

Built-in Agent TypesDirect link to Built-in Agent Types

ScorerRunInputForAgent - Contains inputMessages, rememberedMessages, systemMessages, and taggedSystemMessages for agent evaluation
ScorerRunOutputForAgent - Array of agent response messages

Using these types provides autocomplete, compile-time validation, and better documentation for your scoring logic.

Trace Scoring with Agent TypesDirect link to Trace Scoring with Agent Types

When you use type: 'agent', your scorer is compatible for both adding directly to agents and scoring traces from agent interactions. The scorer automatically transforms trace data into the proper agent input/output format:

const agentTraceScorer = createScorer({
  name: "Agent Trace Length",
  description: "Evaluates agent response length",
  type: "agent",
}).generateScore(({ run }) => {
  // Trace data is automatically transformed to agent format
  const userMessages = run.input.inputMessages;
  const agentResponse = run.output[0]?.content;

  // Score based on response length
  return agentResponse?.length > 50 ? 0 : 1;
});

// Register with Mastra for trace scoring
const mastra = new Mastra({
  scorers: {
    agentTraceScorer,
  },
});

Step Method SignaturesDirect link to Step Method Signatures

preprocessDirect link to preprocess

Optional preprocessing step that can extract or transform data before analysis.

Function Mode: Function: ({ run, results }) => any

run.input:

any

Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.runtimeContext:

object

Runtime context from the agent or workflow step being evaluated (optional).

results:

object

Empty object (no previous steps).

Returns: any
The method can return any value. The returned value will be available to subsequent steps as preprocessStepResult.

Prompt Object Mode:

description:

string

Description of what this preprocessing step does.

outputSchema:

ZodSchema

Zod schema for the expected output of the preprocess step.

createPrompt:

function

Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

analyzeDirect link to analyze

Optional analysis step that processes the input/output and any preprocessed data.

Function Mode: Function: ({ run, results }) => any

run.input:

any

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.runtimeContext:

object

Runtime context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult:

any

Result from preprocess step, if defined (optional).

Returns: any
The method can return any value. The returned value will be available to subsequent steps as analyzeStepResult.

Prompt Object Mode:

description:

string

Description of what this analysis step does.

outputSchema:

ZodSchema

Zod schema for the expected output of the analyze step.

createPrompt:

function

Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

generateScoreDirect link to generateScore

Required step that computes the final numerical score.

Function Mode: Function: ({ run, results }) => number

run.input:

any

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.runtimeContext:

object

Runtime context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult:

any

Result from preprocess step, if defined (optional).

results.analyzeStepResult:

any

Result from analyze step, if defined (optional).

Returns: number
The method must return a numerical score.

Prompt Object Mode:

description:

string

Description of what this scoring step does.

outputSchema:

ZodSchema

Zod schema for the expected output of the generateScore step.

createPrompt:

function

Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

When using prompt object mode, you must also provide a calculateScore function to convert the LLM output to a numerical score:

calculateScore:

function

Function: ({ run, results, analyzeStepResult }) => number. Converts the LLM's structured output into a numerical score.

generateReasonDirect link to generateReason

Optional step that provides an explanation for the score.

Function Mode: Function: ({ run, results, score }) => string

run.input:

any

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.runtimeContext:

object

Runtime context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult:

any

Result from preprocess step, if defined (optional).

results.analyzeStepResult:

any

Result from analyze step, if defined (optional).

score:

number

Score computed by the generateScore step.

Returns: string
The method must return a string explaining the score.

Prompt Object Mode:

description:

string

Description of what this reasoning step does.

createPrompt:

function

Function: ({ run, results, score }) => string. Returns the prompt for the LLM.

judge:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

All step functions can be async.