Skip to main content

createScorer

Mastra provides a unified createScorer factory that allows you to define custom scorers for evaluating input/output pairs. You can use either native JavaScript functions or LLM-based prompt objects for each evaluation step. Custom scorers can be added to Agents and Workflow steps.

How to Create a Custom Scorer

Use the createScorer factory to define your scorer with a name, description, and optional judge configuration. Then chain step methods to build your evaluation pipeline. You must provide at least a generateScore step.

const scorer = createScorer({
name: "My Custom Scorer",
description: "Evaluates responses based on custom criteria",
type: "agent", // Optional: for agent evaluation with automatic typing
judge: {
model: myModel,
instructions: "You are an expert evaluator...",
},
})
.preprocess({
/* step config */
})
.analyze({
/* step config */
})
.generateScore(({ run, results }) => {
// Return a number
})
.generateReason({
/* step config */
});

createScorer Options

name:

string
Name of the scorer.

description:

string
Description of what the scorer does.

judge:

object
Optional judge configuration for LLM-based steps. See Judge Object section below.

type:

string
Type specification for input/output. Use 'agent' for automatic agent types. For custom types, use the generic approach instead.

This function returns a scorer builder that you can chain step methods onto. See the MastraScorer reference for details on the .run() method and its input/output.

Judge Object

model:

LanguageModel
The LLM model instance to use for evaluation.

instructions:

string
System prompt/instructions for the LLM.

Type Safety

You can specify input/output types when creating scorers for better type inference and IntelliSense support:

Agent Type Shortcut

For evaluating agents, use type: 'agent' to automatically get the correct types for agent input/output:

import { createScorer } from "@mastra/core/scorers";

// Agent scorer with automatic typing
const agentScorer = createScorer({
name: "Agent Response Quality",
description: "Evaluates agent responses",
type: "agent", // Automatically provides ScorerRunInputForAgent/ScorerRunOutputForAgent
})
.preprocess(({ run }) => {
// run.input is automatically typed as ScorerRunInputForAgent
const userMessage = run.input.inputMessages[0]?.content;
return { userMessage };
})
.generateScore(({ run, results }) => {
// run.output is automatically typed as ScorerRunOutputForAgent
const response = run.output[0]?.content;
return response.length > 10 ? 1.0 : 0.5;
});

Custom Types with Generics

For custom input/output types, use the generic approach:

import { createScorer } from "@mastra/core/scorers";

type CustomInput = { query: string; context: string[] };
type CustomOutput = { answer: string; confidence: number };

const customScorer = createScorer<CustomInput, CustomOutput>({
name: "Custom Scorer",
description: "Evaluates custom data",
}).generateScore(({ run }) => {
// run.input is typed as CustomInput
// run.output is typed as CustomOutput
return run.output.confidence;
});

Built-in Agent Types

  • ScorerRunInputForAgent - Contains inputMessages, rememberedMessages, systemMessages, and taggedSystemMessages for agent evaluation
  • ScorerRunOutputForAgent - Array of agent response messages

Using these types provides autocomplete, compile-time validation, and better documentation for your scoring logic.

Trace Scoring with Agent Types

When you use type: 'agent', your scorer is compatible for both adding directly to agents and scoring traces from agent interactions. The scorer automatically transforms trace data into the proper agent input/output format:

const agentTraceScorer = createScorer({
name: "Agent Trace Length",
description: "Evaluates agent response length",
type: "agent",
}).generateScore(({ run }) => {
// Trace data is automatically transformed to agent format
const userMessages = run.input.inputMessages;
const agentResponse = run.output[0]?.content;

// Score based on response length
return agentResponse?.length > 50 ? 0 : 1;
});

// Register with Mastra for trace scoring
const mastra = new Mastra({
scorers: {
agentTraceScorer,
},
});

Step Method Signatures

preprocess

Optional preprocessing step that can extract or transform data before analysis.

Function Mode: Function: ({ run, results }) => any

run.input:

any
Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

run.output:

any
Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string
Unique identifier for this scoring run.

run.runtimeContext:

object
Runtime context from the agent or workflow step being evaluated (optional).

results:

object
Empty object (no previous steps).

Returns: any
The method can return any value. The returned value will be available to subsequent steps as preprocessStepResult.

Prompt Object Mode:

description:

string
Description of what this preprocessing step does.

outputSchema:

ZodSchema
Zod schema for the expected output of the preprocess step.

createPrompt:

function
Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge:

object
(Optional) LLM judge for this step (can override main judge). See Judge Object section.

analyze

Optional analysis step that processes the input/output and any preprocessed data.

Function Mode: Function: ({ run, results }) => any

run.input:

any
Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

run.output:

any
Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string
Unique identifier for this scoring run.

run.runtimeContext:

object
Runtime context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult:

any
Result from preprocess step, if defined (optional).

Returns: any
The method can return any value. The returned value will be available to subsequent steps as analyzeStepResult.

Prompt Object Mode:

description:

string
Description of what this analysis step does.

outputSchema:

ZodSchema
Zod schema for the expected output of the analyze step.

createPrompt:

function
Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge:

object
(Optional) LLM judge for this step (can override main judge). See Judge Object section.

generateScore

Required step that computes the final numerical score.

Function Mode: Function: ({ run, results }) => number

run.input:

any
Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

run.output:

any
Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string
Unique identifier for this scoring run.

run.runtimeContext:

object
Runtime context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult:

any
Result from preprocess step, if defined (optional).

results.analyzeStepResult:

any
Result from analyze step, if defined (optional).

Returns: number
The method must return a numerical score.

Prompt Object Mode:

description:

string
Description of what this scoring step does.

outputSchema:

ZodSchema
Zod schema for the expected output of the generateScore step.

createPrompt:

function
Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge:

object
(Optional) LLM judge for this step (can override main judge). See Judge Object section.

When using prompt object mode, you must also provide a calculateScore function to convert the LLM output to a numerical score:

calculateScore:

function
Function: ({ run, results, analyzeStepResult }) => number. Converts the LLM's structured output into a numerical score.

generateReason

Optional step that provides an explanation for the score.

Function Mode: Function: ({ run, results, score }) => string

run.input:

any
Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

run.output:

any
Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string
Unique identifier for this scoring run.

run.runtimeContext:

object
Runtime context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult:

any
Result from preprocess step, if defined (optional).

results.analyzeStepResult:

any
Result from analyze step, if defined (optional).

score:

number
Score computed by the generateScore step.

Returns: string
The method must return a string explaining the score.

Prompt Object Mode:

description:

string
Description of what this reasoning step does.

createPrompt:

function
Function: ({ run, results, score }) => string. Returns the prompt for the LLM.

judge:

object
(Optional) LLM judge for this step (can override main judge). See Judge Object section.

All step functions can be async.