createScorer

Mastra provides a unified createScorer factory that allows you to define custom scorers for evaluating input/output pairs. You can use either native JavaScript functions or LLM-based prompt objects for each evaluation step. Custom scorers can be added to Agents and Workflow steps.

How to Create a Custom Scorer

Use the createScorer factory to define your scorer with a name, description, and optional judge configuration. Then chain step methods to build your evaluation pipeline. You must provide at least a generateScore step.


const scorer = createScorer({
  name: "My Custom Scorer",
  description: "Evaluates responses based on custom criteria",
  type: "agent", // Optional: for agent evaluation with automatic typing
  judge: {
    model: myModel,
    instructions: "You are an expert evaluator..."
  }
})
.preprocess({ /* step config */ })
.analyze({ /* step config */ })
.generateScore(({ run, results }) => {
  // Return a number
})
.generateReason({ /* step config */ });

createScorer Options

name:

string

Name of the scorer.

description:

string

Description of what the scorer does.

judge:

object

Optional judge configuration for LLM-based steps. See Judge Object section below.

type:

string

Type specification for input/output. Use 'agent' for automatic agent types. For custom types, use the generic approach instead.

This function returns a scorer builder that you can chain step methods onto. See the MastraScorer reference for details on the .run() method and its input/output.

Judge Object

model:

LanguageModel

The LLM model instance to use for evaluation.

instructions:

string

System prompt/instructions for the LLM.

Type Safety

You can specify input/output types when creating scorers for better type inference and IntelliSense support:

Agent Type Shortcut

For evaluating agents, use type: 'agent' to automatically get the correct types for agent input/output:


import { createScorer } from '@mastra/core/scorers';
 
// Agent scorer with automatic typing
const agentScorer = createScorer({
  name: 'Agent Response Quality',
  description: 'Evaluates agent responses',
  type: 'agent' // Automatically provides ScorerRunInputForAgent/ScorerRunOutputForAgent
})
.preprocess(({ run }) => {
  // run.input is automatically typed as ScorerRunInputForAgent
  const userMessage = run.input.inputMessages[0]?.content;
  return { userMessage };
})
.generateScore(({ run, results }) => {
  // run.output is automatically typed as ScorerRunOutputForAgent  
  const response = run.output[0]?.content;
  return response.length > 10 ? 1.0 : 0.5;
});

Custom Types with Generics

For custom input/output types, use the generic approach:


import { createScorer } from '@mastra/core/scorers';
 
type CustomInput = { query: string; context: string[] };
type CustomOutput = { answer: string; confidence: number };
 
const customScorer = createScorer<CustomInput, CustomOutput>({
  name: 'Custom Scorer',
  description: 'Evaluates custom data'
})
.generateScore(({ run }) => {
  // run.input is typed as CustomInput
  // run.output is typed as CustomOutput
  return run.output.confidence;
});

Built-in Agent Types

ScorerRunInputForAgent - Contains inputMessages, rememberedMessages, systemMessages, and taggedSystemMessages for agent evaluation
ScorerRunOutputForAgent - Array of agent response messages

Using these types provides autocomplete, compile-time validation, and better documentation for your scoring logic.

Trace Scoring with Agent Types

When you use type: 'agent', your scorer is compatible for both adding directly to agents and scoring traces from agent interactions. The scorer automatically transforms trace data into the proper agent input/output format:


const agentTraceScorer = createScorer({
  name: 'Agent Trace Length',
  description: 'Evaluates agent response length',
  type: 'agent'
})
.generateScore(({ run }) => {
  // Trace data is automatically transformed to agent format
  const userMessages = run.input.inputMessages;
  const agentResponse = run.output[0]?.content;
  
  // Score based on response length
  return agentResponse?.length > 50 ? 0 : 1;
});
 
// Register with Mastra for trace scoring
const mastra = new Mastra({
  scorers: {
    agentTraceScorer
  }
});

Step Method Signatures

preprocess

Optional preprocessing step that can extract or transform data before analysis.

Function Mode: Function: ({ run, results }) => any

run.input:

any

Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.runtimeContext:

object

Runtime context from the agent or workflow step being evaluated (optional).

results:

object

Empty object (no previous steps).

Returns: any
The method can return any value. The returned value will be available to subsequent steps as preprocessStepResult.

Prompt Object Mode:

description:

string

Description of what this preprocessing step does.

outputSchema:

ZodSchema

Zod schema for the expected output of the preprocess step.

createPrompt:

function

Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

analyze

Optional analysis step that processes the input/output and any preprocessed data.

Function Mode: Function: ({ run, results }) => any

run.input:

any

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.runtimeContext:

object

Runtime context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult:

any

Result from preprocess step, if defined (optional).

Returns: any
The method can return any value. The returned value will be available to subsequent steps as analyzeStepResult.

Prompt Object Mode:

description:

string

Description of what this analysis step does.

outputSchema:

ZodSchema

Zod schema for the expected output of the analyze step.

createPrompt:

function

Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

generateScore

Required step that computes the final numerical score.

Function Mode: Function: ({ run, results }) => number

run.input:

any

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.runtimeContext:

object

Runtime context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult:

any

Result from preprocess step, if defined (optional).

results.analyzeStepResult:

any

Result from analyze step, if defined (optional).

Returns: number
The method must return a numerical score.

Prompt Object Mode:

description:

string

Description of what this scoring step does.

outputSchema:

ZodSchema

Zod schema for the expected output of the generateScore step.

createPrompt:

function

Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

When using prompt object mode, you must also provide a calculateScore function to convert the LLM output to a numerical score:

calculateScore:

function

Function: ({ run, results, analyzeStepResult }) => number. Converts the LLM's structured output into a numerical score.

generateReason

Optional step that provides an explanation for the score.

Function Mode: Function: ({ run, results, score }) => string

run.input:

any

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.runtimeContext:

object

Runtime context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult:

any

Result from preprocess step, if defined (optional).

results.analyzeStepResult:

any

Result from analyze step, if defined (optional).

score:

number

Score computed by the generateScore step.

Returns: string
The method must return a string explaining the score.

Prompt Object Mode:

description:

string

Description of what this reasoning step does.

createPrompt:

function

Function: ({ run, results, score }) => string. Returns the prompt for the LLM.

judge:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

All step functions can be async.

createScorer

How to Create a Custom Scorer


const scorer = createScorer({
  name: "My Custom Scorer",
  description: "Evaluates responses based on custom criteria",
  type: "agent", // Optional: for agent evaluation with automatic typing
  judge: {
    model: myModel,
    instructions: "You are an expert evaluator..."
  }
})
.preprocess({ /* step config */ })
.analyze({ /* step config */ })
.generateScore(({ run, results }) => {
  // Return a number
})
.generateReason({ /* step config */ });

createScorer Options

name:

string

Name of the scorer.

description:

string

Description of what the scorer does.

judge:

object

Optional judge configuration for LLM-based steps. See Judge Object section below.

type:

string

Type specification for input/output. Use 'agent' for automatic agent types. For custom types, use the generic approach instead.

This function returns a scorer builder that you can chain step methods onto. See the MastraScorer reference for details on the .run() method and its input/output.

Judge Object

model:

LanguageModel

The LLM model instance to use for evaluation.

instructions:

string

System prompt/instructions for the LLM.

Type Safety

You can specify input/output types when creating scorers for better type inference and IntelliSense support:

Agent Type Shortcut

For evaluating agents, use type: 'agent' to automatically get the correct types for agent input/output:


import { createScorer } from '@mastra/core/scorers';
 
// Agent scorer with automatic typing
const agentScorer = createScorer({
  name: 'Agent Response Quality',
  description: 'Evaluates agent responses',
  type: 'agent' // Automatically provides ScorerRunInputForAgent/ScorerRunOutputForAgent
})
.preprocess(({ run }) => {
  // run.input is automatically typed as ScorerRunInputForAgent
  const userMessage = run.input.inputMessages[0]?.content;
  return { userMessage };
})
.generateScore(({ run, results }) => {
  // run.output is automatically typed as ScorerRunOutputForAgent  
  const response = run.output[0]?.content;
  return response.length > 10 ? 1.0 : 0.5;
});

Custom Types with Generics

For custom input/output types, use the generic approach:


import { createScorer } from '@mastra/core/scorers';
 
type CustomInput = { query: string; context: string[] };
type CustomOutput = { answer: string; confidence: number };
 
const customScorer = createScorer<CustomInput, CustomOutput>({
  name: 'Custom Scorer',
  description: 'Evaluates custom data'
})
.generateScore(({ run }) => {
  // run.input is typed as CustomInput
  // run.output is typed as CustomOutput
  return run.output.confidence;
});

Built-in Agent Types

ScorerRunInputForAgent - Contains inputMessages, rememberedMessages, systemMessages, and taggedSystemMessages for agent evaluation
ScorerRunOutputForAgent - Array of agent response messages

Using these types provides autocomplete, compile-time validation, and better documentation for your scoring logic.

Trace Scoring with Agent Types


const agentTraceScorer = createScorer({
  name: 'Agent Trace Length',
  description: 'Evaluates agent response length',
  type: 'agent'
})
.generateScore(({ run }) => {
  // Trace data is automatically transformed to agent format
  const userMessages = run.input.inputMessages;
  const agentResponse = run.output[0]?.content;
  
  // Score based on response length
  return agentResponse?.length > 50 ? 0 : 1;
});
 
// Register with Mastra for trace scoring
const mastra = new Mastra({
  scorers: {
    agentTraceScorer
  }
});

Step Method Signatures

preprocess

Optional preprocessing step that can extract or transform data before analysis.

Function Mode: Function: ({ run, results }) => any

run.input:

any

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.runtimeContext:

object

Runtime context from the agent or workflow step being evaluated (optional).

results:

object

Empty object (no previous steps).

Returns: any
The method can return any value. The returned value will be available to subsequent steps as preprocessStepResult.

Prompt Object Mode:

description:

string

Description of what this preprocessing step does.

outputSchema:

ZodSchema

Zod schema for the expected output of the preprocess step.

createPrompt:

function

Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

analyze

Optional analysis step that processes the input/output and any preprocessed data.

Function Mode: Function: ({ run, results }) => any

run.input:

any

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.runtimeContext:

object

Runtime context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult:

any

Result from preprocess step, if defined (optional).

Returns: any
The method can return any value. The returned value will be available to subsequent steps as analyzeStepResult.

Prompt Object Mode:

description:

string

Description of what this analysis step does.

outputSchema:

ZodSchema

Zod schema for the expected output of the analyze step.

createPrompt:

function

Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

generateScore

Required step that computes the final numerical score.

Function Mode: Function: ({ run, results }) => number

run.input:

any

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.runtimeContext:

object

Runtime context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult:

any

Result from preprocess step, if defined (optional).

results.analyzeStepResult:

any

Result from analyze step, if defined (optional).

Returns: number
The method must return a numerical score.

Prompt Object Mode:

description:

string

Description of what this scoring step does.

outputSchema:

ZodSchema

Zod schema for the expected output of the generateScore step.

createPrompt:

function

Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

When using prompt object mode, you must also provide a calculateScore function to convert the LLM output to a numerical score:

calculateScore:

function

Function: ({ run, results, analyzeStepResult }) => number. Converts the LLM's structured output into a numerical score.

generateReason

Optional step that provides an explanation for the score.

Function Mode: Function: ({ run, results, score }) => string

run.input:

any

run.output:

any

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string

Unique identifier for this scoring run.

run.runtimeContext:

object

Runtime context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult:

any

Result from preprocess step, if defined (optional).

results.analyzeStepResult:

any

Result from analyze step, if defined (optional).

score:

number

Score computed by the generateScore step.

Returns: string
The method must return a string explaining the score.

Prompt Object Mode:

description:

string

Description of what this reasoning step does.

createPrompt:

function

Function: ({ run, results, score }) => string. Returns the prompt for the LLM.

judge:

object

(Optional) LLM judge for this step (can override main judge). See Judge Object section.

All step functions can be async.