Skip to main content
Mastra 1.0 is available 🎉 Read announcement

createScorer

Mastra provides a unified createScorer factory that allows you to define custom scorers for evaluating input/output pairs. You can use either native JavaScript functions or LLM-based prompt objects for each evaluation step. Custom scorers can be added to Agents and Workflow steps.

How to Create a Custom Scorer
Direct link to How to Create a Custom Scorer

Use the createScorer factory to define your scorer with a name, description, and optional judge configuration. Then chain step methods to build your evaluation pipeline. You must provide at least a generateScore step.

Prompt object steps are step configurations expressed as objects with description + createPrompt (and outputSchema for preprocess/analyze). These steps invoke the judge LLM. Function steps are plain functions and never call the judge.

import { createScorer } from "@mastra/core/evals";

const scorer = createScorer({
id: "my-custom-scorer",
name: "My Custom Scorer", // Optional, defaults to id
description: "Evaluates responses based on custom criteria",
type: "agent", // Optional: for agent evaluation with automatic typing
judge: {
model: myModel,
instructions: "You are an expert evaluator...",
},
})
.preprocess({
/* step config */
})
.analyze({
/* step config */
})
.generateScore(({ run, results }) => {
// Return a number
})
.generateReason({
/* step config */
});

createScorer Options
Direct link to createScorer Options

id:

string
Unique identifier for the scorer. Used as the name if `name` is not provided.

name?:

string
Name of the scorer. Defaults to `id` if not provided.

description:

string
Description of what the scorer does.

judge?:

object
Optional judge configuration for LLM-based steps. See Judge Object section below.

type?:

string
Type specification for input/output. Use 'agent' for automatic agent types. For custom types, use the generic approach instead.

This function returns a scorer builder that you can chain step methods onto. See the MastraScorer reference for details on the .run() method and its input/output.

Judge Object
Direct link to Judge Object

model:

LanguageModel
The LLM model instance to use for evaluation.

instructions:

string
System prompt/instructions for the LLM.

The judge only runs for steps defined as prompt objects (preprocess, analyze, generateScore, generateReason in prompt mode). If you use function steps only, the judge is never called and there is no LLM output to inspect. In that case, any score/reason must be produced by your functions.

When a prompt-object step runs, its structured LLM output is stored in the corresponding result field (preprocessStepResult, analyzeStepResult, or the value consumed by calculateScore in generateScore).

Type Safety
Direct link to Type Safety

You can specify input/output types when creating scorers for better type inference and IntelliSense support:

Agent Type Shortcut
Direct link to Agent Type Shortcut

For evaluating agents, use type: 'agent' to automatically get the correct types for agent input/output:

import { createScorer } from "@mastra/core/evals";

// Agent scorer with automatic typing
const agentScorer = createScorer({
id: "agent-response-quality",
description: "Evaluates agent responses",
type: "agent", // Automatically provides ScorerRunInputForAgent/ScorerRunOutputForAgent
})
.preprocess(({ run }) => {
// run.input is automatically typed as ScorerRunInputForAgent
const userMessage = run.inputData.inputMessages[0]?.content;
return { userMessage };
})
.generateScore(({ run, results }) => {
// run.output is automatically typed as ScorerRunOutputForAgent
const response = run.output[0]?.content;
return response.length > 10 ? 1.0 : 0.5;
});

Custom Types with Generics
Direct link to Custom Types with Generics

For custom input/output types, use the generic approach:

import { createScorer } from "@mastra/core/evals";

type CustomInput = { query: string; context: string[] };
type CustomOutput = { answer: string; confidence: number };

const customScorer = createScorer<CustomInput, CustomOutput>({
id: "custom-scorer",
description: "Evaluates custom data",
}).generateScore(({ run }) => {
// run.input is typed as CustomInput
// run.output is typed as CustomOutput
return run.output.confidence;
});

Built-in Agent Types
Direct link to Built-in Agent Types

  • ScorerRunInputForAgent - Contains inputMessages, rememberedMessages, systemMessages, and taggedSystemMessages for agent evaluation
  • ScorerRunOutputForAgent - Array of agent response messages

Using these types provides autocomplete, compile-time validation, and better documentation for your scoring logic.

Trace Scoring with Agent Types
Direct link to Trace Scoring with Agent Types

When you use type: 'agent', your scorer is compatible for both adding directly to agents and scoring traces from agent interactions. The scorer automatically transforms trace data into the proper agent input/output format:

const agentTraceScorer = createScorer({
id: "agent-trace-length",
description: "Evaluates agent response length",
type: "agent",
}).generateScore(({ run }) => {
// Trace data is automatically transformed to agent format
const userMessages = run.inputData.inputMessages;
const agentResponse = run.output[0]?.content;

// Score based on response length
return agentResponse?.length > 50 ? 0 : 1;
});

// Register with Mastra for trace scoring
const mastra = new Mastra({
scorers: {
agentTraceScorer,
},
});

Step Method Signatures
Direct link to Step Method Signatures

preprocess
Direct link to preprocess

Optional preprocessing step that can extract or transform data before analysis.

Function Mode: Function: ({ run, results }) => any

run.input:

any
Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

run.output:

any
Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string
Unique identifier for this scoring run.

run.requestContext?:

object
Request Context from the agent or workflow step being evaluated (optional).

results:

object
Empty object (no previous steps).

Returns: any
The method can return any value. The returned value will be available to subsequent steps as preprocessStepResult.

Prompt Object Mode:

description:

string
Description of what this preprocessing step does.

outputSchema:

ZodSchema
Zod schema for the expected output of the preprocess step.

createPrompt:

function
Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge?:

object
(Optional) LLM judge for this step (can override main judge). See Judge Object section.

analyze
Direct link to analyze

Optional analysis step that processes the input/output and any preprocessed data.

Function Mode: Function: ({ run, results }) => any

run.input:

any
Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

run.output:

any
Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string
Unique identifier for this scoring run.

run.requestContext?:

object
Request Context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult?:

any
Result from preprocess step, if defined (optional).

Returns: any
The method can return any value. The returned value will be available to subsequent steps as analyzeStepResult.

Prompt Object Mode:

description:

string
Description of what this analysis step does.

outputSchema:

ZodSchema
Zod schema for the expected output of the analyze step.

createPrompt:

function
Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge?:

object
(Optional) LLM judge for this step (can override main judge). See Judge Object section.

generateScore
Direct link to generateScore

Required step that computes the final numerical score.

Function Mode: Function: ({ run, results }) => number

run.input:

any
Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

run.output:

any
Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string
Unique identifier for this scoring run.

run.requestContext?:

object
Request Context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult?:

any
Result from preprocess step, if defined (optional).

results.analyzeStepResult?:

any
Result from analyze step, if defined (optional).

Returns: number
The method must return a numerical score.

Prompt Object Mode:

description:

string
Description of what this scoring step does.

outputSchema:

ZodSchema
Zod schema for the expected output of the generateScore step.

createPrompt:

function
Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge?:

object
(Optional) LLM judge for this step (can override main judge). See Judge Object section.

When using prompt object mode, you must also provide a calculateScore function to convert the LLM output to a numerical score:

calculateScore:

function
Function: ({ run, results, analyzeStepResult }) => number. Converts the LLM's structured output into a numerical score.

generateReason
Direct link to generateReason

Optional step that provides an explanation for the score.

Function Mode: Function: ({ run, results, score }) => string

run.input:

any
Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

run.output:

any
Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string
Unique identifier for this scoring run.

run.requestContext?:

object
Request Context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult?:

any
Result from preprocess step, if defined (optional).

results.analyzeStepResult?:

any
Result from analyze step, if defined (optional).

score:

number
Score computed by the generateScore step.

Returns: string
The method must return a string explaining the score.

Prompt Object Mode:

description:

string
Description of what this reasoning step does.

createPrompt:

function
Function: ({ run, results, score }) => string. Returns the prompt for the LLM.

judge?:

object
(Optional) LLM judge for this step (can override main judge). See Judge Object section.

All step functions can be async.