createScorer
Mastra provides a unified createScorer factory that allows you to define custom scorers for evaluating input/output pairs. You can use either native JavaScript functions or LLM-based prompt objects for each evaluation step. Custom scorers can be added to Agents and Workflow steps.
How to Create a Custom ScorerDirect link to How to Create a Custom Scorer
Use the createScorer factory to define your scorer with a name, description, and optional judge configuration. Then chain step methods to build your evaluation pipeline. You must provide at least a generateScore step.
Prompt object steps are step configurations expressed as objects with description + createPrompt (and outputSchema for preprocess/analyze). These steps invoke the judge LLM. Function steps are plain functions and never call the judge.
import { createScorer } from "@mastra/core/evals";
const scorer = createScorer({
id: "my-custom-scorer",
name: "My Custom Scorer", // Optional, defaults to id
description: "Evaluates responses based on custom criteria",
type: "agent", // Optional: for agent evaluation with automatic typing
judge: {
model: myModel,
instructions: "You are an expert evaluator...",
},
})
.preprocess({
/* step config */
})
.analyze({
/* step config */
})
.generateScore(({ run, results }) => {
// Return a number
})
.generateReason({
/* step config */
});
createScorer OptionsDirect link to createScorer Options
id:
name?:
description:
judge?:
type?:
This function returns a scorer builder that you can chain step methods onto. See the MastraScorer reference for details on the .run() method and its input/output.
Judge ObjectDirect link to Judge Object
model:
instructions:
The judge only runs for steps defined as prompt objects (preprocess, analyze, generateScore, generateReason in prompt mode). If you use function steps only, the judge is never called and there is no LLM output to inspect. In that case, any score/reason must be produced by your functions.
When a prompt-object step runs, its structured LLM output is stored in the corresponding result field (preprocessStepResult, analyzeStepResult, or the value consumed by calculateScore in generateScore).
Type SafetyDirect link to Type Safety
You can specify input/output types when creating scorers for better type inference and IntelliSense support:
Agent Type ShortcutDirect link to Agent Type Shortcut
For evaluating agents, use type: 'agent' to automatically get the correct types for agent input/output:
import { createScorer } from "@mastra/core/evals";
// Agent scorer with automatic typing
const agentScorer = createScorer({
id: "agent-response-quality",
description: "Evaluates agent responses",
type: "agent", // Automatically provides ScorerRunInputForAgent/ScorerRunOutputForAgent
})
.preprocess(({ run }) => {
// run.input is automatically typed as ScorerRunInputForAgent
const userMessage = run.inputData.inputMessages[0]?.content;
return { userMessage };
})
.generateScore(({ run, results }) => {
// run.output is automatically typed as ScorerRunOutputForAgent
const response = run.output[0]?.content;
return response.length > 10 ? 1.0 : 0.5;
});
Custom Types with GenericsDirect link to Custom Types with Generics
For custom input/output types, use the generic approach:
import { createScorer } from "@mastra/core/evals";
type CustomInput = { query: string; context: string[] };
type CustomOutput = { answer: string; confidence: number };
const customScorer = createScorer<CustomInput, CustomOutput>({
id: "custom-scorer",
description: "Evaluates custom data",
}).generateScore(({ run }) => {
// run.input is typed as CustomInput
// run.output is typed as CustomOutput
return run.output.confidence;
});
Built-in Agent TypesDirect link to Built-in Agent Types
ScorerRunInputForAgent- ContainsinputMessages,rememberedMessages,systemMessages, andtaggedSystemMessagesfor agent evaluationScorerRunOutputForAgent- Array of agent response messages
Using these types provides autocomplete, compile-time validation, and better documentation for your scoring logic.
Trace Scoring with Agent TypesDirect link to Trace Scoring with Agent Types
When you use type: 'agent', your scorer is compatible for both adding directly to agents and scoring traces from agent interactions. The scorer automatically transforms trace data into the proper agent input/output format:
const agentTraceScorer = createScorer({
id: "agent-trace-length",
description: "Evaluates agent response length",
type: "agent",
}).generateScore(({ run }) => {
// Trace data is automatically transformed to agent format
const userMessages = run.inputData.inputMessages;
const agentResponse = run.output[0]?.content;
// Score based on response length
return agentResponse?.length > 50 ? 0 : 1;
});
// Register with Mastra for trace scoring
const mastra = new Mastra({
scorers: {
agentTraceScorer,
},
});
Step Method SignaturesDirect link to Step Method Signatures
preprocessDirect link to preprocess
Optional preprocessing step that can extract or transform data before analysis.
Function Mode:
Function: ({ run, results }) => any
run.input:
run.output:
run.runId:
run.requestContext?:
results:
Returns: any
The method can return any value. The returned value will be available to subsequent steps as preprocessStepResult.
Prompt Object Mode:
description:
outputSchema:
createPrompt:
judge?:
analyzeDirect link to analyze
Optional analysis step that processes the input/output and any preprocessed data.
Function Mode:
Function: ({ run, results }) => any
run.input:
run.output:
run.runId:
run.requestContext?:
results.preprocessStepResult?:
Returns: any
The method can return any value. The returned value will be available to subsequent steps as analyzeStepResult.
Prompt Object Mode:
description:
outputSchema:
createPrompt:
judge?:
generateScoreDirect link to generateScore
Required step that computes the final numerical score.
Function Mode:
Function: ({ run, results }) => number
run.input:
run.output:
run.runId:
run.requestContext?:
results.preprocessStepResult?:
results.analyzeStepResult?:
Returns: number
The method must return a numerical score.
Prompt Object Mode:
description:
outputSchema:
createPrompt:
judge?:
When using prompt object mode, you must also provide a calculateScore function to convert the LLM output to a numerical score:
calculateScore:
generateReasonDirect link to generateReason
Optional step that provides an explanation for the score.
Function Mode:
Function: ({ run, results, score }) => string
run.input:
run.output:
run.runId:
run.requestContext?:
results.preprocessStepResult?:
results.analyzeStepResult?:
score:
Returns: string
The method must return a string explaining the score.
Prompt Object Mode:
description:
createPrompt:
judge?:
All step functions can be async.