createScorer
Mastra provides a unified createScorer
factory that allows you to define custom scorers for evaluating input/output pairs. You can use either native JavaScript functions or LLM-based prompt objects for each evaluation step. Custom scorers can be added to Agents and Workflow steps.
How to Create a Custom Scorer
Use the createScorer
factory to define your scorer with a name, description, and optional judge configuration. Then chain step methods to build your evaluation pipeline. You must provide at least a generateScore
step.
const scorer = createScorer({
name: "My Custom Scorer",
description: "Evaluates responses based on custom criteria",
type: "agent", // Optional: for agent evaluation with automatic typing
judge: {
model: myModel,
instructions: "You are an expert evaluator..."
}
})
.preprocess({ /* step config */ })
.analyze({ /* step config */ })
.generateScore(({ run, results }) => {
// Return a number
})
.generateReason({ /* step config */ });
createScorer Options
name:
description:
judge:
type:
This function returns a scorer builder that you can chain step methods onto. See the MastraScorer reference for details on the .run()
method and its input/output.
Judge Object
model:
instructions:
Type Safety
You can specify input/output types when creating scorers for better type inference and IntelliSense support:
Agent Type Shortcut
For evaluating agents, use type: 'agent'
to automatically get the correct types for agent input/output:
import { createScorer } from '@mastra/core/scorers';
// Agent scorer with automatic typing
const agentScorer = createScorer({
name: 'Agent Response Quality',
description: 'Evaluates agent responses',
type: 'agent' // Automatically provides ScorerRunInputForAgent/ScorerRunOutputForAgent
})
.preprocess(({ run }) => {
// run.input is automatically typed as ScorerRunInputForAgent
const userMessage = run.input.inputMessages[0]?.content;
return { userMessage };
})
.generateScore(({ run, results }) => {
// run.output is automatically typed as ScorerRunOutputForAgent
const response = run.output[0]?.content;
return response.length > 10 ? 1.0 : 0.5;
});
Custom Types with Generics
For custom input/output types, use the generic approach:
import { createScorer } from '@mastra/core/scorers';
type CustomInput = { query: string; context: string[] };
type CustomOutput = { answer: string; confidence: number };
const customScorer = createScorer<CustomInput, CustomOutput>({
name: 'Custom Scorer',
description: 'Evaluates custom data'
})
.generateScore(({ run }) => {
// run.input is typed as CustomInput
// run.output is typed as CustomOutput
return run.output.confidence;
});
Built-in Agent Types
ScorerRunInputForAgent
- ContainsinputMessages
,rememberedMessages
,systemMessages
, andtaggedSystemMessages
for agent evaluationScorerRunOutputForAgent
- Array of agent response messages
Using these types provides autocomplete, compile-time validation, and better documentation for your scoring logic.
Trace Scoring with Agent Types
When you use type: 'agent'
, your scorer is compatible for both adding directly to agents and scoring traces from agent interactions. The scorer automatically transforms trace data into the proper agent input/output format:
const agentTraceScorer = createScorer({
name: 'Agent Trace Length',
description: 'Evaluates agent response length',
type: 'agent'
})
.generateScore(({ run }) => {
// Trace data is automatically transformed to agent format
const userMessages = run.input.inputMessages;
const agentResponse = run.output[0]?.content;
// Score based on response length
return agentResponse?.length > 50 ? 0 : 1;
});
// Register with Mastra for trace scoring
const mastra = new Mastra({
scorers: {
agentTraceScorer
}
});
Step Method Signatures
preprocess
Optional preprocessing step that can extract or transform data before analysis.
Function Mode:
Function: ({ run, results }) => any
run.input:
run.output:
run.runId:
run.runtimeContext:
results:
Returns: any
The method can return any value. The returned value will be available to subsequent steps as preprocessStepResult
.
Prompt Object Mode:
description:
outputSchema:
createPrompt:
judge:
analyze
Optional analysis step that processes the input/output and any preprocessed data.
Function Mode:
Function: ({ run, results }) => any
run.input:
run.output:
run.runId:
run.runtimeContext:
results.preprocessStepResult:
Returns: any
The method can return any value. The returned value will be available to subsequent steps as analyzeStepResult
.
Prompt Object Mode:
description:
outputSchema:
createPrompt:
judge:
generateScore
Required step that computes the final numerical score.
Function Mode:
Function: ({ run, results }) => number
run.input:
run.output:
run.runId:
run.runtimeContext:
results.preprocessStepResult:
results.analyzeStepResult:
Returns: number
The method must return a numerical score.
Prompt Object Mode:
description:
outputSchema:
createPrompt:
judge:
When using prompt object mode, you must also provide a calculateScore
function to convert the LLM output to a numerical score:
calculateScore:
generateReason
Optional step that provides an explanation for the score.
Function Mode:
Function: ({ run, results, score }) => string
run.input:
run.output:
run.runId:
run.runtimeContext:
results.preprocessStepResult:
results.analyzeStepResult:
score:
Returns: string
The method must return a string explaining the score.
Prompt Object Mode:
description:
createPrompt:
judge:
All step functions can be async.