createScorer
Mastra provides a unified createScorer
factory that allows you to define custom scorers for evaluating input/output pairs. You can use either native JavaScript functions or LLM-based prompt objects for each evaluation step. Custom scorers can be added to Agents and Workflow steps.
How to Create a Custom Scorer
Use the createScorer
factory to define your scorer with a name, description, and optional judge configuration. Then chain step methods to build your evaluation pipeline. You must provide at least a generateScore
step.
const scorer = createScorer({
name: "My Custom Scorer",
description: "Evaluates responses based on custom criteria",
judge: {
model: myModel,
instructions: "You are an expert evaluator..."
}
})
.preprocess({ /* step config */ })
.analyze({ /* step config */ })
.generateScore(({ run, results }) => {
// Return a number
})
.generateReason({ /* step config */ });
createScorer Options
name:
description:
judge:
This function returns a scorer builder that you can chain step methods onto. See the MastraScorer reference for details on the .run()
method and its input/output.
Judge Object
model:
instructions:
Type Safety
For better type inference and IntelliSense support, you can specify input/output types when creating scorers:
import { createScorer, ScorerRunInputForAgent, ScorerRunOutputForAgent } from '@mastra/core';
// For agent evaluation with full type safety
const agentScorer = createScorer<ScorerRunInputForAgent, ScorerRunOutputForAgent>({
name: 'Agent Response Quality',
description: 'Evaluates agent responses'
})
.preprocess(({ run }) => {
// run.input is typed as ScorerRunInputForAgent
const userMessage = run.input.inputMessages[0]?.content;
return { userMessage };
})
.generateScore(({ run, results }) => {
// run.output is typed as ScorerRunOutputForAgent
const response = run.output[0]?.content;
return response.length > 10 ? 1.0 : 0.5;
});
// For custom input/output types
type CustomInput = { query: string; context: string[] };
type CustomOutput = { answer: string; confidence: number };
const customScorer = createScorer<CustomInput, CustomOutput>({
name: 'Custom Scorer',
description: 'Evaluates custom data'
})
.generateScore(({ run }) => run.output.confidence);
Built-in Agent Types
ScorerRunInputForAgent
- ContainsinputMessages
,rememberedMessages
,systemMessages
, andtaggedSystemMessages
for agent evaluationScorerRunOutputForAgent
- Array of agent response messages
Using these types provides autocomplete, compile-time validation, and better documentation for your scoring logic.
Step Method Signatures
preprocess
Optional preprocessing step that can extract or transform data before analysis.
Function Mode:
Function: ({ run, results }) => any
run.input:
run.output:
run.runId:
run.runtimeContext:
results:
Returns: any
The method can return any value. The returned value will be available to subsequent steps as preprocessStepResult
.
Prompt Object Mode:
description:
outputSchema:
createPrompt:
judge:
analyze
Optional analysis step that processes the input/output and any preprocessed data.
Function Mode:
Function: ({ run, results }) => any
run.input:
run.output:
run.runId:
run.runtimeContext:
results.preprocessStepResult:
Returns: any
The method can return any value. The returned value will be available to subsequent steps as analyzeStepResult
.
Prompt Object Mode:
description:
outputSchema:
createPrompt:
judge:
generateScore
Required step that computes the final numerical score.
Function Mode:
Function: ({ run, results }) => number
run.input:
run.output:
run.runId:
run.runtimeContext:
results.preprocessStepResult:
results.analyzeStepResult:
Returns: number
The method must return a numerical score.
Prompt Object Mode:
description:
outputSchema:
createPrompt:
judge:
When using prompt object mode, you must also provide a calculateScore
function to convert the LLM output to a numerical score:
calculateScore:
generateReason
Optional step that provides an explanation for the score.
Function Mode:
Function: ({ run, results, score }) => string
run.input:
run.output:
run.runId:
run.runtimeContext:
results.preprocessStepResult:
results.analyzeStepResult:
score:
Returns: string
The method must return a string explaining the score.
Prompt Object Mode:
description:
createPrompt:
judge:
All step functions can be async.