# createScorer Mastra provides a unified `createScorer` factory that allows you to define custom scorers for evaluating input/output pairs. You can use either native JavaScript functions or LLM-based prompt objects for each evaluation step. Custom scorers can be added to Agents and Workflow steps. ## How to Create a Custom Scorer Use the `createScorer` factory to define your scorer with a name, description, and optional judge configuration. Then chain step methods to build your evaluation pipeline. You must provide at least a `generateScore` step. **Prompt object steps** are step configurations expressed as objects with `description` + `createPrompt` (and `outputSchema` for `preprocess`/`analyze`). These steps invoke the judge LLM. **Function steps** are plain functions and never call the judge. ```typescript import { createScorer } from "@mastra/core/evals"; const scorer = createScorer({ id: "my-custom-scorer", name: "My Custom Scorer", // Optional, defaults to id description: "Evaluates responses based on custom criteria", type: "agent", // Optional: for agent evaluation with automatic typing judge: { model: myModel, instructions: "You are an expert evaluator...", }, }) .preprocess({ /* step config */ }) .analyze({ /* step config */ }) .generateScore(({ run, results }) => { // Return a number }) .generateReason({ /* step config */ }); ``` ## createScorer Options **id:** (`string`): Unique identifier for the scorer. Used as the name if \`name\` is not provided. **name?:** (`string`): Name of the scorer. Defaults to \`id\` if not provided. **description:** (`string`): Description of what the scorer does. **judge?:** (`object`): Optional judge configuration for LLM-based steps. See Judge Object section below. **type?:** (`string`): Type specification for input/output. Use 'agent' for automatic agent types. For custom types, use the generic approach instead. This function returns a scorer builder that you can chain step methods onto. See the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer) for details on the `.run()` method and its input/output. ## Judge Object **model:** (`LanguageModel`): The LLM model instance to use for evaluation. **instructions:** (`string`): System prompt/instructions for the LLM. The judge only runs for steps defined as **prompt objects** (`preprocess`, `analyze`, `generateScore`, `generateReason` in prompt mode). If you use function steps only, the judge is never called and there is no LLM output to inspect. In that case, any score/reason must be produced by your functions. When a prompt-object step runs, its structured LLM output is stored in the corresponding result field (`preprocessStepResult`, `analyzeStepResult`, or the value consumed by `calculateScore` in `generateScore`). ## Type Safety You can specify input/output types when creating scorers for better type inference and IntelliSense support: ### Agent Type Shortcut For evaluating agents, use `type: 'agent'` to automatically get the correct types for agent input/output: ```typescript import { createScorer } from "@mastra/core/evals"; // Agent scorer with automatic typing const agentScorer = createScorer({ id: "agent-response-quality", description: "Evaluates agent responses", type: "agent", // Automatically provides ScorerRunInputForAgent/ScorerRunOutputForAgent }) .preprocess(({ run }) => { // run.input is automatically typed as ScorerRunInputForAgent const userMessage = run.inputData.inputMessages[0]?.content; return { userMessage }; }) .generateScore(({ run, results }) => { // run.output is automatically typed as ScorerRunOutputForAgent const response = run.output[0]?.content; return response.length > 10 ? 1.0 : 0.5; }); ``` ### Custom Types with Generics For custom input/output types, use the generic approach: ```typescript import { createScorer } from "@mastra/core/evals"; type CustomInput = { query: string; context: string[] }; type CustomOutput = { answer: string; confidence: number }; const customScorer = createScorer({ id: "custom-scorer", description: "Evaluates custom data", }).generateScore(({ run }) => { // run.input is typed as CustomInput // run.output is typed as CustomOutput return run.output.confidence; }); ``` ### Built-in Agent Types - **`ScorerRunInputForAgent`** - Contains `inputMessages`, `rememberedMessages`, `systemMessages`, and `taggedSystemMessages` for agent evaluation - **`ScorerRunOutputForAgent`** - Array of agent response messages Using these types provides autocomplete, compile-time validation, and better documentation for your scoring logic. ## Trace Scoring with Agent Types When you use `type: 'agent'`, your scorer is compatible for both adding directly to agents and scoring traces from agent interactions. The scorer automatically transforms trace data into the proper agent input/output format: ```typescript const agentTraceScorer = createScorer({ id: "agent-trace-length", description: "Evaluates agent response length", type: "agent", }).generateScore(({ run }) => { // Trace data is automatically transformed to agent format const userMessages = run.inputData.inputMessages; const agentResponse = run.output[0]?.content; // Score based on response length return agentResponse?.length > 50 ? 0 : 1; }); // Register with Mastra for trace scoring const mastra = new Mastra({ scorers: { agentTraceScorer, }, }); ``` ## Step Method Signatures ### preprocess Optional preprocessing step that can extract or transform data before analysis. **Function Mode:** Function: `({ run, results }) => any` **run.input:** (`any`): Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. \`\[{ role: 'user', content: 'hello world' }]\`. If the scorer is used in a workflow, this will be the input of the workflow. **run.output:** (`any`): Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output. **run.runId:** (`string`): Unique identifier for this scoring run. **run.requestContext?:** (`object`): Request Context from the agent or workflow step being evaluated (optional). **results:** (`object`): Empty object (no previous steps). Returns: `any`\ The method can return any value. The returned value will be available to subsequent steps as `preprocessStepResult`. **Prompt Object Mode:** **description:** (`string`): Description of what this preprocessing step does. **outputSchema:** (`ZodSchema`): Zod schema for the expected output of the preprocess step. **createPrompt:** (`function`): Function: ({ run, results }) => string. Returns the prompt for the LLM. **judge?:** (`object`): (Optional) LLM judge for this step (can override main judge). See Judge Object section. ### analyze Optional analysis step that processes the input/output and any preprocessed data. **Function Mode:** Function: `({ run, results }) => any` **run.input:** (`any`): Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. \`\[{ role: 'user', content: 'hello world' }]\`. If the scorer is used in a workflow, this will be the input of the workflow. **run.output:** (`any`): Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output. **run.runId:** (`string`): Unique identifier for this scoring run. **run.requestContext?:** (`object`): Request Context from the agent or workflow step being evaluated (optional). **results.preprocessStepResult?:** (`any`): Result from preprocess step, if defined (optional). Returns: `any`\ The method can return any value. The returned value will be available to subsequent steps as `analyzeStepResult`. **Prompt Object Mode:** **description:** (`string`): Description of what this analysis step does. **outputSchema:** (`ZodSchema`): Zod schema for the expected output of the analyze step. **createPrompt:** (`function`): Function: ({ run, results }) => string. Returns the prompt for the LLM. **judge?:** (`object`): (Optional) LLM judge for this step (can override main judge). See Judge Object section. ### generateScore **Required** step that computes the final numerical score. **Function Mode:** Function: `({ run, results }) => number` **run.input:** (`any`): Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. \`\[{ role: 'user', content: 'hello world' }]\`. If the scorer is used in a workflow, this will be the input of the workflow. **run.output:** (`any`): Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output. **run.runId:** (`string`): Unique identifier for this scoring run. **run.requestContext?:** (`object`): Request Context from the agent or workflow step being evaluated (optional). **results.preprocessStepResult?:** (`any`): Result from preprocess step, if defined (optional). **results.analyzeStepResult?:** (`any`): Result from analyze step, if defined (optional). Returns: `number`\ The method must return a numerical score. **Prompt Object Mode:** **description:** (`string`): Description of what this scoring step does. **outputSchema:** (`ZodSchema`): Zod schema for the expected output of the generateScore step. **createPrompt:** (`function`): Function: ({ run, results }) => string. Returns the prompt for the LLM. **judge?:** (`object`): (Optional) LLM judge for this step (can override main judge). See Judge Object section. When using prompt object mode, you must also provide a `calculateScore` function to convert the LLM output to a numerical score: **calculateScore:** (`function`): Function: ({ run, results, analyzeStepResult }) => number. Converts the LLM's structured output into a numerical score. ### generateReason Optional step that provides an explanation for the score. **Function Mode:** Function: `({ run, results, score }) => string` **run.input:** (`any`): Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. \`\[{ role: 'user', content: 'hello world' }]\`. If the scorer is used in a workflow, this will be the input of the workflow. **run.output:** (`any`): Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output. **run.runId:** (`string`): Unique identifier for this scoring run. **run.requestContext?:** (`object`): Request Context from the agent or workflow step being evaluated (optional). **results.preprocessStepResult?:** (`any`): Result from preprocess step, if defined (optional). **results.analyzeStepResult?:** (`any`): Result from analyze step, if defined (optional). **score:** (`number`): Score computed by the generateScore step. Returns: `string`\ The method must return a string explaining the score. **Prompt Object Mode:** **description:** (`string`): Description of what this reasoning step does. **createPrompt:** (`function`): Function: ({ run, results, score }) => string. Returns the prompt for the LLM. **judge?:** (`object`): (Optional) LLM judge for this step (can override main judge). See Judge Object section. All step functions can be async.