Skip to Content
ReferenceScorerscreateLLMScorer

createLLMScorer

The createLLMScorer() function lets you define custom scorers that use a language model (LLM) as a judge for evaluation. LLM scorers are ideal for tasks where you want to use prompt-based evaluation, such as answer relevancy, faithfulness, or custom prompt-based metrics. LLM scorers integrate seamlessly with the Mastra scoring framework and can be used anywhere built-in scorers are used.

For a usage example, see the Custom LLM Judge Examples.

createLLMScorer Options

name:

string
Name of the scorer.

description:

string
Description of what the scorer does.

judge:

object
Judge configuration object. Must include a model and instructions (system prompt). See Judge Object section below.

extract:

object
(Optional) Extraction step configuration object. See Extract Object section below.

analyze:

object
Analysis step configuration object. See Analyze Object section below.

reason:

object
(Optional) Reason step configuration object. See Reason Object section below.

calculateScore:

function
Function: ({ run }) => number. Computes the final score from the analyze step result.

This function returns an instance of the MastraScorer class. See the MastraScorer reference for details on the .run() method and its input/output.

Judge Object

model:

LanguageModel
The LLM model instance to use for evaluation.

instructions:

string
System prompt/instructions for the LLM.

Extract Object

description:

string
Description of the extract step.

judge:

object
(Optional) LLM judge for this step (can override main judge/model). See Judge Object section.

outputSchema:

ZodSchema
Zod schema for the expected output of the extract step.

createPrompt:

function
Function: ({ run: ScoringInput }) => string. Returns the prompt for the LLM.

Analyze Object

description:

string
Description of the analyze step.

judge:

object
(Optional) LLM judge for this step (can override main judge/model). See Judge Object section.

outputSchema:

ZodSchema
Zod schema for the expected output of the analyze step.

createPrompt:

function
Function: ({ run: ScoringInput & { extractStepResult } }) => string. Returns the LLM prompt.

Calculate Score Function

The calculateScore function converts the LLM’s structured analysis into a numerical score. This function receives the results from previous steps but not the score itself (since that’s what it calculates).

input:

Record<string, any>[]
Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

output:

Record<string, any>
Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

runtimeContext:

object
Runtime context from the agent or workflow step being evaluated (optional).

extractStepResult:

object
Result of the extract step, if defined (optional).

analyzeStepResult:

object
Structured result from the analyze step, conforming to the outputSchema defined in the analyze step.

Returns: number
The function must return a numerical score, typically in the 0-1 range where 1 represents the best possible score.

Reason Object

description:

string
Description of the reason step.

judge:

object
(Optional) LLM judge for this step (can override main judge/model). See Judge Object section.

createPrompt:

function
Function: ({ run }) => string. `run` includes input, output, extractStepResult, analyzeStepResult, and score. Returns the prompt for the LLM.

LLM scorers may also include step-specific prompt fields in the return value, such as extractPrompt, analyzePrompt, and reasonPrompt.