createLLMScorer

The createLLMScorer() function lets you define custom scorers that use a language model (LLM) as a judge for evaluation. LLM scorers are ideal for tasks where you want to use prompt-based evaluation, such as answer relevancy, faithfulness, or custom prompt-based metrics. LLM scorers integrate seamlessly with the Mastra scoring framework and can be used anywhere built-in scorers are used.

For a usage example, see the Custom LLM Judge Examples.

createLLMScorer Options

name:

string

Name of the scorer.

description:

string

Description of what the scorer does.

judge:

object

Judge configuration object. Must include a model and instructions (system prompt). See Judge Object section below.

extract:

object

(Optional) Extraction step configuration object. See Extract Object section below.

analyze:

object

Analysis step configuration object. See Analyze Object section below.

reason:

object

(Optional) Reason step configuration object. See Reason Object section below.

calculateScore:

function

Function: ({ run }) => number. Computes the final score from the analyze step result.

This function returns an instance of the MastraScorer class. See the MastraScorer reference for details on the .run() method and its input/output.

Judge Object

model:

LanguageModel

The LLM model instance to use for evaluation.

instructions:

string

System prompt/instructions for the LLM.

Extract Object

description:

string

Description of the extract step.

judge:

object

(Optional) LLM judge for this step (can override main judge/model). See Judge Object section.

outputSchema:

ZodSchema

Zod schema for the expected output of the extract step.

createPrompt:

function

Function: ({ run: ScoringInput }) => string. Returns the prompt for the LLM.

Analyze Object

description:

string

Description of the analyze step.

judge:

object

(Optional) LLM judge for this step (can override main judge/model). See Judge Object section.

outputSchema:

ZodSchema

Zod schema for the expected output of the analyze step.

createPrompt:

function

Function: ({ run: ScoringInput & { extractStepResult } }) => string. Returns the LLM prompt.

Calculate Score Function

The calculateScore function converts the LLM’s structured analysis into a numerical score. This function receives the results from previous steps but not the score itself (since that’s what it calculates).

input:

Record<string, any>[]

Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

output:

Record<string, any>

Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

runtimeContext:

object

Runtime context from the agent or workflow step being evaluated (optional).

extractStepResult:

object

Result of the extract step, if defined (optional).

analyzeStepResult:

object

Structured result from the analyze step, conforming to the outputSchema defined in the analyze step.

Returns: number
The function must return a numerical score, typically in the 0-1 range where 1 represents the best possible score.

Reason Object

description:

string

Description of the reason step.

judge:

object

(Optional) LLM judge for this step (can override main judge/model). See Judge Object section.

createPrompt:

function

Function: ({ run }) => string. `run` includes input, output, extractStepResult, analyzeStepResult, and score. Returns the prompt for the LLM.

LLM scorers may also include step-specific prompt fields in the return value, such as extractPrompt, analyzePrompt, and reasonPrompt.