Skip to main content

Rubric scorer

The createRubricScorer() function creates an LLM-as-judge scorer that grades an agent's output against a rubric — a checklist of criteria. It returns a binary score: 1 only when every required criterion is satisfied, otherwise 0. The reason lists each criterion's verdict so the agent knows exactly what to fix.

This scorer is designed to drop into isTaskComplete. Because isTaskComplete treats score === 1 as "task complete" and injects the reason back into the conversation as feedback, the agent keeps iterating until the rubric is satisfied (or maxSteps is reached).

Parameters
Direct link to Parameters

model:

MastraModelConfig
The language model used to grade the output against the rubric. A smaller, cheaper model is usually sufficient for grading.

criteria:

RubricCriterion[] | string
The rubric to grade against. A string is treated as a newline-delimited checklist (each line becomes a required criterion). If omitted, the rubric is read at run time from a `rubric` value on request/additional context; if none resolves, the scorer is a no-op and returns 1.

options:

RubricScorerOptions
Configuration options for the scorer

.run() returns
Direct link to run-returns

score:

number
1 when every required criterion is satisfied, otherwise 0 (multiplied by scale).

reason:

string
A per-criterion explanation listing which criteria are met or unmet and why. This is the text that isTaskComplete injects back into the conversation as feedback.

Usage with isTaskComplete
Direct link to Usage with isTaskComplete

Define the rubric once, attach the scorer to isTaskComplete, and the agent self-corrects until the rubric is satisfied:

import { createRubricScorer } from '@mastra/evals/scorers/prebuilt'

const rubricScorer = createRubricScorer({
model: 'openai/gpt-5-mini',
criteria: [
{ description: 'The response includes an analysis section' },
{ description: 'The response includes concrete recommendations' },
],
})

const stream = await supervisor.stream('Research AI in education', {
maxSteps: 10,
isTaskComplete: {
scorers: [rubricScorer],
strategy: 'all',
},
})

String rubric
Direct link to String rubric

A newline-delimited string is parsed into criteria, with common list markers (-, *, 1.) stripped. Every line becomes a required criterion:

const rubricScorer = createRubricScorer({
model: 'openai/gpt-5-mini',
criteria: `
- All tests pass in the test suite
- The function is named find_duplicates and accepts a single list argument
`,
})

Optional criteria
Direct link to Optional criteria

Mark a criterion as optional to have it graded and reported without gating completion:

const rubricScorer = createRubricScorer({
model: 'openai/gpt-5-mini',
criteria: [
{ description: 'Includes an analysis section', required: true },
{ description: 'Includes citations', required: false },
],
})

Dynamic rubric per run
Direct link to Dynamic rubric per run

When no criteria is passed to the factory, the scorer resolves a rubric value from the run's request context, additional context, or input. This lets a single scorer instance grade different rubrics per run without rebuilding it:

const rubricScorer = createRubricScorer({
model: 'openai/gpt-5-mini',
})

await supervisor.stream('Write find_duplicates', {
isTaskComplete: { scorers: [rubricScorer] },
requestContext: {
rubric: '- All tests pass\n- The function is named find_duplicates',
},
})

If no rubric resolves, the scorer returns 1 and does not gate the loop.

Scoring details
Direct link to Scoring details

The scorer runs in two phases:

  1. Grade — the judge model evaluates each criterion independently and returns a per-criterion verdict (satisfied / not) with reasoning.
  2. Score — the result is 1 only when every required criterion is satisfied, otherwise 0. If no criteria are marked required, all criteria are treated as required.

The reason summarizes the overall result and lists each criterion with its verdict, so a failing grade gives the agent targeted, actionable feedback rather than a generic "try again".