Rubric scorer
The createRubricScorer() function creates an LLM-as-judge scorer that grades an agent's output against a rubric — a checklist of criteria. It returns a binary score: 1 only when every required criterion is satisfied, otherwise 0. The reason lists each criterion's verdict so the agent knows exactly what to fix.
This scorer is designed to drop into isTaskComplete. Because isTaskComplete treats score === 1 as "task complete" and injects the reason back into the conversation as feedback, the agent keeps iterating until the rubric is satisfied (or maxSteps is reached).
ParametersDirect link to Parameters
model:
criteria:
options:
.run() returnsDirect link to run-returns
score:
reason:
Usage with isTaskCompleteDirect link to Usage with isTaskComplete
Define the rubric once, attach the scorer to isTaskComplete, and the agent self-corrects until the rubric is satisfied:
import { createRubricScorer } from '@mastra/evals/scorers/prebuilt'
const rubricScorer = createRubricScorer({
model: 'openai/gpt-5-mini',
criteria: [
{ description: 'The response includes an analysis section' },
{ description: 'The response includes concrete recommendations' },
],
})
const stream = await supervisor.stream('Research AI in education', {
maxSteps: 10,
isTaskComplete: {
scorers: [rubricScorer],
strategy: 'all',
},
})
String rubricDirect link to String rubric
A newline-delimited string is parsed into criteria, with common list markers (-, *, 1.) stripped. Every line becomes a required criterion:
const rubricScorer = createRubricScorer({
model: 'openai/gpt-5-mini',
criteria: `
- All tests pass in the test suite
- The function is named find_duplicates and accepts a single list argument
`,
})
Optional criteriaDirect link to Optional criteria
Mark a criterion as optional to have it graded and reported without gating completion:
const rubricScorer = createRubricScorer({
model: 'openai/gpt-5-mini',
criteria: [
{ description: 'Includes an analysis section', required: true },
{ description: 'Includes citations', required: false },
],
})
Dynamic rubric per runDirect link to Dynamic rubric per run
When no criteria is passed to the factory, the scorer resolves a rubric value from the run's request context, additional context, or input. This lets a single scorer instance grade different rubrics per run without rebuilding it:
const rubricScorer = createRubricScorer({
model: 'openai/gpt-5-mini',
})
await supervisor.stream('Write find_duplicates', {
isTaskComplete: { scorers: [rubricScorer] },
requestContext: {
rubric: '- All tests pass\n- The function is named find_duplicates',
},
})
If no rubric resolves, the scorer returns 1 and does not gate the loop.
Scoring detailsDirect link to Scoring details
The scorer runs in two phases:
- Grade — the judge model evaluates each criterion independently and returns a per-criterion verdict (
satisfied/ not) with reasoning. - Score — the result is
1only when every required criterion issatisfied, otherwise0. If no criteria are marked required, all criteria are treated as required.
The reason summarizes the overall result and lists each criterion with its verdict, so a failing grade gives the agent targeted, actionable feedback rather than a generic "try again".