# Rubric scorer

**Added in:** `@mastra/evals@1.3.0`

The `createRubricScorer()` function creates an LLM-as-judge scorer that grades an agent's output against a rubric (a checklist of criteria). It returns a **binary** score: `1` only when every required criterion is satisfied, otherwise `0`. The `reason` lists each criterion's verdict so the agent knows exactly what to fix.

This scorer is designed to drop into [`isTaskComplete`](https://mastra.ai/reference/streaming/agents/stream). Because `isTaskComplete` treats `score === 1` as "task complete" and injects the `reason` back into the conversation as feedback, the agent keeps iterating until the rubric is satisfied (or `maxSteps` is reached).

## Parameters

**model** (`MastraModelConfig`): The language model used to grade the output against the rubric. A smaller, cheaper model is usually sufficient for grading.

**criteria** (`RubricCriterion[] | string`): The rubric to grade against. A string is treated as a newline-delimited checklist (each line becomes a required criterion). If omitted, the rubric is read at run time from a \`rubric\` value on request/additional context; if none resolves, the scorer is a no-op and returns 1.

**options** (`RubricScorerOptions`): Configuration options for the scorer

## `.run()` returns

**score** (`number`): 1 when every required criterion is satisfied, otherwise 0 (multiplied by scale).

**reason** (`string`): A per-criterion explanation listing which criteria are met or unmet and why. This is the text that isTaskComplete injects back into the conversation as feedback.

## Usage with isTaskComplete

Define the rubric once, attach the scorer to `isTaskComplete`, and the agent self-corrects until the rubric is satisfied:

```typescript
import { Agent } from '@mastra/core/agent'
import { createRubricScorer } from '@mastra/evals/scorers/prebuilt'

const supervisor = new Agent({
  id: 'supervisor',
  instructions: `You coordinate research and writing using specialized agents. Delegate to research-agent for facts, then writing-agent for content.`,
  model: 'openai/gpt-5.5',
  agents: { researchAgent, writingAgent },
})

const rubricScorer = createRubricScorer({
  model: 'openai/gpt-5-mini',
  criteria: [
    { description: 'The response includes an analysis section' },
    { description: 'The response includes concrete recommendations' },
  ],
})

const stream = await supervisor.stream('Research AI in education', {
  maxSteps: 10,
  isTaskComplete: {
    scorers: [rubricScorer],
    strategy: 'all',
  },
})
```

## String rubric

A newline-delimited string is parsed into criteria, with common list markers (`-`, `*`, `1.`) stripped. Every line becomes a required criterion:

```typescript
const rubricScorer = createRubricScorer({
  model: 'openai/gpt-5-mini',
  criteria: `- All tests pass in the test suite
- The function is named find_duplicates and accepts a single list argument`,
})
```

## Optional criteria

Mark a criterion as optional to have it graded and reported without gating completion:

```typescript
const rubricScorer = createRubricScorer({
  model: 'openai/gpt-5-mini',
  criteria: [
    { description: 'Includes an analysis section', required: true },
    { description: 'Includes citations', required: false },
  ],
})
```

## Dynamic rubric per run

When no `criteria` is passed to the factory, the scorer resolves a `rubric` value from the run's request context, additional context, or input. This lets a single scorer instance grade different rubrics per run without rebuilding it:

```typescript
const rubricScorer = createRubricScorer({
  model: 'openai/gpt-5-mini',
})

await supervisor.stream('Write find_duplicates', {
  isTaskComplete: { scorers: [rubricScorer] },
  requestContext: {
    rubric: '- All tests pass\n- The function is named find_duplicates',
  },
})
```

If no rubric resolves, the scorer returns `1` and doesn't gate the loop.

## Scoring details

The scorer runs in two phases:

1. **Grade**: The judge model evaluates each criterion independently and returns a per-criterion verdict (`satisfied` / not) with reasoning.
2. **Score**: The result is `1` only when every required criterion is `satisfied`, otherwise `0`. If no criteria are marked required, all criteria are treated as required.

The `reason` summarizes the overall result and lists each criterion with its verdict, so a failing grade gives the agent targeted, actionable feedback rather than a generic "try again".

## Related

- [isTaskComplete on stream()](https://mastra.ai/reference/streaming/agents/stream)
- [Supervisor agents](https://mastra.ai/docs/agents/supervisor-agents)
- [createScorer](https://mastra.ai/reference/evals/create-scorer)