# Gates and verdicts Gates and verdicts add severity semantics to `runEvals`. Gates are scorers that must score 1.0 — hard requirements that block a run. Thresholds are minimum acceptable scores on tracked metrics. The verdict summarizes the outcome as `passed`, `scored`, or `failed`. ## When to use gates and verdicts - Enforce hard requirements in CI (e.g., "agent must call the right tool") - Track quality metrics with minimum thresholds (e.g., "faithfulness above 0.7") - Get a single verdict signal (`passed`, `scored`, or `failed`) from an eval run without writing custom assertion logic - Separate "must pass" gates from "nice to have" tracked metrics ## Quickstart ```typescript import { runEvals } from '@mastra/core/evals' import { checks } from '@mastra/evals/checks' import { weatherAgent } from '../agents' import { faithfulnessScorer } from '../scorers' const result = await runEvals({ data: [{ input: 'What is the weather in Brooklyn?' }], target: weatherAgent, // Gates: must all score 1.0 or the run fails gates: [checks.calledTool('get_weather'), checks.noToolErrors()], // Scorers: tracked with optional thresholds scorers: [ { scorer: faithfulnessScorer, threshold: 0.7 }, checks.includes('Brooklyn'), // no threshold = tracked only ], }) console.log(result.verdict) // 'passed' | 'scored' | 'failed' ``` ## How verdicts work The verdict is computed from gates and thresholds after all data items are processed: - `failed`: At least one gate averaged below 1.0 across data items - `scored`: All gates passed, but at least one threshold scorer missed its threshold - `passed`: All gates scored 1.0 and all thresholds were met When no gates or threshold-bearing scorers are provided, the verdict field is omitted and `runEvals` behaves exactly as before. ## Gates Gates are scorers passed via the `gates` field. They run before regular scorers on each data item. A gate must average a score of 1.0 across all data items to pass. ```typescript import { runEvals } from '@mastra/core/evals' import { checks } from '@mastra/evals/checks' const result = await runEvals({ data: [{ input: 'What is the weather?' }], target: weatherAgent, gates: [checks.calledTool('get_weather')], scorers: [qualityScorer], }) // result.gateResults: [{ id: 'check-called-tool', passed: true, score: 1 }] ``` Any scorer works as a gate. Quick Checks are a natural fit because they return binary 1/0 scores. > **Note:** Visit [runEvals() reference](https://mastra.ai/reference/evals/run-evals) for the full parameter and return type documentation. ## Thresholds Wrap a scorer in `{ scorer, threshold }` to set pass/fail bounds. The threshold is compared against the scorer's average score across all data items. A `threshold` can be: - **A number** — implies minimum (score at or above passes): `{ scorer, threshold: 0.7 }` - **An object with `min` and/or `max`** — for range-based checks: `{ scorer, threshold: { max: 0.3 } }` Use `max` for scorers where a high score is bad (e.g., hallucination, toxicity). Use `{ min, max }` when the score should fall within a specific band. ```typescript import { runEvals } from '@mastra/core/evals' const result = await runEvals({ data: [{ input: 'Explain quantum computing' }], target: myAgent, scorers: [ { scorer: faithfulnessScorer, threshold: 0.7 }, // min threshold (number shorthand) { scorer: hallucinationScorer, threshold: { max: 0.3 } }, // max threshold — high score = bad { scorer: verbosityScorer, threshold: { min: 0.3, max: 0.8 } }, // range threshold toneScorer, // bare scorer, no threshold — tracked only ], }) // result.thresholdResults: // [ // { id: 'faithfulness', passed: true, averageScore: 0.85, threshold: 0.7 }, // { id: 'hallucination', passed: true, averageScore: 0.1, threshold: { max: 0.3 } }, // { id: 'verbosity', passed: false, averageScore: 0.9, threshold: { min: 0.3, max: 0.8 } }, // ] ``` A bare scorer (no threshold) still appears in `result.scores` but does not affect the verdict. ## Using verdicts in CI The verdict gives a single signal for CI pipelines: ```typescript import { runEvals } from '@mastra/core/evals' import { checks } from '@mastra/evals/checks' const result = await runEvals({ data: testDataset, target: myAgent, gates: [checks.calledTool('search'), checks.noToolErrors()], scorers: [{ scorer: faithfulnessScorer, threshold: 0.7 }], }) if (result.verdict === 'failed') { console.error( 'Gate failures:', result.gateResults?.filter(g => !g.passed), ) process.exit(1) } if (result.verdict === 'scored') { console.warn( 'Threshold misses:', result.thresholdResults?.filter(t => !t.passed), ) } ``` ## Related - [Quick Checks](https://mastra.ai/docs/evals/quick-checks): Zero-LLM micro-scorers that work well as gates - [runEvals() reference](https://mastra.ai/reference/evals/run-evals): Full API documentation - [Built-in scorers](https://mastra.ai/docs/evals/built-in-scorers): LLM-based and code-based scorers - [Running evals in CI](https://mastra.ai/docs/evals/running-in-ci): CI integration patterns