# Running Scorers in CI Running scorers in your CI pipeline provides quantifiable metrics for measuring agent quality over time. The `runEvals` function processes multiple test cases through your agent or workflow and returns aggregate scores. ## Basic Setup You can use any testing framework that supports ESM modules, such as [Vitest](https://vitest.dev/), [Jest](https://jestjs.io/), or [Mocha](https://mochajs.org/). ## Creating Test Cases Use `runEvals` to evaluate your agent against multiple test cases. The function accepts an array of data items, each containing an `input` and optional `groundTruth` for scorer validation. ```typescript import { describe, it, expect } from 'vitest'; import { createScorer, runEvals } from "@mastra/core/evals"; import { weatherAgent } from "./weather-agent"; import { locationScorer } from "../scorers/location-scorer"; describe('Weather Agent Tests', () => { it('should correctly extract locations from queries', async () => { const result = await runEvals({ data: [ { input: 'weather in Berlin', groundTruth: { expectedLocation: 'Berlin', expectedCountry: 'DE' } }, { input: 'weather in Berlin, Maryland', groundTruth: { expectedLocation: 'Berlin', expectedCountry: 'US' } }, { input: 'weather in Berlin, Russia', groundTruth: { expectedLocation: 'Berlin', expectedCountry: 'RU' } }, ], target: weatherAgent, scorers: [locationScorer] }); // Assert aggregate score meets threshold expect(result.scores['location-accuracy']).toBe(1); expect(result.summary.totalItems).toBe(3); }); }); ``` ## Understanding Results The `runEvals` function returns an object with: - `scores`: Average scores for each scorer across all test cases - `summary.totalItems`: Total number of test cases processed ```typescript { scores: { 'location-accuracy': 1.0, // Average score across all items 'another-scorer': 0.85 }, summary: { totalItems: 3 } } ``` ## Multiple Test Scenarios Create separate test cases for different evaluation scenarios: ```typescript describe('Weather Agent Tests', () => { const locationScorer = createScorer({ /* ... */ }); it('should handle location disambiguation', async () => { const result = await runEvals({ data: [ { input: 'weather in Berlin', groundTruth: { /* ... */ } }, { input: 'weather in Berlin, Maryland', groundTruth: { /* ... */ } }, ], target: weatherAgent, scorers: [locationScorer] }); expect(result.scores['location-accuracy']).toBe(1); }); it('should handle typos and misspellings', async () => { const result = await runEvals({ data: [ { input: 'weather in Berln', groundTruth: { expectedLocation: 'Berlin', expectedCountry: 'DE' } }, { input: 'weather in Parris', groundTruth: { expectedLocation: 'Paris', expectedCountry: 'FR' } }, ], target: weatherAgent, scorers: [locationScorer] }); expect(result.scores['location-accuracy']).toBe(1); }); }); ``` ## Next Steps - Learn about [creating custom scorers](https://mastra.ai/docs/evals/custom-scorers) - Explore [built-in scorers](https://mastra.ai/docs/evals/built-in-scorers) - Read the [runEvals API reference](https://mastra.ai/reference/evals/run-evals)