DocsEvalsOverview

Testing your agents with evals

Evals are automated tests that evaluate Agents outputs using model-graded, rule-based, and statistical methods. Each eval returns a normalized score between 0-1 that can be logged and compared. Evals can be customized with your own prompts and scoring functions.

Evals can be run in the cloud, capturing real-time results. But evals can also be part of your CI/CD pipeline, allowing you to test and monitor your agents over time.

How to use evals

Evals need to be added to an agent. To use any of the default metrics, you can do the following:

src/mastra/agents/index.ts
import { Agent } from "@mastra/core/agent";
import { openai } from "@ai-sdk/openai";
import { ToneConsistencyMetric } from "@mastra/evals/nlp";
 
export const myAgent = new Agent({
  name: "My Agent",
  instructions: "You are a helpful assistant.",
  model: openai("gpt-4o-mini"),
  evals: {
    tone: new ToneConsistencyMetric()
  },
});

You can now view the evals in the Mastra dashboard, when using mastra dev.

Executing evals in your CI/CD pipeline

We support any testing framework that supports ESM modules. For example, you can use Vitest, Jest or Mocha to run evals in your CI/CD pipeline.

src/mastra/agents/index.test.ts
import { describe, it, expect } from 'vitest';
import { evaluate } from '@mastra/core/eval';
import { myAgent } from './index';
 
describe('My Agent', () => {
  it('should be able to validate tone consistency', async () => {
    const metric = new ToneConsistencyMetric();
    const result = await evaluate(myAgent, 'Hello, world!', metric)
 
    expect(result.score).toBe(1);
  });
});
 

You will need to configure a testSetup and globalSetup script for your testing framework to capture the eval results. It allows us to show these results in your mastra dashboard.

Vitest

These are the files you need to add to your project to run evals in your CI/CD pipeline and allow us to capture the results. Without these files, the evals will still run and fail when necessary but you won’t be able to see the results in the Mastra dashboard.

globalSetup.ts
import { globalSetup } from '@mastra/evals';
 
export default function setup() {
  globalSetup()
}
testSetup.ts
import { beforeAll } from 'vitest';
import { attachListeners } from '@mastra/evals';
 
beforeAll(async () => {
  await attachListeners();
});
Store evals in Mastra Storage

Pass your Mastra instance to store evals in the configured storage:

import { mastra } from './your-mastra-setup';
 
beforeAll(async () => {
  // Store evals in Mastra Storage (requires storage to be enabled)
  await attachListeners(mastra);
});

This allows you to save evals in Mastra Storage. With file storage, evals persist and can be queried later. With memory storage, evals are isolated to the test process.


```typescript copy showLineNumbers filename="vitest.config.ts"
import { defineConfig } from 'vitest/config'

export default defineConfig({
  test: {
    globalSetup: './globalSetup.ts',
    setupFiles: ['./testSetup.ts'],
  },
})