Running Evals in CI

Running evals in your CI pipeline helps bridge this gap by providing quantifiable metrics for measuring agent quality over time.

Setting Up CI Integration

We support any testing framework that supports ESM modules. For example, you can use Vitest , Jest or Mocha to run evals in your CI/CD pipeline.

src/mastra/agents/index.test.ts


import { describe, it, expect } from "vitest";
import { evaluate } from "@mastra/evals";
import { ToneConsistencyMetric } from "@mastra/evals/nlp";
import { myAgent } from "./index";
 
describe("My Agent", () => {
  it("should validate tone consistency", async () => {
    const metric = new ToneConsistencyMetric();
    const result = await evaluate(myAgent, "Hello, world!", metric);
 
    expect(result.score).toBe(1);
  });
});

You will need to configure a testSetup and globalSetup script for your testing framework to capture the eval results. It allows us to show these results in your mastra dashboard.

Framework Configuration

Vitest Setup

Add these files to your project to run evals in your CI/CD pipeline:

globalSetup.ts


import { globalSetup } from "@mastra/evals";
 
export default function setup() {
  globalSetup();
}

testSetup.ts


import { beforeAll } from "vitest";
import { attachListeners } from "@mastra/evals";
 
beforeAll(async () => {
  await attachListeners();
});

vitest.config.ts


import { defineConfig } from "vitest/config";
 
export default defineConfig({
  test: {
    globalSetup: "./globalSetup.ts",
    setupFiles: ["./testSetup.ts"],
  },
});

Storage Configuration

To store eval results in Mastra Storage and capture results in the Mastra dashboard:

testSetup.ts


import { beforeAll } from "vitest";
import { attachListeners } from "@mastra/evals";
import { mastra } from "./your-mastra-setup";
 
beforeAll(async () => {
  // Store evals in Mastra Storage (requires storage to be enabled)
  await attachListeners(mastra);
});

With file storage, evals persist and can be queried later. With memory storage, evals are isolated to the test process.