runExperiment

The runExperiment function enables batch evaluation of agents and workflows by running multiple test cases against scorers concurrently. This is essential for systematic testing, performance analysis, and validation of AI systems.

Usage Example

import { runExperiment } from "@mastra/core/scores";
import { myAgent } from "./agents/my-agent";
import { myScorer1, myScorer2 } from "./scorers";

const result = await runExperiment({
  target: myAgent,
  data: [
    { input: "What is machine learning?" },
    { input: "Explain neural networks" },
    { input: "How does AI work?" },
  ],
  scorers: [myScorer1, myScorer2],
  concurrency: 2,
  onItemComplete: ({ item, targetResult, scorerResults }) => {
    console.log(`Completed: ${item.input}`);
    console.log(`Scores:`, scorerResults);
  },
});

console.log(`Average scores:`, result.scores);
console.log(`Processed ${result.summary.totalItems} items`);

Parameters

target:

Agent | Workflow

The agent or workflow to evaluate.

data:

RunExperimentDataItem[]

Array of test cases with input data and optional ground truth.

scorers:

MastraScorer[] | WorkflowScorerConfig

Array of scorers for agents, or configuration object for workflows specifying scorers for the workflow and individual steps.

concurrency?:

number

= 1

Number of test cases to run concurrently.

onItemComplete?:

function

Callback function called after each test case completes. Receives item, target result, and scorer results.

Data Item Structure

input:

string | string[] | CoreMessage[] | any

Input data for the target. For agents: messages or strings. For workflows: workflow input data.

groundTruth?:

any

Expected or reference output for comparison during scoring.

runtimeContext?:

RuntimeContext

Runtime context to pass to the target during execution.

tracingContext?:

TracingContext

Tracing context for observability and debugging.

Workflow Scorer Configuration

For workflows, you can specify scorers at different levels using WorkflowScorerConfig:

workflow?:

MastraScorer[]

Array of scorers to evaluate the entire workflow output.

steps?:

Record<string, MastraScorer[]>

Object mapping step IDs to arrays of scorers for evaluating individual step outputs.

Returns

scores:

Record<string, any>

Average scores across all test cases, organized by scorer name.

summary:

object

Summary information about the experiment execution.

summary.totalItems:

number

Total number of test cases processed.

Examples

Agent Evaluation

import { runExperiment } from "@mastra/core/scores";
import { createScorer } from "@mastra/core/scores";

const myScorer = createScorer({
  name: "My Scorer",
  description: "Check if Agent's response contains ground truth",
  type: "agent",
}).generateScore(({ run }) => {
  const response = run.output[0]?.content || "";
  const expectedResponse = run.groundTruth;
  return response.includes(expectedResponse) ? 1 : 0;
});

const result = await runExperiment({
  target: chatAgent,
  data: [
    {
      input: "What is AI?",
      groundTruth:
        "AI is a field of computer science that creates intelligent machines.",
    },
    {
      input: "How does machine learning work?",
      groundTruth:
        "Machine learning uses algorithms to learn patterns from data.",
    },
  ],
  scorers: [relevancyScorer],
  concurrency: 3,
});

Workflow Evaluation

const workflowResult = await runExperiment({
  target: myWorkflow,
  data: [
    { input: { query: "Process this data", priority: "high" } },
    { input: { query: "Another task", priority: "low" } },
  ],
  scorers: {
    workflow: [outputQualityScorer],
    steps: {
      "validation-step": [validationScorer],
      "processing-step": [processingScorer],
    },
  },
  onItemComplete: ({ item, targetResult, scorerResults }) => {
    console.log(`Workflow completed for: ${item.input.query}`);
    if (scorerResults.workflow) {
      console.log("Workflow scores:", scorerResults.workflow);
    }
    if (scorerResults.steps) {
      console.log("Step scores:", scorerResults.steps);
    }
  },
});

createScorer() - Create custom scorers for experiments
MastraScorer - Learn about scorer structure and methods
Custom Scorers - Guide to building evaluation logic
Scorers Overview - Understanding scorer concepts

Usage Example​

Parameters​