# runEvals

The `runEvals` function enables batch evaluation of agents and workflows by running multiple test cases against scorers concurrently. This is essential for systematic testing, performance analysis, and validation of AI systems.

## Usage Example

```typescript
import { runEvals } from "@mastra/core/evals";
import { myAgent } from "./agents/my-agent";
import { myScorer1, myScorer2 } from "./scorers";

const result = await runEvals({
  target: myAgent,
  data: [
    { input: "What is machine learning?" },
    { input: "Explain neural networks" },
    { input: "How does AI work?" },
  ],
  scorers: [myScorer1, myScorer2],
  concurrency: 2,
  onItemComplete: ({ item, targetResult, scorerResults }) => {
    console.log(`Completed: ${item.input}`);
    console.log(`Scores:`, scorerResults);
  },
});

console.log(`Average scores:`, result.scores);
console.log(`Processed ${result.summary.totalItems} items`);
```

## Parameters

**target:** (`Agent | Workflow`): The agent or workflow to evaluate.

**data:** (`RunEvalsDataItem[]`): Array of test cases with input data and optional ground truth.

**scorers:** (`MastraScorer[] | WorkflowScorerConfig`): Array of scorers for agents, or configuration object for workflows specifying scorers for the workflow and individual steps.

**concurrency?:** (`number`): Number of test cases to run concurrently. (Default: `1`)

**onItemComplete?:** (`function`): Callback function called after each test case completes. Receives item, target result, and scorer results.

## Data Item Structure

**input:** (`string | string[] | CoreMessage[] | any`): Input data for the target. For agents: messages or strings. For workflows: workflow input data.

**groundTruth?:** (`any`): Expected or reference output for comparison during scoring.

**requestContext?:** (`RequestContext`): Request Context to pass to the target during execution.

**tracingContext?:** (`TracingContext`): Tracing context for observability and debugging.

## Workflow Scorer Configuration

For workflows, you can specify scorers at different levels using `WorkflowScorerConfig`:

**workflow?:** (`MastraScorer[]`): Array of scorers to evaluate the entire workflow output.

**steps?:** (`Record<string, MastraScorer[]>`): Object mapping step IDs to arrays of scorers for evaluating individual step outputs.

## Returns

**scores:** (`Record<string, any>`): Average scores across all test cases, organized by scorer name.

**summary:** (`object`): Summary information about the experiment execution.

**summary.totalItems:** (`number`): Total number of test cases processed.

## Examples

### Agent Evaluation

```typescript
import { createScorer, runEvals } from "@mastra/core/evals";

const myScorer = createScorer({
  id: "my-scorer",
  description: "Check if Agent's response contains ground truth",
  type: "agent",
}).generateScore(({ run }) => {
  const response = run.output[0]?.content || "";
  const expectedResponse = run.groundTruth;
  return response.includes(expectedResponse) ? 1 : 0;
});

const result = await runEvals({
  target: chatAgent,
  data: [
    {
      input: "What is AI?",
      groundTruth:
        "AI is a field of computer science that creates intelligent machines.",
    },
    {
      input: "How does machine learning work?",
      groundTruth:
        "Machine learning uses algorithms to learn patterns from data.",
    },
  ],
  scorers: [relevancyScorer],
  concurrency: 3,
});
```

### Workflow Evaluation

```typescript
const workflowResult = await runEvals({
  target: myWorkflow,
  data: [
    { input: { query: "Process this data", priority: "high" } },
    { input: { query: "Another task", priority: "low" } },
  ],
  scorers: {
    workflow: [outputQualityScorer],
    steps: {
      "validation-step": [validationScorer],
      "processing-step": [processingScorer],
    },
  },
  onItemComplete: ({ item, targetResult, scorerResults }) => {
    console.log(`Workflow completed for: ${item.inputData.query}`);
    if (scorerResults.workflow) {
      console.log("Workflow scores:", scorerResults.workflow);
    }
    if (scorerResults.steps) {
      console.log("Step scores:", scorerResults.steps);
    }
  },
});
```

## Related

- [createScorer()](https://mastra.ai/reference/evals/create-scorer) - Create custom scorers for experiments
- [MastraScorer](https://mastra.ai/reference/evals/mastra-scorer) - Learn about scorer structure and methods
- [Custom Scorers](https://mastra.ai/docs/evals/custom-scorers) - Guide to building evaluation logic
- [Scorers Overview](https://mastra.ai/docs/evals/overview) - Understanding scorer concepts