# runEvals The `runEvals` function enables batch evaluation of agents and workflows by running multiple test cases against scorers concurrently. This is essential for systematic testing, performance analysis, and validation of AI systems. ## Usage Example ```typescript import { runEvals } from "@mastra/core/evals"; import { myAgent } from "./agents/my-agent"; import { myScorer1, myScorer2 } from "./scorers"; const result = await runEvals({ target: myAgent, data: [ { input: "What is machine learning?" }, { input: "Explain neural networks" }, { input: "How does AI work?" }, ], scorers: [myScorer1, myScorer2], concurrency: 2, onItemComplete: ({ item, targetResult, scorerResults }) => { console.log(`Completed: ${item.input}`); console.log(`Scores:`, scorerResults); }, }); console.log(`Average scores:`, result.scores); console.log(`Processed ${result.summary.totalItems} items`); ``` ## Parameters **target:** (`Agent | Workflow`): The agent or workflow to evaluate. **data:** (`RunEvalsDataItem[]`): Array of test cases with input data and optional ground truth. **scorers:** (`MastraScorer[] | WorkflowScorerConfig`): Array of scorers for agents, or configuration object for workflows specifying scorers for the workflow and individual steps. **concurrency?:** (`number`): Number of test cases to run concurrently. (Default: `1`) **onItemComplete?:** (`function`): Callback function called after each test case completes. Receives item, target result, and scorer results. ## Data Item Structure **input:** (`string | string[] | CoreMessage[] | any`): Input data for the target. For agents: messages or strings. For workflows: workflow input data. **groundTruth?:** (`any`): Expected or reference output for comparison during scoring. **requestContext?:** (`RequestContext`): Request Context to pass to the target during execution. **tracingContext?:** (`TracingContext`): Tracing context for observability and debugging. ## Workflow Scorer Configuration For workflows, you can specify scorers at different levels using `WorkflowScorerConfig`: **workflow?:** (`MastraScorer[]`): Array of scorers to evaluate the entire workflow output. **steps?:** (`Record`): Object mapping step IDs to arrays of scorers for evaluating individual step outputs. ## Returns **scores:** (`Record`): Average scores across all test cases, organized by scorer name. **summary:** (`object`): Summary information about the experiment execution. **summary.totalItems:** (`number`): Total number of test cases processed. ## Examples ### Agent Evaluation ```typescript import { createScorer, runEvals } from "@mastra/core/evals"; const myScorer = createScorer({ id: "my-scorer", description: "Check if Agent's response contains ground truth", type: "agent", }).generateScore(({ run }) => { const response = run.output[0]?.content || ""; const expectedResponse = run.groundTruth; return response.includes(expectedResponse) ? 1 : 0; }); const result = await runEvals({ target: chatAgent, data: [ { input: "What is AI?", groundTruth: "AI is a field of computer science that creates intelligent machines.", }, { input: "How does machine learning work?", groundTruth: "Machine learning uses algorithms to learn patterns from data.", }, ], scorers: [relevancyScorer], concurrency: 3, }); ``` ### Workflow Evaluation ```typescript const workflowResult = await runEvals({ target: myWorkflow, data: [ { input: { query: "Process this data", priority: "high" } }, { input: { query: "Another task", priority: "low" } }, ], scorers: { workflow: [outputQualityScorer], steps: { "validation-step": [validationScorer], "processing-step": [processingScorer], }, }, onItemComplete: ({ item, targetResult, scorerResults }) => { console.log(`Workflow completed for: ${item.inputData.query}`); if (scorerResults.workflow) { console.log("Workflow scores:", scorerResults.workflow); } if (scorerResults.steps) { console.log("Step scores:", scorerResults.steps); } }, }); ``` ## Related - [createScorer()](https://mastra.ai/reference/evals/create-scorer) - Create custom scorers for experiments - [MastraScorer](https://mastra.ai/reference/evals/mastra-scorer) - Learn about scorer structure and methods - [Custom Scorers](https://mastra.ai/docs/evals/custom-scorers) - Guide to building evaluation logic - [Scorers Overview](https://mastra.ai/docs/evals/overview) - Understanding scorer concepts