Skip to main content
Mastra 1.0 is available 🎉 Read announcement

dataset.startExperiment()

Added in: @mastra/core@1.4.0

Runs an experiment on the dataset and waits for completion. Executes all items against a target (agent, workflow, or scorer) with optional scoring.

Usage example
Direct link to Usage example

import { Mastra } from "@mastra/core";

const mastra = new Mastra({ /* storage config */ });

const dataset = await mastra.datasets.get({ id: "dataset-id" });

// Run against a registered agent with scorers
const summary = await dataset.startExperiment({
targetType: "agent",
targetId: "my-agent",
scorers: ["accuracy", "relevancy"],
maxConcurrency: 10,
});

console.log(`${summary.succeededCount}/${summary.totalItems} succeeded`);
console.log(`Status: ${summary.status}`);

Parameters
Direct link to Parameters

targetType?:

'agent' | 'workflow' | 'scorer'
Type of registered target to run items against. Use with `targetId`.

targetId?:

string
ID of the registered target. Use with `targetType`.

scorers?:

(MastraScorer | string)[]
Scorers to evaluate each result. Pass `MastraScorer` instances or registered scorer IDs.

name?:

string
Display name for the experiment.

description?:

string
Description of the experiment.

metadata?:

Record<string, unknown>
Arbitrary metadata for the experiment.

version?:

number
Pin to a specific dataset version. Defaults to the latest version.

maxConcurrency?:

number
Maximum concurrent item executions. Defaults to `5`.

signal?:

AbortSignal
AbortSignal for cancelling the experiment.

itemTimeout?:

number
Per-item execution timeout in milliseconds.

maxRetries?:

number
Maximum retries per item on failure. Defaults to `0` (no retries). Abort errors are never retried.

Returns
Direct link to Returns

result:

Promise<ExperimentSummary>
Summary of the completed experiment.
ExperimentSummary

experimentId:

string
Unique ID of the experiment.

status:

'pending' | 'running' | 'completed' | 'failed'
Final status of the experiment.

totalItems:

number
Total number of items in the dataset.

succeededCount:

number
Number of items that succeeded.

failedCount:

number
Number of items that failed.

skippedCount:

number
Number of items skipped (e.g., due to abort).

completedWithErrors:

boolean
`true` if the run completed but some items failed.

startedAt:

Date
When the experiment started.

completedAt:

Date
When the experiment completed.

results:

ItemWithScores[]
All item results with their scores.
ItemWithScores

itemId:

string
ID of the dataset item.

itemVersion:

number
Dataset version of the item when executed.

input:

unknown
Input data passed to the target.

output:

unknown | null
Output from the target, or `null` if failed.

groundTruth:

unknown | null
Expected output from the dataset item.

error:

{ message: string; stack?: string; code?: string } | null
Structured error if execution failed.

startedAt:

Date
When item execution started.

completedAt:

Date
When item execution completed.

retryCount:

number
Number of retry attempts.

scores:

ScorerResult[]
Results from all scorers for this item.
ScorerResult

scorerId:

string
ID of the scorer.

scorerName:

string
Display name of the scorer.

score:

number | null
Computed score, or `null` if the scorer failed.

reason:

string | null
Reason/explanation for the score.

error:

string | null
Error message if the scorer failed.
On this page