dataset.startExperiment()

Added in: @mastra/core@1.4.0

Runs an experiment on the dataset and waits for completion. Executes all items against a target (agent, workflow, or scorer) with optional scoring.

Usage example
Direct link to Usage example

import { Mastra } from '@mastra/core'

const mastra = new Mastra({
  /* storage config */
})

const dataset = await mastra.datasets.get({ id: 'dataset-id' })

// Run against a registered agent with scorers
const summary = await dataset.startExperiment({
  targetType: 'agent',
  targetId: 'my-agent',
  scorers: ['accuracy', 'relevancy'],
  maxConcurrency: 10,
})

console.log(`${summary.succeededCount}/${summary.totalItems} succeeded`)
console.log(`Status: ${summary.status}`)

Parameters
Direct link to Parameters

targetType?:

'agent' | 'workflow' | 'scorer'

Type of registered target to run items against. Use with `targetId`.

targetId?:

string

ID of the registered target. Use with `targetType`.

scorers?:

(MastraScorer | string)[]

Scorers to evaluate each result. Pass `MastraScorer` instances or registered scorer IDs.

name?:

string

Display name for the experiment.

description?:

string

Description of the experiment.

metadata?:

Record<string, unknown>

Arbitrary metadata for the experiment.

version?:

number

Pin to a specific dataset version. Defaults to the latest version.

maxConcurrency?:

number

Maximum concurrent item executions. Defaults to `5`.

signal?:

AbortSignal

AbortSignal for cancelling the experiment.

itemTimeout?:

number

Per-item execution timeout in milliseconds.

maxRetries?:

number

Maximum retries per item on failure. Defaults to `0` (no retries). Abort errors are never retried.

Returns
Direct link to Returns

result:

Promise<ExperimentSummary>

Summary of the completed experiment.

ExperimentSummary

experimentId:

string

Unique ID of the experiment.

status:

'pending' | 'running' | 'completed' | 'failed'

Final status of the experiment.

totalItems:

number

Total number of items in the dataset.

succeededCount:

number

Number of items that succeeded.

failedCount:

number

Number of items that failed.

skippedCount:

number

Number of items skipped (e.g., due to abort).

completedWithErrors:

boolean

`true` if the run completed but some items failed.

startedAt:

Date

When the experiment started.

completedAt:

Date

When the experiment completed.

results:

ItemWithScores[]

All item results with their scores.

ItemWithScores

itemId:

string

ID of the dataset item.

itemVersion:

number

Dataset version of the item when executed.

input:

unknown

Input data passed to the target.

output:

unknown | null

Output from the target, or `null` if failed.

groundTruth:

unknown | null

Expected output from the dataset item.

error:

{ message: string; stack?: string; code?: string } | null

Structured error if execution failed.

startedAt:

Date

When item execution started.

completedAt:

Date

When item execution completed.

retryCount:

number

Number of retry attempts.

scores:

ScorerResult[]

Results from all scorers for this item.

ScorerResult

scorerId:

string

ID of the scorer.

scorerName:

string

Display name of the scorer.

score:

number | null

Computed score, or `null` if the scorer failed.

reason:

string | null

Reason/explanation for the score.

error:

string | null

Error message if the scorer failed.

Usage exampleDirect link to Usage example

ParametersDirect link to Parameters

targetType?:

targetId?:

scorers?:

name?:

description?:

metadata?:

version?:

maxConcurrency?:

signal?:

itemTimeout?:

maxRetries?:

ReturnsDirect link to Returns

result:

experimentId:

status:

totalItems:

succeededCount:

failedCount:

skippedCount:

completedWithErrors:

startedAt:

completedAt:

results:

itemId:

itemVersion:

input:

output:

groundTruth:

error:

startedAt:

completedAt:

retryCount:

scores:

scorerId:

scorerName:

score:

reason:

error:

RelatedDirect link to Related

Usage example
Direct link to Usage example

Parameters
Direct link to Parameters

Returns
Direct link to Returns

Related
Direct link to Related