Step Retries
Mastra provides built-in retry mechanisms to handle transient failures in workflow steps. This allows workflows to recover gracefully from temporary issues without requiring manual intervention.
Overview
When a step in a workflow fails (throws an exception), Mastra can automatically retry the step execution based on a configurable retry policy. This is useful for handling:
- Network connectivity issues
- Service unavailability
- Rate limiting
- Temporary resource constraints
- Other transient failures
Default Behavior
By default, steps do not retry when they fail. This means:
- A step will execute once
- If it fails, it will immediately mark the step as failed
- The workflow will continue to execute any subsequent steps that don’t depend on the failed step
Configuration Options
Retries can be configured at two levels:
1. Workflow-level Configuration
You can set a default retry configuration for all steps in a workflow:
const workflow = new Workflow({
name: 'my-workflow',
retryConfig: {
attempts: 3, // Number of retries (in addition to the initial attempt)
delay: 1000, // Delay between retries in milliseconds
},
});
2. Step-level Configuration
You can also configure retries on individual steps, which will override the workflow-level configuration for that specific step:
const fetchDataStep = new Step({
id: 'fetchData',
execute: async () => {
// Fetch data from external API
},
retryConfig: {
attempts: 5, // This step will retry up to 5 times
delay: 2000, // With a 2-second delay between retries
},
});
Retry Parameters
The retryConfig
object supports the following parameters:
Parameter | Type | Default | Description |
---|---|---|---|
attempts | number | 0 | The number of retry attempts (in addition to the initial attempt) |
delay | number | 1000 | Time in milliseconds to wait between retries |
How Retries Work
When a step fails, Mastra’s retry mechanism:
- Checks if the step has retry attempts remaining
- If attempts remain:
- Decrements the attempt counter
- Transitions the step to a “waiting” state
- Waits for the configured delay period
- Retries the step execution
- If no attempts remain or all attempts have been exhausted:
- Marks the step as “failed”
- Continues workflow execution (for steps that don’t depend on the failed step)
During retry attempts, the workflow execution remains active but paused for the specific step that is being retried.
Examples
Basic Retry Example
import { Workflow, Step } from '@mastra/core/workflows';
// Define a step that might fail
const unreliableApiStep = new Step({
id: 'callUnreliableApi',
execute: async () => {
// Simulate an API call that might fail
const random = Math.random();
if (random < 0.7) {
throw new Error('API call failed');
}
return { data: 'API response data' };
},
retryConfig: {
attempts: 3, // Retry up to 3 times
delay: 2000, // Wait 2 seconds between attempts
},
});
// Create a workflow with the unreliable step
const workflow = new Workflow({
name: 'retry-demo-workflow',
});
workflow
.step(unreliableApiStep)
.then(processResultStep)
.commit();
Workflow-level Retries with Step Override
import { Workflow, Step } from '@mastra/core/workflows';
// Create a workflow with default retry configuration
const workflow = new Workflow({
name: 'multi-retry-workflow',
retryConfig: {
attempts: 2, // All steps will retry twice by default
delay: 1000, // With a 1-second delay
},
});
// This step uses the workflow's default retry configuration
const standardStep = new Step({
id: 'standardStep',
execute: async () => {
// Some operation that might fail
},
});
// This step overrides the workflow's retry configuration
const criticalStep = new Step({
id: 'criticalStep',
execute: async () => {
// Critical operation that needs more retry attempts
},
retryConfig: {
attempts: 5, // Override with 5 retry attempts
delay: 5000, // And a longer 5-second delay
},
});
// This step disables retries
const noRetryStep = new Step({
id: 'noRetryStep',
execute: async () => {
// Operation that should not retry
},
retryConfig: {
attempts: 0, // Explicitly disable retries
},
});
workflow
.step(standardStep)
.then(criticalStep)
.then(noRetryStep)
.commit();
Monitoring Retries
You can monitor retry attempts in your logs. Mastra logs retry-related events at the debug
level:
[DEBUG] Step fetchData failed (runId: abc-123)
[DEBUG] Attempt count for step fetchData: 2 remaining attempts (runId: abc-123)
[DEBUG] Step fetchData waiting (runId: abc-123)
[DEBUG] Step fetchData finished waiting (runId: abc-123)
[DEBUG] Step fetchData pending (runId: abc-123)
Best Practices
-
Use Retries for Transient Failures: Only configure retries for operations that might experience transient failures. For deterministic errors (like validation failures), retries won’t help.
-
Set Appropriate Delays: Consider using longer delays for external API calls to allow time for services to recover.
-
Limit Retry Attempts: Don’t set extremely high retry counts as this could cause workflows to run for excessive periods during outages.
-
Implement Idempotent Operations: Ensure your step’s
execute
function is idempotent (can be called multiple times without side effects) since it may be retried. -
Consider Backoff Strategies: For more advanced scenarios, consider implementing exponential backoff in your step’s logic for operations that might be rate-limited.