Mastra Observability
Observe and evaluateyour AI agent performance
Mastra Observability monitors LLM operations, traces agent decision paths, and helps you debug complex workflows. Seamlessly integrate with any OpenTelemetry-compatible platform.
AI agent and workflow monitoring
When errors occur, Mastra shows you exactly what happened. Every LLM call logs token usage, latency, prompts and completions. Every agent run captures decision paths, tool calls and memory operations.
Agent tracing and telemetry
Mastra traces every step of agent execution and sends telemetry to any observability provider. Decorator-based instrumentation captures the full callstack locally and in production. Mastra supports any OpenTelemetry-compatible platform, including MLflow, Langfuse, Braintrust, Datadog, New Relic and SigNoz.
build custom agent evals
Mastra Scorers give you quantifiable metrics for measuring agent quality, automatically running in the background, and stored in your database. Get insights into performance, compare different approaches, and identify areas for improvement in your AI systems.
Frequently asked questions
What does Mastra observability cover?
Mastra observability monitors LLM operations, traces agent decisions and debugs complex workflows with tools built around AI-specific patterns. Traces and logs appear in Mastra Studio and Mastra Platform. Mastra also provides scorer-based evaluation that runs asynchronously alongside agents and workflows and integrates into CI/CD pipelines.
What storage backend does Mastra use for traces?
Mastra's default exporter persists traces to your configured storage backend, though not all storage providers support observability. For high-traffic production environments, Mastra recommends ClickHouse for the observability storage domain via composite storage. Traces and logs are also available in Mastra Studio and Mastra Platform.
Does Mastra redact sensitive data from traces?
Mastra's SensitiveDataFilter span output processor redacts sensitive data, including passwords, tokens and keys, before any trace data is exported. You need to add the SensitiveDataFilter as a span output processor in your observability configuration.
What data does Mastra capture for each agent run?
Mastra captures token usage, latency, prompts and completions for every LLM call. Every agent run records decision paths, tool calls and memory operations. Workflow steps capture branching logic, parallel execution and individual step outputs. All trace and log data surfaces in Mastra Studio and Mastra Platform.
What observability providers does Mastra support?
Mastra has first-class integrations with leading providers like Braintrust, Langfuse, Arize, and LangSmith. In addition, Mastra supports any OpenTelemetry-compatible platform. The CloudExporter sends traces directly to Mastra Cloud when a MASTRA_CLOUD_ACCESS_TOKEN is configured. The DefaultExporter persists traces to your configured storage backend for use in Mastra Studio.
What are Mastra scorers?
Mastra scorers are automated tests that evaluate agent output, returning values typically between 0 and 1 using model-graded, rule-based and statistical methods. Mastra provides three scorer types: textual scorers evaluate accuracy, reliability and context understanding; classification scorers measure categorization accuracy based on predefined categories; and prompt engineering scorers explore the impact of different instructions and input formats. You can customize scorers with your own prompts and scoring functions.
Learn more about evals →Learn more about running evals in CI/CD →
How do Mastra live evaluations work?
Mastra live evaluations run asynchronously in the background without blocking agent responses or workflow execution. A sampling rate parameter between 0 and 1 controls what percentage of outputs get scored. Mastra automatically stores all scoring results in your database to analyze performance trends over time.