Running Evals
Evals are a way to evaluate the performance of your agents. They are essentially tests — although because LLMs are non-deterministic, you might not get a 100% pass rate every time.
Evals suites run in the cloud, but as tests, it’s logical to store them in your codebase.
Mastra recommends using Braintrust’s eval framework, autoevals, to run evals. They have a free tier that should be enough for most use cases.
Other open-source eval frameworks: