DocsEvalsTextual Evals

Textual Evals

Textual evals use an LLM-as-judge methodology to evaluate agent outputs. This approach leverages language models to assess various aspects of text quality, similar to how a teaching assistant might grade assignments using a rubric.

Each eval focuses on specific quality aspects and returns a score between 0 and 1, providing quantifiable metrics for non-deterministic AI outputs.

Mastra provides several eval metrics for assessing Agent outputs. Mastra is not limited to these metrics, and you can also define your own evals.

Why Use Textual Evals?

Textual evals help ensure your agent:

  • Produces accurate and reliable responses
  • Uses context effectively
  • Follows output requirements
  • Maintains consistent quality over time

Available Metrics

Accuracy and Reliability

These metrics evaluate how correct, truthful, and complete your agent’s answers are:

Understanding Context

These metrics evaluate how well your agent uses provided context:

Output Quality

These metrics evaluate adherence to format and style requirements:

  • tone: Measures consistency in formality, complexity, and style
  • toxicity: Detects harmful or inappropriate content
  • bias: Detects potential biases in the output
  • prompt-alignment: Checks adherence to explicit instructions like length restrictions, formatting requirements, or other constraints
  • summarization: Evaluates information retention and conciseness
  • keyword-coverage: Assesses technical terminology usage