Built-in Scorers
Mastra provides a comprehensive set of built-in scorers for evaluating AI outputs. These scorers are optimized for common evaluation scenarios and are ready to use in your agents and workflows.
Available Scorers
Accuracy and Reliability
These scorers evaluate how correct, truthful, and complete your agent’s answers are:
answer-relevancy
: Evaluates how well responses address the input query (0-1
, higher is better)faithfulness
: Measures how accurately responses represent provided context (0-1
, higher is better)hallucination
: Detects factual contradictions and unsupported claims (0-1
, lower is better)completeness
: Checks if responses include all necessary information (0-1
, higher is better)content-similarity
: Measures textual similarity using character-level matching (0-1
, higher is better)textual-difference
: Measures textual differences between strings (0-1
, higher means more similar)
Output Quality
These scorers evaluate adherence to format, style, and safety requirements:
tone-consistency
: Measures consistency in formality, complexity, and style (0-1
, higher is better)toxicity
: Detects harmful or inappropriate content (0-1
, lower is better)bias
: Detects potential biases in the output (0-1
, lower is better)keyword-coverage
: Assesses technical terminology usage (0-1
, higher is better)