# Built-in Scorers Mastra provides a comprehensive set of built-in scorers for evaluating AI outputs. These scorers are optimized for common evaluation scenarios and are ready to use in your agents and workflows. To create your own scorers, see the [Custom Scorers](https://mastra.ai/docs/evals/custom-scorers) guide. ## Available scorers ### Accuracy and reliability These scorers evaluate how correct, truthful, and complete your agent's answers are: - [`answer-relevancy`](https://mastra.ai/reference/evals/answer-relevancy): Evaluates how well responses address the input query (`0-1`, higher is better) - [`answer-similarity`](https://mastra.ai/reference/evals/answer-similarity): Compares agent outputs against ground-truth answers for CI/CD testing using semantic analysis (`0-1`, higher is better) - [`faithfulness`](https://mastra.ai/reference/evals/faithfulness): Measures how accurately responses represent provided context (`0-1`, higher is better) - [`hallucination`](https://mastra.ai/reference/evals/hallucination): Detects factual contradictions and unsupported claims (`0-1`, lower is better) - [`completeness`](https://mastra.ai/reference/evals/completeness): Checks if responses include all necessary information (`0-1`, higher is better) - [`content-similarity`](https://mastra.ai/reference/evals/content-similarity): Measures textual similarity using character-level matching (`0-1`, higher is better) - [`textual-difference`](https://mastra.ai/reference/evals/textual-difference): Measures textual differences between strings (`0-1`, higher means more similar) - [`tool-call-accuracy`](https://mastra.ai/reference/evals/tool-call-accuracy): Evaluates whether the LLM selects the correct tool from available options (`0-1`, higher is better) - [`prompt-alignment`](https://mastra.ai/reference/evals/prompt-alignment): Measures how well agent responses align with user prompt intent, requirements, completeness, and format (`0-1`, higher is better) ### Context quality These scorers evaluate the quality and relevance of context used in generating responses: - [`context-precision`](https://mastra.ai/reference/evals/context-precision): Evaluates context relevance and ranking using Mean Average Precision, rewarding early placement of relevant context (`0-1`, higher is better) - [`context-relevance`](https://mastra.ai/reference/evals/context-relevance): Measures context utility with nuanced relevance levels, usage tracking, and missing context detection (`0-1`, higher is better) > tip Context Scorer Selection > > - Use **Context Precision** when context ordering matters and you need standard IR metrics (ideal for RAG ranking evaluation) > - Use **Context Relevance** when you need detailed relevance assessment and want to track context usage and identify gaps > > Both context scorers support: > > - **Static context**: Pre-defined context arrays > - **Dynamic context extraction**: Extract context from runs using custom functions (ideal for RAG systems, vector databases, etc.) ### Output quality These scorers evaluate adherence to format, style, and safety requirements: - [`tone-consistency`](https://mastra.ai/reference/evals/tone-consistency): Measures consistency in formality, complexity, and style (`0-1`, higher is better) - [`toxicity`](https://mastra.ai/reference/evals/toxicity): Detects harmful or inappropriate content (`0-1`, lower is better) - [`bias`](https://mastra.ai/reference/evals/bias): Detects potential biases in the output (`0-1`, lower is better) - [`keyword-coverage`](https://mastra.ai/reference/evals/keyword-coverage): Assesses technical terminology usage (`0-1`, higher is better)