# Built-in Scorers

Mastra provides a comprehensive set of built-in scorers for evaluating AI outputs. These scorers are optimized for common evaluation scenarios and are ready to use in your agents and workflows.

To create your own scorers, see the [Custom Scorers](https://mastra.ai/docs/evals/custom-scorers) guide.

## Available scorers

### Accuracy and reliability

These scorers evaluate how correct, truthful, and complete your agent's answers are:

- [`answer-relevancy`](https://mastra.ai/reference/evals/answer-relevancy): Evaluates how well responses address the input query (`0-1`, higher is better)
- [`answer-similarity`](https://mastra.ai/reference/evals/answer-similarity): Compares agent outputs against ground-truth answers for CI/CD testing using semantic analysis (`0-1`, higher is better)
- [`faithfulness`](https://mastra.ai/reference/evals/faithfulness): Measures how accurately responses represent provided context (`0-1`, higher is better)
- [`hallucination`](https://mastra.ai/reference/evals/hallucination): Detects factual contradictions and unsupported claims (`0-1`, lower is better)
- [`completeness`](https://mastra.ai/reference/evals/completeness): Checks if responses include all necessary information (`0-1`, higher is better)
- [`content-similarity`](https://mastra.ai/reference/evals/content-similarity): Measures textual similarity using character-level matching (`0-1`, higher is better)
- [`textual-difference`](https://mastra.ai/reference/evals/textual-difference): Measures textual differences between strings (`0-1`, higher means more similar)
- [`tool-call-accuracy`](https://mastra.ai/reference/evals/tool-call-accuracy): Evaluates whether the LLM selects the correct tool from available options (`0-1`, higher is better)
- [`prompt-alignment`](https://mastra.ai/reference/evals/prompt-alignment): Measures how well agent responses align with user prompt intent, requirements, completeness, and format (`0-1`, higher is better)

### Context quality

These scorers evaluate the quality and relevance of context used in generating responses:

- [`context-precision`](https://mastra.ai/reference/evals/context-precision): Evaluates context relevance and ranking using Mean Average Precision, rewarding early placement of relevant context (`0-1`, higher is better)
- [`context-relevance`](https://mastra.ai/reference/evals/context-relevance): Measures context utility with nuanced relevance levels, usage tracking, and missing context detection (`0-1`, higher is better)

> tip Context Scorer Selection
>
> - Use **Context Precision** when context ordering matters and you need standard IR metrics (ideal for RAG ranking evaluation)
> - Use **Context Relevance** when you need detailed relevance assessment and want to track context usage and identify gaps
>
> Both context scorers support:
>
> - **Static context**: Pre-defined context arrays
> - **Dynamic context extraction**: Extract context from runs using custom functions (ideal for RAG systems, vector databases, etc.)

### Output quality

These scorers evaluate adherence to format, style, and safety requirements:

- [`tone-consistency`](https://mastra.ai/reference/evals/tone-consistency): Measures consistency in formality, complexity, and style (`0-1`, higher is better)
- [`toxicity`](https://mastra.ai/reference/evals/toxicity): Detects harmful or inappropriate content (`0-1`, lower is better)
- [`bias`](https://mastra.ai/reference/evals/bias): Detects potential biases in the output (`0-1`, lower is better)
- [`keyword-coverage`](https://mastra.ai/reference/evals/keyword-coverage): Assesses technical terminology usage (`0-1`, higher is better)