# Answer Relevancy Scorer The `createAnswerRelevancyScorer()` function accepts a single options object with the following properties: ## Parameters **model:** (`LanguageModel`): Configuration for the model used to evaluate relevancy. **uncertaintyWeight:** (`number`): Weight given to 'unsure' verdicts in scoring (0-1). (Default: `0.3`) **scale:** (`number`): Maximum score value. (Default: `1`) This function returns an instance of the MastraScorer class. The `.run()` method accepts the same input as other scorers (see the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer)), but the return value includes LLM-specific fields as documented below. ## .run() Returns **runId:** (`string`): The id of the run (optional). **score:** (`number`): Relevancy score (0 to scale, default 0-1) **preprocessPrompt:** (`string`): The prompt sent to the LLM for the preprocess step (optional). **preprocessStepResult:** (`object`): Object with extracted statements: { statements: string\[] } **analyzePrompt:** (`string`): The prompt sent to the LLM for the analyze step (optional). **analyzeStepResult:** (`object`): Object with results: { results: Array<{ result: 'yes' | 'unsure' | 'no', reason: string }> } **generateReasonPrompt:** (`string`): The prompt sent to the LLM for the reason step (optional). **reason:** (`string`): Explanation of the score. ## Scoring Details The scorer evaluates relevancy through query-answer alignment, considering completeness and detail level, but not factual correctness. ### Scoring Process 1. **Statement Preprocess:** - Breaks output into meaningful statements while preserving context. 2. **Relevance Analysis:** - Each statement is evaluated as: - "yes": Full weight for direct matches - "unsure": Partial weight (default: 0.3) for approximate matches - "no": Zero weight for irrelevant content 3. **Score Calculation:** - `((direct + uncertainty * partial) / total_statements) * scale` ### Score Interpretation A relevancy score between 0 and 1: - **1.0**: The response fully answers the query with relevant and focused information. - **0.7–0.9**: The response mostly answers the query but may include minor unrelated content. - **0.4–0.6**: The response partially answers the query, mixing relevant and unrelated information. - **0.1–0.3**: The response includes minimal relevant content and largely misses the intent of the query. - **0.0**: The response is entirely unrelated and does not answer the query. ## Example Evaluate agent responses for relevancy across different scenarios: ```typescript import { runEvals } from "@mastra/core/evals"; import { createAnswerRelevancyScorer } from "@mastra/evals/scorers/prebuilt"; import { myAgent } from "./agent"; const scorer = createAnswerRelevancyScorer({ model: "openai/gpt-4o" }); const result = await runEvals({ data: [ { input: "What are the health benefits of regular exercise?", }, { input: "What should a healthy breakfast include?", }, { input: "What are the benefits of meditation?", }, ], scorers: [scorer], target: myAgent, onItemComplete: ({ scorerResults }) => { console.log({ score: scorerResults[scorer.id].score, reason: scorerResults[scorer.id].reason, }); }, }); console.log(result.scores); ``` For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals). To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide. ## Related - [Faithfulness Scorer](https://mastra.ai/reference/evals/faithfulness)