Skip to Content
ReferenceScorersAnswerRelevancy

Answer Relevancy Scorer

The createAnswerRelevancyScorer() function accepts a single options object with the following properties:

For usage example, see the Answer Relevancy Examples.

Parameters

model:

LanguageModel
Configuration for the model used to evaluate relevancy.

uncertaintyWeight:

number
= 0.3
Weight given to 'unsure' verdicts in scoring (0-1).

scale:

number
= 1
Maximum score value.

This function returns an instance of the MastraScorer class. The .run() method accepts the same input as other scorers (see the MastraScorer reference), but the return value includes LLM-specific fields as documented below.

.run() Returns

runId:

string
The id of the run (optional).

score:

number
Relevancy score (0 to scale, default 0-1)

extractPrompt:

string
The prompt sent to the LLM for the extract step (optional).

extractStepResult:

object
Object with extracted statements: { statements: string[] }

analyzePrompt:

string
The prompt sent to the LLM for the analyze step (optional).

analyzeStepResult:

object
Object with results: { results: Array<{ result: 'yes' | 'unsure' | 'no', reason: string }> }

reasonPrompt:

string
The prompt sent to the LLM for the reason step (optional).

reason:

string
Explanation of the score.

Scoring Details

The scorer evaluates relevancy through query-answer alignment, considering completeness and detail level, but not factual correctness.

Scoring Process

  1. Statement Extraction:
    • Breaks output into meaningful statements while preserving context.
  2. Relevance Analysis:
    • Each statement is evaluated as:
      • “yes”: Full weight for direct matches
      • “unsure”: Partial weight (default: 0.3) for approximate matches
      • “no”: Zero weight for irrelevant content
  3. Score Calculation:
    • ((direct + uncertainty * partial) / total_statements) * scale

Score Interpretation

  • 1.0: Perfect relevance - complete and accurate
  • 0.7-0.9: High relevance - minor gaps or imprecisions
  • 0.4-0.6: Moderate relevance - significant gaps
  • 0.1-0.3: Low relevance - major issues
  • 0.0: No relevance - incorrect or off-topic