Skip to Content
ReferenceScorersexp.ContextRelevance

Context Relevance Scorer

The createContextRelevanceScorerLLM() function creates a scorer that evaluates how relevant and useful provided context was for generating agent responses. It uses weighted relevance levels and applies penalties for unused high-relevance context and missing information.

Parameters

model:

MastraLanguageModel
The language model to use for evaluating context relevance

options:

ContextRelevanceOptions
Configuration options for the scorer

:::note Either context or contextExtractor must be provided. If both are provided, contextExtractor takes precedence. :::

.run() Returns

score:

number
Weighted relevance score between 0 and scale (default 0-1)

reason:

string
Human-readable explanation of the context relevance evaluation

Scoring Details

Weighted Relevance Scoring

Context Relevance uses a sophisticated scoring algorithm that considers:

  1. Relevance Levels: Each context piece is classified with weighted values:

    • high = 1.0 (directly addresses the query)
    • medium = 0.7 (supporting information)
    • low = 0.3 (tangentially related)
    • none = 0.0 (completely irrelevant)
  2. Usage Detection: Tracks whether relevant context was actually used in the response

  3. Penalties Applied (configurable via penalties options):

    • Unused High-Relevance: unusedHighRelevanceContext penalty per unused high-relevance context (default: 0.1)
    • Missing Context: Up to maxMissingContextPenalty for identified missing information (default: 0.5)

Scoring Formula

Base Score = Σ(relevance_weights) / (num_contexts × 1.0) Usage Penalty = count(unused_high_relevance) × unusedHighRelevanceContext Missing Penalty = min(count(missing_context) × missingContextPerItem, maxMissingContextPenalty) Final Score = max(0, Base Score - Usage Penalty - Missing Penalty) × scale

Default Values:

  • unusedHighRelevanceContext = 0.1 (10% penalty per unused high-relevance context)
  • missingContextPerItem = 0.15 (15% penalty per missing context item)
  • maxMissingContextPenalty = 0.5 (maximum 50% penalty for missing context)
  • scale = 1

Score Interpretation

  • 0.9-1.0 = Excellent relevance with minimal gaps
  • 0.7-0.8 = Good relevance with some unused or missing context
  • 0.4-0.6 = Mixed relevance with significant gaps
  • 0.0-0.3 = Poor relevance or mostly irrelevant context

Difference from Context Precision

AspectContext RelevanceContext Precision
AlgorithmWeighted levels with penaltiesMean Average Precision (MAP)
RelevanceMultiple levels (high/medium/low/none)Binary (yes/no)
PositionNot consideredCritical (rewards early placement)
UsageTracks and penalizes unused contextNot considered
MissingIdentifies and penalizes gapsNot evaluated

Usage Examples

Basic Configuration

const scorer = createContextRelevanceScorerLLM({ model: openai('gpt-4o'), options: { context: ['Einstein won the Nobel Prize for his work on the photoelectric effect'], scale: 1, }, });

Custom Penalty Configuration

const scorer = createContextRelevanceScorerLLM({ model: openai('gpt-4o'), options: { context: ['Context information...'], penalties: { unusedHighRelevanceContext: 0.05, // Lower penalty for unused context missingContextPerItem: 0.2, // Higher penalty per missing item maxMissingContextPenalty: 0.4, // Lower maximum penalty cap }, scale: 2, // Double the final score }, });

Dynamic Context Extraction

const scorer = createContextRelevanceScorerLLM({ model: openai('gpt-4o'), options: { contextExtractor: (input, output) => { // Extract context based on the query const userQuery = input?.inputMessages?.[0]?.content || ''; if (userQuery.includes('Einstein')) { return [ 'Einstein won the Nobel Prize for the photoelectric effect', 'He developed the theory of relativity' ]; } return ['General physics information']; }, penalties: { unusedHighRelevanceContext: 0.15, }, }, });

Usage Patterns

Content Generation Evaluation

Best for evaluating context quality in:

  • Chat systems where context usage matters
  • RAG pipelines needing nuanced relevance assessment
  • Systems where missing context affects quality

Context Selection Optimization

Use when optimizing for:

  • Comprehensive context coverage
  • Effective context utilization
  • Identifying context gaps