Context Relevance Scorer

The createContextRelevanceScorerLLM() function creates a scorer that evaluates how relevant and useful provided context was for generating agent responses. It uses weighted relevance levels and applies penalties for unused high-relevance context and missing information.

Parameters

model:

MastraLanguageModel

The language model to use for evaluating context relevance

options:

ContextRelevanceOptions

Configuration options for the scorer

note

Either context or contextExtractor must be provided. If both are provided, contextExtractor takes precedence.

.run() Returns

score:

number

Weighted relevance score between 0 and scale (default 0-1)

reason:

string

Human-readable explanation of the context relevance evaluation

Scoring Details

Weighted Relevance Scoring

Context Relevance uses a sophisticated scoring algorithm that considers:

Relevance Levels: Each context piece is classified with weighted values:
- high = 1.0 (directly addresses the query)
- medium = 0.7 (supporting information)
- low = 0.3 (tangentially related)
- none = 0.0 (completely irrelevant)
Usage Detection: Tracks whether relevant context was actually used in the response
Penalties Applied (configurable via penalties options):
- Unused High-Relevance: unusedHighRelevanceContext penalty per unused high-relevance context (default: 0.1)
- Missing Context: Up to maxMissingContextPenalty for identified missing information (default: 0.5)

Scoring Formula

Base Score = Σ(relevance_weights) / (num_contexts × 1.0)
Usage Penalty = count(unused_high_relevance) × unusedHighRelevanceContext
Missing Penalty = min(count(missing_context) × missingContextPerItem, maxMissingContextPenalty)

Final Score = max(0, Base Score - Usage Penalty - Missing Penalty) × scale

Default Values:

unusedHighRelevanceContext = 0.1 (10% penalty per unused high-relevance context)
missingContextPerItem = 0.15 (15% penalty per missing context item)
maxMissingContextPenalty = 0.5 (maximum 50% penalty for missing context)
scale = 1

Score Interpretation

0.9-1.0 = Excellent relevance with minimal gaps
0.7-0.8 = Good relevance with some unused or missing context
0.4-0.6 = Mixed relevance with significant gaps
0.0-0.3 = Poor relevance or mostly irrelevant context

Difference from Context Precision

Aspect	Context Relevance	Context Precision
Algorithm	Weighted levels with penalties	Mean Average Precision (MAP)
Relevance	Multiple levels (high/medium/low/none)	Binary (yes/no)
Position	Not considered	Critical (rewards early placement)
Usage	Tracks and penalizes unused context	Not considered
Missing	Identifies and penalizes gaps	Not evaluated

Usage Examples

Basic Configuration

const scorer = createContextRelevanceScorerLLM({
  model: openai("gpt-4o"),
  options: {
    context: [
      "Einstein won the Nobel Prize for his work on the photoelectric effect",
    ],
    scale: 1,
  },
});

Custom Penalty Configuration

const scorer = createContextRelevanceScorerLLM({
  model: openai("gpt-4o"),
  options: {
    context: ["Context information..."],
    penalties: {
      unusedHighRelevanceContext: 0.05, // Lower penalty for unused context
      missingContextPerItem: 0.2, // Higher penalty per missing item
      maxMissingContextPenalty: 0.4, // Lower maximum penalty cap
    },
    scale: 2, // Double the final score
  },
});

Dynamic Context Extraction

const scorer = createContextRelevanceScorerLLM({
  model: openai("gpt-4o"),
  options: {
    contextExtractor: (input, output) => {
      // Extract context based on the query
      const userQuery = input?.inputMessages?.[0]?.content || "";
      if (userQuery.includes("Einstein")) {
        return [
          "Einstein won the Nobel Prize for the photoelectric effect",
          "He developed the theory of relativity",
        ];
      }
      return ["General physics information"];
    },
    penalties: {
      unusedHighRelevanceContext: 0.15,
    },
  },
});

Usage Patterns

Content Generation Evaluation

Best for evaluating context quality in:

Chat systems where context usage matters
RAG pipelines needing nuanced relevance assessment
Systems where missing context affects quality

Context Selection Optimization

Use when optimizing for:

Comprehensive context coverage
Effective context utilization
Identifying context gaps

Context Precision Scorer - Evaluates context ranking using MAP
Faithfulness Scorer - Measures answer groundedness in context
Custom Scorers - Creating your own evaluation metrics

Parameters​

model:

options:

.run() Returns​

score:

reason:

Scoring Details​

Weighted Relevance Scoring​

Scoring Formula​

Score Interpretation​

Difference from Context Precision​

Usage Examples​

Basic Configuration​

Custom Penalty Configuration​

Dynamic Context Extraction​

Usage Patterns​

Content Generation Evaluation​

Context Selection Optimization​

Related​