Context Relevance Scorer
The createContextRelevanceScorerLLM() function creates a scorer that evaluates how relevant and useful provided context was for generating agent responses. It uses weighted relevance levels and applies penalties for unused high-relevance context and missing information.
Parameters
model:
MastraLanguageModel
The language model to use for evaluating context relevance
options:
ContextRelevanceOptions
Configuration options for the scorer
note
Either context or contextExtractor must be provided. If both are provided, contextExtractor takes precedence.
.run() Returns
score:
number
Weighted relevance score between 0 and scale (default 0-1)
reason:
string
Human-readable explanation of the context relevance evaluation
Scoring Details
Weighted Relevance Scoring
Context Relevance uses a sophisticated scoring algorithm that considers:
-
Relevance Levels: Each context piece is classified with weighted values:
high= 1.0 (directly addresses the query)medium= 0.7 (supporting information)low= 0.3 (tangentially related)none= 0.0 (completely irrelevant)
-
Usage Detection: Tracks whether relevant context was actually used in the response
-
Penalties Applied (configurable via
penaltiesoptions):- Unused High-Relevance:
unusedHighRelevanceContextpenalty per unused high-relevance context (default: 0.1) - Missing Context: Up to
maxMissingContextPenaltyfor identified missing information (default: 0.5)
- Unused High-Relevance:
Scoring Formula
Base Score = Σ(relevance_weights) / (num_contexts × 1.0)
Usage Penalty = count(unused_high_relevance) × unusedHighRelevanceContext
Missing Penalty = min(count(missing_context) × missingContextPerItem, maxMissingContextPenalty)
Final Score = max(0, Base Score - Usage Penalty - Missing Penalty) × scale
Default Values:
unusedHighRelevanceContext= 0.1 (10% penalty per unused high-relevance context)missingContextPerItem= 0.15 (15% penalty per missing context item)maxMissingContextPenalty= 0.5 (maximum 50% penalty for missing context)scale= 1
Score Interpretation
- 0.9-1.0 = Excellent relevance with minimal gaps
- 0.7-0.8 = Good relevance with some unused or missing context
- 0.4-0.6 = Mixed relevance with significant gaps
- 0.0-0.3 = Poor relevance or mostly irrelevant context
Difference from Context Precision
| Aspect | Context Relevance | Context Precision |
|---|---|---|
| Algorithm | Weighted levels with penalties | Mean Average Precision (MAP) |
| Relevance | Multiple levels (high/medium/low/none) | Binary (yes/no) |
| Position | Not considered | Critical (rewards early placement) |
| Usage | Tracks and penalizes unused context | Not considered |
| Missing | Identifies and penalizes gaps | Not evaluated |
Usage Examples
Basic Configuration
const scorer = createContextRelevanceScorerLLM({
model: openai("gpt-4o"),
options: {
context: [
"Einstein won the Nobel Prize for his work on the photoelectric effect",
],
scale: 1,
},
});
Custom Penalty Configuration
const scorer = createContextRelevanceScorerLLM({
model: openai("gpt-4o"),
options: {
context: ["Context information..."],
penalties: {
unusedHighRelevanceContext: 0.05, // Lower penalty for unused context
missingContextPerItem: 0.2, // Higher penalty per missing item
maxMissingContextPenalty: 0.4, // Lower maximum penalty cap
},
scale: 2, // Double the final score
},
});
Dynamic Context Extraction
const scorer = createContextRelevanceScorerLLM({
model: openai("gpt-4o"),
options: {
contextExtractor: (input, output) => {
// Extract context based on the query
const userQuery = input?.inputMessages?.[0]?.content || "";
if (userQuery.includes("Einstein")) {
return [
"Einstein won the Nobel Prize for the photoelectric effect",
"He developed the theory of relativity",
];
}
return ["General physics information"];
},
penalties: {
unusedHighRelevanceContext: 0.15,
},
},
});
Usage Patterns
Content Generation Evaluation
Best for evaluating context quality in:
- Chat systems where context usage matters
- RAG pipelines needing nuanced relevance assessment
- Systems where missing context affects quality
Context Selection Optimization
Use when optimizing for:
- Comprehensive context coverage
- Effective context utilization
- Identifying context gaps
Related
- Context Precision Scorer - Evaluates context ranking using MAP
- Faithfulness Scorer - Measures answer groundedness in context
- Custom Scorers - Creating your own evaluation metrics