Context Precision Scorer
The createContextPrecisionScorer()
function creates a scorer that evaluates how relevant and well-positioned retrieved context pieces are for generating expected outputs. It uses Mean Average Precision (MAP) to reward systems that place relevant context earlier in the sequence.
Parameters
model:
MastraLanguageModel
The language model to use for evaluating context relevance
options:
ContextPrecisionMetricOptions
Configuration options for the scorer
:::note
Either context
or contextExtractor
must be provided. If both are provided, contextExtractor
takes precedence.
:::
.run() Returns
score:
number
Mean Average Precision score between 0 and scale (default 0-1)
reason:
string
Human-readable explanation of the context precision evaluation
Scoring Details
Mean Average Precision (MAP)
Context Precision uses Mean Average Precision to evaluate both relevance and positioning:
- Context Evaluation: Each context piece is classified as relevant or irrelevant for generating the expected output
- Precision Calculation: For each relevant context at position
i
, precision =relevant_items_so_far / (i + 1)
- Average Precision: Sum all precision values and divide by total relevant items
- Final Score: Multiply by scale factor and round to 2 decimals
Scoring Formula
MAP = (Σ Precision@k) / R
Where:
- Precision@k = (relevant items in positions 1...k) / k
- R = total number of relevant items
- Only calculated at positions where relevant items appear
Score Interpretation
- 1.0 = Perfect precision (all relevant context appears first)
- 0.5-0.9 = Good precision with some relevant context well-positioned
- 0.1-0.4 = Poor precision with relevant context buried or scattered
- 0.0 = No relevant context found
Example Calculation
Given context: [relevant, irrelevant, relevant, irrelevant]
- Position 0: Relevant → Precision = 1/1 = 1.0
- Position 1: Skip (irrelevant)
- Position 2: Relevant → Precision = 2/3 = 0.67
- Position 3: Skip (irrelevant)
MAP = (1.0 + 0.67) / 2 = 0.835 ≈ 0.83
Usage Patterns
RAG System Evaluation
Ideal for evaluating retrieved context in RAG pipelines where:
- Context ordering matters for model performance
- You need to measure retrieval quality beyond simple relevance
- Early relevant context is more valuable than later relevant context
Context Window Optimization
Use when optimizing context selection for:
- Limited context windows
- Token budget constraints
- Multi-step reasoning tasks
Related
- Answer Relevancy Scorer - Evaluates if answers address the question
- Faithfulness Scorer - Measures answer groundedness in context
- Custom Scorers - Creating your own evaluation metrics