Context Precision
This example demonstrates how to use Mastra’s Context Precision metric to evaluate how precisely responses use provided context information.
Overview
The example shows how to:
- Configure the Context Precision metric
- Evaluate context precision
- Analyze precision scores
- Handle different precision levels
Setup
Environment Setup
Make sure to set up your environment variables:
.env
OPENAI_API_KEY=your_api_key_here
Dependencies
Import the necessary dependencies:
src/index.ts
import { openai } from "@ai-sdk/openai";
import { ContextPrecisionMetric } from "@mastra/evals/llm";
Example Usage
High Precision Example
Evaluate a response where all context is relevant:
src/index.ts
const context1 = [
"Photosynthesis converts sunlight into energy.",
"Plants use chlorophyll for photosynthesis.",
"Photosynthesis produces oxygen as a byproduct.",
"The process requires sunlight and chlorophyll.",
];
const metric1 = new ContextPrecisionMetric(openai("gpt-4o-mini"), {
context: context1,
});
const query1 = "What is photosynthesis and how does it work?";
const response1 =
"Photosynthesis is a process where plants convert sunlight into energy using chlorophyll, producing oxygen as a byproduct.";
console.log("Example 1 - High Precision:");
console.log("Context:", context1);
console.log("Query:", query1);
console.log("Response:", response1);
const result1 = await metric1.measure(query1, response1);
console.log("Metric Result:", {
score: result1.score,
reason: result1.info.reason,
});
// Example Output:
// Metric Result: { score: 1, reason: 'The context uses all relevant information and does not include any irrelevant information.' }
Mixed Precision Example
Evaluate a response where some context is irrelevant:
src/index.ts
const context2 = [
"Volcanoes are openings in the Earth's crust.",
"Volcanoes can be active, dormant, or extinct.",
"Hawaii has many active volcanoes.",
"The Pacific Ring of Fire has many volcanoes.",
];
const metric2 = new ContextPrecisionMetric(openai("gpt-4o-mini"), {
context: context2,
});
const query2 = "What are the different types of volcanoes?";
const response2 =
"Volcanoes can be classified as active, dormant, or extinct based on their activity status.";
console.log("Example 2 - Mixed Precision:");
console.log("Context:", context2);
console.log("Query:", query2);
console.log("Response:", response2);
const result2 = await metric2.measure(query2, response2);
console.log("Metric Result:", {
score: result2.score,
reason: result2.info.reason,
});
// Example Output:
// Metric Result: { score: 0.5, reason: 'The context uses some relevant information and includes some irrelevant information.' }
Low Precision Example
Evaluate a response where most context is irrelevant:
src/index.ts
const context3 = [
"The Nile River is in Africa.",
"The Nile is the longest river.",
"Ancient Egyptians used the Nile.",
"The Nile flows north.",
];
const metric3 = new ContextPrecisionMetric(openai("gpt-4o-mini"), {
context: context3,
});
const query3 = "Which direction does the Nile River flow?";
const response3 = "The Nile River flows northward.";
console.log("Example 3 - Low Precision:");
console.log("Context:", context3);
console.log("Query:", query3);
console.log("Response:", response3);
const result3 = await metric3.measure(query3, response3);
console.log("Metric Result:", {
score: result3.score,
reason: result3.info.reason,
});
// Example Output:
// Metric Result: { score: 0.2, reason: 'The context only has one relevant piece, which is at the end.' }
Understanding the Results
The metric provides:
-
A precision score between 0 and 1:
- 1.0: Perfect precision - all context pieces are relevant and used
- 0.7-0.9: High precision - most context pieces are relevant
- 0.4-0.6: Mixed precision - some context pieces are relevant
- 0.1-0.3: Low precision - few context pieces are relevant
- 0.0: No precision - no context pieces are relevant
-
Detailed reason for the score, including analysis of:
- Relevance of each context piece
- Usage in the response
- Contribution to answering the query
- Overall context usefulness
View Example on GitHub