Answer Relevancy Evaluation
This example demonstrates how to use Mastra’s Answer Relevancy metric to evaluate how well responses address their input queries.
Overview
The example shows how to:
- Configure the Answer Relevancy metric
- Evaluate response relevancy to queries
- Analyze relevancy scores
- Handle different relevancy scenarios
Setup
Environment Setup
Make sure to set up your environment variables:
.env
OPENAI_API_KEY=your_api_key_here
Dependencies
Import the necessary dependencies:
src/index.ts
import { openai } from '@ai-sdk/openai';
import { AnswerRelevancyMetric } from '@mastra/evals/llm';
Metric Configuration
Set up the Answer Relevancy metric with custom parameters:
src/index.ts
const metric = new AnswerRelevancyMetric(openai('gpt-4o-mini'), {
uncertaintyWeight: 0.3, // Weight for 'unsure' verdicts
scale: 1, // Scale for the final score
});
Example Usage
High Relevancy Example
Evaluate a highly relevant response:
src/index.ts
const query1 = 'What are the health benefits of regular exercise?';
const response1 =
'Regular exercise improves cardiovascular health, strengthens muscles, boosts metabolism, and enhances mental well-being through the release of endorphins.';
console.log('Example 1 - High Relevancy:');
console.log('Query:', query1);
console.log('Response:', response1);
const result1 = await metric.measure(query1, response1);
console.log('Metric Result:', {
score: result1.score,
reason: result1.info.reason,
});
// Example Output:
// Metric Result: { score: 1, reason: 'The response is highly relevant to the query. It provides a comprehensive overview of the health benefits of regular exercise.' }
Partial Relevancy Example
Evaluate a partially relevant response:
src/index.ts
const query2 = 'What should a healthy breakfast include?';
const response2 =
'A nutritious breakfast should include whole grains and protein. However, the timing of your breakfast is just as important - studies show eating within 2 hours of waking optimizes metabolism and energy levels throughout the day.';
console.log('Example 2 - Partial Relevancy:');
console.log('Query:', query2);
console.log('Response:', response2);
const result2 = await metric.measure(query2, response2);
console.log('Metric Result:', {
score: result2.score,
reason: result2.info.reason,
});
// Example Output:
// Metric Result: { score: 0.7, reason: 'The response is partially relevant to the query. It provides some information about healthy breakfast choices but misses the timing aspect.' }
Low Relevancy Example
Evaluate an irrelevant response:
src/index.ts
const query3 = 'What are the benefits of meditation?';
const response3 =
'The Great Wall of China is over 13,000 miles long and was built during the Ming Dynasty to protect against invasions.';
console.log('Example 3 - Low Relevancy:');
console.log('Query:', query3);
console.log('Response:', response3);
const result3 = await metric.measure(query3, response3);
console.log('Metric Result:', {
score: result3.score,
reason: result3.info.reason,
});
// Example Output:
// Metric Result: { score: 0.1, reason: 'The response is not relevant to the query. It provides information about the Great Wall of China but does not mention meditation.' }
Understanding the Results
The metric provides:
-
A relevancy score between 0 and 1:
- 1.0: Perfect relevancy - response directly addresses the query
- 0.7-0.9: High relevancy - response mostly addresses the query
- 0.4-0.6: Moderate relevancy - response partially addresses the query
- 0.1-0.3: Low relevancy - response barely addresses the query
- 0.0: No relevancy - response does not address the query at all
-
Detailed reason for the score, including analysis of:
- Query-response alignment
- Topic focus
- Information relevance
- Improvement suggestions
View Example on GitHub