ExamplesEvalsAnswer Relevancy

Answer Relevancy Evaluation

This example demonstrates how to use Mastra’s Answer Relevancy metric to evaluate how well responses address their input queries.

Overview

The example shows how to:

  1. Configure the Answer Relevancy metric
  2. Evaluate response relevancy to queries
  3. Analyze relevancy scores
  4. Handle different relevancy scenarios

Setup

Environment Setup

Make sure to set up your environment variables:

.env
OPENAI_API_KEY=your_api_key_here

Dependencies

Import the necessary dependencies:

src/index.ts
import { openai } from '@ai-sdk/openai';
import { AnswerRelevancyMetric } from '@mastra/evals/llm';

Metric Configuration

Set up the Answer Relevancy metric with custom parameters:

src/index.ts
const metric = new AnswerRelevancyMetric(openai('gpt-4o-mini'), {
  uncertaintyWeight: 0.3, // Weight for 'unsure' verdicts
  scale: 1, // Scale for the final score
});

Example Usage

High Relevancy Example

Evaluate a highly relevant response:

src/index.ts
const query1 = 'What are the health benefits of regular exercise?';
const response1 =
  'Regular exercise improves cardiovascular health, strengthens muscles, boosts metabolism, and enhances mental well-being through the release of endorphins.';
 
console.log('Example 1 - High Relevancy:');
console.log('Query:', query1);
console.log('Response:', response1);
 
const result1 = await metric.measure(query1, response1);
console.log('Metric Result:', {
  score: result1.score,
  reason: result1.info.reason,
});
// Example Output:
// Metric Result: { score: 1, reason: 'The response is highly relevant to the query. It provides a comprehensive overview of the health benefits of regular exercise.' }

Partial Relevancy Example

Evaluate a partially relevant response:

src/index.ts
const query2 = 'What should a healthy breakfast include?';
const response2 =
  'A nutritious breakfast should include whole grains and protein. However, the timing of your breakfast is just as important - studies show eating within 2 hours of waking optimizes metabolism and energy levels throughout the day.';
 
console.log('Example 2 - Partial Relevancy:');
console.log('Query:', query2);
console.log('Response:', response2);
 
const result2 = await metric.measure(query2, response2);
console.log('Metric Result:', {
  score: result2.score,
  reason: result2.info.reason,
});
// Example Output:
// Metric Result: { score: 0.7, reason: 'The response is partially relevant to the query. It provides some information about healthy breakfast choices but misses the timing aspect.' }

Low Relevancy Example

Evaluate an irrelevant response:

src/index.ts
const query3 = 'What are the benefits of meditation?';
const response3 =
  'The Great Wall of China is over 13,000 miles long and was built during the Ming Dynasty to protect against invasions.';
 
console.log('Example 3 - Low Relevancy:');
console.log('Query:', query3);
console.log('Response:', response3);
 
const result3 = await metric.measure(query3, response3);
console.log('Metric Result:', {
  score: result3.score,
  reason: result3.info.reason,
});
// Example Output:
// Metric Result: { score: 0.1, reason: 'The response is not relevant to the query. It provides information about the Great Wall of China but does not mention meditation.' }

Understanding the Results

The metric provides:

  1. A relevancy score between 0 and 1:

    • 1.0: Perfect relevancy - response directly addresses the query
    • 0.7-0.9: High relevancy - response mostly addresses the query
    • 0.4-0.6: Moderate relevancy - response partially addresses the query
    • 0.1-0.3: Low relevancy - response barely addresses the query
    • 0.0: No relevancy - response does not address the query at all
  2. Detailed reason for the score, including analysis of:

    • Query-response alignment
    • Topic focus
    • Information relevance
    • Improvement suggestions





View Example on GitHub