ExamplesEvalsContext Position

Context Position

This example demonstrates how to use Mastra’s Context Position metric to evaluate how well responses maintain the sequential order of information.

Overview

The example shows how to:

  1. Configure the Context Position metric
  2. Evaluate position adherence
  3. Analyze sequential ordering
  4. Handle different sequence types

Setup

Environment Setup

Make sure to set up your environment variables:

.env
OPENAI_API_KEY=your_api_key_here

Dependencies

Import the necessary dependencies:

src/index.ts
import { openai } from '@ai-sdk/openai';
import { ContextPositionMetric } from '@mastra/evals/llm';

Example Usage

High Position Adherence Example

Evaluate a response that follows sequential steps:

src/index.ts
const context1 = [
  'The capital of France is Paris.',
  'Paris has been the capital since 508 CE.',
  'Paris serves as France\'s political center.',
  'The capital city hosts the French government.',
];
 
const metric1 = new ContextPositionMetric(openai('gpt-4o-mini'), {
  context: context1,
});
 
const query1 = 'What is the capital of France?';
const response1 = 'The capital of France is Paris.';
 
console.log('Example 1 - High Position Adherence:');
console.log('Context:', context1);
console.log('Query:', query1);
console.log('Response:', response1);
 
const result1 = await metric1.measure(query1, response1);
console.log('Metric Result:', {
  score: result1.score,
  reason: result1.info.reason,
});
// Example Output:
// Metric Result: { score: 1, reason: 'The context is in the correct sequential order.' }

Mixed Position Adherence Example

Evaluate a response where relevant information is scattered:

src/index.ts
const context2 = [
  'Elephants are herbivores.',
  'Adult elephants can weigh up to 13,000 pounds.',
  'Elephants are the largest land animals.',
  'Elephants eat plants and grass.',
];
 
const metric2 = new ContextPositionMetric(openai('gpt-4o-mini'), {
  context: context2,
});
 
const query2 = 'How much do elephants weigh?';
const response2 = 'Adult elephants can weigh up to 13,000 pounds, making them the largest land animals.';
 
console.log('Example 2 - Mixed Position Adherence:');
console.log('Context:', context2);
console.log('Query:', query2);
console.log('Response:', response2);
 
const result2 = await metric2.measure(query2, response2);
console.log('Metric Result:', {
  score: result2.score,
  reason: result2.info.reason,
});
// Example Output:
// Metric Result: { score: 0.4, reason: 'The context includes relevant information and irrelevant information and is not in the correct sequential order.' }

Low Position Adherence Example

Evaluate a response where relevant information appears last:

src/index.ts
const context3 = [
  'Rainbows appear in the sky.',
  'Rainbows have different colors.',
  'Rainbows are curved in shape.',
  'Rainbows form when sunlight hits water droplets.',
];
 
const metric3 = new ContextPositionMetric(openai('gpt-4o-mini'), {
  context: context3,
});
 
const query3 = 'How do rainbows form?';
const response3 = 'Rainbows are created when sunlight interacts with water droplets in the air.';
 
console.log('Example 3 - Low Position Adherence:');
console.log('Context:', context3);
console.log('Query:', query3);
console.log('Response:', response3);
 
const result3 = await metric3.measure(query3, response3);
console.log('Metric Result:', {
  score: result3.score,
  reason: result3.info.reason,
});
// Example Output:
// Metric Result: { score: 0.12, reason: 'The context includes some relevant information, but most of the relevant information is at the end.' }

Understanding the Results

The metric provides:

  1. A position score between 0 and 1:

    • 1.0: Perfect position adherence - most relevant information appears first
    • 0.7-0.9: Strong position adherence - relevant information mostly at the beginning
    • 0.4-0.6: Mixed position adherence - relevant information scattered throughout
    • 0.1-0.3: Weak position adherence - relevant information mostly at the end
    • 0.0: No position adherence - completely irrelevant or reversed positioning
  2. Detailed reason for the score, including analysis of:

    • Information relevance to query and response
    • Position of relevant information in context
    • Importance of early vs. late context
    • Overall context organization





View Example on GitHub