DocsReferenceEvalsPromptAlignment

PromptAlignmentMetric

The PromptAlignmentMetric class evaluates how strictly an LLM’s output follows a set of given prompt instructions. It uses a judge-based system to verify each instruction is followed exactly and provides detailed reasoning for any deviations.

Basic Usage

import { PromptAlignmentMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = {
  provider: "OPEN_AI",
  name: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
};
 
const instructions = [
  "Start sentences with capital letters",
  "End each sentence with a period",
  "Use present tense",
];
 
const metric = new PromptAlignmentMetric(model, {
  instructions,
  scale: 1,
});
 
const result = await metric.measure(
  "describe the weather",
  "The sun is shining. Clouds float in the sky. A gentle breeze blows.",
);
 
console.log(result.score); // Alignment score from 0-1
console.log(result.info.reason); // Explanation of the score

Constructor Parameters

model:

ModelConfig
Configuration for the model used to evaluate instruction alignment

options:

PromptAlignmentOptions
Configuration options for the metric

PromptAlignmentOptions

instructions:

string[]
Array of instructions that the output should follow

scale?:

number
= 1
Maximum score value

measure() Parameters

input:

string
The original prompt or query

output:

string
The LLM's response to evaluate

Returns

score:

number
Alignment score (0 to scale, default 0-1)

info:

object
Object containing detailed metrics about instruction compliance
string

reason:

string
Detailed explanation of the score and instruction compliance

Scoring Details

The metric evaluates instruction alignment through:

  • Individual assessment of each instruction’s compliance
  • Strict binary verdicts (yes/no) for each instruction
  • Detailed reasoning for any non-compliance
  • Equal weighting of all instructions

The scoring process:

  1. Evaluates each instruction independently
  2. Assigns binary scores (1 for compliant, 0 for non-compliant)
  3. Calculates percentage of followed instructions
  4. Scales to configured range (default 0-1)

Score interpretation:

  • 1.0: All instructions followed perfectly
  • 0.7-0.9: Most instructions followed with minor deviations
  • 0.4-0.6: Mixed compliance with instructions
  • 0.1-0.3: Limited compliance with instructions
  • 0: No instructions followed correctly

Example with Analysis

const metric = new PromptAlignmentMetric(model, {
  instructions: [
    "Use bullet points for each item",
    "Include exactly three examples",
    "End each point with a semicolon"
  ],
  scale: 1
});
 
const result = await metric.measure(
  "List three fruits",
  "• Apple is red and sweet;
• Banana is yellow and curved;
• Orange is citrus and round."
);
 
// Example output:
// {
//   score: 1.0,
//   info: {
//     reason: "The score is 1.0 because all instructions were followed exactly:
//           bullet points were used, exactly three examples were provided, and
//           each point ends with a semicolon."
//   }
// }
 
const result2 = await metric.measure(
  "List three fruits",
  "1. Apple
2. Banana
3. Orange and Grape"
);
 
// Example output:
// {
//   score: 0.33,
//   info: {
//     reason: "The score is 0.33 because: numbered lists were used instead of bullet points,
//           no semicolons were used, and four fruits were listed instead of exactly three."
//   }
// }