PromptAlignmentMetric

Scorers

This documentation refers to the legacy evals API. For the latest scorer features, see Scorers.

The PromptAlignmentMetric class evaluates how strictly an LLM's output follows a set of given prompt instructions. It uses a judge-based system to verify each instruction is followed exactly and provides detailed reasoning for any deviations.

Basic UsageDirect link to Basic Usage

import { openai } from "@ai-sdk/openai";
import { PromptAlignmentMetric } from "@mastra/evals/llm";

// Configure the model for evaluation
const model = openai("gpt-4o-mini");

const instructions = [
  "Start sentences with capital letters",
  "End each sentence with a period",
  "Use present tense",
];

const metric = new PromptAlignmentMetric(model, {
  instructions,
  scale: 1,
});

const result = await metric.measure(
  "describe the weather",
  "The sun is shining. Clouds float in the sky. A gentle breeze blows.",
);

console.log(result.score); // Alignment score from 0-1
console.log(result.info.reason); // Explanation of the score

Constructor ParametersDirect link to Constructor Parameters

model:

LanguageModel

Configuration for the model used to evaluate instruction alignment

options:

PromptAlignmentOptions

Configuration options for the metric

PromptAlignmentOptionsDirect link to PromptAlignmentOptions

instructions:

string[]

Array of instructions that the output should follow

scale?:

number

= 1

Maximum score value

measure() ParametersDirect link to measure() Parameters

input:

string

The original prompt or query

output:

string

The LLM's response to evaluate

ReturnsDirect link to Returns

score:

number

Alignment score (0 to scale, default 0-1)

info:

object

Object containing detailed metrics about instruction compliance

string

reason:

string

Detailed explanation of the score and instruction compliance

Scoring DetailsDirect link to Scoring Details

The metric evaluates instruction alignment through:

Applicability assessment for each instruction
Strict compliance evaluation for applicable instructions
Detailed reasoning for all verdicts
Proportional scoring based on applicable instructions

Instruction VerdictsDirect link to Instruction Verdicts

Each instruction receives one of three verdicts:

"yes": Instruction is applicable and completely followed
"no": Instruction is applicable but not followed or only partially followed
"n/a": Instruction is not applicable to the given context

Scoring ProcessDirect link to Scoring Process

Evaluates instruction applicability:
- Determines if each instruction applies to the context
- Marks irrelevant instructions as "n/a"
- Considers domain-specific requirements
Assesses compliance for applicable instructions:
- Evaluates each applicable instruction independently
- Requires complete compliance for "yes" verdict
- Documents specific reasons for all verdicts
Calculates alignment score:
- Counts followed instructions ("yes" verdicts)
- Divides by total applicable instructions (excluding "n/a")
- Scales to configured range

Final score: (followed_instructions / applicable_instructions) * scale

Important ConsiderationsDirect link to Important Considerations

Empty outputs:
- All formatting instructions are considered applicable
- Marked as "no" since they cannot satisfy requirements
Domain-specific instructions:
- Always applicable if about the queried domain
- Marked as "no" if not followed, not "n/a"
"n/a" verdicts:
- Only used for completely different domains
- Do not affect the final score calculation

Score interpretationDirect link to Score interpretation

(0 to scale, default 0-1)

1.0: All applicable instructions followed perfectly
0.7-0.9: Most applicable instructions followed
0.4-0.6: Mixed compliance with applicable instructions
0.1-0.3: Limited compliance with applicable instructions
0.0: No applicable instructions followed

Example with AnalysisDirect link to Example with Analysis

import { openai } from "@ai-sdk/openai";
import { PromptAlignmentMetric } from "@mastra/evals/llm";

// Configure the model for evaluation
const model = openai("gpt-4o-mini");

const metric = new PromptAlignmentMetric(model, {
  instructions: [
    "Use bullet points for each item",
    "Include exactly three examples",
    "End each point with a semicolon"
  ],
  scale: 1
});

const result = await metric.measure(
  "List three fruits",
  "• Apple is red and sweet;
• Banana is yellow and curved;
• Orange is citrus and round."
);

// Example output:
// {
//   score: 1.0,
//   info: {
//     reason: "The score is 1.0 because all instructions were followed exactly:
//           bullet points were used, exactly three examples were provided, and
//           each point ends with a semicolon."
//   }
// }

const result2 = await metric.measure(
  "List three fruits",
  "1. Apple
2. Banana
3. Orange and Grape"
);

// Example output:
// {
//   score: 0.33,
//   info: {
//     reason: "The score is 0.33 because: numbered lists were used instead of bullet points,
//           no semicolons were used, and four fruits were listed instead of exactly three."
//   }
// }

Basic UsageDirect link to Basic Usage

Constructor ParametersDirect link to Constructor Parameters

model:

options:

PromptAlignmentOptionsDirect link to PromptAlignmentOptions

instructions:

scale?:

measure() ParametersDirect link to measure() Parameters

input:

output:

ReturnsDirect link to Returns

score:

info:

reason:

Scoring DetailsDirect link to Scoring Details

Instruction VerdictsDirect link to Instruction Verdicts

Scoring ProcessDirect link to Scoring Process

Important ConsiderationsDirect link to Important Considerations

Score interpretationDirect link to Score interpretation

Example with AnalysisDirect link to Example with Analysis

RelatedDirect link to Related