PromptAlignmentMetric
The PromptAlignmentMetric
class evaluates how strictly an LLM’s output follows a set of given prompt instructions. It uses a judge-based system to verify each instruction is followed exactly and provides detailed reasoning for any deviations.
Basic Usage
import { PromptAlignmentMetric } from "@mastra/evals/llm";
// Configure the model for evaluation
const model = {
provider: "OPEN_AI",
name: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
};
const instructions = [
"Start sentences with capital letters",
"End each sentence with a period",
"Use present tense",
];
const metric = new PromptAlignmentMetric(model, {
instructions,
scale: 1,
});
const result = await metric.measure(
"describe the weather",
"The sun is shining. Clouds float in the sky. A gentle breeze blows.",
);
console.log(result.score); // Alignment score from 0-1
console.log(result.info.reason); // Explanation of the score
Constructor Parameters
model:
ModelConfig
Configuration for the model used to evaluate instruction alignment
options:
PromptAlignmentOptions
Configuration options for the metric
PromptAlignmentOptions
instructions:
string[]
Array of instructions that the output should follow
scale?:
number
= 1
Maximum score value
measure() Parameters
input:
string
The original prompt or query
output:
string
The LLM's response to evaluate
Returns
score:
number
Alignment score (0 to scale, default 0-1)
info:
object
Object containing detailed metrics about instruction compliance
string
reason:
string
Detailed explanation of the score and instruction compliance
Scoring Details
The metric evaluates instruction alignment through:
- Individual assessment of each instruction’s compliance
- Strict binary verdicts (yes/no) for each instruction
- Detailed reasoning for any non-compliance
- Equal weighting of all instructions
The scoring process:
- Evaluates each instruction independently
- Assigns binary scores (1 for compliant, 0 for non-compliant)
- Calculates percentage of followed instructions
- Scales to configured range (default 0-1)
Score interpretation:
- 1.0: All instructions followed perfectly
- 0.7-0.9: Most instructions followed with minor deviations
- 0.4-0.6: Mixed compliance with instructions
- 0.1-0.3: Limited compliance with instructions
- 0: No instructions followed correctly
Example with Analysis
const metric = new PromptAlignmentMetric(model, {
instructions: [
"Use bullet points for each item",
"Include exactly three examples",
"End each point with a semicolon"
],
scale: 1
});
const result = await metric.measure(
"List three fruits",
"• Apple is red and sweet;
• Banana is yellow and curved;
• Orange is citrus and round."
);
// Example output:
// {
// score: 1.0,
// info: {
// reason: "The score is 1.0 because all instructions were followed exactly:
// bullet points were used, exactly three examples were provided, and
// each point ends with a semicolon."
// }
// }
const result2 = await metric.measure(
"List three fruits",
"1. Apple
2. Banana
3. Orange and Grape"
);
// Example output:
// {
// score: 0.33,
// info: {
// reason: "The score is 0.33 because: numbered lists were used instead of bullet points,
// no semicolons were used, and four fruits were listed instead of exactly three."
// }
// }