Scorer Utils
Mastra provides utility functions to help extract and process data from scorer run inputs and outputs. These utilities are particularly useful in the preprocess step of custom scorers.
ImportDirect link to Import
import {
getAssistantMessageFromRunOutput,
getReasoningFromRunOutput,
getUserMessageFromRunInput,
getSystemMessagesFromRunInput,
getCombinedSystemPrompt,
extractToolCalls,
extractInputMessages,
extractAgentResponseMessages,
} from "@mastra/evals/scorers/utils";
Message ExtractionDirect link to Message Extraction
getAssistantMessageFromRunOutputDirect link to getAssistantMessageFromRunOutput
Extracts the text content from the first assistant message in the run output.
const scorer = createScorer({
id: "my-scorer",
description: "My scorer",
type: "agent",
})
.preprocess(({ run }) => {
const response = getAssistantMessageFromRunOutput(run.output);
return { response };
})
.generateScore(({ results }) => {
return results.preprocessStepResult?.response ? 1 : 0;
});
output?:
Returns: string | undefined - The assistant message text, or undefined if no assistant message is found.
getUserMessageFromRunInputDirect link to getUserMessageFromRunInput
Extracts the text content from the first user message in the run input.
.preprocess(({ run }) => {
const userMessage = getUserMessageFromRunInput(run.input);
return { userMessage };
})
input?:
Returns: string | undefined - The user message text, or undefined if no user message is found.
extractInputMessagesDirect link to extractInputMessages
Extracts text content from all input messages as an array.
.preprocess(({ run }) => {
const allUserMessages = extractInputMessages(run.input);
return { conversationHistory: allUserMessages.join("\n") };
})
Returns: string[] - Array of text strings from each input message.
extractAgentResponseMessagesDirect link to extractAgentResponseMessages
Extracts text content from all assistant response messages as an array.
.preprocess(({ run }) => {
const allResponses = extractAgentResponseMessages(run.output);
return { allResponses };
})
Returns: string[] - Array of text strings from each assistant message.
Reasoning ExtractionDirect link to Reasoning Extraction
getReasoningFromRunOutputDirect link to getReasoningFromRunOutput
Extracts reasoning text from the run output. This is particularly useful when evaluating responses from reasoning models like deepseek-reasoner that produce chain-of-thought reasoning.
Reasoning can be stored in two places:
content.reasoning- a string field on the message contentcontent.parts- as parts withtype: 'reasoning'containingdetails
import {
getReasoningFromRunOutput,
getAssistantMessageFromRunOutput
} from "@mastra/evals/scorers/utils";
const reasoningQualityScorer = createScorer({
id: "reasoning-quality",
name: "Reasoning Quality",
description: "Evaluates the quality of model reasoning",
type: "agent",
})
.preprocess(({ run }) => {
const reasoning = getReasoningFromRunOutput(run.output);
const response = getAssistantMessageFromRunOutput(run.output);
return { reasoning, response };
})
.analyze(({ results }) => {
const { reasoning } = results.preprocessStepResult || {};
return {
hasReasoning: !!reasoning,
reasoningLength: reasoning?.length || 0,
hasStepByStep: reasoning?.includes("step") || false,
};
})
.generateScore(({ results }) => {
const { hasReasoning, reasoningLength } = results.analyzeStepResult || {};
if (!hasReasoning) return 0;
// Score based on reasoning length (normalized to 0-1)
return Math.min(reasoningLength / 500, 1);
})
.generateReason(({ results, score }) => {
const { hasReasoning, reasoningLength } = results.analyzeStepResult || {};
if (!hasReasoning) {
return "No reasoning was provided by the model.";
}
return `Model provided ${reasoningLength} characters of reasoning. Score: ${score}`;
});
output?:
Returns: string | undefined - The reasoning text, or undefined if no reasoning is present.
System Message ExtractionDirect link to System Message Extraction
getSystemMessagesFromRunInputDirect link to getSystemMessagesFromRunInput
Extracts all system messages from the run input, including both standard system messages and tagged system messages (specialized prompts like memory instructions).
.preprocess(({ run }) => {
const systemMessages = getSystemMessagesFromRunInput(run.input);
return {
systemPromptCount: systemMessages.length,
systemPrompts: systemMessages
};
})
Returns: string[] - Array of system message strings.
getCombinedSystemPromptDirect link to getCombinedSystemPrompt
Combines all system messages into a single prompt string, joined with double newlines.
.preprocess(({ run }) => {
const fullSystemPrompt = getCombinedSystemPrompt(run.input);
return { fullSystemPrompt };
})
Returns: string - Combined system prompt string.
Tool Call ExtractionDirect link to Tool Call Extraction
extractToolCallsDirect link to extractToolCalls
Extracts information about all tool calls from the run output, including tool names, call IDs, and their positions in the message array.
const toolUsageScorer = createScorer({
id: "tool-usage",
description: "Evaluates tool usage patterns",
type: "agent",
})
.preprocess(({ run }) => {
const { tools, toolCallInfos } = extractToolCalls(run.output);
return {
toolsUsed: tools,
toolCount: tools.length,
toolDetails: toolCallInfos,
};
})
.generateScore(({ results }) => {
const { toolCount } = results.preprocessStepResult || {};
// Score based on appropriate tool usage
return toolCount > 0 ? 1 : 0;
});
Returns:
{
tools: string[]; // Array of tool names
toolCallInfos: ToolCallInfo[]; // Detailed tool call information
}
Where ToolCallInfo is:
type ToolCallInfo = {
toolName: string; // Name of the tool
toolCallId: string; // Unique call identifier
messageIndex: number; // Index in the output array
invocationIndex: number; // Index within message's tool invocations
};
Test UtilitiesDirect link to Test Utilities
These utilities help create test data for scorer development.
createTestMessageDirect link to createTestMessage
Creates a MastraDBMessage object for testing purposes.
import { createTestMessage } from "@mastra/evals/scorers/utils";
const userMessage = createTestMessage({
content: "What is the weather?",
role: "user",
});
const assistantMessage = createTestMessage({
content: "The weather is sunny.",
role: "assistant",
toolInvocations: [
{
toolCallId: "call-1",
toolName: "weatherTool",
args: { location: "London" },
result: { temp: 20 },
state: "result",
},
],
});
createAgentTestRunDirect link to createAgentTestRun
Creates a complete test run object for testing scorers.
import { createAgentTestRun, createTestMessage } from "@mastra/evals/scorers/utils";
const testRun = createAgentTestRun({
inputMessages: [
createTestMessage({ content: "Hello", role: "user" }),
],
output: [
createTestMessage({ content: "Hi there!", role: "assistant" }),
],
});
// Run your scorer with the test data
const result = await myScorer.run({
input: testRun.input,
output: testRun.output,
});
Complete ExampleDirect link to Complete Example
Here's a complete example showing how to use multiple utilities together:
import { createScorer } from "@mastra/core/evals";
import {
getAssistantMessageFromRunOutput,
getReasoningFromRunOutput,
getUserMessageFromRunInput,
getCombinedSystemPrompt,
extractToolCalls,
} from "@mastra/evals/scorers/utils";
const comprehensiveScorer = createScorer({
id: "comprehensive-analysis",
name: "Comprehensive Analysis",
description: "Analyzes all aspects of an agent response",
type: "agent",
})
.preprocess(({ run }) => {
// Extract all relevant data
const userMessage = getUserMessageFromRunInput(run.input);
const response = getAssistantMessageFromRunOutput(run.output);
const reasoning = getReasoningFromRunOutput(run.output);
const systemPrompt = getCombinedSystemPrompt(run.input);
const { tools, toolCallInfos } = extractToolCalls(run.output);
return {
userMessage,
response,
reasoning,
systemPrompt,
toolsUsed: tools,
toolCount: tools.length,
};
})
.generateScore(({ results }) => {
const { response, reasoning, toolCount } = results.preprocessStepResult || {};
let score = 0;
if (response && response.length > 0) score += 0.4;
if (reasoning) score += 0.3;
if (toolCount > 0) score += 0.3;
return score;
})
.generateReason(({ results, score }) => {
const { response, reasoning, toolCount } = results.preprocessStepResult || {};
const parts = [];
if (response) parts.push("provided a response");
if (reasoning) parts.push("included reasoning");
if (toolCount > 0) parts.push(`used ${toolCount} tool(s)`);
return `Score: ${score}. The agent ${parts.join(", ")}.`;
});