Scorer utils

Mastra provides utility functions to help extract and process data from scorer run inputs and outputs. These utilities are particularly useful in the preprocess step of custom scorers.

Import
Direct link to Import

import {
  getAssistantMessageFromRunOutput,
  getReasoningFromRunOutput,
  getUserMessageFromRunInput,
  getSystemMessagesFromRunInput,
  getCombinedSystemPrompt,
  extractToolCalls,
  extractInputMessages,
  extractAgentResponseMessages,
  compareTrajectories,
  createTrajectoryTestRun,
} from '@mastra/evals/scorers/utils'

Trajectory extraction functions are available from @mastra/core/evals:

import {
  extractTrajectory,
  extractWorkflowTrajectory,
  extractTrajectoryFromTrace,
} from '@mastra/core/evals'

Message extraction
Direct link to Message extraction

`getAssistantMessageFromRunOutput`
Direct link to getassistantmessagefromrunoutput

Extracts the text content from the first assistant message in the run output.

const scorer = createScorer({
  id: 'my-scorer',
  description: 'My scorer',
  type: 'agent',
})
  .preprocess(({ run }) => {
    const response = getAssistantMessageFromRunOutput(run.output)
    return { response }
  })
  .generateScore(({ results }) => {
    return results.preprocessStepResult?.response ? 1 : 0
  })

output?:

ScorerRunOutputForAgent

The scorer run output (array of MastraDBMessage)

Returns: string | undefined - The assistant message text, or undefined if no assistant message is found.

`getUserMessageFromRunInput`
Direct link to getusermessagefromruninput

Extracts the text content from the first user message in the run input.

.preprocess(({ run }) => {
  const userMessage = getUserMessageFromRunInput(run.input);
  return { userMessage };
})

input?:

ScorerRunInputForAgent

The scorer run input containing input messages

Returns: string | undefined - The user message text, or undefined if no user message is found.

`extractInputMessages`
Direct link to extractinputmessages

Extracts text content from all input messages as an array.

.preprocess(({ run }) => {
  const allUserMessages = extractInputMessages(run.input);
  return { conversationHistory: allUserMessages.join("\n") };
})

Returns: string[] - Array of text strings from each input message.

`extractAgentResponseMessages`
Direct link to extractagentresponsemessages

Extracts text content from all assistant response messages as an array.

.preprocess(({ run }) => {
  const allResponses = extractAgentResponseMessages(run.output);
  return { allResponses };
})

Returns: string[] - Array of text strings from each assistant message.

Reasoning extraction
Direct link to Reasoning extraction

`getReasoningFromRunOutput`
Direct link to getreasoningfromrunoutput

Extracts reasoning text from the run output. This is particularly useful when evaluating responses from reasoning models like deepseek-reasoner that produce chain-of-thought reasoning.

Reasoning can be stored in two places:

content.reasoning - a string field on the message content
content.parts - as parts with type: 'reasoning' containing details

import {
  getReasoningFromRunOutput,
  getAssistantMessageFromRunOutput,
} from '@mastra/evals/scorers/utils'

const reasoningQualityScorer = createScorer({
  id: 'reasoning-quality',
  name: 'Reasoning Quality',
  description: 'Evaluates the quality of model reasoning',
  type: 'agent',
})
  .preprocess(({ run }) => {
    const reasoning = getReasoningFromRunOutput(run.output)
    const response = getAssistantMessageFromRunOutput(run.output)
    return { reasoning, response }
  })
  .analyze(({ results }) => {
    const { reasoning } = results.preprocessStepResult || {}
    return {
      hasReasoning: !!reasoning,
      reasoningLength: reasoning?.length || 0,
      hasStepByStep: reasoning?.includes('step') || false,
    }
  })
  .generateScore(({ results }) => {
    const { hasReasoning, reasoningLength } = results.analyzeStepResult || {}
    if (!hasReasoning) return 0
    // Score based on reasoning length (normalized to 0-1)
    return Math.min(reasoningLength / 500, 1)
  })
  .generateReason(({ results, score }) => {
    const { hasReasoning, reasoningLength } = results.analyzeStepResult || {}
    if (!hasReasoning) {
      return 'No reasoning was provided by the model.'
    }
    return `Model provided ${reasoningLength} characters of reasoning. Score: ${score}`
  })

output?:

ScorerRunOutputForAgent

The scorer run output (array of MastraDBMessage)

Returns: string | undefined - The reasoning text, or undefined if no reasoning is present.

System message extraction
Direct link to System message extraction

`getSystemMessagesFromRunInput`
Direct link to getsystemmessagesfromruninput

Extracts all system messages from the run input, including both standard system messages and tagged system messages (specialized prompts like memory instructions).

.preprocess(({ run }) => {
  const systemMessages = getSystemMessagesFromRunInput(run.input);
  return { 
    systemPromptCount: systemMessages.length,
    systemPrompts: systemMessages 
  };
})

Returns: string[] - Array of system message strings.

`getCombinedSystemPrompt`
Direct link to getcombinedsystemprompt

Combines all system messages into a single prompt string, joined with double newlines.

.preprocess(({ run }) => {
  const fullSystemPrompt = getCombinedSystemPrompt(run.input);
  return { fullSystemPrompt };
})

Returns: string - Combined system prompt string.

Tool call extraction
Direct link to Tool call extraction

`extractToolCalls`
Direct link to extracttoolcalls

Extracts information about all tool calls from the run output, including tool names, call IDs, and their positions in the message array.

const toolUsageScorer = createScorer({
  id: 'tool-usage',
  description: 'Evaluates tool usage patterns',
  type: 'agent',
})
  .preprocess(({ run }) => {
    const { tools, toolCallInfos } = extractToolCalls(run.output)
    return {
      toolsUsed: tools,
      toolCount: tools.length,
      toolDetails: toolCallInfos,
    }
  })
  .generateScore(({ results }) => {
    const { toolCount } = results.preprocessStepResult || {}
    // Score based on appropriate tool usage
    return toolCount > 0 ? 1 : 0
  })

Returns:

{
  tools: string[];           // Array of tool names
  toolCallInfos: ToolCallInfo[];  // Detailed tool call information
}

Where ToolCallInfo is:

type ToolCallInfo = {
  toolName: string // Name of the tool
  toolCallId: string // Unique call identifier
  messageIndex: number // Index in the output array
  invocationIndex: number // Index within message's tool invocations
}

Test utilities
Direct link to Test utilities

These utilities help create test data for scorer development.

`createTestMessage`
Direct link to createtestmessage

Creates a MastraDBMessage object for testing purposes.

import { createTestMessage } from '@mastra/evals/scorers/utils'

const userMessage = createTestMessage({
  content: 'What is the weather?',
  role: 'user',
})

const assistantMessage = createTestMessage({
  content: 'The weather is sunny.',
  role: 'assistant',
  toolInvocations: [
    {
      toolCallId: 'call-1',
      toolName: 'weatherTool',
      args: { location: 'London' },
      result: { temp: 20 },
      state: 'result',
    },
  ],
})

`createAgentTestRun`
Direct link to createagenttestrun

Creates a complete test run object for testing scorers.

import { createAgentTestRun, createTestMessage } from '@mastra/evals/scorers/utils'

const testRun = createAgentTestRun({
  inputMessages: [createTestMessage({ content: 'Hello', role: 'user' })],
  output: [createTestMessage({ content: 'Hi there!', role: 'assistant' })],
})

// Run your scorer with the test data
const result = await myScorer.run({
  input: testRun.input,
  output: testRun.output,
})

Trajectory utilities
Direct link to Trajectory utilities

`extractTrajectory`
Direct link to extracttrajectory

Extracts a Trajectory from agent output messages (MastraDBMessage[]). Converts tool invocations into ToolCallStep objects. The runEvals pipeline calls this automatically for trajectory scorers — you only need it for direct testing.

Available from @mastra/core/evals.

import { extractTrajectory } from '@mastra/core/evals'

const trajectory = extractTrajectory(agentOutputMessages)
// trajectory.steps — ToolCallStep[] extracted from toolInvocations
// trajectory.rawOutput — the original MastraDBMessage[] array

Returns: Trajectory — Contains steps: TrajectoryStep[], totalDurationMs, and rawOutput.

`extractWorkflowTrajectory`
Direct link to extractworkflowtrajectory

Extracts a Trajectory from workflow step results. Converts StepResult records into WorkflowStepStep objects, respecting the execution path ordering.

Available from @mastra/core/evals.

import { extractWorkflowTrajectory } from '@mastra/core/evals'

const trajectory = extractWorkflowTrajectory(
  workflowResult.steps, // Record<string, StepResult>
  workflowResult.stepExecutionPath, // string[] (optional)
)
// trajectory.steps — WorkflowStepStep[] in execution order

Returns: Trajectory — Contains steps: TrajectoryStep[], totalDurationMs, and rawWorkflowResult.

`extractTrajectoryFromTrace`
Direct link to extracttrajectoryfromtrace

Builds a hierarchical Trajectory from observability trace spans (SpanRecord[]). Reconstructs the parent-child span tree and maps each span to the appropriate TrajectoryStep discriminated union type with nested children.

This is the preferred extraction method when storage is available. The runEvals pipeline calls this automatically when the target's Mastra instance has a configured storage backend. It produces richer trajectories than extractTrajectory or extractWorkflowTrajectory because it captures the full execution tree, including nested agent runs, tool calls, and model generations.

Available from @mastra/core/evals.

import { extractTrajectoryFromTrace } from '@mastra/core/evals'

// After fetching a trace from the observability store
const traceData = await observabilityStore.getTrace({ traceId })
const trajectory = extractTrajectoryFromTrace(traceData.spans, rootSpanId)
// trajectory.steps — hierarchical TrajectoryStep[] with children

Parameters:

spans (SpanRecord[]): Array of span records from a trace query.
rootSpanId (string, optional): Span ID to use as the starting point. When omitted, uses spans with no parent.

Returns: Trajectory: Contains steps: TrajectoryStep[] with recursive children and totalDurationMs.

Span type mapping
Direct link to Span type mapping

Span type	Trajectory step type	Key fields extracted
`TOOL_CALL`	`tool_call`	`toolArgs`, `toolResult`, `success`
`MCP_TOOL_CALL`	`mcp_tool_call`	`toolArgs`, `toolResult`, `mcpServer`, `success`
`MODEL_GENERATION`	`model_generation`	`modelId`, `promptTokens`, `completionTokens`, `finishReason`
`AGENT_RUN`	`agent_run`	`agentId` (from entity ID)
`WORKFLOW_RUN`	`workflow_run`	`workflowId` (from entity ID)
`WORKFLOW_STEP`	`workflow_step`	`output`
`WORKFLOW_CONDITIONAL`	`workflow_conditional`	`conditionCount`, `selectedSteps`
`WORKFLOW_PARALLEL`	`workflow_parallel`	`branchCount`, `parallelSteps`
`WORKFLOW_LOOP`	`workflow_loop`	`loopType`, `totalIterations`
`WORKFLOW_SLEEP`	`workflow_sleep`	`sleepDurationMs`, `sleepType`
`WORKFLOW_WAIT_EVENT`	`workflow_wait_event`	`eventName`, `eventReceived`
`PROCESSOR_RUN`	`processor_run`	`processorId`

Spans with types GENERIC, MODEL_STEP, MODEL_CHUNK, and WORKFLOW_CONDITIONAL_EVAL are skipped as noise.

`compareTrajectories`
Direct link to comparetrajectories

Compares an actual trajectory against an expected trajectory and returns a detailed comparison result. Used internally by createTrajectoryAccuracyScorerCode.

The expected parameter accepts either a Trajectory (actual trajectory) or { steps: ExpectedStep[] }. When using ExpectedStep[], you can match by name only, name + stepType, or include data for comparison. See Expected steps for details.

import { compareTrajectories } from '@mastra/evals/scorers/utils'

// Using ExpectedStep[] (recommended for expectations)
// Data fields (e.g. toolArgs) are auto-compared when present on expected steps
const result = compareTrajectories(
  actualTrajectory,
  { steps: [{ name: 'search' }, { name: 'summarize', stepType: 'tool_call' }] },
  { allowRepeatedSteps: true },
)
// result.score — 0.0 to 1.0
// result.missingSteps — step names not found
// result.extraSteps — unexpected step names
// result.outOfOrderSteps — steps found but in wrong order

Returns: TrajectoryComparisonResult

`createTrajectoryTestRun`
Direct link to createtrajectorytestrun

Creates a test run object for trajectory scorers. Wraps a Trajectory into the expected ScorerRun format.

import { createTrajectoryTestRun } from '@mastra/evals/scorers/utils'

const run = createTrajectoryTestRun({
  steps: [
    { stepType: 'tool_call', name: 'search', toolArgs: { q: 'test' } },
    { stepType: 'tool_call', name: 'summarize' },
  ],
})

const result = await trajectoryScorer.run(run)

`checkTrajectoryEfficiency`
Direct link to checktrajectoryefficiency

Evaluates trajectory efficiency against step, token, and duration budgets. Also detects redundant calls (same tool with same arguments).

import { checkTrajectoryEfficiency } from '@mastra/evals/scorers/utils'

const result = checkTrajectoryEfficiency(trajectory, {
  maxSteps: 5,
  maxTotalTokens: 2000,
  maxTotalDurationMs: 5000,
  noRedundantCalls: true,
})
// result.score — 1.0 if within all budgets, lower with penalties
// result.redundantCalls — duplicate tool+args combos
// result.overStepBudget — true if maxSteps exceeded
// result.overTokenBudget — true if maxTotalTokens exceeded
// result.overDurationBudget — true if maxTotalDurationMs exceeded

Returns: TrajectoryEfficiencyResult

`checkTrajectoryBlacklist`
Direct link to checktrajectoryblacklist

Checks whether a trajectory contains forbidden tools or tool call sequences.

import { checkTrajectoryBlacklist } from '@mastra/evals/scorers/utils'

const result = checkTrajectoryBlacklist(trajectory, {
  blacklistedTools: ['deleteAll', 'admin-override'],
  blacklistedSequences: [['escalate', 'admin-override']],
})
// result.score — 1.0 if no violations, 0.0 if any found
// result.violatedTools — blacklisted tools that were called
// result.violatedSequences — blacklisted sequences that were detected

Returns: TrajectoryBlacklistResult

`analyzeToolFailures`
Direct link to analyzetoolfailures

Detects tool failure patterns including retries, fallbacks, and argument corrections.

import { analyzeToolFailures } from '@mastra/evals/scorers/utils'

const result = analyzeToolFailures(trajectory, {
  maxRetriesPerTool: 2,
})
// result.score — 1.0 if no failure patterns, lower if patterns detected
// result.patterns — detected patterns (retry, fallback, arg_correction)

Returns: ToolFailureAnalysisResult

Complete example
Direct link to Complete example

Here's a complete example showing how to use multiple utilities together:

import { createScorer } from '@mastra/core/evals'
import {
  getAssistantMessageFromRunOutput,
  getReasoningFromRunOutput,
  getUserMessageFromRunInput,
  getCombinedSystemPrompt,
  extractToolCalls,
} from '@mastra/evals/scorers/utils'

const comprehensiveScorer = createScorer({
  id: 'comprehensive-analysis',
  name: 'Comprehensive Analysis',
  description: 'Analyzes all aspects of an agent response',
  type: 'agent',
})
  .preprocess(({ run }) => {
    // Extract all relevant data
    const userMessage = getUserMessageFromRunInput(run.input)
    const response = getAssistantMessageFromRunOutput(run.output)
    const reasoning = getReasoningFromRunOutput(run.output)
    const systemPrompt = getCombinedSystemPrompt(run.input)
    const { tools, toolCallInfos } = extractToolCalls(run.output)

    return {
      userMessage,
      response,
      reasoning,
      systemPrompt,
      toolsUsed: tools,
      toolCount: tools.length,
    }
  })
  .generateScore(({ results }) => {
    const { response, reasoning, toolCount } = results.preprocessStepResult || {}

    let score = 0
    if (response && response.length > 0) score += 0.4
    if (reasoning) score += 0.3
    if (toolCount > 0) score += 0.3

    return score
  })
  .generateReason(({ results, score }) => {
    const { response, reasoning, toolCount } = results.preprocessStepResult || {}

    const parts = []
    if (response) parts.push('provided a response')
    if (reasoning) parts.push('included reasoning')
    if (toolCount > 0) parts.push(`used ${toolCount} tool(s)`)

    return `Score: ${score}. The agent ${parts.join(', ')}.`
  })

ImportDirect link to Import

Message extractionDirect link to Message extraction

getAssistantMessageFromRunOutputDirect link to getassistantmessagefromrunoutput

output?:

getUserMessageFromRunInputDirect link to getusermessagefromruninput

input?:

extractInputMessagesDirect link to extractinputmessages

extractAgentResponseMessagesDirect link to extractagentresponsemessages

Reasoning extractionDirect link to Reasoning extraction

getReasoningFromRunOutputDirect link to getreasoningfromrunoutput

output?:

System message extractionDirect link to System message extraction

getSystemMessagesFromRunInputDirect link to getsystemmessagesfromruninput

getCombinedSystemPromptDirect link to getcombinedsystemprompt

Tool call extractionDirect link to Tool call extraction

extractToolCallsDirect link to extracttoolcalls

Test utilitiesDirect link to Test utilities

createTestMessageDirect link to createtestmessage

createAgentTestRunDirect link to createagenttestrun

Trajectory utilitiesDirect link to Trajectory utilities

extractTrajectoryDirect link to extracttrajectory

extractWorkflowTrajectoryDirect link to extractworkflowtrajectory

extractTrajectoryFromTraceDirect link to extracttrajectoryfromtrace

Span type mappingDirect link to Span type mapping

compareTrajectoriesDirect link to comparetrajectories

createTrajectoryTestRunDirect link to createtrajectorytestrun

checkTrajectoryEfficiencyDirect link to checktrajectoryefficiency

checkTrajectoryBlacklistDirect link to checktrajectoryblacklist

analyzeToolFailuresDirect link to analyzetoolfailures

Complete exampleDirect link to Complete example

Import
Direct link to Import

Message extraction
Direct link to Message extraction

`getAssistantMessageFromRunOutput`
Direct link to getassistantmessagefromrunoutput

`getUserMessageFromRunInput`
Direct link to getusermessagefromruninput

`extractInputMessages`
Direct link to extractinputmessages

`extractAgentResponseMessages`
Direct link to extractagentresponsemessages

Reasoning extraction
Direct link to Reasoning extraction

`getReasoningFromRunOutput`
Direct link to getreasoningfromrunoutput

System message extraction
Direct link to System message extraction

`getSystemMessagesFromRunInput`
Direct link to getsystemmessagesfromruninput

`getCombinedSystemPrompt`
Direct link to getcombinedsystemprompt

Tool call extraction
Direct link to Tool call extraction

`extractToolCalls`
Direct link to extracttoolcalls

Test utilities
Direct link to Test utilities

`createTestMessage`
Direct link to createtestmessage

`createAgentTestRun`
Direct link to createagenttestrun

Trajectory utilities
Direct link to Trajectory utilities

`extractTrajectory`
Direct link to extracttrajectory

`extractWorkflowTrajectory`
Direct link to extractworkflowtrajectory

`extractTrajectoryFromTrace`
Direct link to extracttrajectoryfromtrace

Span type mapping
Direct link to Span type mapping

`compareTrajectories`
Direct link to comparetrajectories

`createTrajectoryTestRun`
Direct link to createtrajectorytestrun

`checkTrajectoryEfficiency`
Direct link to checktrajectoryefficiency

`checkTrajectoryBlacklist`
Direct link to checktrajectoryblacklist

`analyzeToolFailures`
Direct link to analyzetoolfailures

Complete example
Direct link to Complete example