xAI Realtime voice

The XAIRealtimeVoice class provides realtime voice interaction capabilities using the xAI Grok Voice Agent API. It implements Mastra's MastraVoice realtime contract and supports bidirectional audio streaming, text turns, server VAD, xAI voices, function tools, and xAI server-side tools.

Usage example
Direct link to Usage example

import { Agent } from '@mastra/core/agent'
import { getMicrophoneStream, playAudio } from '@mastra/node-audio'
import { XAIRealtimeVoice } from '@mastra/voice-xai-realtime'

const voice = new XAIRealtimeVoice({
  apiKey: process.env.XAI_API_KEY,
  model: 'grok-voice-think-fast-1.0',
  speaker: 'eve',
  instructions: 'You are a concise voice assistant.',
  turnDetection: { type: 'server_vad' },
})

const agent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a helpful voice assistant.',
  model: 'xai/grok-4.3',
  voice,
})

await agent.voice.connect()

agent.voice.on('speaker', audioStream => {
  playAudio(audioStream)
})

agent.voice.on('writing', ({ text, role }) => {
  console.log(`${role}: ${text}`)
})

await agent.voice.speak('How can I help you today?')

const microphoneStream = getMicrophoneStream()
await agent.voice.send(microphoneStream)

agent.voice.close()

Configuration
Direct link to Configuration

Constructor options
Direct link to Constructor options

apiKey?:

string

xAI API key. Falls back to the XAI_API_KEY environment variable.

ephemeralToken?:

string

Short-lived xAI token sent with the WebSocket protocol instead of an authorization header.

model?:

XAIRealtimeModel

= 'grok-voice-think-fast-1.0'

The Grok voice model to use.

speaker?:

XAIVoice

= 'eve'

Voice ID to use for speech output. Built-in values are eve, ara, rex, sal, and leo. Custom xAI voice IDs are also supported.

instructions?:

string

System instructions sent in session.update.

turnDetection?:

XAITurnDetection

= { type: 'server_vad' }

Voice activity detection configuration.

audio?:

XAIAudioConfig

= 24 kHz audio/pcm input and output

Input and output audio format configuration.

serverTools?:

XAIServerTool[]

xAI server-side tools to send in session.update. Supports file_search, web_search, x_search, and mcp. These are merged with session.tools.

session?:

Partial<XAISessionConfig>

Additional xAI session fields to merge into the initial session.update event.

url?:

string

= 'wss://api.x.ai/v1/realtime'

Override the xAI realtime WebSocket URL.

debug?:

boolean

= false

Enable debug logging for received xAI events. Debug logs can include transcripts and tool-call arguments.

VoiceConfig pattern
Direct link to VoiceConfig pattern

You can also use Mastra's shared voice configuration shape:

const voice = new XAIRealtimeVoice({
  speaker: 'ara',
  realtimeConfig: {
    model: 'grok-voice-think-fast-1.0',
    apiKey: process.env.XAI_API_KEY,
    options: {
      instructions: 'Answer briefly.',
      turnDetection: { type: 'server_vad', threshold: 0.85 },
    },
  },
})

Authentication
Direct link to Authentication

Use apiKey or XAI_API_KEY for server-side applications. This provider is built for Node.js server-side runtimes. If you already mint xAI ephemeral tokens on your server, you can pass one as ephemeralToken; the provider uses the xai-client-secret.<token> WebSocket protocol instead of an authorization header. If both apiKey and ephemeralToken are configured, the provider uses the ephemeral token.

Methods
Direct link to Methods

`connect()`
Direct link to connect

Establishes the WebSocket connection and sends the initial session.update.

requestContext?:

RequestContext

Optional Mastra request context passed to function tool executions.

Returns: Promise<void>

`close()`
Direct link to close

Closes the WebSocket connection, ends active speaker streams, and clears queued events, pending function-call state, and request context. disconnect() is an alias for close().

Returns: void

`addInstructions()`
Direct link to addinstructions

Sets session instructions. If the WebSocket is open, the provider sends a session.update; passing undefined stores an empty string and clears the active instructions on the current session or the next connection.

instructions?:

string

System instructions to send to xAI.

Returns: void

`addTools()`
Direct link to addtools

Registers Mastra function tools and, when connected, refreshes the session tools with session.update.

tools?:

ToolsInput

Mastra tools to expose as xAI function tools.

Returns: void

`updateConfig()`
Direct link to updateconfig

Sends a session.update event with additional xAI session fields.

sessionConfig:

Partial<XAISessionConfig>

Session fields to update.

Returns: void

`speak()`
Direct link to speak

Sends a text turn using conversation.item.create and then requests a response.

input:

string | NodeJS.ReadableStream

Text or a readable text stream to send as user input.

options.speaker?:

XAIVoice

Voice override. This updates the active xAI session voice and is used for subsequent turns.

options.response?:

Record<string, unknown>

Additional xAI response.create fields.

Returns: Promise<void>

`send()`
Direct link to send

Streams realtime audio chunks with input_audio_buffer.append.

send() requires an open connection. Use it for live microphone audio after connect() resolves. Readable stream chunks must be binary audio chunks (Buffer, ArrayBuffer, or a typed array).

audioData:

NodeJS.ReadableStream | Int16Array

PCM audio stream or Int16Array audio data.

eventId?:

string

Optional xAI event ID.

Returns: Promise<void>

`listen()`
Direct link to listen

Sends a finite audio stream with input_audio_buffer.append. By default it commits the input buffer and requests a response.

audioData:

NodeJS.ReadableStream

Audio stream to send.

options.commit?:

boolean

= true

Whether to send input_audio_buffer.commit after the audio item.

options.createResponse?:

boolean

= true

Whether to send response.create after the audio item.

Returns: Promise<void>

`answer()`
Direct link to answer

Sends response.create to ask xAI to continue the conversation.

Returns: Promise<void>

`commitAudioBuffer()` and `clearAudioBuffer()`
Direct link to commitaudiobuffer-and-clearaudiobuffer

Send the matching xAI realtime client events for manual turn control.

Returns: Promise<void>

`cancelResponse()`
Direct link to cancelresponse

Sends response.cancel to interrupt an in-flight response.

responseId?:

string

Optional xAI response ID to cancel.

eventId?:

string

Optional xAI event ID.

Returns: Promise<void>

Events
Direct link to Events

XAIRealtimeVoice maps xAI realtime server events onto Mastra voice events:

speaker: emits a readable stream for assistant audio.
speaking: emits assistant audio deltas.
speaking.done: emits when an assistant audio response completes.
writing: emits assistant text deltas and user input transcriptions.
error: emits xAI errors, provider execution errors, tool execution errors, and malformed function-call arguments. Tool errors include details.call_id and details.name.
close: emits when the WebSocket closes.
tool-call-start: emits before a Mastra function tool is executed.
tool-call-result: emits after a Mastra function tool returns.

Raw xAI event names are also emitted, so you can subscribe to events such as response.output_audio.delta, response.text.delta, response.function_call_arguments.done, and response.done.

Tools
Direct link to Tools

Mastra function tools
Direct link to Mastra function tools

Tools added with addTools() are converted into xAI function tools and included in session.update.

import { createTool } from '@mastra/core/tools'
import { z } from 'zod'

const weatherTool = createTool({
  id: 'getWeather',
  description: 'Get current weather for a location.',
  inputSchema: z.object({
    location: z.string(),
  }),
  execute: async ({ location }) => {
    return { location, temperature: 22 }
  },
})

voice.addTools({ getWeather: weatherTool })

When xAI emits response.function_call_arguments.done, the provider executes the matching Mastra tool and sends a function_call_output item. If xAI emits multiple function calls for one response, the provider waits for every tool result and the response's response.done event before sending one continuation response.create.

xAI server-side tools
Direct link to xAI server-side tools

xAI server-side tools are passed through in the session configuration and executed by xAI. Tools passed in session.tools and serverTools are merged:

const voice = new XAIRealtimeVoice({
  apiKey: process.env.XAI_API_KEY,
  serverTools: [
    { type: 'web_search' },
    { type: 'x_search', allowed_x_handles: ['xai'] },
    { type: 'file_search', vector_store_ids: ['collection_123'], max_num_results: 10 },
    {
      type: 'mcp',
      server_url: 'https://mcp.example.com/mcp',
      server_label: 'business-tools',
      allowed_tools: ['lookup_order'],
    },
  ],
})

Audio formats
Direct link to Audio formats

The default input and output format is 24 kHz PCM16. You can also configure supported PCM sample rates or telephony codecs:

const voice = new XAIRealtimeVoice({
  audio: {
    input: { format: { type: 'audio/pcm', rate: 16000 } },
    output: { format: { type: 'audio/pcm', rate: 16000 } },
  },
})

Supported format types are audio/pcm, audio/pcmu, and audio/pcma. PCM supports the documented sample rates from 8 kHz through 48 kHz. audio/pcmu and audio/pcma are G.711 telephony codecs and use 8 kHz.

Usage exampleDirect link to Usage example

ConfigurationDirect link to Configuration

Constructor optionsDirect link to Constructor options

apiKey?:

ephemeralToken?:

model?:

speaker?:

instructions?:

turnDetection?:

audio?:

serverTools?:

session?:

url?:

debug?:

VoiceConfig patternDirect link to VoiceConfig pattern

AuthenticationDirect link to Authentication

MethodsDirect link to Methods

connect()Direct link to connect

requestContext?:

close()Direct link to close

addInstructions()Direct link to addinstructions

instructions?:

addTools()Direct link to addtools

tools?:

updateConfig()Direct link to updateconfig

sessionConfig:

speak()Direct link to speak

input:

options.speaker?:

options.response?:

send()Direct link to send

audioData:

eventId?:

listen()Direct link to listen

audioData:

options.commit?:

options.createResponse?:

answer()Direct link to answer

commitAudioBuffer() and clearAudioBuffer()Direct link to commitaudiobuffer-and-clearaudiobuffer

cancelResponse()Direct link to cancelresponse

responseId?:

eventId?:

EventsDirect link to Events

ToolsDirect link to Tools

Mastra function toolsDirect link to Mastra function tools

xAI server-side toolsDirect link to xAI server-side tools

Audio formatsDirect link to Audio formats

Usage example
Direct link to Usage example

Configuration
Direct link to Configuration

Constructor options
Direct link to Constructor options

VoiceConfig pattern
Direct link to VoiceConfig pattern

Authentication
Direct link to Authentication

Methods
Direct link to Methods

`connect()`
Direct link to connect

`close()`
Direct link to close

`addInstructions()`
Direct link to addinstructions

`addTools()`
Direct link to addtools

`updateConfig()`
Direct link to updateconfig

`speak()`
Direct link to speak

`send()`
Direct link to send

`listen()`
Direct link to listen

`answer()`
Direct link to answer

`commitAudioBuffer()` and `clearAudioBuffer()`
Direct link to commitaudiobuffer-and-clearaudiobuffer

`cancelResponse()`
Direct link to cancelresponse

Events
Direct link to Events

Tools
Direct link to Tools

Mastra function tools
Direct link to Mastra function tools

xAI server-side tools
Direct link to xAI server-side tools

Audio formats
Direct link to Audio formats