OpenAI

Mastraの中のOpenAIVoiceクラスは、OpenAIのモデルを使用してテキスト読み上げと音声認識の機能を提供します。

使用例


import { OpenAIVoice } from "@mastra/voice-openai";
 
// Initialize with default configuration using environment variables
const voice = new OpenAIVoice();
 
// Or initialize with specific configuration
const voiceWithConfig = new OpenAIVoice({
  speechModel: {
    name: "tts-1-hd",
    apiKey: "your-openai-api-key",
  },
  listeningModel: {
    name: "whisper-1",
    apiKey: "your-openai-api-key",
  },
  speaker: "alloy", // Default voice
});
 
// Convert text to speech
const audioStream = await voice.speak("Hello, how can I help you?", {
  speaker: "nova", // Override default voice
  speed: 1.2, // Adjust speech speed
});
 
// Convert speech to text
const text = await voice.listen(audioStream, {
  filetype: "mp3",
});

設定

コンストラクタオプション

speechModel?:

OpenAIConfig

= { name: 'tts-1' }

テキスト読み上げ合成の設定。

listeningModel?:

OpenAIConfig

= { name: 'whisper-1' }

音声からテキストへの認識の設定。

speaker?:

OpenAIVoiceId

= 'alloy'

音声合成のデフォルトボイスID。

OpenAIConfig

name?:

'tts-1' | 'tts-1-hd' | 'whisper-1'

モデル名。より高品質な音声には'tts-1-hd'を使用してください。

apiKey?:

string

OpenAI APIキー。設定されていない場合はOPENAI_API_KEY環境変数を使用します。

メソッド

speak()

OpenAIのテキスト読み上げモデルを使用してテキストを音声に変換します。

input:

string | NodeJS.ReadableStream

音声に変換するテキストまたはテキストストリーム。

options.speaker?:

OpenAIVoiceId

= Constructor's speaker value

音声合成に使用する音声ID。

options.speed?:

number

= 1.0

音声速度の倍率。

戻り値: Promise<NodeJS.ReadableStream>

listen()

OpenAIのWhisperモデルを使用して音声を文字起こしします。

audioStream:

NodeJS.ReadableStream

文字起こしする音声ストリーム。

options.filetype?:

string

= 'mp3'

入力ストリームの音声フォーマット。

戻り値: Promise<string>

getSpeakers()

利用可能な音声オプションの配列を返します。各ノードには以下が含まれます：

voiceId:

string

音声の一意識別子

メモ

APIキーはコンストラクタオプションまたはOPENAI_API_KEY環境変数を通じて提供できます
tts-1-hdモデルはより高品質の音声を提供しますが、処理時間が長くなる場合があります
音声認識はmp3、wav、webmなど複数の音声フォーマットをサポートしています