MDocument

MDocumentクラスは、RAGアプリケーション向けにドキュメントを処理します。主なメソッドは .chunk() と .extractMetadata() です。

コンストラクタ

docs:

Array<{ text: string, metadata?: Record<string, any> }>

テキストコンテンツとオプションのメタデータを含むドキュメントチャンクの配列

type:

'text' | 'html' | 'markdown' | 'json' | 'latex'

ドキュメントコンテンツの種類

静的メソッド

fromText()

プレーンテキストの内容からドキュメントを作成します。


static fromText(text: string, metadata?: Record<string, any>): MDocument

fromHTML()

HTMLコンテンツからドキュメントを作成します。


static fromHTML(html: string, metadata?: Record<string, any>): MDocument

fromMarkdown()

Markdownコンテンツからドキュメントを作成します。


static fromMarkdown(markdown: string, metadata?: Record<string, any>): MDocument

fromJSON()

JSONコンテンツからドキュメントを作成します。


static fromJSON(json: string, metadata?: Record<string, any>): MDocument

インスタンスメソッド

chunk()

ドキュメントをチャンクに分割し、オプションでメタデータを抽出します。


async chunk(params?: ChunkParams): Promise<Chunk[]>

詳細なオプションについては、chunk() リファレンスを参照してください。

getDocs()

処理済みドキュメントチャンクの配列を返します。


getDocs(): Chunk[]

getText()

チャンクからテキスト文字列の配列を返します。


getText(): string[]

getMetadata()

チャンクからメタデータオブジェクトの配列を返します。


getMetadata(): Record<string, any>[]

extractMetadata()

指定したエクストラクターを使用してメタデータを抽出します。詳細は ExtractParams リファレンスを参照してください。


async extractMetadata(params: ExtractParams): Promise<MDocument>

例


import { MDocument } from "@mastra/rag";
 
// Create document from text
const doc = MDocument.fromText("Your content here");
 
// Split into chunks with metadata extraction
const chunks = await doc.chunk({
  strategy: "markdown",
  headers: [
    ["#", "title"],
    ["##", "section"],
  ],
  extract: {
    summary: true, // Extract summaries with default settings
    keywords: true, // Extract keywords with default settings
  },
});
 
// Get processed chunks
const docs = doc.getDocs();
const texts = doc.getText();
const metadata = doc.getMetadata();