Skip to main content

MDocument

The MDocument class processes documents for RAG applications. The main methods are .chunk() and .extractMetadata().

Constructor
Direct link to Constructor

docs:

Array<{ text: string, metadata?: Record<string, any> }>
Array of document chunks with their text content and optional metadata

type:

'text' | 'html' | 'markdown' | 'json' | 'latex'
Type of document content

Static methods
Direct link to Static methods

fromText()
Direct link to fromtext

Creates a document from plain text content.

static fromText(text: string, metadata?: Record<string, any>): MDocument

fromHTML()
Direct link to fromhtml

Creates a document from HTML content.

static fromHTML(html: string, metadata?: Record<string, any>): MDocument

fromMarkdown()
Direct link to frommarkdown

Creates a document from Markdown content.

static fromMarkdown(markdown: string, metadata?: Record<string, any>): MDocument

fromJSON()
Direct link to fromjson

Creates a document from JSON content.

static fromJSON(json: string, metadata?: Record<string, any>): MDocument

Instance methods
Direct link to Instance methods

chunk()
Direct link to chunk

Splits document into chunks and optionally extracts metadata.

async chunk(params?: ChunkParams): Promise<Chunk[]>

See chunk() reference for detailed options.

getDocs()
Direct link to getdocs

Returns array of processed document chunks.

getDocs(): Chunk[]

getText()
Direct link to gettext

Returns array of text strings from chunks.

getText(): string[]

getMetadata()
Direct link to getmetadata

Returns array of metadata objects from chunks.

getMetadata(): Record<string, any>[]

extractMetadata()
Direct link to extractmetadata

Extracts metadata using specified extractors. See ExtractParams reference for details.

async extractMetadata(params: ExtractParams): Promise<MDocument>

Examples
Direct link to Examples

import { MDocument } from '@mastra/rag'

// Create document from text
const doc = MDocument.fromText('Your content here')

// Split into chunks with metadata extraction
const chunks = await doc.chunk({
strategy: 'markdown',
headers: [
['#', 'title'],
['##', 'section'],
],
extract: {
summary: true, // Extract summaries with default settings
keywords: true, // Extract keywords with default settings
},
})

// Get processed chunks
const docs = doc.getDocs()
const texts = doc.getText()
const metadata = doc.getMetadata()