MDocument
The MDocument class processes documents for RAG applications. The main methods are .chunk()
and .extractMetadata()
.
Constructor
docs:
Array<{ text: string, metadata?: Record<string, any> }>
Array of document chunks with their text content and optional metadata
type:
'text' | 'html' | 'markdown' | 'json' | 'latex'
Type of document content
Static Methods
fromText()
Creates a document from plain text content.
static fromText(text: string, metadata?: Record<string, any>): MDocument
fromHTML()
Creates a document from HTML content.
static fromHTML(html: string, metadata?: Record<string, any>): MDocument
fromMarkdown()
Creates a document from Markdown content.
static fromMarkdown(markdown: string, metadata?: Record<string, any>): MDocument
fromJSON()
Creates a document from JSON content.
static fromJSON(json: string, metadata?: Record<string, any>): MDocument
Instance Methods
chunk()
Splits document into chunks and optionally extracts metadata.
async chunk(params?: ChunkParams): Promise<Chunk[]>
See chunk() reference for detailed options.
getDocs()
Returns array of processed document chunks.
getDocs(): Chunk[]
getText()
Returns array of text strings from chunks.
getText(): string[]
getMetadata()
Returns array of metadata objects from chunks.
getMetadata(): Record<string, any>[]
extractMetadata()
Extracts metadata using specified extractors. See ExtractParams reference for details.
async extractMetadata(params: ExtractParams): Promise<MDocument>
Examples
import { MDocument } from '@mastra/rag';
// Create document from text
const doc = MDocument.fromText('Your content here');
// Split into chunks with metadata extraction
const chunks = await doc.chunk({
strategy: 'markdown',
headers: [['#', 'title'], ['##', 'section']],
extract: {
fields: [
{ name: 'summary', description: 'A brief summary' },
{ name: 'keywords', description: 'Key terms' }
]
}
});
// Get processed chunks
const docs = doc.getDocs();
const texts = doc.getText();
const metadata = doc.getMetadata();