Document Processing

The MDocument class handles document chunking and metadata extraction.

Constructor

text:

string
Document text content

metadata?:

Record<string, any>
Optional metadata about the document

Methods

chunk()

Splits document into chunks and optionally extracts metadata.

strategy:

'sentence' | 'paragraph' | 'fixed'
Chunking strategy to use

parseMarkdown?:

boolean
Whether to parse markdown syntax

metadataExtraction?:

object
Metadata extraction options (requires OpenAI)
boolean | TitleExtractorsArgs
boolean | SummaryExtractArgs
boolean | QuestionAnswerExtractArgs
boolean | KeywordExtractArgs

Response Types

The chunk method returns an array of document nodes:

interface DocumentNode {
  text: string;
  metadata: Record<string, any>;
  embedding?: number[];
}

Error Handling

try {
  const chunks = await doc.chunk({
    strategy: "sentence",
  });
} catch (error) {
  if (error instanceof DocumentProcessingError) {
    console.log(error.code); // 'invalid_strategy' | 'extraction_failed' etc
    console.log(error.details); // Additional error context
  }
}

MIT 2025 © Nextra.