Document Processing
The MDocument
class handles document chunking and metadata extraction.
Constructor
text:
string
Document text content
metadata?:
Record<string, any>
Optional metadata about the document
Methods
chunk()
Splits document into chunks and optionally extracts metadata.
strategy:
'sentence' | 'paragraph' | 'fixed'
Chunking strategy to use
parseMarkdown?:
boolean
Whether to parse markdown syntax
metadataExtraction?:
object
Metadata extraction options (requires OpenAI)
boolean | TitleExtractorsArgs
boolean | SummaryExtractArgs
boolean | QuestionAnswerExtractArgs
boolean | KeywordExtractArgs
Response Types
The chunk method returns an array of document nodes:
interface DocumentNode {
text: string;
metadata: Record<string, any>;
embedding?: number[];
}
Error Handling
try {
const chunks = await doc.chunk({
strategy: "sentence",
});
} catch (error) {
if (error instanceof DocumentProcessingError) {
console.log(error.code); // 'invalid_strategy' | 'extraction_failed' etc
console.log(error.details); // Additional error context
}
}