Chunk Function Reference
The chunk
function splits documents into smaller segments using various strategies and options.
Parameters
strategy?:
'recursive' | 'character' | 'token' | 'markdown' | 'html' | 'json' | 'latex'
The chunking strategy to use. If not specified, defaults based on document type.
size?:
number
Maximum size of each chunk
overlap?:
number
Number of characters/tokens that overlap between chunks
separator?:
string
Character(s) to split on
isSeparatorRegex?:
boolean
Whether the separator is a regex pattern
keepSeparator?:
'start' | 'end'
Whether to keep the separator at the start or end of chunks
extract?:
ExtractParams
Metadata extraction options (requires OpenAI API key)
Strategy-Specific Options
HTML
headers:
Array<[string, string]>
Array of [selector, metadata key] pairs for header-based splitting
sections:
Array<[string, string]>
Array of [selector, metadata key] pairs for section-based splitting
returnEachLine?:
boolean
Whether to return each line as a separate chunk
Markdown
headers:
Array<[string, string]>
Array of [header level, metadata key] pairs
stripHeaders?:
boolean
Whether to remove headers from the output
returnEachLine?:
boolean
Whether to return each line as a separate chunk
Token
encodingName?:
string
Name of the token encoding to use
modelName?:
string
Name of the model for tokenization
JSON
maxSize:
number
Maximum size of each chunk
minSize?:
number
Minimum size of each chunk
ensureAscii?:
boolean
Whether to ensure ASCII encoding
convertLists?:
boolean
Whether to convert lists in the JSON
Return Value
Returns a MDocument
instance containing the chunked documents. Each chunk includes:
interface DocumentNode {
text: string;
metadata: Record<string, any>;
embedding?: number[];
}