Chunk Function Reference

The chunk function splits documents into smaller segments using various strategies and options.

Parameters

strategy?:

'recursive' | 'character' | 'token' | 'markdown' | 'html' | 'json' | 'latex'
The chunking strategy to use. If not specified, defaults based on document type.

size?:

number
Maximum size of each chunk

overlap?:

number
Number of characters/tokens that overlap between chunks

separator?:

string
Character(s) to split on

isSeparatorRegex?:

boolean
Whether the separator is a regex pattern

keepSeparator?:

'start' | 'end'
Whether to keep the separator at the start or end of chunks

extract?:

ExtractParams
Metadata extraction options (requires OpenAI API key)

Strategy-Specific Options

HTML

headers:

Array<[string, string]>
Array of [selector, metadata key] pairs for header-based splitting

sections:

Array<[string, string]>
Array of [selector, metadata key] pairs for section-based splitting

returnEachLine?:

boolean
Whether to return each line as a separate chunk

Markdown

headers:

Array<[string, string]>
Array of [header level, metadata key] pairs

stripHeaders?:

boolean
Whether to remove headers from the output

returnEachLine?:

boolean
Whether to return each line as a separate chunk

Token

encodingName?:

string
Name of the token encoding to use

modelName?:

string
Name of the model for tokenization

JSON

maxSize:

number
Maximum size of each chunk

minSize?:

number
Minimum size of each chunk

ensureAscii?:

boolean
Whether to ensure ASCII encoding

convertLists?:

boolean
Whether to convert lists in the JSON

Return Value

Returns a MDocument instance containing the chunked documents. Each chunk includes:

interface DocumentNode {
  text: string;
  metadata: Record<string, any>;
  embedding?: number[];
}

MIT 2025 © Nextra.