Skip to Content
ReferenceRAG.chunk()

Reference: .chunk()

The .chunk() function splits documents into smaller segments using various strategies and options.

Example

import { MDocument } from '@mastra/rag'; const doc = MDocument.fromMarkdown(` # Introduction This is a sample document that we want to split into chunks. ## Section 1 Here is the first section with some content. ## Section 2 Here is another section with different content. `); // Basic chunking with defaults const chunks = await doc.chunk(); // Markdown-specific chunking with header extraction const chunksWithMetadata = await doc.chunk({ strategy: 'markdown', headers: [['#', 'title'], ['##', 'section']], extract: { summary: true, // Extract summaries with default settings keywords: true // Extract keywords with default settings } });

Parameters

strategy?:

'recursive' | 'character' | 'token' | 'markdown' | 'html' | 'json' | 'latex'
The chunking strategy to use. If not specified, defaults based on document type. Depending on the chunking strategy, there are additional optionals. Defaults: .md files → 'markdown', .html/.htm → 'html', .json → 'json', .tex → 'latex', others → 'recursive'

size?:

number
= 512
Maximum size of each chunk

overlap?:

number
= 50
Number of characters/tokens that overlap between chunks.

separator?:

string
= \n\n
Character(s) to split on. Defaults to double newline for text content.

isSeparatorRegex?:

boolean
= false
Whether the separator is a regex pattern

keepSeparator?:

'start' | 'end'
Whether to keep the separator at the start or end of chunks

extract?:

ExtractParams
Metadata extraction configuration. See [ExtractParams reference](./extract-params) for details.

Strategy-Specific Options

Strategy-specific options are passed as top-level parameters alongside the strategy parameter. For example:

// HTML strategy example const chunks = await doc.chunk({ strategy: 'html', headers: [['h1', 'title'], ['h2', 'subtitle']], // HTML-specific option sections: [['div.content', 'main']], // HTML-specific option size: 500 // general option }); // Markdown strategy example const chunks = await doc.chunk({ strategy: 'markdown', headers: [['#', 'title'], ['##', 'section']], // Markdown-specific option stripHeaders: true, // Markdown-specific option overlap: 50 // general option }); // Token strategy example const chunks = await doc.chunk({ strategy: 'token', encodingName: 'gpt2', // Token-specific option modelName: 'gpt-3.5-turbo', // Token-specific option size: 1000 // general option });

The options documented below are passed directly at the top level of the configuration object, not nested within a separate options object.

HTML

headers:

Array<[string, string]>
Array of [selector, metadata key] pairs for header-based splitting

sections:

Array<[string, string]>
Array of [selector, metadata key] pairs for section-based splitting

returnEachLine?:

boolean
Whether to return each line as a separate chunk

Markdown

headers:

Array<[string, string]>
Array of [header level, metadata key] pairs

stripHeaders?:

boolean
Whether to remove headers from the output

returnEachLine?:

boolean
Whether to return each line as a separate chunk

Token

encodingName?:

string
Name of the token encoding to use

modelName?:

string
Name of the model for tokenization

JSON

maxSize:

number
Maximum size of each chunk

minSize?:

number
Minimum size of each chunk

ensureAscii?:

boolean
Whether to ensure ASCII encoding

convertLists?:

boolean
Whether to convert lists in the JSON

Return Value

Returns a MDocument instance containing the chunked documents. Each chunk includes:

interface DocumentNode { text: string; metadata: Record<string, any>; embedding?: number[]; }