DocsReferenceRAGDocument Chunker

createDocumentChunker()

The createDocumentChunker() function creates a tool for splitting documents into smaller chunks for efficient processing and retrieval. It supports different chunking strategies and configurable parameters.

Basic Usage

import { createDocumentChunker, MDocument } from "@mastra/rag";
 
const document = new MDocument({
  text: "Your document content here...",
  metadata: { source: "user-manual" }
});
 
const chunker = createDocumentChunker({
  doc: document,
  params: {
    strategy: "recursive",
    size: 512,
    overlap: 50,
    separator: "\n"
  }
});
 
const { chunks } = await chunker.execute();

Parameters

doc:

MDocument
The document to be chunked

params?:

ChunkParams
Configuration parameters for chunking

ChunkParams

strategy?:

'recursive'
The chunking strategy to use

size?:

number
Target size of each chunk in tokens/characters

overlap?:

number
Number of overlapping tokens/characters between chunks

separator?:

string
Character(s) to use as chunk separator

Returns

chunks:

DocumentChunk[]
Array of document chunks with their content and metadata

Example with Custom Parameters

const technicalDoc = new MDocument({
  text: longDocumentContent,
  metadata: {
    type: "technical",
    version: "1.0"
  }
});
 
const chunker = createDocumentChunker({
  doc: technicalDoc,
  params: {
    strategy: "recursive",
    size: 1024,      // Larger chunks
    overlap: 100,    // More overlap
    separator: "\n\n" // Split on double newlines
  }
});
 
const { chunks } = await chunker.execute();
 
// Process the chunks
chunks.forEach((chunk, index) => {
  console.log(`Chunk ${index + 1} length: ${chunk.content.length}`);
});

Tool Details

The chunker is created as a Mastra tool with the following properties:

  • Tool ID: Document Chunker {strategy} {size}
  • Description: Chunks document using {strategy} strategy with size {size} and {overlap} overlap
  • Input Schema: Empty object (no additional inputs required)
  • Output Schema: Object containing the chunks array

MIT 2025 © Nextra.