createDocumentChunker()
The createDocumentChunker()
function creates a tool for splitting documents into smaller chunks for efficient processing and retrieval. It supports different chunking strategies and configurable parameters.
Basic Usage
import { createDocumentChunker, MDocument } from "@mastra/rag";
const document = new MDocument({
text: "Your document content here...",
metadata: { source: "user-manual" }
});
const chunker = createDocumentChunker({
doc: document,
params: {
strategy: "recursive",
size: 512,
overlap: 50,
separator: "\n"
}
});
const { chunks } = await chunker.execute();
Parameters
doc:
MDocument
The document to be chunked
params?:
ChunkParams
Configuration parameters for chunking
ChunkParams
strategy?:
'recursive'
The chunking strategy to use
size?:
number
Target size of each chunk in tokens/characters
overlap?:
number
Number of overlapping tokens/characters between chunks
separator?:
string
Character(s) to use as chunk separator
Returns
chunks:
DocumentChunk[]
Array of document chunks with their content and metadata
Example with Custom Parameters
const technicalDoc = new MDocument({
text: longDocumentContent,
metadata: {
type: "technical",
version: "1.0"
}
});
const chunker = createDocumentChunker({
doc: technicalDoc,
params: {
strategy: "recursive",
size: 1024, // Larger chunks
overlap: 100, // More overlap
separator: "\n\n" // Split on double newlines
}
});
const { chunks } = await chunker.execute();
// Process the chunks
chunks.forEach((chunk, index) => {
console.log(`Chunk ${index + 1} length: ${chunk.content.length}`);
});
Tool Details
The chunker is created as a Mastra tool with the following properties:
- Tool ID:
Document Chunker {strategy} {size}
- Description:
Chunks document using {strategy} strategy with size {size} and {overlap} overlap
- Input Schema: Empty object (no additional inputs required)
- Output Schema: Object containing the chunks array