ExtractParams
ExtractParams configures metadata extraction from document chunks using LLM analysis.
Example
import { MDocument } from "@mastra/rag";
const doc = MDocument.fromText(text);
const chunks = await doc.chunk({
extract: {
title: true, // Extract titles using default settings
summary: true, // Generate summaries using default settings
keywords: true // Extract keywords using default settings
}
});
// Example output:
// chunks[0].metadata = {
// documentTitle: "AI Systems Overview",
// sectionSummary: "Overview of artificial intelligence concepts and applications",
// excerptKeywords: "KEYWORDS: AI, machine learning, algorithms"
// }
Parameters
The extract
parameter accepts the following fields:
title?:
boolean | TitleExtractorsArgs
Enable title extraction. Set to true for default settings, or provide custom configuration.
summary?:
boolean | SummaryExtractArgs
Enable summary extraction. Set to true for default settings, or provide custom configuration.
questions?:
boolean | QuestionAnswerExtractArgs
Enable question generation. Set to true for default settings, or provide custom configuration.
keywords?:
boolean | KeywordExtractArgs
Enable keyword extraction. Set to true for default settings, or provide custom configuration.
Extractor Arguments
TitleExtractorsArgs
llm?:
LLM
Custom LLM instance to use for title extraction
nodes?:
number
Number of title nodes to extract
nodeTemplate?:
string
Custom prompt template for title node extraction. Must include {context} placeholder
combineTemplate?:
string
Custom prompt template for combining titles. Must include {context} placeholder
SummaryExtractArgs
llm?:
LLM
Custom LLM instance to use for summary extraction
summaries?:
('self' | 'prev' | 'next')[]
List of summary types to generate. Can only include 'self' (current chunk), 'prev' (previous chunk), or 'next' (next chunk)
promptTemplate?:
string
Custom prompt template for summary generation. Must include {context} placeholder
QuestionAnswerExtractArgs
llm?:
LLM
Custom LLM instance to use for question generation
questions?:
number
Number of questions to generate
promptTemplate?:
string
Custom prompt template for question generation. Must include both {context} and {numQuestions} placeholders
embeddingOnly?:
boolean
If true, only generate embeddings without actual questions
KeywordExtractArgs
llm?:
LLM
Custom LLM instance to use for keyword extraction
keywords?:
number
Number of keywords to extract
promptTemplate?:
string
Custom prompt template for keyword extraction. Must include both {context} and {maxKeywords} placeholders
Advanced Example
import { MDocument } from "@mastra/rag";
const doc = MDocument.fromText(text);
const chunks = await doc.chunk({
extract: {
// Title extraction with custom settings
title: {
nodes: 2, // Extract 2 title nodes
nodeTemplate: "Generate a title for this: {context}",
combineTemplate: "Combine these titles: {context}"
},
// Summary extraction with custom settings
summary: {
summaries: ["self"], // Generate summaries for current chunk
promptTemplate: "Summarize this: {context}"
},
// Question generation with custom settings
questions: {
questions: 3, // Generate 3 questions
promptTemplate: "Generate {numQuestions} questions about: {context}",
embeddingOnly: false
},
// Keyword extraction with custom settings
keywords: {
keywords: 5, // Extract 5 keywords
promptTemplate: "Extract {maxKeywords} key terms from: {context}"
}
}
});
// Example output:
// chunks[0].metadata = {
// documentTitle: "AI in Modern Computing",
// sectionSummary: "Overview of AI concepts and their applications in computing",
// questionsThisExcerptCanAnswer: "1. What is machine learning?\n2. How do neural networks work?",
// excerptKeywords: "1. Machine learning\n2. Neural networks\n3. Training data"
// }