Skip to Content
ReferenceRAGExtractParams

ExtractParams

ExtractParams configures metadata extraction from document chunks using LLM analysis.

Example

import { MDocument } from "@mastra/rag"; const doc = MDocument.fromText(text); const chunks = await doc.chunk({ extract: { title: true, // Extract titles using default settings summary: true, // Generate summaries using default settings keywords: true // Extract keywords using default settings } }); // Example output: // chunks[0].metadata = { // documentTitle: "AI Systems Overview", // sectionSummary: "Overview of artificial intelligence concepts and applications", // excerptKeywords: "KEYWORDS: AI, machine learning, algorithms" // }

Parameters

The extract parameter accepts the following fields:

title?:

boolean | TitleExtractorsArgs
Enable title extraction. Set to true for default settings, or provide custom configuration.

summary?:

boolean | SummaryExtractArgs
Enable summary extraction. Set to true for default settings, or provide custom configuration.

questions?:

boolean | QuestionAnswerExtractArgs
Enable question generation. Set to true for default settings, or provide custom configuration.

keywords?:

boolean | KeywordExtractArgs
Enable keyword extraction. Set to true for default settings, or provide custom configuration.

Extractor Arguments

TitleExtractorsArgs

llm?:

LLM
Custom LLM instance to use for title extraction

nodes?:

number
Number of title nodes to extract

nodeTemplate?:

string
Custom prompt template for title node extraction. Must include {context} placeholder

combineTemplate?:

string
Custom prompt template for combining titles. Must include {context} placeholder

SummaryExtractArgs

llm?:

LLM
Custom LLM instance to use for summary extraction

summaries?:

('self' | 'prev' | 'next')[]
List of summary types to generate. Can only include 'self' (current chunk), 'prev' (previous chunk), or 'next' (next chunk)

promptTemplate?:

string
Custom prompt template for summary generation. Must include {context} placeholder

QuestionAnswerExtractArgs

llm?:

LLM
Custom LLM instance to use for question generation

questions?:

number
Number of questions to generate

promptTemplate?:

string
Custom prompt template for question generation. Must include both {context} and {numQuestions} placeholders

embeddingOnly?:

boolean
If true, only generate embeddings without actual questions

KeywordExtractArgs

llm?:

LLM
Custom LLM instance to use for keyword extraction

keywords?:

number
Number of keywords to extract

promptTemplate?:

string
Custom prompt template for keyword extraction. Must include both {context} and {maxKeywords} placeholders

Advanced Example

import { MDocument } from "@mastra/rag"; const doc = MDocument.fromText(text); const chunks = await doc.chunk({ extract: { // Title extraction with custom settings title: { nodes: 2, // Extract 2 title nodes nodeTemplate: "Generate a title for this: {context}", combineTemplate: "Combine these titles: {context}" }, // Summary extraction with custom settings summary: { summaries: ["self"], // Generate summaries for current chunk promptTemplate: "Summarize this: {context}" }, // Question generation with custom settings questions: { questions: 3, // Generate 3 questions promptTemplate: "Generate {numQuestions} questions about: {context}", embeddingOnly: false }, // Keyword extraction with custom settings keywords: { keywords: 5, // Extract 5 keywords promptTemplate: "Extract {maxKeywords} key terms from: {context}" } } }); // Example output: // chunks[0].metadata = { // documentTitle: "AI in Modern Computing", // sectionSummary: "Overview of AI concepts and their applications in computing", // questionsThisExcerptCanAnswer: "1. What is machine learning?\n2. How do neural networks work?", // excerptKeywords: "1. Machine learning\n2. Neural networks\n3. Training data" // }