ExtractParams

ExtractParams configures metadata extraction from document chunks using LLM analysis.

Example


import { MDocument } from "@mastra/rag";
 
const doc = MDocument.fromText(text);
const chunks = await doc.chunk({
  extract: {
    title: true, // Extract titles using default settings
    summary: true, // Generate summaries using default settings
    keywords: true, // Extract keywords using default settings
  },
});
 
// Example output:
// chunks[0].metadata = {
//   documentTitle: "AI Systems Overview",
//   sectionSummary: "Overview of artificial intelligence concepts and applications",
//   excerptKeywords: "KEYWORDS: AI, machine learning, algorithms"
// }

Parameters

The extract parameter accepts the following fields:

title?:

boolean | TitleExtractorsArgs

Enable title extraction. Set to true for default settings, or provide custom configuration.

summary?:

boolean | SummaryExtractArgs

Enable summary extraction. Set to true for default settings, or provide custom configuration.

questions?:

boolean | QuestionAnswerExtractArgs

Enable question generation. Set to true for default settings, or provide custom configuration.

keywords?:

boolean | KeywordExtractArgs

Enable keyword extraction. Set to true for default settings, or provide custom configuration.

Extractor Arguments

TitleExtractorsArgs

llm?:

MastraLanguageModel

AI SDK language model to use for title extraction

nodes?:

number

Number of title nodes to extract

nodeTemplate?:

string

Custom prompt template for title node extraction. Must include {context} placeholder

combineTemplate?:

string

Custom prompt template for combining titles. Must include {context} placeholder

SummaryExtractArgs

llm?:

MastraLanguageModel

AI SDK language model to use for summary extraction

summaries?:

('self' | 'prev' | 'next')[]

List of summary types to generate. Can only include 'self' (current chunk), 'prev' (previous chunk), or 'next' (next chunk)

promptTemplate?:

string

Custom prompt template for summary generation. Must include {context} placeholder

QuestionAnswerExtractArgs

llm?:

MastraLanguageModel

AI SDK language model to use for question generation

questions?:

number

Number of questions to generate

promptTemplate?:

string

Custom prompt template for question generation. Must include both {context} and {numQuestions} placeholders

embeddingOnly?:

boolean

If true, only generate embeddings without actual questions

KeywordExtractArgs

llm?:

MastraLanguageModel

AI SDK language model to use for keyword extraction

keywords?:

number

Number of keywords to extract

promptTemplate?:

string

Custom prompt template for keyword extraction. Must include both {context} and {maxKeywords} placeholders

Advanced Example


import { MDocument } from "@mastra/rag";
 
const doc = MDocument.fromText(text);
const chunks = await doc.chunk({
  extract: {
    // Title extraction with custom settings
    title: {
      nodes: 2, // Extract 2 title nodes
      nodeTemplate: "Generate a title for this: {context}",
      combineTemplate: "Combine these titles: {context}",
    },
 
    // Summary extraction with custom settings
    summary: {
      summaries: ["self"], // Generate summaries for current chunk
      promptTemplate: "Summarize this: {context}",
    },
 
    // Question generation with custom settings
    questions: {
      questions: 3, // Generate 3 questions
      promptTemplate: "Generate {numQuestions} questions about: {context}",
      embeddingOnly: false,
    },
 
    // Keyword extraction with custom settings
    keywords: {
      keywords: 5, // Extract 5 keywords
      promptTemplate: "Extract {maxKeywords} key terms from: {context}",
    },
  },
});
 
// Example output:
// chunks[0].metadata = {
//   documentTitle: "AI in Modern Computing",
//   sectionSummary: "Overview of AI concepts and their applications in computing",
//   questionsThisExcerptCanAnswer: "1. What is machine learning?\n2. How do neural networks work?",
//   excerptKeywords: "1. Machine learning\n2. Neural networks\n3. Training data"
// }

Document Grouping for Title Extraction

When using the TitleExtractor, you can group multiple chunks together for title extraction by specifying a shared docId in the metadata field of each chunk. All chunks with the same docId will receive the same extracted title. If no docId is set, each chunk is treated as its own document for title extraction.

Example:


import { MDocument } from "@mastra/rag";
 
const doc = new MDocument({
  docs: [
    { text: "chunk 1", metadata: { docId: "docA" } },
    { text: "chunk 2", metadata: { docId: "docA" } },
    { text: "chunk 3", metadata: { docId: "docB" } },
  ],
  type: "text",
});
 
await doc.extractMetadata({ title: true });
// The first two chunks will share a title, while the third chunk will be assigned a separate title.

ExtractParams

ExtractParams configures metadata extraction from document chunks using LLM analysis.

Example


import { MDocument } from "@mastra/rag";
 
const doc = MDocument.fromText(text);
const chunks = await doc.chunk({
  extract: {
    title: true, // Extract titles using default settings
    summary: true, // Generate summaries using default settings
    keywords: true, // Extract keywords using default settings
  },
});
 
// Example output:
// chunks[0].metadata = {
//   documentTitle: "AI Systems Overview",
//   sectionSummary: "Overview of artificial intelligence concepts and applications",
//   excerptKeywords: "KEYWORDS: AI, machine learning, algorithms"
// }

Parameters

The extract parameter accepts the following fields:

title?:

boolean | TitleExtractorsArgs

Enable title extraction. Set to true for default settings, or provide custom configuration.

summary?:

boolean | SummaryExtractArgs

Enable summary extraction. Set to true for default settings, or provide custom configuration.

questions?:

boolean | QuestionAnswerExtractArgs

Enable question generation. Set to true for default settings, or provide custom configuration.

keywords?:

boolean | KeywordExtractArgs

Enable keyword extraction. Set to true for default settings, or provide custom configuration.

Extractor Arguments

TitleExtractorsArgs

llm?:

MastraLanguageModel

AI SDK language model to use for title extraction

nodes?:

number

Number of title nodes to extract

nodeTemplate?:

string

Custom prompt template for title node extraction. Must include {context} placeholder

combineTemplate?:

string

Custom prompt template for combining titles. Must include {context} placeholder

SummaryExtractArgs

llm?:

MastraLanguageModel

AI SDK language model to use for summary extraction

summaries?:

('self' | 'prev' | 'next')[]

List of summary types to generate. Can only include 'self' (current chunk), 'prev' (previous chunk), or 'next' (next chunk)

promptTemplate?:

string

Custom prompt template for summary generation. Must include {context} placeholder

QuestionAnswerExtractArgs

llm?:

MastraLanguageModel

AI SDK language model to use for question generation

questions?:

number

Number of questions to generate

promptTemplate?:

string

Custom prompt template for question generation. Must include both {context} and {numQuestions} placeholders

embeddingOnly?:

boolean

If true, only generate embeddings without actual questions

KeywordExtractArgs

llm?:

MastraLanguageModel

AI SDK language model to use for keyword extraction

keywords?:

number

Number of keywords to extract

promptTemplate?:

string

Custom prompt template for keyword extraction. Must include both {context} and {maxKeywords} placeholders

Advanced Example


import { MDocument } from "@mastra/rag";
 
const doc = MDocument.fromText(text);
const chunks = await doc.chunk({
  extract: {
    // Title extraction with custom settings
    title: {
      nodes: 2, // Extract 2 title nodes
      nodeTemplate: "Generate a title for this: {context}",
      combineTemplate: "Combine these titles: {context}",
    },
 
    // Summary extraction with custom settings
    summary: {
      summaries: ["self"], // Generate summaries for current chunk
      promptTemplate: "Summarize this: {context}",
    },
 
    // Question generation with custom settings
    questions: {
      questions: 3, // Generate 3 questions
      promptTemplate: "Generate {numQuestions} questions about: {context}",
      embeddingOnly: false,
    },
 
    // Keyword extraction with custom settings
    keywords: {
      keywords: 5, // Extract 5 keywords
      promptTemplate: "Extract {maxKeywords} key terms from: {context}",
    },
  },
});
 
// Example output:
// chunks[0].metadata = {
//   documentTitle: "AI in Modern Computing",
//   sectionSummary: "Overview of AI concepts and their applications in computing",
//   questionsThisExcerptCanAnswer: "1. What is machine learning?\n2. How do neural networks work?",
//   excerptKeywords: "1. Machine learning\n2. Neural networks\n3. Training data"
// }

Document Grouping for Title Extraction

Example:


import { MDocument } from "@mastra/rag";
 
const doc = new MDocument({
  docs: [
    { text: "chunk 1", metadata: { docId: "docA" } },
    { text: "chunk 2", metadata: { docId: "docA" } },
    { text: "chunk 3", metadata: { docId: "docB" } },
  ],
  type: "text",
});
 
await doc.extractMetadata({ title: true });
// The first two chunks will share a title, while the third chunk will be assigned a separate title.