ExamplesRAGMetadata Extraction

Metadata Extraction

This example demonstrates how to extract and utilize metadata from documents using Mastra’s document processing capabilities. The extracted metadata can be used for document organization, filtering, and enhanced retrieval in RAG systems.

Overview

The system demonstrates metadata extraction in two ways:

  1. Direct metadata extraction from a document
  2. Chunking with metadata extraction

Setup

Dependencies

Import the necessary dependencies:

src/index.ts
import { MDocument } from '@mastra/rag';

Document Creation

Create a document from text content:

src/index.ts
const doc = MDocument.fromText(`Title: The Benefits of Regular Exercise
 
Regular exercise has numerous health benefits. It improves cardiovascular health, 
strengthens muscles, and boosts mental wellbeing.
 
Key Benefits:
• Reduces stress and anxiety
• Improves sleep quality
• Helps maintain healthy weight
• Increases energy levels
 
For optimal results, experts recommend at least 150 minutes of moderate exercise 
per week.`);

1. Direct Metadata Extraction

Extract metadata directly from the document:

src/index.ts
// Configure metadata extraction options
await doc.extractMetadata({
  keywords: true,  // Extract important keywords
  summary: true,   // Generate a concise summary
});
 
// Retrieve the extracted metadata
const meta = doc.getMetadata();
console.log('Extracted Metadata:', meta);
 
// Example Output:
// Extracted Metadata: {
//   keywords: [
//     'exercise',
//     'health benefits',
//     'cardiovascular health',
//     'mental wellbeing',
//     'stress reduction',
//     'sleep quality'
//   ],
//   summary: 'Regular exercise provides multiple health benefits including improved cardiovascular health, muscle strength, and mental wellbeing. Key benefits include stress reduction, better sleep, weight management, and increased energy. Recommended exercise duration is 150 minutes per week.'
// }

2. Chunking with Metadata

Combine document chunking with metadata extraction:

src/index.ts
// Configure chunking with metadata extraction
await doc.chunk({
  strategy: 'recursive',  // Use recursive chunking strategy
  size: 200,             // Maximum chunk size
  extract: {
    keywords: true,      // Extract keywords per chunk
    summary: true,       // Generate summary per chunk
  },
});
 
// Get metadata from chunks
const metaTwo = doc.getMetadata();
console.log('Chunk Metadata:', metaTwo);
 
// Example Output:
// Chunk Metadata: {
//   keywords: [
//     'exercise',
//     'health benefits',
//     'cardiovascular health',
//     'mental wellbeing',
//     'stress reduction',
//     'sleep quality'
//   ],
//   summary: 'Regular exercise provides multiple health benefits including improved cardiovascular health, muscle strength, and mental wellbeing. Key benefits include stress reduction, better sleep, weight management, and increased energy. Recommended exercise duration is 150 minutes per week.'
// }





View Example on GitHub