Skip to Content
ExamplesRAGEmbeddingMetadata Extraction

Metadata Extraction

This example demonstrates how to extract and utilize metadata from documents using Mastra’s document processing capabilities. The extracted metadata can be used for document organization, filtering, and enhanced retrieval in RAG systems.

Overview

The system demonstrates metadata extraction in two ways:

  1. Direct metadata extraction from a document
  2. Chunking with metadata extraction

Setup

Dependencies

Import the necessary dependencies:

src/index.ts
import { MDocument } from '@mastra/rag';

Document Creation

Create a document from text content:

src/index.ts
const doc = MDocument.fromText(`Title: The Benefits of Regular Exercise Regular exercise has numerous health benefits. It improves cardiovascular health, strengthens muscles, and boosts mental wellbeing. Key Benefits: • Reduces stress and anxiety • Improves sleep quality • Helps maintain healthy weight • Increases energy levels For optimal results, experts recommend at least 150 minutes of moderate exercise per week.`);

1. Direct Metadata Extraction

Extract metadata directly from the document:

src/index.ts
// Configure metadata extraction options await doc.extractMetadata({ keywords: true, // Extract important keywords summary: true, // Generate a concise summary }); // Retrieve the extracted metadata const meta = doc.getMetadata(); console.log('Extracted Metadata:', meta); // Example Output: // Extracted Metadata: { // keywords: [ // 'exercise', // 'health benefits', // 'cardiovascular health', // 'mental wellbeing', // 'stress reduction', // 'sleep quality' // ], // summary: 'Regular exercise provides multiple health benefits including improved cardiovascular health, muscle strength, and mental wellbeing. Key benefits include stress reduction, better sleep, weight management, and increased energy. Recommended exercise duration is 150 minutes per week.' // }

2. Chunking with Metadata

Combine document chunking with metadata extraction:

src/index.ts
// Configure chunking with metadata extraction await doc.chunk({ strategy: 'recursive', // Use recursive chunking strategy size: 200, // Maximum chunk size extract: { keywords: true, // Extract keywords per chunk summary: true, // Generate summary per chunk }, }); // Get metadata from chunks const metaTwo = doc.getMetadata(); console.log('Chunk Metadata:', metaTwo); // Example Output: // Chunk Metadata: { // keywords: [ // 'exercise', // 'health benefits', // 'cardiovascular health', // 'mental wellbeing', // 'stress reduction', // 'sleep quality' // ], // summary: 'Regular exercise provides multiple health benefits including improved cardiovascular health, muscle strength, and mental wellbeing. Key benefits include stress reduction, better sleep, weight management, and increased energy. Recommended exercise duration is 150 minutes per week.' // }





View Example on GitHub