DocsReferenceRAGExtractParams

ExtractParams

ExtractParams configures metadata extraction from document chunks.

Example

ExtractParams

ExtractParams configures automatic metadata extraction from chunks using LLM analysis.

const doc = new Document(text);
const chunks = await doc.chunk({
  extract: {
    fields: [
      { 
        name: 'summary', 
        description: 'A 1-2 sentence summary of the main points' 
      },
      { 
        name: 'entities', 
        description: 'List of companies, people, and locations mentioned' 
      },
      {
        name: 'custom_field',
        description: 'Any other metadata you want to extract, guided by this description'
      }
    ],
    model: 'gpt-4o-mini' // Optional: specify a different model
  }
});

Parameters

fields:

Array<{ name: string, description: string }>
Array of fields to extract from each chunk

model?:

string
= gpt-3.5-turbo
OpenAI model to use for extraction

Field Types

The fields are flexible - you can define any metadata fields you want to extract. Common field types include:

  • summary: Brief overview of chunk content
  • keywords: Key terms or concepts
  • topics: Main subjects discussed
  • entities: Named entities (people, places, organizations)
  • sentiment: Emotional tone
  • language: Detected language
  • timestamp: Temporal references
  • categories: Content classification

Example: