ExtractParams
ExtractParams configures metadata extraction from document chunks.
Example
ExtractParams
ExtractParams
configures automatic metadata extraction from chunks using LLM analysis.
const doc = new Document(text);
const chunks = await doc.chunk({
extract: {
fields: [
{
name: 'summary',
description: 'A 1-2 sentence summary of the main points'
},
{
name: 'entities',
description: 'List of companies, people, and locations mentioned'
},
{
name: 'custom_field',
description: 'Any other metadata you want to extract, guided by this description'
}
],
model: 'gpt-4o-mini' // Optional: specify a different model
}
});
Parameters
fields:
Array<{ name: string, description: string }>
Array of fields to extract from each chunk
model?:
string
= gpt-3.5-turbo
OpenAI model to use for extraction
Field Types
The fields are flexible - you can define any metadata fields you want to extract. Common field types include:
summary
: Brief overview of chunk contentkeywords
: Key terms or conceptstopics
: Main subjects discussedentities
: Named entities (people, places, organizations)sentiment
: Emotional tonelanguage
: Detected languagetimestamp
: Temporal referencescategories
: Content classification
Example: