DocsRAGVector Databases

Storing Embeddings in A Vector Database

After generating embeddings, you need to store them in a database that supports vector similarity search. Mastra provides a consistent interface for storing and querying embeddings across different vector databases.

Supported databases

PostgreSQL with PgVector

Best for teams already using PostgreSQL who want to minimize infrastructure complexity:

vector-store.ts
import { PgVector } from '@mastra/pg';
 
const store = new PgVector(process.env.POSTGRES_CONNECTION_STRING)
await store.createIndex("my-collection", 1536);
await store.upsert(
"my-collection",
embeddings,
chunks.map(chunk => ({ text: chunk.text }))
)
 

Using Vector Storage

Once initialized, all vector stores share the same interface for creating indexes, upserting embeddings, and querying.

Creating Indexes

Before storing embeddings, you need to create an index with the appropriate dimension size for your embedding model:

store-embeddings.ts
// Create an index with dimension 1536 (for text-embedding-3-small)
await store.createIndex('my-collection', 1536);
 
// For other models, use their corresponding dimensions:
// - text-embedding-3-large: 3072
// - text-embedding-ada-002: 1536
// - cohere-embed-multilingual-v3: 1024

The dimension size must match the output dimension of your chosen embedding model. Common dimension sizes are:

  • OpenAI text-embedding-3-small: 1536 dimensions
  • OpenAI text-embedding-3-large: 3072 dimensions
  • Cohere embed-multilingual-v3: 1024 dimensions

Upserting Embeddings

After creating an index, you can store embeddings along with their basic metadata:

store-embeddings.ts
// Store embeddings with their corresponding metadata
await store.upsert(
  'my-collection',  // index name
  embeddings,       // array of embedding vectors
  chunks.map(chunk => ({
    text: chunk.text,  // The original text content
    id: chunk.id       // Optional unique identifier
  }))
);

The upsert operation:

  • Takes an array of embedding vectors and their corresponding metadata
  • Updates existing vectors if they share the same ID
  • Creates new vectors if they don’t exist
  • Automatically handles batching for large datasets

Adding Metadata

Vector stores support rich metadata for advanced filtering and organization. You can add any JSON-serializable fields that will help with retrieval.

Reminder: Metadata is stored as a JSON field with no fixed schema, so you’ll want to name your fields consistently and apply a consistent schema, or your queries will return unexpected results.

// Store embeddings with rich metadata for better organization and filtering
await vectorStore.upsert(
  "embeddings",
  embeddings,
  chunks.map((chunk) => ({
    // Basic content
    text: chunk.text,
    id: chunk.id,
    
    // Document organization
    source: chunk.source,
    category: chunk.category,
    
    // Temporal metadata
    createdAt: new Date().toISOString(),
    version: "1.0",
    
    // Custom fields
    language: chunk.language,
    author: chunk.author,
    confidenceScore: chunk.score,
  })),
);

Key metadata considerations:

  • Be strict with field naming - inconsistencies like ‘category’ vs ‘Category’ will affect queries
  • Only include fields you plan to filter or sort by - extra fields add overhead
  • Add timestamps (e.g., ‘createdAt’, ‘lastUpdated’) to track content freshness

Best Practices

  • Create indexes before bulk insertions
  • Use batch operations for large insertions (the upsert method handles batching automatically)
  • Only store metadata you’ll query against
  • Match embedding dimensions to your model (e.g., 1536 for text-embedding-3-small)

Examples

For complete examples of different vector store implementations, see: