Benchmarking pgvector RAG performance across different dataset sizes

It started with a simple customer question:

"Why did you choose IVFFlat indexing for your PG vector library?"

Seems straightforward enough... except we realized we didn't have a data-backed answer.

We had implemented IVFFlat with fixed parameters (100 lists) as our default, but could we actually defend this choice with hard numbers? This sent us down a rabbit hole of benchmarking and testing that I want to share, because the results are interesting.

When customers ask technical "why" questions, they're usually really asking: "Is this the optimal solution for my use case?" Our customer had growing datasets and wanted confidence that our indexing strategy would scale with them. Our implementation looked like this:

IVFFlat indexes created immediately upon table creation
Fixed parameters (100 lists) regardless of dataset size
No index rebuilding as data changes or grows

Let's explore this

Our current fixed IVFFlat implementation
An adaptive IVFFlat approach that scales lists with dataset size
Flat (no index) as a baseline
HNSW with a maximum of 8 connections and a build time complexity of 32

We tested across:

Different dataset sizes (10K, 100K, 500K, and 1M vectors)
Various K values (10, 25, 50, 100 nearest neighbors)
Different dimensions (64, 384, 1024)
Different vector distributions (random, clustered, skewed, and mixed)

For each configuration, we ran 30 queries to get reliable data on both recall and latency.

Recall Performance: Better Than Expected

One of our initial concerns was that recall might degrade significantly with our fixed approach as datasets grew. The data showed otherwise:

Current: pgvectorrecall

Adaptive: pgvectorrecall

Flat: pgvectorrecall

HNSW: pgvectorrecall

Note: Combined results shown for all dimensions (64, 384, 1024)

Both fixed and adaptive approaches maintained excellent recall (typically 100%) for datasets larger than 1,000 vectors

Even with our fixed lists, recall stayed strong as data grew.

Latency Performance: Room for Improvement

Here's where things got interesting:

Current: pgvectorlatency

Adaptive: pgvectorlatency

Flat: pgvectorlatency

HNSW: pgvectorlatency

Note: Results shown for dimension 64

The fixed approach showed much more variable latency, especially for P95 measurements
For large datasets, the adaptive approach delivered significant improvements:
With 1M vectors (64 dimensions), P95 latencies were 125-161ms for adaptive vs 141-219ms for fixed
With 500K vectors, median latencies were 60-65ms for adaptive vs 66-70ms for fixed

These might seem like small differences, but in production, they add up to a much better user experience.

The fixed approach created some major cluster imbalances:

With 100K vectors: 1000 vectors/list in fixed vs 158 vectors/list in adaptive
With 1M vectors: Up to 10,000 vectors in some clusters while others remained sparse

This uneven distribution explained the latency variability we were seeing.

After seeing all these results, we decided to make some improvements.

Better index management now

Separate table creation from index building
User-Controlled Rebuilding: Index can be reconstructed whenever data changes occur
Intelligent List Sizing: Lists are dynamically calculated based on your dataset size

import { PgVector } from "@mastra/pg";

const vector = new PgVector();

// Rebuild the index on index creation
await vector.createIndex({
  indexName: "embeddings",
  dimension: 1536,
  metric: "cosine",
  indexConfig: {},
  buildIndex: true,
});

// Rebuild via buildIndex
await vector.buildIndex({
  indexName: "embeddings",
  metric: "cosine",
  indexConfig: {},
});

The nice thing about Mastra, is that you don't have to worry about most of this. We handle it for you.

When To Rebuild Your Index?

The one thing you still need to think about is: when should you rebuild your index?

Some thoughts:

After inserting >20% new data
When query performance degrades noticeably
When recall rates drop (test with known queries)
Start with sufficient data before creating the initial index
Schedule rebuilds during low-traffic periods
Consider rebuilding after significant changes in data distribution

Benchmarking pgvector RAG performance across different dataset sizes

Let's explore this

Recall Performance: Better Than Expected

Latency Performance: Room for Improvement

Better index management now

When To Rebuild Your Index?

Author

Share

Stay up to date