It started with a simple customer question:
"Why did you choose IVFFlat indexing for your PG vector library?"
Seems straightforward enough... except we realized we didn't have a data-backed answer.
We had implemented IVFFlat with fixed parameters (100 lists) as our default, but could we actually defend this choice with hard numbers? This sent us down a rabbit hole of benchmarking and testing that I want to share, because the results are interesting.
When customers ask technical "why" questions, they're usually really asking: "Is this the optimal solution for my use case?" Our customer had growing datasets and wanted confidence that our indexing strategy would scale with them. Our implementation looked like this:
- IVFFlat indexes created immediately upon table creation
- Fixed parameters (100 lists) regardless of dataset size
- No index rebuilding as data changes or grows
Let's explore this
- Our current fixed IVFFlat implementation
- An adaptive IVFFlat approach that scales lists with dataset size
- Flat (no index) as a baseline
- HNSW with a maximum of 8 connections and a build time complexity of 32
We tested across:
- Different dataset sizes (10K, 100K, 500K, and 1M vectors)
- Various K values (10, 25, 50, 100 nearest neighbors)
- Different dimensions (64, 384, 1024)
- Different vector distributions (random, clustered, skewed, and mixed)
For each configuration, we ran 30 queries to get reliable data on both recall and latency.
Recall Performance: Better Than Expected
One of our initial concerns was that recall might degrade significantly with our fixed approach as datasets grew. The data showed otherwise:
Current:
Adaptive:
Flat:
HNSW:
Note: Combined results shown for all dimensions (64, 384, 1024)
Both fixed and adaptive approaches maintained excellent recall (typically 100%) for datasets larger than 1,000 vectors
Even with our fixed lists, recall stayed strong as data grew.
Latency Performance: Room for Improvement
Here's where things got interesting:
Current:
Adaptive:
Flat:
HNSW:
Note: Results shown for dimension 64
- The fixed approach showed much more variable latency, especially for P95 measurements
- For large datasets, the adaptive approach delivered significant improvements:
- With 1M vectors (64 dimensions), P95 latencies were 125-161ms for adaptive vs 141-219ms for fixed
- With 500K vectors, median latencies were 60-65ms for adaptive vs 66-70ms for fixed
These might seem like small differences, but in production, they add up to a much better user experience.
The fixed approach created some major cluster imbalances:
- With 100K vectors: 1000 vectors/list in fixed vs 158 vectors/list in adaptive
- With 1M vectors: Up to 10,000 vectors in some clusters while others remained sparse
This uneven distribution explained the latency variability we were seeing.
After seeing all these results, we decided to make some improvements.
Better index management now
- Separate table creation from index building
- User-Controlled Rebuilding: Index can be reconstructed whenever data changes occur
- Intelligent List Sizing: Lists are dynamically calculated based on your dataset size
import { PgVector } from "@mastra/pg";
const vector = new PgVector();
// Rebuild the index on index creation
await vector.createIndex({
indexName: "embeddings",
dimension: 1536,
metric: "cosine",
indexConfig: {},
buildIndex: true,
});
// Rebuild via buildIndex
await vector.buildIndex({
indexName: "embeddings",
metric: "cosine",
indexConfig: {},
});
The nice thing about Mastra, is that you don't have to worry about most of this. We handle it for you.
When To Rebuild Your Index?
The one thing you still need to think about is: when should you rebuild your index?
Some thoughts:
- After inserting >20% new data
- When query performance degrades noticeably
- When recall rates drop (test with known queries)
- Start with sufficient data before creating the initial index
- Schedule rebuilds during low-traffic periods
- Consider rebuilding after significant changes in data distribution