# DuckDBVector Store

The DuckDB storage implementation provides an embedded high-performance vector search solution using [DuckDB](https://duckdb.org/), an in-process analytical database. It uses the VSS extension for vector similarity search with HNSW indexing, offering a lightweight and efficient vector database that requires no external server.

It's part of the `@mastra/duckdb` package and offers efficient vector similarity search with metadata filtering.

## Installation

**npm**:

```bash
npm install @mastra/duckdb@latest
```

**pnpm**:

```bash
pnpm add @mastra/duckdb@latest
```

**Yarn**:

```bash
yarn add @mastra/duckdb@latest
```

**Bun**:

```bash
bun add @mastra/duckdb@latest
```

## Usage

```typescript
import { DuckDBVector } from "@mastra/duckdb";

// Create a new vector store instance
const store = new DuckDBVector({
  id: "duckdb-vector",
  path: ":memory:", // or './vectors.duckdb' for file persistence
});

// Create an index
await store.createIndex({
  indexName: "myCollection",
  dimension: 1536,
  metric: "cosine",
});

// Add vectors with metadata
const vectors = [[0.1, 0.2, ...], [0.3, 0.4, ...]];
const metadata = [
  { text: "first document", category: "A" },
  { text: "second document", category: "B" },
];
await store.upsert({
  indexName: "myCollection",
  vectors,
  metadata,
});

// Query similar vectors
const queryVector = [0.1, 0.2, ...];
const results = await store.query({
  indexName: "myCollection",
  queryVector,
  topK: 10,
  filter: { category: "A" },
});

// Clean up
await store.close();
```

## Constructor Options

**id:** (`string`): Unique identifier for the vector store instance

**path?:** (`string`): Database file path. Use ':memory:' for in-memory database, or a file path like './vectors.duckdb' for persistence. (Default: `':memory:'`)

**dimensions?:** (`number`): Default dimension for vector embeddings (Default: `1536`)

**metric?:** (`'cosine' | 'euclidean' | 'dotproduct'`): Default distance metric for similarity search (Default: `cosine`)

## Methods

### createIndex()

Creates a new vector collection with optional HNSW index for fast approximate nearest neighbor search.

**indexName:** (`string`): Name of the index to create

**dimension:** (`number`): Vector dimension size (must match your embedding model)

**metric?:** (`'cosine' | 'euclidean' | 'dotproduct'`): Distance metric for similarity search (Default: `cosine`)

### upsert()

Adds or updates vectors and their metadata in the index.

**indexName:** (`string`): Name of the index to insert into

**vectors:** (`number[][]`): Array of embedding vectors

**metadata?:** (`Record<string, any>[]`): Metadata for each vector

**ids?:** (`string[]`): Optional vector IDs (auto-generated UUIDs if not provided)

### query()

Searches for similar vectors with optional metadata filtering.

**indexName:** (`string`): Name of the index to search in

**queryVector:** (`number[]`): Query vector to find similar vectors for

**topK?:** (`number`): Number of results to return (Default: `10`)

**filter?:** (`Filter`): Metadata filters using MongoDB-like query syntax

**includeVector?:** (`boolean`): Whether to include vector data in results (Default: `false`)

### describeIndex()

Gets information about an index.

**indexName:** (`string`): Name of the index to describe

Returns:

```typescript
interface IndexStats {
  dimension: number;
  count: number;
  metric: "cosine" | "euclidean" | "dotproduct";
}
```

### deleteIndex()

Deletes an index and all its data.

**indexName:** (`string`): Name of the index to delete

### listIndexes()

Lists all vector indexes in the database.

Returns: `Promise<string[]>`

### updateVector()

Update a single vector by ID or by metadata filter. Either `id` or `filter` must be provided, but not both.

**indexName:** (`string`): Name of the index containing the vector

**id?:** (`string`): ID of the vector entry to update (mutually exclusive with filter)

**filter?:** (`Record<string, any>`): Metadata filter to identify vector(s) to update (mutually exclusive with id)

**update:** (`object`): Update data containing vector and/or metadata

**update.vector?:** (`number[]`): New vector data to update

**update.metadata?:** (`Record<string, any>`): New metadata to update

### deleteVector()

Deletes a specific vector entry from an index by its ID.

**indexName:** (`string`): Name of the index containing the vector

**id:** (`string`): ID of the vector entry to delete

### deleteVectors()

Delete multiple vectors by IDs or by metadata filter. Either `ids` or `filter` must be provided, but not both.

**indexName:** (`string`): Name of the index containing the vectors to delete

**ids?:** (`string[]`): Array of vector IDs to delete (mutually exclusive with filter)

**filter?:** (`Record<string, any>`): Metadata filter to identify vectors to delete (mutually exclusive with ids)

### close()

Closes the database connection and releases resources.

```typescript
await store.close();
```

## Response Types

Query results are returned in this format:

```typescript
interface QueryResult {
  id: string;
  score: number;
  metadata: Record<string, any>;
  vector?: number[]; // Only included if includeVector is true
}
```

## Filter Operators

DuckDB vector store supports MongoDB-like filter operators:

| Category   | Operators                                  |
| ---------- | ------------------------------------------ |
| Comparison | `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte` |
| Logical    | `$and`, `$or`, `$not`, `$nor`              |
| Array      | `$in`, `$nin`                              |
| Element    | `$exists`                                  |
| Text       | `$contains`                                |

### Filter Examples

```typescript
// Allegato operators
const results = await store.query({
  indexName: "docs",
  queryVector: [...],
  filter: {
    $and: [
      { category: "electronics" },
      { price: { $gte: 100, $lte: 500 } },
    ],
  },
});

// Nested field access
const results = await store.query({
  indexName: "docs",
  queryVector: [...],
  filter: { "user.profile.tier": "premium" },
});
```

## Distance Metrics

| Metric       | Description       | Score Interpretation   | Best For                            |
| ------------ | ----------------- | ---------------------- | ----------------------------------- |
| `cosine`     | Cosine similarity | 0-1 (1 = most similar) | Text embeddings, normalized vectors |
| `euclidean`  | L2 distance       | 0-∞ (0 = most similar) | Image embeddings, spatial data      |
| `dotproduct` | Inner product     | Higher = more similar  | When vector magnitude matters       |

## Error Handling

The store throws specific errors for different failure cases:

```typescript
try {
  await store.query({
    indexName: "my-collection",
    queryVector: queryVector,
  });
} catch (error) {
  if (error.message.includes("not found")) {
    console.error("The specified index does not exist");
  } else if (error.message.includes("Invalid identifier")) {
    console.error("Index name contains invalid characters");
  } else {
    console.error("Vector store error:", error.message);
  }
}
```

Common error cases include:

- Invalid index name format
- Index/table not found
- Dimension mismatch between query vector and index
- Empty filter or ids array in delete/update operations
- Mutual exclusivity violations (providing both `id` and `filter`)

## Use Cases

### Embedded Semantic Search

Build offline-capable AI applications with semantic search that runs entirely in-process:

```typescript
const store = new DuckDBVector({
  id: "offline-search",
  path: "./search.duckdb",
});
```

### Local RAG Pipelines

Process sensitive documents locally without sending data to cloud vector databases:

```typescript
const store = new DuckDBVector({
  id: "private-rag",
  path: "./confidential.duckdb",
  dimensions: 1536,
});
```

### Development and Testing

Rapidly prototype vector search features with zero infrastructure:

```typescript
const store = new DuckDBVector({
  id: "dev-store",
  path: ":memory:", // Fast in-memory for tests
});
```

## Related

- [Metadata Filters](https://mastra.ai/reference/rag/metadata-filters)
- [DuckDB Documentation](https://duckdb.org/docs/)