# Metadata Filters Mastra provides a unified metadata filtering syntax across all vector stores, based on MongoDB/Sift query syntax. Each vector store translates these filters into their native format. ## Basic Example ```typescript import { PgVector } from "@mastra/pg"; const store = new PgVector({ id: 'pg-vector', connectionString }); const results = await store.query({ indexName: "my_index", queryVector: queryVector, topK: 10, filter: { category: "electronics", // Simple equality price: { $gt: 100 }, // Numeric comparison tags: { $in: ["sale", "new"] }, // Array membership }, }); ``` ## Supported Operators ### Basic Comparison `$eq`Matches values equal to specified value{ age: { $eq: 25 } }Supported by: All except Couchbase`$ne`Matches values not equal{ status: { $ne: 'inactive' } }Supported by: All except Couchbase`$gt`Greater than{ price: { $gt: 100 } }Supported by: All except Couchbase`$gte`Greater than or equal{ rating: { $gte: 4.5 } }Supported by: All except Couchbase`$lt`Less than{ stock: { $lt: 20 } }Supported by: All except Couchbase`$lte`Less than or equal{ priority: { $lte: 3 } }Supported by: All except Couchbase ### Array Operators `$in`Matches any value in array{ category: { $in: \["A", "B"] } }Supported by: All except Couchbase`$nin`Matches none of the values{ status: { $nin: \["deleted", "archived"] } }Supported by: All except Couchbase`$all`Matches arrays containing all elements{ tags: { $all: \["urgent", "high"] } }Supported by: Astra, Pinecone, Upstash, MongoDB`$elemMatch`Matches array elements meeting criteria{ scores: { $elemMatch: { $gt: 80 } } }Supported by: libSQL, PgVector, MongoDB ### Logical Operators `$and`Logical AND{ $and: \[{ price: { $gt: 100 } }, { stock: { $gt: 0 } }] }Supported by: All except Vectorize, Couchbase`$or`Logical OR{ $or: \[{ status: "active" }, { priority: "high" }] }Supported by: All except Vectorize, Couchbase`$not`Logical NOT{ price: { $not: { $lt: 100 } } }Supported by: Astra, Qdrant, Upstash, PgVector, libSQL, MongoDB`$nor`Logical NOR{ $nor: \[{ status: "deleted" }, { archived: true }] }Supported by: Qdrant, Upstash, PgVector, libSQL, MongoDB ### Element Operators `$exists`Matches documents with field{ rating: { $exists: true } }Supported by: All except Vectorize, Chroma, Couchbase ### Custom Operators `$contains`Text contains substring{ description: { $contains: "sale" } }Supported by: Upstash, libSQL, PgVector`$regex`Regular expression match{ name: { $regex: "^test" } }Supported by: Qdrant, PgVector, Upstash, MongoDB`$size`Array length check{ tags: { $size: { $gt: 2 } } }Supported by: Astra, libSQL, PgVector, MongoDB`$geo`Geospatial query{ location: { $geo: { type: "radius", ... } } }Supported by: Qdrant`$datetime`Datetime range query{ created: { $datetime: { range: { gt: "2024-01-01" } } } }Supported by: Qdrant`$hasId`Vector ID existence check{ $hasId: \["id1", "id2"] }Supported by: Qdrant`$hasVector`Vector existence check{ $hasVector: true }Supported by: Qdrant ## Common Rules and Restrictions 1. Field names cannot: - Contain dots (.) unless referring to nested fields - Start with $ or contain null characters - Be empty strings 2. Values must be: - Valid JSON types (string, number, boolean, object, array) - Not undefined - Properly typed for the operator (e.g., numbers for numeric comparisons) 3. Logical operators: - Must contain valid conditions - Cannot be empty - Must be properly nested - Can only be used at top level or nested within other logical operators - Cannot be used at field level or nested inside a field - Cannot be used inside an operator - Valid: `{ "$and": [{ "field": { "$gt": 100 } }] }` - Valid: `{ "$or": [{ "$and": [{ "field": { "$gt": 100 } }] }] }` - Invalid: `{ "field": { "$and": [{ "$gt": 100 }] } }` - Invalid: `{ "field": { "$gt": { "$and": [{...}] } } }` 4. $not operator: - Must be an object - Cannot be empty - Can be used at field level or top level - Valid: `{ "$not": { "field": "value" } }` - Valid: `{ "field": { "$not": { "$eq": "value" } } }` 5. Operator nesting: - Logical operators must contain field conditions, not direct operators - Valid: `{ "$and": [{ "field": { "$gt": 100 } }] }` - Invalid: `{ "$and": [{ "$gt": 100 }] }` ## Store-Specific Notes ### Astra - Nested field queries are supported using dot notation - Array fields must be explicitly defined as arrays in the metadata - Metadata values are case-sensitive ### ChromaDB - Where filters only return results where the filtered field exists in metadata - Empty metadata fields are not included in filter results - Metadata fields must be present for negative matches (e.g., $ne won't match documents missing the field) ### Cloudflare Vectorize - Requires explicit metadata indexing before filtering can be used - Use `createMetadataIndex()` to index fields you want to filter on - Up to 10 metadata indexes per Vectorize index - String values are indexed up to first 64 bytes (truncated on UTF-8 boundaries) - Number values use float64 precision - Filter JSON must be under 2048 bytes - Field names cannot contain dots (.) or start with $ - Field names limited to 512 characters - Vectors must be re-upserted after creating new metadata indexes to be included in filtered results - Range queries may have reduced accuracy with very large datasets (\~10M+ vectors) ### libSQL - Supports nested object queries with dot notation - Array fields are validated to ensure they contain valid JSON arrays - Numeric comparisons maintain proper type handling - Empty arrays in conditions are handled gracefully - Metadata is stored in a JSONB column for efficient querying ### PgVector - Full support for PostgreSQL's native JSON querying capabilities - Efficient handling of array operations using native array functions - Proper type handling for numbers, strings, and booleans - Nested field queries use PostgreSQL's JSON path syntax internally - Metadata is stored in a JSONB column for efficient indexing ### Pinecone - Metadata field names are limited to 512 characters - Numeric values must be within the range of ±1e38 - Arrays in metadata are limited to 64KB total size - Nested objects are flattened with dot notation - Metadata updates replace the entire metadata object ### Qdrant - Supports advanced filtering with nested conditions - Payload (metadata) fields must be explicitly indexed for filtering - Use `createPayloadIndex()` to index fields you want to filter on: ```typescript // Index a field before filtering on it await store.createPayloadIndex({ indexName: "my_index", fieldName: "source", fieldSchema: "keyword", // 'keyword' | 'integer' | 'float' | 'geo' | 'text' | 'bool' | 'datetime' | 'uuid' }); // Now filtering works const results = await store.query({ indexName: "my_index", queryVector: queryVector, filter: { source: "document-a" }, }); ``` - Efficient handling of geo-spatial queries - Special handling for null and empty values - Vector-specific filtering capabilities - Datetime values must be in RFC 3339 format ### Upstash - 512-character limit for metadata field keys - Query size is limited (avoid large IN clauses) - No support for null/undefined values in filters - Translates to SQL-like syntax internally - Case-sensitive string comparisons - Metadata updates are atomic ### MongoDB - Full support for MongoDB/Sift query syntax for metadata filters - Supports all standard comparison, array, logical, and element operators - Supports nested fields and arrays in metadata - Filtering can be applied to both `metadata` and the original document content using the `filter` and `documentFilter` options, respectively - `filter` applies to the metadata object; `documentFilter` applies to the original document fields - No artificial limits on filter size or complexity (subject to MongoDB query limits) - Indexing metadata fields is recommended for optimal performance ### Couchbase - Currently does not have support for metadata filters. Filtering must be done client-side after retrieving results or by using the Couchbase SDK's Search capabilities directly for more complex queries. ### Amazon S3 Vectors - Equality values must be primitives (string/number/boolean). `null`/`undefined`, arrays, objects, and Date are not allowed for equality. Range operators accept numbers or Date (Dates are normalized to epoch ms). - `$in`/`$nin` require **non-empty arrays of primitives**; Date elements are allowed and normalized to epoch ms. **Array equality** is not supported. - Implicit AND is canonicalized (`{a:1,b:2}` → `{$and:[{a:1},{b:2}]}`). Logical operators must contain field conditions, use non-empty arrays, and appear only at the root or within other logical operators (not inside field values). - Keys listed in `nonFilterableMetadataKeys` at index creation are stored but not filterable; this setting is immutable. - $exists requires a boolean value. - undefined/null/empty filters are treated as no filter. - Each metadata key name limited to 63 characters. - Total metadata per vector: Up to 40 KB (filterable + non-filterable) - Total metadata keys per vector: Up to 10 - Filterable metadata per vector: Up to 2 KB - Non-filterable metadata keys per vector index: Up to 10 ## Related - [Astra](https://mastra.ai/reference/vectors/astra) - [Chroma](https://mastra.ai/reference/vectors/chroma) - [Cloudflare Vectorize](https://mastra.ai/reference/vectors/vectorize) - [libSQL](https://mastra.ai/reference/vectors/libsql) - [MongoDB](https://mastra.ai/reference/vectors/mongodb) - [PgStore](https://mastra.ai/reference/vectors/pg) - [Pinecone](https://mastra.ai/reference/vectors/pinecone) - [Qdrant](https://mastra.ai/reference/vectors/qdrant) - [Upstash](https://mastra.ai/reference/vectors/upstash) - [Amazon S3 Vectors](https://mastra.ai/reference/vectors/s3vectors)