# Search and Indexing Search lets agents find relevant content in indexed workspace files. When an agent needs to answer a question or find information, it can search the indexed content instead of reading every file. ## How it works Workspace search has two phases: indexing and querying. ### Indexing Content must be indexed before it can be searched. When you index a document: 1. The content is tokenized (split into searchable terms) 2. For BM25: term frequencies and document statistics are computed 3. For vector: the content is embedded using your embedder function and stored in the vector store Each indexed document has: - **id** - A unique identifier (typically the file path) - **content** - The text content - **metadata** - Optional key-value data stored with the document ### Querying When you search: 1. The query is processed using the same tokenization/embedding as indexing 2. Documents are scored based on relevance to the query 3. Results are ranked by score and returned with the matching content Workspaces support three search modes: BM25 keyword search, vector semantic search, and hybrid search that combines both. ## BM25 keyword search BM25 scores documents based on term frequency and document length. It works well for exact matches and specific terminology. ```typescript import { Workspace, LocalFilesystem } from '@mastra/core/workspace'; const workspace = new Workspace({ filesystem: new LocalFilesystem({ basePath: './workspace' }), bm25: true, }); ``` For custom BM25 parameters: ```typescript const workspace = new Workspace({ filesystem: new LocalFilesystem({ basePath: './workspace' }), bm25: { k1: 1.5, // Term frequency saturation (default: 1.5) b: 0.75, // Document length normalization (default: 0.75) }, }); ``` ## Vector search Vector search uses embeddings to find semantically similar content. It requires a vector store and embedder function. ```typescript import { Workspace, LocalFilesystem } from '@mastra/core/workspace'; import { PineconeVector } from '@mastra/pinecone'; import { embed } from 'ai'; import { openai } from '@ai-sdk/openai'; const workspace = new Workspace({ filesystem: new LocalFilesystem({ basePath: './workspace' }), vectorStore: new PineconeVector({ apiKey: process.env.PINECONE_API_KEY, index: 'workspace-index', }), embedder: async (text: string) => { const { embedding } = await embed({ model: openai.embedding('text-embedding-3-small'), value: text, }); return embedding; }, }); ``` ## Hybrid search Configure both BM25 and vector search to enable hybrid mode, which combines keyword matching with semantic understanding. ```typescript const workspace = new Workspace({ filesystem: new LocalFilesystem({ basePath: './workspace' }), bm25: true, vectorStore: pineconeVector, embedder: embedderFn, }); ``` ## Indexing content ### Manual indexing Use `workspace.index()` to add content to the search index programmatically: ```typescript // Basic indexing - path becomes the document ID await workspace.index('/docs/guide.md', 'Content of the guide...'); // Index with metadata for filtering or context await workspace.index('/docs/api.md', apiDocContent, { metadata: { category: 'api', version: '2.0', }, }); ``` Manual indexing is useful when: - You're indexing content that doesn't come from files (e.g., database records, API responses) - You want to pre-process or chunk content before indexing - You need to add custom metadata to documents ### Auto-indexing Configure `autoIndexPaths` to automatically index files when the workspace initializes. Each path specifies a directory to index recursively. ```typescript const workspace = new Workspace({ filesystem: new LocalFilesystem({ basePath: './workspace' }), bm25: true, autoIndexPaths: ['/docs', '/support/faq'], }); await workspace.init(); // Indexes all files in /docs and /support/faq ``` When `init()` is called, all files in the specified directories are read and indexed for search. The file path becomes the document ID. Paths must be directories, not glob patterns. Use `/docs` to index all files in the docs directory recursively. Glob patterns like `**/*.md` are not supported. ## Searching Use `workspace.search()` to find relevant content: ```typescript const results = await workspace.search('password reset'); // Results are ranked by relevance for (const result of results) { console.log(`${result.id}: ${result.score}`); console.log(result.content); } ``` ### Search options ```typescript const results = await workspace.search('authentication flow', { topK: 10, // Maximum results (default: 5) mode: 'hybrid', // 'bm25' | 'vector' | 'hybrid' minScore: 0.5, // Minimum score threshold (0-1) vectorWeight: 0.5, // Weight for vector scores in hybrid mode (0-1) }); ``` | Option | Description | | -------------- | ------------------------------------------------------------------------------------------------------------- | | `topK` | Maximum number of results to return. Default: 5 | | `mode` | Search mode: `'bm25'`, `'vector'`, or `'hybrid'`. Defaults to the best available mode based on configuration. | | `minScore` | Filter out results below this score threshold (0-1). | | `vectorWeight` | In hybrid mode, how much to weight vector scores vs BM25. 0 = all BM25, 1 = all vector, 0.5 = equal. | ### Search results Each result contains: ```typescript interface SearchResult { id: string; // Document ID (typically file path) content: string; // The matching content score: number; // Relevance score (0-1) lineRange?: { // Lines where the match was found start: number; end: number; }; metadata?: Record; // Metadata stored with the document scoreDetails?: { // Score breakdown (hybrid mode only) vector?: number; bm25?: number; }; } ``` **Understanding scores:** - Scores range from 0 to 1, where 1 is a perfect match - BM25 scores are normalized based on the best match in the result set - Vector scores represent cosine similarity between query and document embeddings - In hybrid mode, scores are combined using the `vectorWeight` parameter ### When to use each mode | Mode | Best for | Example queries | | -------- | ------------------------------------ | ------------------------------------------------------------------------ | | `bm25` | Exact terms, technical queries, code | "useState hook", "404 error", "config.yaml" | | `vector` | Conceptual queries, natural language | "how to handle user authentication", "best practices for error handling" | | `hybrid` | General search, unknown query types | Most agent use cases | ## Agent tools When you configure search on a workspace, agents receive tools for searching and indexing content. See [Workspace Class Reference](https://mastra.ai/reference/workspace/workspace-class/llms.txt) for details. ## Related - [Workspace Overview](https://mastra.ai/docs/workspace/overview/llms.txt) - [RAG Overview](https://mastra.ai/docs/rag/overview/llms.txt) - [Workspace Class Reference](https://mastra.ai/reference/workspace/workspace-class/llms.txt)