Search and Indexing
Search lets agents find relevant content in indexed workspace files. When an agent needs to answer a question or find information, it can search the indexed content instead of reading every file.
How it worksDirect link to How it works
Workspace search has two phases: indexing and querying.
IndexingDirect link to Indexing
Content must be indexed before it can be searched. When you index a document:
- The content is tokenized (split into searchable terms)
- For BM25: term frequencies and document statistics are computed
- For vector: the content is embedded using your embedder function and stored in the vector store
Each indexed document has:
- id - A unique identifier (typically the file path)
- content - The text content
- metadata - Optional key-value data stored with the document
QueryingDirect link to Querying
When you search:
- The query is processed using the same tokenization/embedding as indexing
- Documents are scored based on relevance to the query
- Results are ranked by score and returned with the matching content
Workspaces support three search modes: BM25 keyword search, vector semantic search, and hybrid search that combines both.
BM25 keyword searchDirect link to BM25 keyword search
BM25 scores documents based on term frequency and document length. It works well for exact matches and specific terminology.
import { Workspace, LocalFilesystem } from '@mastra/core/workspace';
const workspace = new Workspace({
filesystem: new LocalFilesystem({ basePath: './workspace' }),
bm25: true,
});
For custom BM25 parameters:
const workspace = new Workspace({
filesystem: new LocalFilesystem({ basePath: './workspace' }),
bm25: {
k1: 1.5, // Term frequency saturation (default: 1.5)
b: 0.75, // Document length normalization (default: 0.75)
},
});
Vector searchDirect link to Vector search
Vector search uses embeddings to find semantically similar content. It requires a vector store and embedder function.
import { Workspace, LocalFilesystem } from '@mastra/core/workspace';
import { PineconeVector } from '@mastra/pinecone';
import { embed } from 'ai';
import { openai } from '@ai-sdk/openai';
const workspace = new Workspace({
filesystem: new LocalFilesystem({ basePath: './workspace' }),
vectorStore: new PineconeVector({
apiKey: process.env.PINECONE_API_KEY,
index: 'workspace-index',
}),
embedder: async (text: string) => {
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-small'),
value: text,
});
return embedding;
},
});
Hybrid searchDirect link to Hybrid search
Configure both BM25 and vector search to enable hybrid mode, which combines keyword matching with semantic understanding.
const workspace = new Workspace({
filesystem: new LocalFilesystem({ basePath: './workspace' }),
bm25: true,
vectorStore: pineconeVector,
embedder: embedderFn,
});
Indexing contentDirect link to Indexing content
Manual indexingDirect link to Manual indexing
Use workspace.index() to add content to the search index programmatically:
// Basic indexing - path becomes the document ID
await workspace.index('/docs/guide.md', 'Content of the guide...');
// Index with metadata for filtering or context
await workspace.index('/docs/api.md', apiDocContent, {
metadata: {
category: 'api',
version: '2.0',
},
});
Manual indexing is useful when:
- You're indexing content that doesn't come from files (e.g., database records, API responses)
- You want to pre-process or chunk content before indexing
- You need to add custom metadata to documents
Auto-indexingDirect link to Auto-indexing
Configure autoIndexPaths to automatically index files when the workspace initializes. Each path specifies a directory to index recursively.
const workspace = new Workspace({
filesystem: new LocalFilesystem({ basePath: './workspace' }),
bm25: true,
autoIndexPaths: ['/docs', '/support/faq'],
});
await workspace.init(); // Indexes all files in /docs and /support/faq
When init() is called, all files in the specified directories are read and indexed for search. The file path becomes the document ID.
Paths must be directories, not glob patterns. Use /docs to index all files in the docs directory recursively. Glob patterns like **/*.md are not supported.
SearchingDirect link to Searching
Use workspace.search() to find relevant content:
const results = await workspace.search('password reset');
// Results are ranked by relevance
for (const result of results) {
console.log(`${result.id}: ${result.score}`);
console.log(result.content);
}
Search optionsDirect link to Search options
const results = await workspace.search('authentication flow', {
topK: 10, // Maximum results (default: 5)
mode: 'hybrid', // 'bm25' | 'vector' | 'hybrid'
minScore: 0.5, // Minimum score threshold (0-1)
vectorWeight: 0.5, // Weight for vector scores in hybrid mode (0-1)
});
| Option | Description |
|---|---|
topK | Maximum number of results to return. Default: 5 |
mode | Search mode: 'bm25', 'vector', or 'hybrid'. Defaults to the best available mode based on configuration. |
minScore | Filter out results below this score threshold (0-1). |
vectorWeight | In hybrid mode, how much to weight vector scores vs BM25. 0 = all BM25, 1 = all vector, 0.5 = equal. |
Search resultsDirect link to Search results
Each result contains:
interface SearchResult {
id: string; // Document ID (typically file path)
content: string; // The matching content
score: number; // Relevance score (0-1)
lineRange?: { // Lines where the match was found
start: number;
end: number;
};
metadata?: Record<string, unknown>; // Metadata stored with the document
scoreDetails?: { // Score breakdown (hybrid mode only)
vector?: number;
bm25?: number;
};
}
Understanding scores:
- Scores range from 0 to 1, where 1 is a perfect match
- BM25 scores are normalized based on the best match in the result set
- Vector scores represent cosine similarity between query and document embeddings
- In hybrid mode, scores are combined using the
vectorWeightparameter
When to use each modeDirect link to When to use each mode
| Mode | Best for | Example queries |
|---|---|---|
bm25 | Exact terms, technical queries, code | "useState hook", "404 error", "config.yaml" |
vector | Conceptual queries, natural language | "how to handle user authentication", "best practices for error handling" |
hybrid | General search, unknown query types | Most agent use cases |
Agent toolsDirect link to Agent tools
When you configure search on a workspace, agents receive tools for searching and indexing content. See Workspace Class Reference for details.