Search and Indexing

Added in: @mastra/core@1.1.0

Search lets agents find relevant content in indexed workspace files. When an agent needs to answer a question or find information, it can search the indexed content instead of reading every file.

How it works
Direct link to How it works

Workspace search has two phases: indexing and querying.

Indexing
Direct link to Indexing

Content must be indexed before it can be searched. When you index a document:

The content is tokenized (split into searchable terms)
For BM25: term frequencies and document statistics are computed
For vector: the content is embedded using your embedder function and stored in the vector store

Each indexed document has:

id - A unique identifier (typically the file path)
content - The text content
metadata - Optional key-value data stored with the document

Querying
Direct link to Querying

When you search:

The query is processed using the same tokenization/embedding as indexing
Documents are scored based on relevance to the query
Results are ranked by score and returned with the matching content

Workspaces support three search modes: BM25 keyword search, vector semantic search, and hybrid search that combines both.

BM25 keyword search
Direct link to BM25 keyword search

BM25 scores documents based on term frequency and document length. It works well for exact matches and specific terminology.

src/mastra/workspaces.ts
import { Workspace, LocalFilesystem } from '@mastra/core/workspace'

const workspace = new Workspace({
  filesystem: new LocalFilesystem({ basePath: './workspace' }),
  bm25: true,
})

For custom BM25 parameters (k1 is term frequency saturation, b is document length normalization):

src/mastra/workspaces.ts
const workspace = new Workspace({
  filesystem: new LocalFilesystem({ basePath: './workspace' }),
  bm25: {
    k1: 1.5,
    b: 0.75,
  },
})

Vector search
Direct link to Vector search

Vector search uses embeddings to find semantically similar content. It requires a vector store and embedder function.

src/mastra/workspaces.ts
import { Workspace, LocalFilesystem } from '@mastra/core/workspace'
import { PineconeVector } from '@mastra/pinecone'
import { embed } from 'ai'
import { openai } from '@ai-sdk/openai'

const workspace = new Workspace({
  filesystem: new LocalFilesystem({ basePath: './workspace' }),
  vectorStore: new PineconeVector({
    apiKey: process.env.PINECONE_API_KEY,
    index: 'workspace-index',
  }),
  embedder: async (text: string) => {
    const { embedding } = await embed({
      model: openai.embedding('text-embedding-3-small'),
      value: text,
    })
    return embedding
  },
})

Hybrid search
Direct link to Hybrid search

Configure both BM25 and vector search to enable hybrid mode, which combines keyword matching with semantic understanding.

src/mastra/workspaces.ts
const workspace = new Workspace({
  filesystem: new LocalFilesystem({ basePath: './workspace' }),
  bm25: true,
  vectorStore: pineconeVector,
  embedder: embedderFn,
})

Custom index name
Direct link to Custom index name

By default, the search index name is derived from the workspace ID. To set a custom name, use searchIndexName:

const workspace = new Workspace({
  filesystem: new LocalFilesystem({ basePath: './workspace' }),
  bm25: true,
  searchIndexName: 'my_workspace_vectors',
})

The index name must be a valid SQL identifier: start with a letter or underscore, contain only letters, numbers, or underscores, and be at most 63 characters long.

Indexing content
Direct link to Indexing content

Manual indexing
Direct link to Manual indexing

Use workspace.index() to add content to the search index programmatically. The file paths become document IDs. You can also pass metadata for each document.

// Basic indexing
await workspace.index('/docs/guide.md', 'Content of the guide...')

// Index with metadata for filtering or context
await workspace.index('/docs/api.md', apiDocContent, {
  metadata: {
    category: 'api',
    version: '2.0',
  },
})

Manual indexing is useful when:

You're indexing content that doesn't come from files (e.g., database records, API responses)
You want to pre-process or chunk content before indexing
You need to add custom metadata to documents

Auto-indexing
Direct link to Auto-indexing

Configure autoIndexPaths to automatically index files when the workspace initializes. Each entry can be a directory path (indexed recursively) or a glob pattern for selective indexing.

const workspace = new Workspace({
  filesystem: new LocalFilesystem({ basePath: './workspace' }),
  bm25: true,
  autoIndexPaths: ['/docs', '/support/faq'],
})

await workspace.init()

When init() is called, all matching files are read and indexed for search. The file path becomes the document ID.

Glob patterns let you index specific file types:

const workspace = new Workspace({
  filesystem: new LocalFilesystem({ basePath: './workspace' }),
  bm25: true,
  autoIndexPaths: ['/docs/**/*.md', '/support/**/*.txt'],
})

Searching
Direct link to Searching

Use workspace.search() to find relevant content. Results are ranked by relevance score.

const results = await workspace.search('password reset')

for (const result of results) {
  console.log(`${result.id}: ${result.score}`)
  console.log(result.content)
}

Search options
Direct link to Search options

You can customize the search behavior with options:

const results = await workspace.search('authentication flow', {
  topK: 10,
  mode: 'hybrid',
  minScore: 0.5,
  vectorWeight: 0.5,
})

Option	Description
`topK`	Maximum number of results to return. Default: 5
`mode`	Search mode: `'bm25'`, `'vector'`, or `'hybrid'`. Defaults to the best available mode based on configuration.
`minScore`	Filter out results below this score threshold (0-1).
`vectorWeight`	In hybrid mode, how much to weight vector scores vs BM25. 0 = all BM25, 1 = all vector, 0.5 = equal.

Search results
Direct link to Search results

Each result contains:

interface SearchResult {
  id: string // Document ID (typically file path)
  content: string // The matching content
  score: number // Relevance score (0-1)
  lineRange?: {
    // Lines where the match was found
    start: number
    end: number
  }
  metadata?: Record<string, unknown> // Metadata stored with the document
  scoreDetails?: {
    // Score breakdown (hybrid mode only)
    vector?: number
    bm25?: number
  }
}

Understanding scores:

Scores range from 0 to 1, where 1 is a perfect match
BM25 scores are normalized based on the best match in the result set
Vector scores represent cosine similarity between query and document embeddings
In hybrid mode, scores are combined using the vectorWeight parameter

When to use each mode
Direct link to When to use each mode

Mode	Best for	Example queries
`bm25`	Exact terms, technical queries, code	"useState hook", "404 error", "config.yaml"
`vector`	Conceptual queries, natural language	"how to handle user authentication", "best practices for error handling"
`hybrid`	General search, unknown query types	Most agent use cases

Agent tools
Direct link to Agent tools

When you configure search on a workspace, agents receive tools for searching and indexing content. See workspace class reference for details.

How it worksDirect link to How it works

IndexingDirect link to Indexing

QueryingDirect link to Querying

BM25 keyword searchDirect link to BM25 keyword search

Vector searchDirect link to Vector search

Hybrid searchDirect link to Hybrid search

Custom index nameDirect link to Custom index name

Indexing contentDirect link to Indexing content

Manual indexingDirect link to Manual indexing

Auto-indexingDirect link to Auto-indexing

SearchingDirect link to Searching

Search optionsDirect link to Search options

Search resultsDirect link to Search results

When to use each modeDirect link to When to use each mode

Agent toolsDirect link to Agent tools

RelatedDirect link to Related