Building a Code Assistant with Large Context Windows

The limits of traditional RAG

We recently built a demo using Mastra called Repo Base, a natural language interface for understanding code repositories.

When building it, we initially took the standard RAG approach - chunking code files, generating embeddings, and storing vectors. This quickly revealed several technical limitations:

Processing large codebases consumed significant compute resources
Chunking destroyed important semantic relationships between code sections
Vector storage and retrieval added unnecessary architectural complexity
Traditional RAG approaches weren't optimized for code's hierarchical structure

Leveraging Large Context Windows

The breakthrough came with Gemini Flash 2.0's expanded context window. Instead of complex chunking and embedding pipelines, we could feed entire files directly to the model. Here's how we set it up:

import { google } from "@ai-sdk/google";
import { Agent } from "@mastra/core/agent";

import { memory } from "../memory";
import { instructions } from "./instructions";
import { getFilePaths } from "../tools/getFilePaths";
import { getFileContent } from "../tools/getFileContent";
import { getRepositoryIssues } from "../tools/getRepositoryIssues";
import { getRepositoryCommits } from "../tools/getRepositoryCommits";
import { getRepositoryPullRequests } from "../tools/getRepositoryPullRequests";

export const agent = new Agent({
  name: "agent",
  instructions,
  memory,
  model: google("gemini-2.0-flash-001"),
  tools: {
    getFilePaths,
    getFileContent,
    getRepositoryIssues,
    getRepositoryCommits,
    getRepositoryPullRequests,
  },
});

Simplifying with Mastra's Memory System

Rather than building custom database schemas for message history and context management, we leveraged Mastra's memory system:

import { Memory } from "@mastra/memory";
import { PostgresStore } from "@mastra/pg";

export const memory = new Memory({
  storage: new PostgresStore({ connectionString: process.env.DB_URL! }),
  options: { lastMessages: 10 },
});

This handled:

Conversation persistence
Context window management
Semantic search across previous interactions
File tree relationship tracking

Out-the-box UI

Rather than building a custom chat interface from scratch, we leveraged assistant-ui which provides pre-built components for chat interactions and thread management. This saved significant development time compared to implementing our own UI components for message history, typing indicators, and thread state management.

Implementation Details

The system works by:

Maintaining the complete repository structure in memory
Using file-tree context to understand import/export relationships
Leveraging large context windows to analyze entire files
Applying semantic search for finding relevant code examples

Key Technical Learnings

Context > Chunking: For code understanding, preserving complete file context often beats sophisticated chunking strategies.
Memory Management: Mastra's built-in memory system eliminated the need for custom persistence layers while providing better semantic search capabilities.
Model Selection: Large context window models (like Gemini Flash) significantly simplified the architecture by reducing the need for complex RAG pipelines.
File Tree Context: Maintaining repository structure awareness was crucial for accurate code understanding.

The end result is a more maintainable system that better preserves the semantic relationships in code. You can find the code for Repo Base here.

We're excited to see how others use these patterns to build similar tools with Mastra.