Library Documentation Generator
The Problem We Solve
AI coding assistants like Claude Code and Cursor are revolutionizing development, but they face a critical challenge: outdated library knowledge. When LLMs reference deprecated syntax or miss new APIs due to knowledge cutoffs, development slows down.
Current solutions like MCP Context7 or direct documentation fetching are inefficient and token-heavy. Production projects with version-locked dependencies need precise, version-specific documentation - not the latest docs from the web.
Our Solution: An intelligent documentation generator that creates concise, LLM-optimized API references directly from any repository. Generate comprehensive API documentation that can be committed to your repo, ensuring your AI tools always have accurate, version-specific references.
Generate exhaustive, LLM-optimized API documentation from any GitHub repository. This tool analyzes codebases and produces condensed context indexes containing hundreds or thousands of API methods - perfect for AI coding assistants that need complete, accurate references.
Key Features
- Comprehensive API Extraction - Captures EVERY public method, not just summaries (300+ for lodash, 200+ for date-fns)
- Multi-Phase Intelligence - Progressively extracts from docs → TypeScript definitions → source code
- LLM-Optimized Output - Structured markdown designed for efficient token usage
- Version-Specific Documentation - Generate docs for your exact dependency versions
- Language Agnostic - Works best with TypeScript/JavaScript, supports Python, Go, and more
- Automatic Chunking - Handles large codebases by intelligently chunking content
- Built with Mastra AI Framework - Leverages AI agents for intelligent documentation analysis
Generated Documentation Format
Each generated document follows this structure:
1## [Library Name] - Condensed Context Index
2
3### Overall Purpose
4[2-3 sentence comprehensive description of the library]
5
6### Core Concepts & Capabilities
7[6-8 bullet points covering main features and concepts]
8
9### Key APIs / Components / Configuration / Patterns
10[EXHAUSTIVE list - 50, 100, 300+ entries depending on library size]
11* `methodName(params)` - Brief description
12* `anotherMethod(args)` - What it does
13* ... [continues for ALL public APIs]
14
15### Common Patterns & Best Practices / Pitfalls
16[4-6 bullet points of usage patterns and gotchas]
How It Works: Intelligent Multi-Phase Extraction
Our AI-powered system uses a progressive extraction strategy:
Phase 1: Documentation Mining
- Fetches markdown documentation (README.md, API.md, docs/*.md)
- Intelligently skips non-API files (CHANGELOG, CONTRIBUTING, etc.)
- Extracts API signatures from code blocks and inline snippets
- Parses API reference tables and method listings
Phase 2: TypeScript Definitions (if < 30 APIs found)
- Fetches .d.ts files which contain complete type definitions
- Extracts all exported functions, interfaces, and types
- For TypeScript libraries, often captures 100% of the API surface
Phase 3: Source Code Analysis (fallback)
- Analyzes main source files (index.js, main.ts)
- Parses package.json for entry points
- Extracts exported functions and classes directly
Intelligent Processing
- Token Management: Automatically chunks large content (>50K tokens)
- Retry Logic: Implements exponential backoff for API calls
- Error Recovery: Continues processing even if individual chunks fail
Real-World Examples
lodash (300+ methods extracted)
1### Key APIs / Components / Configuration / Patterns
2* `_.chunk(array, size)` - Creates array of elements split into groups
3* `_.compact(array)` - Creates array with falsy values removed
4* `_.concat(array, values)` - Creates new array concatenating values
5* `_.debounce(func, wait, options)` - Creates debounced function
6* `_.difference(array, values)` - Creates array excluding values
7... [300+ more methods]
date-fns (200+ functions extracted)
1### Key APIs / Components / Configuration / Patterns
2* `format(date, formatString)` - Formats date according to string
3* `addDays(date, amount)` - Adds specified number of days
4* `differenceInDays(dateLeft, dateRight)` - Gets difference in days
5* `parseISO(dateString)` - Parses ISO 8601 string to Date
6... [200+ more functions]
express.js (50+ APIs extracted)
1### Key APIs / Components / Configuration / Patterns
2* `app.get(path, callback)` - Routes HTTP GET requests
3* `app.post(path, callback)` - Routes HTTP POST requests
4* `app.use(middleware)` - Mounts middleware function
5* `req.params` - Route parameters object
6* `res.json(body)` - Sends JSON response
7... [50+ more APIs]
Quick Start
Prerequisites
- Node.js 20.9.0 or higher
- OpenAI API key
Installation & Setup
1# Clone the repository
2git clone https://github.com/yourusername/lib-docs-generator
3cd lib-docs-generator
4
5# Install dependencies
6npm install
7
8# Configure OpenAI API key
9echo "OPENAI_API_KEY=your_api_key_here" > .env
10
11# Build the CLI tool
12npm run build-cli
Generate Documentation
1# Basic usage
2npm run cli https://github.com/owner/repository
3
4# Real examples with expected output
5npm run cli https://github.com/lodash/lodash
6# ✅ Generates 300+ utility methods documentation
7
8npm run cli https://github.com/date-fns/date-fns
9# ✅ Generates 200+ date manipulation functions
10
11npm run cli https://github.com/expressjs/express
12# ✅ Generates 50+ web framework APIs
13
14npm run cli https://github.com/axios/axios
15# ✅ Generates HTTP client methods and config options
Output
- Console: Displays progress and final documentation
- File: Saves to
{repository-name}-context-index.md
- Logs: Detailed execution logs in
logs/workflow.log
Language Support
Excellent Support (80-100% API coverage)
- TypeScript/JavaScript - Complete extraction from .d.ts files
- Well-documented libraries - Any language with comprehensive markdown
- Python - Good support with documented APIs
- Go - Extracts from README and doc comments
Good Support (40-80% API coverage)
- Java - Markdown docs (Javadoc parsing limited)
- Ruby - Markdown extraction (RDoc not supported)
- Rust - README docs (docs.rs not scraped)
- C/C++ - Markdown only (Doxygen not parsed)
Limited Support
- External documentation sites
- Proprietary doc formats
- Binary-only libraries
Technical Architecture
Built with Mastra AI Framework
AI Agent
comprehensive-doc-generator
- Intelligent agent that orchestrates the entire extraction process- Uses GPT-4 for content analysis
- Implements adaptive extraction strategy
- Manages token limits and chunking
Core Tools
-
fetch-all-docs
- Multi-mode file fetcher- Supports docs/types/source search modes
- Intelligent file filtering (skips CHANGELOG, etc.)
- Handles large repositories efficiently
-
extract-all-apis
- Universal API extractor- Parses multiple languages and formats
- Extracts from markdown, TypeScript, JavaScript
- Identifies function signatures and descriptions
-
fetch-repo-content
- GitHub content fetcher- Direct file access via GitHub API
- Handles rate limiting gracefully
Workflow Pipeline
generate-context-index
- Three-phase workflow- fetch-docs - Extracts from documentation
- fetch-types - Augments with TypeScript definitions
- generate-final-docs - Produces final markdown
Supporting Infrastructure
- Logger - Comprehensive execution logging
- Retry Logic - Exponential backoff for resilience
- Token Management - Automatic content chunking for large repos
Performance & Limitations
Performance Characteristics
- Processing Time: 30-120 seconds for most libraries
- Token Usage: ~10K-50K tokens per generation
- API Extraction Rate: 50-500+ APIs per minute
- Content Handling: Automatic chunking for >50K token documents
- Retry Logic: Exponential backoff prevents API failures
Token Limits & Chunking
- Documents >200KB are automatically chunked
- Each chunk processes ~50K tokens independently
- Final output combines all extracted APIs
- Large libraries may have truncated API lists
Known Limitations
Language-Specific
- Java: Javadoc parsing not implemented
- C/C++: Doxygen comments not parsed
- Python: Docstring extraction limited
- Ruby: RDoc/YARD not supported
General Constraints
- External documentation sites not scraped
- Files >1MB automatically skipped
- GitHub API rate limit: 60 requests/hour (unauthenticated)
- Comment-only documentation may be missed
- Binary or compiled libraries not supported
Troubleshooting
Common Issues & Solutions
Few or No APIs Found
- Cause: Library uses external documentation
- Solution: Check if docs are on a separate website
- Alternative: Try running on a different version tag
Process Timeouts
- Cause: Very large repository or slow API
- Solution: Repository may be too large; try a specific subdirectory
- Note: Processing can take 2-3 minutes for large libraries
OpenAI API Errors
1# Verify API key is set
2echo $OPENAI_API_KEY
3
4# Check key validity
5curl https://api.openai.com/v1/models \
6 -H "Authorization: Bearer $OPENAI_API_KEY"
Rate Limiting
- GitHub: 60 requests/hour without authentication
- OpenAI: Check your plan's rate limits
- Solution: Add delays between runs or authenticate GitHub
Debug Mode
1# Enable detailed logging
2export DEBUG=true
3npm run cli https://github.com/owner/repo
4
5# Check logs for details
6tail -f logs/workflow.log
Future Enhancements
Planned Features
- GitHub Authentication - Higher API rate limits
- Version Tags - Generate docs for specific releases
- Incremental Updates - Only regenerate changed sections
- Multiple Output Formats - JSON, YAML, custom templates
- Language-Specific Parsers - Javadoc, RDoc, Doxygen
- External Doc Sites - Scrape docs.rs, pkg.go.dev, etc.
- Caching Layer - Reuse processed documentation
- Web UI - Browser-based generation interface
Contributing
We welcome contributions! Key areas:
- Language Support - Add parsers for new languages
- Documentation Formats - Support more doc standards
- Performance - Optimize token usage and processing
- Testing - Add test coverage for various libraries
See CONTRIBUTING.md for guidelines.
License
ISC License - See LICENSE for details
🙏 Acknowledgments
- Mastra - AI workflow orchestration framework
- OpenAI GPT-4 - Intelligent content analysis
- GitHub API - Repository content access
- Open Source Community - Inspiration from hundreds of well-documented libraries
Support & Feedback
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: support@example.com
Made with ❤️ for developers who value great documentation