Library Documentation Generator

The Problem We Solve

AI coding assistants like Claude Code and Cursor are revolutionizing development, but they face a critical challenge: outdated library knowledge. When LLMs reference deprecated syntax or miss new APIs due to knowledge cutoffs, development slows down.

Current solutions like MCP Context7 or direct documentation fetching are inefficient and token-heavy. Production projects with version-locked dependencies need precise, version-specific documentation - not the latest docs from the web.

Our Solution: An intelligent documentation generator that creates concise, LLM-optimized API references directly from any repository. Generate comprehensive API documentation that can be committed to your repo, ensuring your AI tools always have accurate, version-specific references.

Generate exhaustive, LLM-optimized API documentation from any GitHub repository. This tool analyzes codebases and produces condensed context indexes containing hundreds or thousands of API methods - perfect for AI coding assistants that need complete, accurate references.

Key Features

Comprehensive API Extraction - Captures EVERY public method, not just summaries (300+ for lodash, 200+ for date-fns)
Multi-Phase Intelligence - Progressively extracts from docs → TypeScript definitions → source code
LLM-Optimized Output - Structured markdown designed for efficient token usage
Version-Specific Documentation - Generate docs for your exact dependency versions
Language Agnostic - Works best with TypeScript/JavaScript, supports Python, Go, and more
Automatic Chunking - Handles large codebases by intelligently chunking content
Built with Mastra AI Framework - Leverages AI agents for intelligent documentation analysis

Generated Documentation Format

Each generated document follows this structure:

 1## [Library Name] - Condensed Context Index
 2
 3### Overall Purpose
 4[2-3 sentence comprehensive description of the library]
 5
 6### Core Concepts & Capabilities  
 7[6-8 bullet points covering main features and concepts]
 8
 9### Key APIs / Components / Configuration / Patterns
10[EXHAUSTIVE list - 50, 100, 300+ entries depending on library size]
11* `methodName(params)` - Brief description
12* `anotherMethod(args)` - What it does
13* ... [continues for ALL public APIs]
14
15### Common Patterns & Best Practices / Pitfalls
16[4-6 bullet points of usage patterns and gotchas]

How It Works: Intelligent Multi-Phase Extraction

Our AI-powered system uses a progressive extraction strategy:

Phase 1: Documentation Mining

Fetches markdown documentation (README.md, API.md, docs/*.md)
Intelligently skips non-API files (CHANGELOG, CONTRIBUTING, etc.)
Extracts API signatures from code blocks and inline snippets
Parses API reference tables and method listings

Phase 2: TypeScript Definitions (if < 30 APIs found)

Fetches .d.ts files which contain complete type definitions
Extracts all exported functions, interfaces, and types
For TypeScript libraries, often captures 100% of the API surface

Phase 3: Source Code Analysis (fallback)

Analyzes main source files (index.js, main.ts)
Parses package.json for entry points
Extracts exported functions and classes directly

Intelligent Processing

Token Management: Automatically chunks large content (>50K tokens)
Retry Logic: Implements exponential backoff for API calls
Error Recovery: Continues processing even if individual chunks fail

Real-World Examples

lodash (300+ methods extracted)

 1### Key APIs / Components / Configuration / Patterns
 2* `_.chunk(array, size)` - Creates array of elements split into groups
 3* `_.compact(array)` - Creates array with falsy values removed  
 4* `_.concat(array, values)` - Creates new array concatenating values
 5* `_.debounce(func, wait, options)` - Creates debounced function
 6* `_.difference(array, values)` - Creates array excluding values
 7... [300+ more methods]

date-fns (200+ functions extracted)

 1### Key APIs / Components / Configuration / Patterns
 2* `format(date, formatString)` - Formats date according to string
 3* `addDays(date, amount)` - Adds specified number of days
 4* `differenceInDays(dateLeft, dateRight)` - Gets difference in days
 5* `parseISO(dateString)` - Parses ISO 8601 string to Date
 6... [200+ more functions]

express.js (50+ APIs extracted)

 1### Key APIs / Components / Configuration / Patterns
 2* `app.get(path, callback)` - Routes HTTP GET requests
 3* `app.post(path, callback)` - Routes HTTP POST requests
 4* `app.use(middleware)` - Mounts middleware function
 5* `req.params` - Route parameters object
 6* `res.json(body)` - Sends JSON response
 7... [50+ more APIs]

Quick Start

Prerequisites

Node.js 20.9.0 or higher
OpenAI API key

Installation & Setup

 1# Clone the repository
 2git clone https://github.com/yourusername/lib-docs-generator
 3cd lib-docs-generator
 4
 5# Install dependencies
 6npm install
 7
 8# Configure OpenAI API key
 9echo "OPENAI_API_KEY=your_api_key_here" > .env
10
11# Build the CLI tool
12npm run build-cli

Generate Documentation

 1# Basic usage
 2npm run cli https://github.com/owner/repository
 3
 4# Real examples with expected output
 5npm run cli https://github.com/lodash/lodash
 6# ✅ Generates 300+ utility methods documentation
 7
 8npm run cli https://github.com/date-fns/date-fns  
 9# ✅ Generates 200+ date manipulation functions
10
11npm run cli https://github.com/expressjs/express
12# ✅ Generates 50+ web framework APIs
13
14npm run cli https://github.com/axios/axios
15# ✅ Generates HTTP client methods and config options

Output

Console: Displays progress and final documentation
File: Saves to {repository-name}-context-index.md
Logs: Detailed execution logs in logs/workflow.log

Language Support

Excellent Support (80-100% API coverage)

TypeScript/JavaScript - Complete extraction from .d.ts files
Well-documented libraries - Any language with comprehensive markdown
Python - Good support with documented APIs
Go - Extracts from README and doc comments

Good Support (40-80% API coverage)

Java - Markdown docs (Javadoc parsing limited)
Ruby - Markdown extraction (RDoc not supported)
Rust - README docs (docs.rs not scraped)
C/C++ - Markdown only (Doxygen not parsed)

Limited Support

External documentation sites
Proprietary doc formats
Binary-only libraries

comprehensive-doc-generator - Intelligent agent that orchestrates the entire extraction process
- Uses GPT-4 for content analysis
- Implements adaptive extraction strategy
- Manages token limits and chunking

Core Tools

fetch-all-docs - Multi-mode file fetcher
- Supports docs/types/source search modes
- Intelligent file filtering (skips CHANGELOG, etc.)
- Handles large repositories efficiently
extract-all-apis - Universal API extractor
- Parses multiple languages and formats
- Extracts from markdown, TypeScript, JavaScript
- Identifies function signatures and descriptions
fetch-repo-content - GitHub content fetcher
- Direct file access via GitHub API
- Handles rate limiting gracefully

Workflow Pipeline

generate-context-index - Three-phase workflow
1. fetch-docs - Extracts from documentation
2. fetch-types - Augments with TypeScript definitions
3. generate-final-docs - Produces final markdown

Supporting Infrastructure

Logger - Comprehensive execution logging
Retry Logic - Exponential backoff for resilience
Token Management - Automatic content chunking for large repos

Performance & Limitations

Performance Characteristics

Processing Time: 30-120 seconds for most libraries
Token Usage: ~10K-50K tokens per generation
API Extraction Rate: 50-500+ APIs per minute
Content Handling: Automatic chunking for >50K token documents
Retry Logic: Exponential backoff prevents API failures

Token Limits & Chunking

Documents >200KB are automatically chunked
Each chunk processes ~50K tokens independently
Final output combines all extracted APIs
Large libraries may have truncated API lists

Known Limitations

Language-Specific

Java: Javadoc parsing not implemented
C/C++: Doxygen comments not parsed
Python: Docstring extraction limited
Ruby: RDoc/YARD not supported

General Constraints

External documentation sites not scraped
Files >1MB automatically skipped
GitHub API rate limit: 60 requests/hour (unauthenticated)
Comment-only documentation may be missed
Binary or compiled libraries not supported

Cause: Library uses external documentation
Solution: Check if docs are on a separate website
Alternative: Try running on a different version tag

Process Timeouts

Cause: Very large repository or slow API
Solution: Repository may be too large; try a specific subdirectory
Note: Processing can take 2-3 minutes for large libraries

OpenAI API Errors

 1# Verify API key is set
 2echo $OPENAI_API_KEY
 3
 4# Check key validity
 5curl https://api.openai.com/v1/models \
 6  -H "Authorization: Bearer $OPENAI_API_KEY"

Rate Limiting

GitHub: 60 requests/hour without authentication
OpenAI: Check your plan's rate limits
Solution: Add delays between runs or authenticate GitHub

Debug Mode

 1# Enable detailed logging
 2export DEBUG=true
 3npm run cli https://github.com/owner/repo
 4
 5# Check logs for details
 6tail -f logs/workflow.log

Future Enhancements

Planned Features

GitHub Authentication - Higher API rate limits
Version Tags - Generate docs for specific releases
Incremental Updates - Only regenerate changed sections
Multiple Output Formats - JSON, YAML, custom templates
Language-Specific Parsers - Javadoc, RDoc, Doxygen
External Doc Sites - Scrape docs.rs, pkg.go.dev, etc.
Caching Layer - Reuse processed documentation
Web UI - Browser-based generation interface

Contributing

We welcome contributions! Key areas:

Language Support - Add parsers for new languages
Documentation Formats - Support more doc standards
Performance - Optimize token usage and processing
Testing - Add test coverage for various libraries

See CONTRIBUTING.md for guidelines.

License

ISC License - See LICENSE for details

🙏 Acknowledgments

Mastra - AI workflow orchestration framework
OpenAI GPT-4 - Intelligent content analysis
GitHub API - Repository content access
Open Source Community - Inspiration from hundreds of well-documented libraries

Support & Feedback

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: support@example.com

Made with ❤️ for developers who value great documentation