- Tree-sitter Query 50.8%
- Python 49.2%
- Migrate from flat layout to src layout - Implement local grep_ast() using tree-sitter directly (grep_the_hash not in released grep-ast) - Add Tag.from_raw(), frozen dataclass, text/source properties - Add .repomap-cache to IGNORED_DIRS - Fix all 21 tests passing Conversation features: - get_repo_map(conversation=...) for auto-extracting mentioned files/identifiers - get_repo_map(loaded_context=...) for excluding already-loaded files - extract_mentions() standalone utility - Mentions dataclass for structured results - 15 conversation awareness tests |
||
|---|---|---|
| src/repomap | ||
| tests | ||
| .gitignore | ||
| LICENSE | ||
| pyproject.toml | ||
| README.md | ||
repomap
Standalone codebase mapping library for efficient LLM context.
Extracted from Aider-AI/aider as a general-purpose tool. Scans a codebase with tree-sitter, extracts symbols (definitions and references), builds a cross-file reference graph, ranks files via PageRank, and formats the output within a configurable token budget.
Conversation-aware: pass conversation history and the library automatically extracts mentioned symbols and filenames to rank the most relevant files for your current task.
Install
pip install repomap
Or from source:
pip install /path/to/repomap
Quick Start
from repomap import RepoMap
# Map a codebase
mapper = RepoMap(root="path/to/repo", max_map_tokens=1500)
result = mapper.get_repo_map()
print(result)
Smart API
The library is designed to integrate with LLM workflows. Pass conversation context to get smarter results.
With conversation history
from repomap import RepoMap
mapper = RepoMap(root="path/to/repo")
messages = [
{"role": "user", "content": "I need to fix the authentication in server.js"},
{"role": "assistant", "content": "Let me check the auth module in src/auth.py"},
{"role": "user", "content": "Also look at the DatabaseConnection class"},
]
result = mapper.get_repo_map(conversation=messages)
The library automatically extracts server.js, auth.py, and DatabaseConnection from the conversation and boosts files containing those symbols.
With already-loaded files
When you have files already in the LLM's context window, pass them so the library excludes them from output and uses their identifiers for smarter ranking:
from repomap import RepoMap
mapper = RepoMap(root="path/to/repo")
# Files already loaded in the LLM context
loaded = {
"src/server.js": "const app = express();\napp.use(auth.middleware)...",
"src/auth.py": "class AuthHandler:\n def verify(self, token):...",
}
result = mapper.get_repo_map(
loaded_context=loaded,
conversation="Fix the login flow in server.js",
)
Files in loaded_context are:
- Excluded from the output (no duplication)
- Scanned for identifiers to improve cross-file ranking
With raw conversation text
mapper = RepoMap(root="path/to/repo")
conv = """
User: How does the CacheManager handle eviction?
Assistant: Let me check the caching module. The LRU policy is implemented
in the CachePolicy class, which is referenced by CacheManager.
"""
result = mapper.get_repo_map(
conversation=conv,
max_tokens=1000,
)
Inspecting extracted mentions
from repomap import extract_mentions, Mentions
text = "The RepoMap class in repo.py uses PageRank from networkx"
mentions: Mentions = extract_mentions(text)
print(mentions.fnames) # {"repo.py"}
print(mentions.idents) # {"RepoMap", "PageRank", "networkx"}
With known files (reduces false positives)
from repomap import extract_mentions
repo_files = {"src/server.js", "src/auth.py", "README.md", "pyproject.toml"}
mentions = extract_mentions(
"Check server.js for the middleware function",
known_files=repo_files,
)
# Only matches filenames that actually exist in the repo
CLI
repomap /path/to/repo [--tokens 1500] [--exclude dir1 dir2 ...]
# Map the current directory
repomap .
# Map with custom token budget
repomap /path/to/repo --tokens 2000
# Exclude additional directories
repomap . --exclude dist test __pycache__
Supported Languages
The library supports 27+ languages via tree-sitter queries:
Python, JavaScript, TypeScript, Java, C, C++, C#, Go, Rust, Ruby, PHP, HTML, CSS, JSON, YAML, Markdown, Bash, Lua, Perl, R, Scala, Kotlin, Swift, Elixir, Erlang, Haskell, OCaml, Julia, Dart, Zig, Racket, Clojure, Common Lisp, Elisp, MATLAB, Fortran, Elm, Gleam, HCL, TOML
How It Works
- Scan: Walk the directory tree, filter ignored dirs/extensions
- Extract: Parse each file with tree-sitter to find definitions and references
- Graph: Build a directed graph of cross-file symbol references
- Rank: Run PageRank on the reference graph to identify important files
- Format: Select top files within the token budget, format as a readable tree
License
MIT