No description
  • Tree-sitter Query 50.8%
  • Python 49.2%
Find a file
trainraider bf80ad0234 Phase 7: Conversation awareness + tree-sitter refactor
- Migrate from flat layout to src layout
- Implement local grep_ast() using tree-sitter directly (grep_the_hash not in released grep-ast)
- Add Tag.from_raw(), frozen dataclass, text/source properties
- Add .repomap-cache to IGNORED_DIRS
- Fix all 21 tests passing

Conversation features:
- get_repo_map(conversation=...) for auto-extracting mentioned files/identifiers
- get_repo_map(loaded_context=...) for excluding already-loaded files
- extract_mentions() standalone utility
- Mentions dataclass for structured results
- 15 conversation awareness tests
2026-05-16 19:07:18 -04:00
src/repomap Phase 7: Conversation awareness + tree-sitter refactor 2026-05-16 19:07:18 -04:00
tests Phase 7: Conversation awareness + tree-sitter refactor 2026-05-16 19:07:18 -04:00
.gitignore feat: initial repomap library 2026-05-15 23:40:53 -04:00
LICENSE feat: initial repomap library 2026-05-15 23:40:53 -04:00
pyproject.toml Phase 7: Conversation awareness + tree-sitter refactor 2026-05-16 19:07:18 -04:00
README.md Phase 7: Conversation awareness + tree-sitter refactor 2026-05-16 19:07:18 -04:00

repomap

Standalone codebase mapping library for efficient LLM context.

Extracted from Aider-AI/aider as a general-purpose tool. Scans a codebase with tree-sitter, extracts symbols (definitions and references), builds a cross-file reference graph, ranks files via PageRank, and formats the output within a configurable token budget.

Conversation-aware: pass conversation history and the library automatically extracts mentioned symbols and filenames to rank the most relevant files for your current task.

Install

pip install repomap

Or from source:

pip install /path/to/repomap

Quick Start

from repomap import RepoMap

# Map a codebase
mapper = RepoMap(root="path/to/repo", max_map_tokens=1500)
result = mapper.get_repo_map()
print(result)

Smart API

The library is designed to integrate with LLM workflows. Pass conversation context to get smarter results.

With conversation history

from repomap import RepoMap

mapper = RepoMap(root="path/to/repo")

messages = [
    {"role": "user", "content": "I need to fix the authentication in server.js"},
    {"role": "assistant", "content": "Let me check the auth module in src/auth.py"},
    {"role": "user", "content": "Also look at the DatabaseConnection class"},
]

result = mapper.get_repo_map(conversation=messages)

The library automatically extracts server.js, auth.py, and DatabaseConnection from the conversation and boosts files containing those symbols.

With already-loaded files

When you have files already in the LLM's context window, pass them so the library excludes them from output and uses their identifiers for smarter ranking:

from repomap import RepoMap

mapper = RepoMap(root="path/to/repo")

# Files already loaded in the LLM context
loaded = {
    "src/server.js": "const app = express();\napp.use(auth.middleware)...",
    "src/auth.py": "class AuthHandler:\n    def verify(self, token):...",
}

result = mapper.get_repo_map(
    loaded_context=loaded,
    conversation="Fix the login flow in server.js",
)

Files in loaded_context are:

  • Excluded from the output (no duplication)
  • Scanned for identifiers to improve cross-file ranking

With raw conversation text

mapper = RepoMap(root="path/to/repo")

conv = """
User: How does the CacheManager handle eviction?
Assistant: Let me check the caching module. The LRU policy is implemented
in the CachePolicy class, which is referenced by CacheManager.
"""

result = mapper.get_repo_map(
    conversation=conv,
    max_tokens=1000,
)

Inspecting extracted mentions

from repomap import extract_mentions, Mentions

text = "The RepoMap class in repo.py uses PageRank from networkx"
mentions: Mentions = extract_mentions(text)

print(mentions.fnames)   # {"repo.py"}
print(mentions.idents)   # {"RepoMap", "PageRank", "networkx"}

With known files (reduces false positives)

from repomap import extract_mentions

repo_files = {"src/server.js", "src/auth.py", "README.md", "pyproject.toml"}
mentions = extract_mentions(
    "Check server.js for the middleware function",
    known_files=repo_files,
)
# Only matches filenames that actually exist in the repo

CLI

repomap /path/to/repo [--tokens 1500] [--exclude dir1 dir2 ...]
# Map the current directory
repomap .

# Map with custom token budget
repomap /path/to/repo --tokens 2000

# Exclude additional directories
repomap . --exclude dist test __pycache__

Supported Languages

The library supports 27+ languages via tree-sitter queries:

Python, JavaScript, TypeScript, Java, C, C++, C#, Go, Rust, Ruby, PHP, HTML, CSS, JSON, YAML, Markdown, Bash, Lua, Perl, R, Scala, Kotlin, Swift, Elixir, Erlang, Haskell, OCaml, Julia, Dart, Zig, Racket, Clojure, Common Lisp, Elisp, MATLAB, Fortran, Elm, Gleam, HCL, TOML

How It Works

  1. Scan: Walk the directory tree, filter ignored dirs/extensions
  2. Extract: Parse each file with tree-sitter to find definitions and references
  3. Graph: Build a directed graph of cross-file symbol references
  4. Rank: Run PageRank on the reference graph to identify important files
  5. Format: Select top files within the token budget, format as a readable tree

License

MIT