CodeGraphX is a local-first code intelligence CLI that scans repositories, parses source files into structured facts, emits deterministic graph events, and supports impact analysis, snapshots, diffs, and optional Neo4j loading.
- Deterministic output: unchanged inputs produce stable JSONL artifacts and hashes.
- Incremental execution: parsing, extraction, and graph loading reuse cached state.
- Database optional:
scan,parse,extract,snapshots, anddeltawork without Neo4j. - Inspectable pipeline: every stage writes artifacts you can diff, audit, and replay.
- CI covered: the repo is tested on Ubuntu and Windows across Python 3.11 to 3.13.
git clone https://github.com/MRJR0101/CodeGraphX.git
cd CodeGraphX
# pip
python -m venv .venv
.venv\Scripts\activate # Windows PowerShell
# source .venv/bin/activate # Linux/macOS
pip install -e ".[dev]"
# or uv
uv sync --all-groupsVerify the CLI:
python -m codegraphx --version
python -m codegraphx --helppython cli/main.py is still supported as a legacy compatibility shim, but the
canonical entrypoints are python -m codegraphx and codegraphx.
- Create a local projects file from the example:
cp config/projects.example.yaml config/projects.yaml-
Edit
config/projects.yamlto point at one or more repositories. -
Run the pipeline:
codegraphx scan
codegraphx parse
codegraphx extract- Inspect the first few emitted events:
Get-Content data/events.jsonl -TotalCount 5 # Windows PowerShell
# head -n 5 data/events.jsonl # POSIX shellsCopy the example env file and fill in your credentials:
cp .env.example .envOption A -- Docker (recommended for local use):
# Start a Neo4j 5 container using credentials from .env
.\start-neo4j.ps1The script creates a container named neo4j-cgx, waits until bolt is ready,
and prints the connection details. Run it any time you need Neo4j. To stop:
docker stop neo4j-cgxOption B -- existing Neo4j instance:
Point NEO4J_URI, NEO4J_USER, and NEO4J_PASSWORD in your .env at the
running instance.
Then validate and load:
codegraphx doctor
codegraphx load
codegraphx query "MATCH (f:Function) RETURN f.name LIMIT 10"After load, the search command uses a fast SQLite FTS index:
codegraphx search parse --index functionsCodeGraphX loads .env automatically when referenced by config/default.yaml.
Source repo(s)
|
v
[scan] -> scan.jsonl File inventory by project/root/path
|
v
[parse] -> ast.jsonl Parsed functions/imports/calls
parse.cache.json
parse.meta.json
|
v
[extract] -> events.jsonl Project/File/Function/Symbol/Module events
extract.cache.json
extract.meta.json
|
v
[load] -> Neo4j Incremental merge and stale-record cleanup
load.state.json
load.meta.json
|
v
[search/query/impact/analyze/delta/snapshots]
| Node | Key Properties |
|---|---|
Project |
name |
File |
uid, project, path, rel_path, language, line_count |
Function |
uid, name, line, project, file_uid, signature_hash |
Symbol |
uid, name |
Module |
uid, name |
| Edge | Meaning |
|---|---|
CONTAINS |
Project -> File |
DEFINES |
File -> Function |
CALLS |
Function -> Symbol or File -> Symbol |
IMPORTS |
File -> Module |
CALLS_FUNCTION |
Function -> Function |
| Command | Purpose |
|---|---|
scan |
Discover project files from config/projects.yaml |
parse |
Parse supported files into AST-like records |
extract |
Convert parsed records into graph events |
load |
Incrementally apply events to Neo4j |
| Command | Purpose |
|---|---|
analyze metrics |
Function fan-in/fan-out style metrics |
analyze hotspots |
High-line hotspots from loaded graph data |
analyze security |
Name-based security pattern queries |
analyze debt |
Aggregate debt-style summaries |
analyze refactor |
Name-filtered refactor candidates |
analyze duplicates |
Signature-hash duplicate detection |
analyze patterns |
Pattern-oriented function search |
analyze full |
Multi-section summary report |
snapshots create/list/diff/report |
Snapshot lifecycle commands |
delta |
Detailed snapshot delta reporting |
| Command | Purpose |
|---|---|
query |
Run Cypher against Neo4j |
search |
Search extracted events by name/path |
ask |
Run template-based NL-style queries |
compare |
Compare two projects |
impact |
Trace direct and transitive callers |
| Command | Purpose |
|---|---|
doctor |
Validate config, imports, and optional Neo4j connectivity |
completions |
Print shell-completion guidance |
enrich backlog |
Rank candidate repos from a SQLite catalog |
enrich chunk-scan |
Run chunked scans against a target root |
enrich campaign |
Plan or execute ranked enrichment campaigns |
enrich index-audit |
Audit recommended SQLite indexes |
enrich collectors |
Compute collector-style project signals |
enrich intelligence |
Compute similarity and intelligence signals |
Project roots are configured in a local config/projects.yaml file:
projects:
- name: DemoPython
root: C:/path/to/python_project
exclude:
- .venv
- __pycache__
- dist
- buildRuntime behavior comes from config/default.yaml:
run:
out_dir: data
max_files: 0
include_ext: [".py", ".js", ".ts"]
neo4j:
uri: ${NEO4J_URI:-bolt://localhost:7687}
user: ${NEO4J_USER:-neo4j}
password: ${NEO4J_PASSWORD:-}
database: ${NEO4J_DATABASE:-neo4j}Environment variable expansion uses ${VAR:-default} syntax.
python -m pytest -q
python -m codegraphx --help
powershell -ExecutionPolicy Bypass -File .\scripts\smoke_no_db.ps1 -ReportPath smoke_no_db_report.jsonFor the full local gate, install uv and run:
powershell -ExecutionPolicy Bypass -File .\scripts\release_check.ps1- User-supplied Cypher parameters are passed separately from query text.
query --safeadds a lexical guard for ad hoc query execution.- Path handling in the pipeline avoids traversing outside configured roots.
- Credentials should live in
.envor environment variables, not committed YAML.
| Document | Purpose |
|---|---|
| docs/README.md | Docs entrypoint and navigation |
| docs/commands.md | Command examples and reference |
| docs/design.md | Architecture and stage behavior |
| docs/schema.md | Graph entities and identities |
| docs/queries.md | Query examples and patterns |
| docs/security-architecture.md | Threat model and safeguards |
| docs/roadmap.md | Planned work |
| CONTRIBUTING.md | Contributor workflow |
| VERIFY.md | Current validation checklist |
| CHANGELOG.md | Release history |
| Component | Supported |
|---|---|
| Python | 3.10+ |
| CI matrix | 3.11, 3.12, 3.13 |
| OS | Windows and Linux |
| Neo4j | 5.x |