Skip to content

MRJR0101/CodeGraphX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeGraphX

CI Python 3.10+ License: MIT Version

CodeGraphX is a local-first code intelligence CLI that scans repositories, parses source files into structured facts, emits deterministic graph events, and supports impact analysis, snapshots, diffs, and optional Neo4j loading.

Why CodeGraphX

  • Deterministic output: unchanged inputs produce stable JSONL artifacts and hashes.
  • Incremental execution: parsing, extraction, and graph loading reuse cached state.
  • Database optional: scan, parse, extract, snapshots, and delta work without Neo4j.
  • Inspectable pipeline: every stage writes artifacts you can diff, audit, and replay.
  • CI covered: the repo is tested on Ubuntu and Windows across Python 3.11 to 3.13.

Installation

git clone https://github.com/MRJR0101/CodeGraphX.git
cd CodeGraphX

# pip
python -m venv .venv
.venv\Scripts\activate        # Windows PowerShell
# source .venv/bin/activate   # Linux/macOS
pip install -e ".[dev]"

# or uv
uv sync --all-groups

Verify the CLI:

python -m codegraphx --version
python -m codegraphx --help

python cli/main.py is still supported as a legacy compatibility shim, but the canonical entrypoints are python -m codegraphx and codegraphx.

Quick Start

  1. Create a local projects file from the example:
cp config/projects.example.yaml config/projects.yaml
  1. Edit config/projects.yaml to point at one or more repositories.

  2. Run the pipeline:

codegraphx scan
codegraphx parse
codegraphx extract
  1. Inspect the first few emitted events:
Get-Content data/events.jsonl -TotalCount 5   # Windows PowerShell
# head -n 5 data/events.jsonl                 # POSIX shells

Optional Neo4j Setup

Copy the example env file and fill in your credentials:

cp .env.example .env

Option A -- Docker (recommended for local use):

# Start a Neo4j 5 container using credentials from .env
.\start-neo4j.ps1

The script creates a container named neo4j-cgx, waits until bolt is ready, and prints the connection details. Run it any time you need Neo4j. To stop:

docker stop neo4j-cgx

Option B -- existing Neo4j instance:

Point NEO4J_URI, NEO4J_USER, and NEO4J_PASSWORD in your .env at the running instance.

Then validate and load:

codegraphx doctor
codegraphx load
codegraphx query "MATCH (f:Function) RETURN f.name LIMIT 10"

After load, the search command uses a fast SQLite FTS index:

codegraphx search parse --index functions

CodeGraphX loads .env automatically when referenced by config/default.yaml.

Pipeline

Source repo(s)
    |
    v
[scan]    -> scan.jsonl            File inventory by project/root/path
    |
    v
[parse]   -> ast.jsonl             Parsed functions/imports/calls
             parse.cache.json
             parse.meta.json
    |
    v
[extract] -> events.jsonl          Project/File/Function/Symbol/Module events
             extract.cache.json
             extract.meta.json
    |
    v
[load]    -> Neo4j                 Incremental merge and stale-record cleanup
             load.state.json
             load.meta.json
    |
    v
[search/query/impact/analyze/delta/snapshots]

Graph Model

Node Key Properties
Project name
File uid, project, path, rel_path, language, line_count
Function uid, name, line, project, file_uid, signature_hash
Symbol uid, name
Module uid, name
Edge Meaning
CONTAINS Project -> File
DEFINES File -> Function
CALLS Function -> Symbol or File -> Symbol
IMPORTS File -> Module
CALLS_FUNCTION Function -> Function

Commands

Core Pipeline

Command Purpose
scan Discover project files from config/projects.yaml
parse Parse supported files into AST-like records
extract Convert parsed records into graph events
load Incrementally apply events to Neo4j

Analysis and Diffs

Command Purpose
analyze metrics Function fan-in/fan-out style metrics
analyze hotspots High-line hotspots from loaded graph data
analyze security Name-based security pattern queries
analyze debt Aggregate debt-style summaries
analyze refactor Name-filtered refactor candidates
analyze duplicates Signature-hash duplicate detection
analyze patterns Pattern-oriented function search
analyze full Multi-section summary report
snapshots create/list/diff/report Snapshot lifecycle commands
delta Detailed snapshot delta reporting

Search and Query

Command Purpose
query Run Cypher against Neo4j
search Search extracted events by name/path
ask Run template-based NL-style queries
compare Compare two projects
impact Trace direct and transitive callers

Diagnostics and Automation

Command Purpose
doctor Validate config, imports, and optional Neo4j connectivity
completions Print shell-completion guidance
enrich backlog Rank candidate repos from a SQLite catalog
enrich chunk-scan Run chunked scans against a target root
enrich campaign Plan or execute ranked enrichment campaigns
enrich index-audit Audit recommended SQLite indexes
enrich collectors Compute collector-style project signals
enrich intelligence Compute similarity and intelligence signals

Configuration

Project roots are configured in a local config/projects.yaml file:

projects:
  - name: DemoPython
    root: C:/path/to/python_project
    exclude:
      - .venv
      - __pycache__
      - dist
      - build

Runtime behavior comes from config/default.yaml:

run:
  out_dir: data
  max_files: 0
  include_ext: [".py", ".js", ".ts"]

neo4j:
  uri: ${NEO4J_URI:-bolt://localhost:7687}
  user: ${NEO4J_USER:-neo4j}
  password: ${NEO4J_PASSWORD:-}
  database: ${NEO4J_DATABASE:-neo4j}

Environment variable expansion uses ${VAR:-default} syntax.

Validation

python -m pytest -q
python -m codegraphx --help
powershell -ExecutionPolicy Bypass -File .\scripts\smoke_no_db.ps1 -ReportPath smoke_no_db_report.json

For the full local gate, install uv and run:

powershell -ExecutionPolicy Bypass -File .\scripts\release_check.ps1

Security Notes

  • User-supplied Cypher parameters are passed separately from query text.
  • query --safe adds a lexical guard for ad hoc query execution.
  • Path handling in the pipeline avoids traversing outside configured roots.
  • Credentials should live in .env or environment variables, not committed YAML.

Documentation

Document Purpose
docs/README.md Docs entrypoint and navigation
docs/commands.md Command examples and reference
docs/design.md Architecture and stage behavior
docs/schema.md Graph entities and identities
docs/queries.md Query examples and patterns
docs/security-architecture.md Threat model and safeguards
docs/roadmap.md Planned work
CONTRIBUTING.md Contributor workflow
VERIFY.md Current validation checklist
CHANGELOG.md Release history

Compatibility

Component Supported
Python 3.10+
CI matrix 3.11, 3.12, 3.13
OS Windows and Linux
Neo4j 5.x

License

MIT

About

Deterministic code intelligence pipeline. Scans repos with tree-sitter, builds knowledge graphs, runs impact analysis and snapshot diffs. 14 CLI commands.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors