Security Ownership Mapping
Analyze git repositories to build a security ownership topology (people-to-file), compute bus factor and sensitive-code ownership, and export CSV/JSON for graph databases and visualization.
Content
Overview
Build a bipartite graph of people and files from git history, then compute ownership risk and export graph artifacts for Neo4j/Gephi. Also build a file co-change graph (Jaccard similarity on shared commits) to cluster files by how they move together while ignoring large, noisy commits.
Requirements
- -Python 3
- -
networkx(required; community detection is enabled by default)
Install with:
Workflow
1. Scope the repo and time window (optional --since/--until).
2. Decide sensitivity rules (use defaults or provide a CSV config).
3. Build the ownership map with scripts/run_ownership_map.py (co-change graph is on by default; use --cochange-max-files to ignore supernode commits).
4. Communities are computed by default; graphml output is optional (--graphml).
5. Query the outputs with scripts/query_ownership.py for bounded JSON slices.
6. Persist and visualize (see references/neo4j-import.md).
By default, the co-change graph ignores common “glue” files (lockfiles, .github/*, editor config) so clusters reflect actual code movement instead of shared infra edits. Override with --cochange-exclude or --no-default-cochange-excludes. Dependabot commits are excluded by default; override with --no-default-author-excludes or add patterns via --author-exclude-regex.
If you want to exclude Linux build glue like Kbuild from co-change clustering, pass:
Quick start
Run from the repo root:
Defaults: author identity, author date, and merge commits excluded. Use --identity committer, --date-field committer, or --include-merges if needed.
Example (override co-change excludes):
Communities are computed by default. To disable:
Sensitivity rules
By default, the script flags common auth/crypto/secret paths. Override by providing a CSV file:
Use it with --sensitive-config path/to/sensitive.csv.
Output artifacts
ownership-map-out/ contains:
- -
people.csv(nodes: people) - -
files.csv(nodes: files) - -
edges.csv(edges: touches) - -
cochange_edges.csv(file-to-file co-change edges with Jaccard weight; omitted with--no-cochange) - -
summary.json(security ownership findings) - -
commits.jsonl(optional, if--emit-commits) - -
communities.json(computed by default from co-change edges when available; includesmaintainersper community; disable with--no-communities) - -
cochange.graph.json(NetworkX node-link JSON withcommunity_id+community_maintainers; falls back toownership.graph.jsonif no co-change edges) - -
ownership.graphml/cochange.graphml(optional, if--graphml)
people.csv includes timezone detection based on author commit offsets: primary_tz_offset, primary_tz_minutes, and timezone_offsets.
LLM query helper
Use scripts/query_ownership.py to return small, JSON-bounded slices without loading the full graph into context.
Examples:
Use --community-top-owners 5 (default) to control how many maintainers are stored per community.
Basic security queries
Run these to answer common security ownership questions with bounded output:
Notes:
- -Touches default to one authored commit (not per-file). Use
--touch-mode fileto count per-file touches. - -Use
--window-days 90or--weight recency --half-life-days 180to smooth churn. - -Filter bots with
--ignore-author-regex '(bot|dependabot)'. - -Use
--min-share 0.1to show stable maintainers only. - -Use
--bucket quarterfor calendar quarter groupings. - -Use
--identity committeror--date-field committerto switch from author attribution. - -Use
--include-mergesto include merge commits (excluded by default).
Summary format (default)
Use this structure, add fields if needed:
Graph persistence
Use references/neo4j-import.md when you need to load the CSVs into Neo4j. It includes constraints, import Cypher, and visualization tips.
Notes
- -
bus_factor_hotspotsinsummary.jsonlists sensitive files with low bus factor;orphaned_sensitive_codeis the stale subset. - -If
git logis too large, narrow with--sinceor--until. - -Compare
summary.jsonagainst CODEOWNERS to highlight ownership drift.
FAQ
Discussion
Loading comments...