security-ownership-map

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Security Ownership Map

安全所有权图谱

Overview

概述

Build a bipartite graph of people and files from git history, then compute ownership risk and export graph artifacts for Neo4j/Gephi. Also build a file co-change graph (Jaccard similarity on shared commits) to cluster files by how they move together while ignoring large, noisy commits.
基于Git历史构建人员与文件的二分图,随后计算所有权风险并导出图谱工件用于Neo4j/Gephi。同时构建文件共变更图谱(基于共享提交的Jaccard相似度),将文件按关联变更情况聚类,同时忽略大型、噪声较多的提交。

Requirements

环境要求

  • Python 3
  • networkx
    (required; community detection is enabled by default)
Install with:
bash
pip install networkx
  • Python 3
  • networkx
    (必填;默认启用社区检测功能)
安装命令:
bash
pip install networkx

Workflow

工作流程

  1. Scope the repo and time window (optional
    --since/--until
    ).
  2. Decide sensitivity rules (use defaults or provide a CSV config).
  3. Build the ownership map with
    scripts/run_ownership_map.py
    (co-change graph is on by default; use
    --cochange-max-files
    to ignore supernode commits).
  4. Communities are computed by default; graphml output is optional (
    --graphml
    ).
  5. Query the outputs with
    scripts/query_ownership.py
    for bounded JSON slices.
  6. Persist and visualize (see
    references/neo4j-import.md
    ).
By default, the co-change graph ignores common “glue” files (lockfiles,
.github/*
, editor config) so clusters reflect actual code movement instead of shared infra edits. Override with
--cochange-exclude
or
--no-default-cochange-excludes
. Dependabot commits are excluded by default; override with
--no-default-author-excludes
or add patterns via
--author-exclude-regex
.
If you want to exclude Linux build glue like
Kbuild
from co-change clustering, pass:
bash
python skills/skills/security-ownership-map/scripts/run_ownership_map.py \
  --repo /path/to/linux \
  --out ownership-map-out \
  --cochange-exclude "**/Kbuild"
  1. 确定仓库范围和时间窗口(可选参数
    --since/--until
    )。
  2. 确定敏感度规则(使用默认规则或提供CSV配置文件)。
  3. 运行
    scripts/run_ownership_map.py
    构建所有权图谱(默认启用共变更图谱;使用
    --cochange-max-files
    忽略包含超多数目文件的提交)。
  4. 默认会计算社区;可选生成graphml格式输出(使用
    --graphml
    参数)。
  5. 运行
    scripts/query_ownership.py
    查询输出结果,获取限定范围的JSON切片。
  6. 持久化存储并进行可视化(参考
    references/neo4j-import.md
    )。
默认情况下,共变更图谱会忽略常见的“粘合类”文件(锁文件、
.github/*
、编辑器配置文件),因此聚类结果能反映代码的实际变更关联,而非基础设施文件的共享编辑。可通过
--cochange-exclude
--no-default-cochange-excludes
参数覆盖默认设置。默认会排除Dependabot提交;可通过
--no-default-author-excludes
参数取消该设置,或使用
--author-exclude-regex
添加自定义排除规则。
若要在共变更聚类中排除类似
Kbuild
的Linux构建粘合文件,可执行以下命令:
bash
python skills/skills/security-ownership-map/scripts/run_ownership_map.py \
  --repo /path/to/linux \
  --out ownership-map-out \
  --cochange-exclude "**/Kbuild"

Quick start

快速开始

Run from the repo root:
bash
python skills/skills/security-ownership-map/scripts/run_ownership_map.py \
  --repo . \
  --out ownership-map-out \
  --since "12 months ago" \
  --emit-commits
Defaults: author identity, author date, and merge commits excluded. Use
--identity committer
,
--date-field committer
, or
--include-merges
if needed.
Example (override co-change excludes):
bash
python skills/skills/security-ownership-map/scripts/run_ownership_map.py \
  --repo . \
  --out ownership-map-out \
  --cochange-exclude "**/Cargo.lock" \
  --cochange-exclude "**/.github/**" \
  --no-default-cochange-excludes
Communities are computed by default. To disable:
bash
python skills/skills/security-ownership-map/scripts/run_ownership_map.py \
  --repo . \
  --out ownership-map-out \
  --no-communities
在仓库根目录执行以下命令:
bash
python skills/skills/security-ownership-map/scripts/run_ownership_map.py \
  --repo . \
  --out ownership-map-out \
  --since "12 months ago" \
  --emit-commits
默认设置:基于作者身份、作者日期统计,且排除合并提交。若需调整,可使用
--identity committer
--date-field committer
--include-merges
参数。
示例(覆盖共变更排除规则):
bash
python skills/skills/security-ownership-map/scripts/run_ownership_map.py \
  --repo . \
  --out ownership-map-out \
  --cochange-exclude "**/Cargo.lock" \
  --cochange-exclude "**/.github/**" \
  --no-default-cochange-excludes
默认会计算社区。若要禁用该功能:
bash
python skills/skills/security-ownership-map/scripts/run_ownership_map.py \
  --repo . \
  --out ownership-map-out \
  --no-communities

Sensitivity rules

敏感度规则

By default, the script flags common auth/crypto/secret paths. Override by providing a CSV file:
undefined
默认情况下,脚本会标记常见的认证/加密/密钥相关路径。可通过提供CSV配置文件覆盖默认规则:
undefined

pattern,tag,weight

pattern,tag,weight

/auth/,auth,1.0 /crypto/,crypto,1.0 **/*.pem,secrets,1.0

Use it with `--sensitive-config path/to/sensitive.csv`.
/auth/,auth,1.0 /crypto/,crypto,1.0 **/*.pem,secrets,1.0

使用时添加参数`--sensitive-config path/to/sensitive.csv`。

Output artifacts

输出工件

ownership-map-out/
contains:
  • people.csv
    (nodes: people)
  • files.csv
    (nodes: files)
  • edges.csv
    (edges: touches)
  • cochange_edges.csv
    (file-to-file co-change edges with Jaccard weight; omitted with
    --no-cochange
    )
  • summary.json
    (security ownership findings)
  • commits.jsonl
    (optional, if
    --emit-commits
    )
  • communities.json
    (computed by default from co-change edges when available; includes
    maintainers
    per community; disable with
    --no-communities
    )
  • cochange.graph.json
    (NetworkX node-link JSON with
    community_id
    +
    community_maintainers
    ; falls back to
    ownership.graph.json
    if no co-change edges)
  • ownership.graphml
    /
    cochange.graphml
    (optional, if
    --graphml
    )
people.csv
includes timezone detection based on author commit offsets:
primary_tz_offset
,
primary_tz_minutes
, and
timezone_offsets
.
ownership-map-out/
目录包含以下文件:
  • people.csv
    (节点:人员)
  • files.csv
    (节点:文件)
  • edges.csv
    (边:关联记录)
  • cochange_edges.csv
    (文件间共变更边,带Jaccard权重;使用
    --no-cochange
    参数时会省略)
  • summary.json
    (安全所有权分析结果)
  • commits.jsonl
    (可选,添加
    --emit-commits
    参数时生成)
  • communities.json
    (默认基于共变更边计算;包含每个社区的
    maintainers
    信息;使用
    --no-communities
    参数可禁用)
  • cochange.graph.json
    (NetworkX节点-链接格式JSON,包含
    community_id
    community_maintainers
    ;若无共变更边则 fallback 到
    ownership.graph.json
  • ownership.graphml
    /
    cochange.graphml
    (可选,添加
    --graphml
    参数时生成)
people.csv
包含基于作者提交偏移量检测到的时区信息:
primary_tz_offset
primary_tz_minutes
timezone_offsets

LLM query helper

LLM 查询辅助工具

Use
scripts/query_ownership.py
to return small, JSON-bounded slices without loading the full graph into context.
Examples:
bash
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out people --limit 10
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out files --tag auth --bus-factor-max 1
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out person --person alice@corp --limit 10
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out file --file crypto/tls
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out cochange --file crypto/tls --limit 10
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out summary --section orphaned_sensitive_code
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out community --id 3
Use
--community-top-owners 5
(default) to control how many maintainers are stored per community.
使用
scripts/query_ownership.py
可返回小范围、限定格式的JSON切片,无需将整个图谱加载到上下文环境中。
示例:
bash
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out people --limit 10
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out files --tag auth --bus-factor-max 1
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out person --person alice@corp --limit 10
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out file --file crypto/tls
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out cochange --file crypto/tls --limit 10
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out summary --section orphaned_sensitive_code
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out community --id 3
可使用
--community-top-owners 5
(默认值)控制每个社区存储的维护人员数量。

Basic security queries

基础安全查询

Run these to answer common security ownership questions with bounded output:
bash
undefined
执行以下命令可获取常见安全所有权问题的限定格式结果:
bash
undefined

Orphaned sensitive code (stale + low bus factor)

孤立敏感代码(已过时且总线因子低)

python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out summary --section orphaned_sensitive_code
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out summary --section orphaned_sensitive_code

Hidden owners for sensitive tags

敏感代码的隐藏所有者

python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out summary --section hidden_owners
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out summary --section hidden_owners

Sensitive hotspots with low bus factor

总线因子低的敏感代码热点

python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out summary --section bus_factor_hotspots
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out summary --section bus_factor_hotspots

Auth/crypto files with bus factor <= 1

总线因子 ≤1 的认证/加密文件

python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out files --tag auth --bus-factor-max 1 python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out files --tag crypto --bus-factor-max 1
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out files --tag auth --bus-factor-max 1 python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out files --tag crypto --bus-factor-max 1

Who is touching sensitive code the most

处理敏感代码最多的人员

python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out people --sort sensitive_touches --limit 10
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out people --sort sensitive_touches --limit 10

Co-change neighbors (cluster hints for ownership drift)

共变更关联文件(所有权漂移的集群提示)

python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out cochange --file path/to/file --min-jaccard 0.05 --limit 20
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out cochange --file path/to/file --min-jaccard 0.05 --limit 20

Community maintainers (for a cluster)

社区维护人员(针对特定集群)

python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out community --id 3
python skills/skills/security-ownership-map/scripts/query_ownership.py --data-dir ownership-map-out community --id 3

Monthly maintainers for the community containing a file

包含指定文件的社区月度维护人员

python skills/skills/security-ownership-map/scripts/community_maintainers.py
--data-dir ownership-map-out
--file network/card.c
--since 2025-01-01
--top 5
python skills/skills/security-ownership-map/scripts/community_maintainers.py
--data-dir ownership-map-out
--file network/card.c
--since 2025-01-01
--top 5

Quarterly buckets instead of monthly

按季度分组而非月度

python skills/skills/security-ownership-map/scripts/community_maintainers.py
--data-dir ownership-map-out
--file network/card.c
--since 2025-01-01
--bucket quarter
--top 5

Notes:

- Touches default to one authored commit (not per-file). Use `--touch-mode file` to count per-file touches.
- Use `--window-days 90` or `--weight recency --half-life-days 180` to smooth churn.
- Filter bots with `--ignore-author-regex '(bot|dependabot)'`.
- Use `--min-share 0.1` to show stable maintainers only.
- Use `--bucket quarter` for calendar quarter groupings.
- Use `--identity committer` or `--date-field committer` to switch from author attribution.
- Use `--include-merges` to include merge commits (excluded by default).
python skills/skills/security-ownership-map/scripts/community_maintainers.py
--data-dir ownership-map-out
--file network/card.c
--since 2025-01-01
--bucket quarter
--top 5

注意事项:

- 默认情况下,“关联记录(touches)”按作者提交次数统计(而非按文件统计)。使用`--touch-mode file`参数可改为按文件统计关联次数。
- 使用`--window-days 90`或`--weight recency --half-life-days 180`参数可平滑人员变动情况。
- 使用`--ignore-author-regex '(bot|dependabot)'`参数过滤机器人提交。
- 使用`--min-share 0.1`参数仅显示稳定维护人员。
- 使用`--bucket quarter`参数按自然季度分组。
- 使用`--identity committer`或`--date-field committer`参数切换为基于提交者的归因统计。
- 使用`--include-merges`参数可包含合并提交(默认排除)。

Summary format (default)

默认摘要格式

Use this structure, add fields if needed:
json
{
  "orphaned_sensitive_code": [
    {
      "path": "crypto/tls/handshake.rs",
      "last_security_touch": "2023-03-12T18:10:04+00:00",
      "bus_factor": 1
    }
  ],
  "hidden_owners": [
    {
      "person": "alice@corp",
      "controls": "63% of auth code"
    }
  ]
}
采用以下结构,可按需添加字段:
json
{
  "orphaned_sensitive_code": [
    {
      "path": "crypto/tls/handshake.rs",
      "last_security_touch": "2023-03-12T18:10:04+00:00",
      "bus_factor": 1
    }
  ],
  "hidden_owners": [
    {
      "person": "alice@corp",
      "controls": "63% of auth code"
    }
  ]
}

Graph persistence

图谱持久化

Use
references/neo4j-import.md
when you need to load the CSVs into Neo4j. It includes constraints, import Cypher, and visualization tips.
如需将CSV文件导入Neo4j,可参考
references/neo4j-import.md
,其中包含约束配置、导入Cypher语句及可视化技巧。

Notes

注意事项

  • bus_factor_hotspots
    in
    summary.json
    lists sensitive files with low bus factor;
    orphaned_sensitive_code
    is the stale subset.
  • If
    git log
    is too large, narrow with
    --since
    or
    --until
    .
  • Compare
    summary.json
    against CODEOWNERS to highlight ownership drift.
  • summary.json
    中的
    bus_factor_hotspots
    列出总线因子低的敏感文件;
    orphaned_sensitive_code
    是其中已过时的子集。
  • git log
    输出过大,可通过
    --since
    --until
    参数缩小范围。
  • summary.json
    与CODEOWNERS对比,可突出显示所有权漂移情况。