codebase-recon

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Codebase Recon

代码库侦察

Analyze git history to understand a codebase before reading any code. Reveals project health, risk areas, team structure, and development momentum.
在阅读任何代码之前,通过分析git历史记录来了解代码库。可揭示项目健康状况、风险区域、团队结构和开发势头。
灵感来自Ally Piechowski(GitHub:grepsedawk)的文章《The Git Commands I Run Before Reading Any Code》(链接:https://piechowski.io/post/git-commands-before-reading-code/)。

Phase 1: Probe

阶段1:探测

Before running analysis, determine repo scale to calibrate time windows and result counts.
Run this single shell command to collect repo vitals:
sh
echo "COMMITS=$(git rev-list --count HEAD)" && \
echo "FIRST_COMMIT=$(git log --reverse --format='%ad' --date=short | head -1)" && \
echo "LATEST_COMMIT=$(git log --format='%ad' --date=short | head -1)" && \
echo "BRANCHES=$(git branch -a | wc -l | tr -d ' ')"
Use the commit count to set parameters for Phase 2:
Repo SizeCommits
WINDOW
(--since)
N
(--head)
Small<500(omit --since)10
Medium500-10k
1 year ago
20
Large>10k
6 months ago
30
Print the Repo Vitals line immediately:
Repo Vitals: Age: [FIRST_COMMIT to LATEST_COMMIT] | Commits: [COMMITS] | Branches: [BRANCHES] | Analysis window: [WINDOW or "all time"]
在运行分析之前,先确定代码库规模,以便校准时间窗口和结果数量。
运行以下单个shell命令来收集代码库关键指标:
sh
echo "COMMITS=$(git rev-list --count HEAD)" && \
echo "FIRST_COMMIT=$(git log --reverse --format='%ad' --date=short | head -1)" && \
echo "LATEST_COMMIT=$(git log --format='%ad' --date=short | head -1)" && \
echo "BRANCHES=$(git branch -a | wc -l | tr -d ' ')"
根据提交次数设置阶段2的参数:
代码库规模提交次数
WINDOW
(--since参数)
N
(--head参数)
小型<500(省略--since)10
中型500-10k
1 year ago
20
大型>10k
6 months ago
30
立即打印代码库关键指标行:
Repo Vitals: Age: [FIRST_COMMIT to LATEST_COMMIT] | Commits: [COMMITS] | Branches: [BRANCHES] | Analysis window: [WINDOW or "all time"]

Phase 2: Parallel Analysis

阶段2:并行分析

Run all 7 commands in parallel (they are independent). Substitute
WINDOW
and
N
from Phase 1. For small repos, omit
--since
flags entirely.
并行运行所有7个命令(它们相互独立)。替换阶段1中的
WINDOW
N
参数。对于小型代码库,完全省略
--since
标志。

2a. Code Hotspots

2a. 代码热点

Most-changed files in the analysis window:
sh
git log --format=format: --name-only --since="WINDOW" | sort | uniq -c | sort -nr | head -N
分析时间窗口内变更最频繁的文件:
sh
git log --format=format: --name-only --since="WINDOW" | sort | uniq -c | sort -nr | head -N

2b. Bus Factor

2b. 关键人员风险(Bus Factor)

All-time contributor ranking by commit count:
sh
git shortlog -sn --no-merges
按提交次数统计的历史贡献者排名:
sh
git shortlog -sn --no-merges

2c. Bug Magnets

2c. Bug高发文件

Files most associated with bug-fix commits:
sh
git log -i -E --grep="fix|bug|broken" --name-only --format='' --since="WINDOW" | sort | uniq -c | sort -nr | head -N
与bug修复提交关联最多的文件:
sh
git log -i -E --grep="fix|bug|broken" --name-only --format='' --since="WINDOW" | sort | uniq -c | sort -nr | head -N

2d. Team Momentum

2d. 团队开发势头

Commit frequency by month (all time):
sh
git log --format='%ad' --date=format:'%Y-%m' | sort | uniq -c
按月份统计的提交频率(全时段):
sh
git log --format='%ad' --date=format:'%Y-%m' | sort | uniq -c

2e. Firefighting Frequency

2e. 紧急修复频率

Emergency/revert commits in the analysis window:
sh
git log --oneline --since="WINDOW" | grep -iE 'revert|hotfix|emergency|rollback'
分析时间窗口内的紧急/回滚提交:
sh
git log --oneline --since="WINDOW" | grep -iE 'revert|hotfix|emergency|rollback'

2f. Recently Added Files

2f. 近期新增文件

New files added in the analysis window:
sh
git log --diff-filter=A --since="WINDOW" --name-only --format='' | sort | uniq -c | sort -nr | head -N
分析时间窗口内新增的文件:
sh
git log --diff-filter=A --since="WINDOW" --name-only --format='' | sort | uniq -c | sort -nr | head -N

2g. Active vs Total Contributors

2g. 活跃贡献者 vs 总贡献者

Count of contributors active in the last 3 months (fixed window — measures "who's here now"):
sh
git shortlog -sn --no-merges --since="3 months ago" | wc -l
Compare this count against the total from 2b.
最近3个月内活跃的贡献者数量(固定窗口——衡量“当前参与人员”):
sh
git shortlog -sn --no-merges --since="3 months ago" | wc -l
将此数量与2b中的总贡献者数量进行对比。

Cross-Referencing

交叉验证

After collecting all Phase 2 results, perform these cross-references before presenting the report:
  1. High-Risk Files: Intersect code hotspots (2a) with bug magnets (2c). Files appearing in both lists are highest-risk.
  2. Risk Ownership: For each high-risk file, run
    git shortlog -sn -- <file>
    to identify the primary owner.
  3. Bus Factor Risk: If active contributors (2g) are less than 30% of total contributors (2b), flag this as a bus factor concern.
  4. Momentum Trend: Analyze the monthly commit counts (2d):
    • Compare the average of the last 3 months to the average of the 3 months before that.
    • Rising: last 3 months average > prior 3 months average by 20%+
    • Declining: last 3 months average < prior 3 months average by 20%+
    • Erratic: month-over-month variance exceeds 50%
    • Stable: otherwise
收集完阶段2的所有结果后,在生成报告前执行以下交叉验证:
  1. 高风险文件:将代码热点(2a)与Bug高发文件(2c)取交集。同时出现在两个列表中的文件风险最高。
  2. 风险归属:对每个高风险文件,运行
    git shortlog -sn -- <file>
    来确定主要负责人。
  3. 关键人员风险(Bus Factor):如果活跃贡献者(2g)数量少于总贡献者(2b)的30%,则标记为关键人员风险问题。
  4. 开发势头趋势:分析月度提交统计(2d):
    • 对比最近3个月的平均值与之前3个月的平均值。
    • 上升:最近3个月平均值比前3个月平均值高出20%以上
    • 下降:最近3个月平均值比前3个月平均值低20%以上
    • 波动:月度间差异超过50%
    • 稳定:其他情况

Report Template

报告模板

Present the report in the terminal using this structure:
═══ Codebase Recon Report ═══

Repo Vitals
  Age: [first commit] to [latest commit] | Commits: N | Branches: N | Analysis window: WINDOW

1. Code Hotspots (most-changed files)
   [ranked list: count  filepath]

2. Bug Magnets (files with fix/bug/broken commits)
   [ranked list: count  filepath]

3. High-Risk Files (appear in BOTH hotspots AND bug magnets)
   [list with: filepath — hotspot rank #X, bug magnet rank #Y, primary owner: NAME]
   If none overlap, state: "No files appear in both lists — good sign."

4. Bus Factor
   [top 10 contributors: count  name]
   Active (last 3 months): X of Y total contributors
   [If active < 30% of total: "Warning: low active contributor ratio — knowledge concentration risk"]

5. Team Momentum
   [monthly commit counts, most recent 12 months or all if fewer]
   Trend: [rising / stable / declining / erratic]

6. Firefighting Frequency
   [list of revert/hotfix/emergency commits, or "None found"]
   Rate: N emergency commits out of M total in window (X%)

7. Recently Added Files
   [ranked list: count  filepath]

8. Recommendations
   - Start reading: [top 3 high-risk files, or top 3 hotspots if no high-risk files]
   - Talk to: [primary owner of the #1 high-risk or hotspot file]
   - Watch out: [any trend warnings — declining momentum, low bus factor, high firefighting rate]
After printing the report, ask:
"Want me to save this report to a markdown file? (e.g.,
docs/codebase-recon-report.md
)"
If yes, write the same content as a markdown file. Do not commit — let the user decide.
在终端中使用以下结构展示报告:
═══ 代码库侦察报告 ═══

代码库关键指标
  存续时间:[首次提交日期] 至 [最新提交日期] | 提交次数:N | 分支数量:N | 分析窗口:WINDOW

1. 代码热点(变更最频繁的文件)
   [排名列表:变更次数  文件路径]

2. Bug高发文件(关联fix/bug/broken提交的文件)
   [排名列表:关联次数  文件路径]

3. 高风险文件(同时出现在热点和Bug高发列表中)
   [列表格式:文件路径 — 热点排名#X,Bug高发排名#Y,主要负责人:姓名]
   如果没有重叠文件,说明:"无重叠文件——良好信号。"

4. 关键人员风险(Bus Factor)
   [前10名贡献者:提交次数  姓名]
   活跃贡献者(近3个月):总Y人中的X人
   [如果活跃人数<总人数的30%:"警告:活跃贡献者占比低——知识集中风险"]

5. 团队开发势头
   [月度提交统计,最近12个月或全部(如果不足12个月)]
   趋势:[上升 / 稳定 / 下降 / 波动]

6. 紧急修复频率
   [回滚/热修复/紧急提交列表,或"未发现"]
   占比:窗口内M次总提交中的N次紧急提交(X%)

7. 近期新增文件
   [排名列表:新增次数  文件路径]

8. 建议
   - 优先阅读:[前3个高风险文件,若无则为前3个代码热点文件]
   - 沟通对象:[排名第1的高风险或热点文件的主要负责人]
   - 注意事项:[任何趋势警告——开发势头下降、关键人员风险高、紧急修复频率高]
打印报告后,询问:
"是否需要将此报告保存为markdown文件?(例如:
docs/codebase-recon-report.md
)"
如果用户同意,将相同内容写入markdown文件。无需提交——由用户决定是否提交。