git-leak-recovery
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGit Leak Recovery
Git 敏感数据恢复与清理
Overview
概述
This skill guides the process of recovering sensitive data (secrets, credentials, API keys) that have been removed from Git history through history-rewriting operations, extracting the data, and then securely cleaning the repository to ensure complete removal.
本技能指导你完成以下流程:恢复通过重写历史操作从Git历史中移除的敏感数据(机密、凭证、API密钥),提取数据,然后安全清理仓库以确保数据被彻底删除。
Key Concepts
核心概念
Git Object Persistence
Git 对象持久化
When commits are "removed" via operations like , , or , the underlying Git objects are not immediately deleted. They become "unreachable" but persist in the repository until garbage collection occurs. This behavior enables recovery but also means secrets remain accessible until explicit cleanup.
git resetgit rebasegit commit --amend当通过、或等操作“移除”提交时,底层的Git对象不会立即被删除。它们会变为“不可达”状态,但会保留在仓库中,直到垃圾回收(garbage collection)执行。这种特性使得数据恢复成为可能,但也意味着在执行显式清理之前,机密数据仍然可以被访问。
git resetgit rebasegit commit --amendCommon Hiding Places for Secrets
机密数据常见隐藏位置
When searching for removed secrets, check these locations in order of likelihood:
- Reflog - Most common location for rewritten history ()
git reflog - Dangling commits - Commits with no branch reference ()
git fsck --unreachable - Stashes - Often overlooked ()
git stash list - Other branches - May contain the original commits
- Tags - May reference old commits
- Git notes - Annotations attached to commits
查找被移除的机密数据时,按以下优先级检查这些位置:
- Reflog - 重写历史后最常见的位置()
git reflog - 悬空提交 - 没有分支引用的提交()
git fsck --unreachable - 暂存区(Stashes) - 常被忽略的位置()
git stash list - 其他分支 - 可能包含原始提交
- 标签 - 可能引用旧提交
- Git 注释 - 附加到提交的注解
Workflow
工作流程
Phase 1: Reconnaissance
阶段1:侦察
Before attempting recovery, gather information about the repository state:
bash
undefined在尝试恢复之前,先收集仓库状态信息:
bash
undefinedView current commit history
查看当前提交历史
git log --all --oneline
git log --all --oneline
Check reflog for all references
查看所有引用的reflog
git reflog show --all
git reflog show --all
Find unreachable objects
查找不可达对象
git fsck --unreachable
git fsck --unreachable
List stashes
列出所有暂存内容
git stash list
git stash list
List all branches
列出所有分支
git branch -a
git branch -a
List tags
列出所有标签
git tag -l
undefinedgit tag -l
undefinedPhase 2: Recovery
阶段2:恢复
Once a target commit is identified:
bash
undefined一旦找到目标提交:
bash
undefinedView commit contents without checking out
无需检出即可查看提交内容
git show <commit-hash>
git show <commit-hash>
View specific file from commit
查看提交中的特定文件
git show <commit-hash>:<path/to/file>
git show <commit-hash>:<path/to/file>
Extract file to working directory
将文件提取到工作目录
git show <commit-hash>:<path/to/file> > recovered_file.txt
undefinedgit show <commit-hash>:<path/to/file> > recovered_file.txt
undefinedPhase 3: Cleanup
阶段3:清理
To completely remove secrets from the repository, perform cleanup in this specific order:
bash
undefined要彻底从仓库中移除机密数据,请按以下特定顺序执行清理:
bash
undefinedStep 1: Expire all reflog entries
步骤1:过期所有reflog条目
git reflog expire --expire=now --all
git reflog expire --expire=now --all
Step 2: Run aggressive garbage collection
步骤2:执行激进的垃圾回收
git gc --prune=now --aggressive
The order matters: reflog must be expired first, otherwise GC will not remove the objects since they are still referenced.git gc --prune=now --aggressive
顺序至关重要:必须先过期reflog,否则垃圾回收(GC)不会移除这些对象,因为它们仍然被reflog引用。Phase 4: Verification
阶段4:验证
Verify cleanup was successful using multiple approaches:
bash
undefined通过多种方式验证清理是否成功:
bash
undefinedAttempt to access the old commit (should fail)
尝试访问旧提交(应返回错误)
git show <old-commit-hash>
git show <old-commit-hash>
Search for secret patterns in repository
在仓库中搜索机密模式
grep -r "secret_pattern" . .git 2>/dev/null
grep -r "secret_pattern" . .git 2>/dev/null
Check for unreachable objects
检查不可达对象
git fsck --unreachable
git fsck --unreachable
Count loose objects (should decrease after GC)
统计松散对象数量(GC后应减少)
find .git/objects -type f | wc -l
find .git/objects -type f | wc -l
Verify working tree is clean
验证工作区是否干净
git status
undefinedgit status
undefinedVerification Strategies
验证策略
Confirming Recovery Success
确认恢复成功
- Verify the recovered data matches expected format/content
- Write recovered data to the designated output location
- Confirm existing commits and history remain intact after recovery
- 验证恢复的数据与预期格式/内容匹配
- 将恢复的数据写入指定输出位置
- 确认恢复后现有提交和历史保持完整
Confirming Cleanup Success
确认清理成功
- Old commit hash should return error when accessed via
git show - for secret patterns should return no matches
grep -r - should show no objects containing the secret
git fsck --unreachable - Compare object count before and after GC to confirm removal
- 尝试通过访问旧提交哈希时应返回错误
git show - 搜索机密模式应无匹配结果
grep -r - 应显示不包含机密数据的对象
git fsck --unreachable - 比较GC前后的对象数量以确认数据已移除
Common Pitfalls
常见陷阱
Investigation Pitfalls
调查陷阱
- Only checking recent history - Use not just
git reflog show --allto see all referencesgit reflog - Forgetting stashes - Stashes are a common place for accidentally stored secrets
- Missing other branches - Always check all branches with
git branch -a
- 仅检查近期历史 - 使用而非仅
git reflog show --all来查看所有引用git reflog - 忘记暂存区 - 暂存区是意外存储机密数据的常见位置
- 遗漏其他分支 - 始终使用检查所有分支
git branch -a
Cleanup Pitfalls
清理陷阱
- Wrong order of operations - Always expire reflog before running GC
- Missing the flag -
--allwithoutgit reflog expire --expire=nowonly affects HEAD--all - Using without
--prune- Default prune time is 2 weeks, use=nowfor immediate effect--prune=now - Not using - Standard GC may not remove all unreachable objects
--aggressive
- 操作顺序错误 - 必须先过期reflog再执行GC
- 遗漏参数 - 不带
--all的--all仅影响HEADgit reflog expire --expire=now - 使用但未指定
--prune- 默认修剪时间为2周,使用=now可立即生效--prune=now - 未使用- 标准GC可能无法移除所有不可达对象
--aggressive
Verification Pitfalls
验证陷阱
- Only checking working directory - Secrets in directory require explicit checks
.git - Not verifying object removal - Always confirm the commit hash is inaccessible
- Incomplete grep patterns - Search for multiple variations of the secret pattern
- 仅检查工作目录 - 目录中的机密数据需要显式检查
.git - 未验证对象移除 - 始终确认提交哈希已无法访问
- grep模式不完整 - 搜索机密模式的多种变体
Decision Tree
决策树
Is the task about recovering lost data from Git?
├── Yes → Check reflog first (git reflog show --all)
│ ├── Found in reflog → Use git show <hash> to view/extract
│ └── Not in reflog → Check fsck, stashes, branches, tags
│
└── Is the task about cleaning up secrets from Git?
├── Yes → Follow cleanup sequence:
│ 1. git reflog expire --expire=now --all
│ 2. git gc --prune=now --aggressive
│ 3. Verify with multiple methods
│
└── Both recovery AND cleanup needed?
→ Complete recovery first, verify data saved,
then proceed with cleanup任务是否为从Git中恢复丢失的数据?
├── 是 → 首先检查reflog(git reflog show --all)
│ ├── 在reflog中找到 → 使用git show <哈希>查看/提取数据
│ └── 未在reflog中找到 → 检查fsck、暂存区、分支、标签
│
└── 任务是否为从Git中清理机密数据?
├── 是 → 遵循清理流程:
│ 1. git reflog expire --expire=now --all
│ 2. git gc --prune=now --aggressive
│ 3. 使用多种方法验证
│
└── 是否同时需要恢复和清理?
→ 先完成恢复,验证数据已保存,
再执行清理流程Important Considerations
重要注意事项
- Backup first: Before any cleanup operations, ensure recovered data is saved outside the repository
- Remote repositories: This cleanup only affects the local repository; if secrets were pushed to a remote, additional steps are needed
- Cloned copies: Any cloned copies of the repository may still contain the secrets
- Credential rotation: After recovering exposed secrets, rotate them immediately regardless of cleanup success
- 先备份:在执行任何清理操作之前,确保已将恢复的数据保存到仓库外部
- 远程仓库:此清理仅影响本地仓库;如果机密数据已推送到远程仓库,还需要额外步骤
- 克隆副本:仓库的任何克隆副本可能仍包含机密数据
- 凭证轮换:无论清理是否成功,恢复暴露的机密数据后应立即轮换凭证