git-leak-recovery

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Git Leak Recovery

Git 敏感数据恢复与清理

Overview

概述

This skill guides the process of recovering sensitive data (secrets, credentials, API keys) that have been removed from Git history through history-rewriting operations, extracting the data, and then securely cleaning the repository to ensure complete removal.
本技能指导你完成以下流程:恢复通过重写历史操作从Git历史中移除的敏感数据(机密、凭证、API密钥),提取数据,然后安全清理仓库以确保数据被彻底删除。

Key Concepts

核心概念

Git Object Persistence

Git 对象持久化

When commits are "removed" via operations like
git reset
,
git rebase
, or
git commit --amend
, the underlying Git objects are not immediately deleted. They become "unreachable" but persist in the repository until garbage collection occurs. This behavior enables recovery but also means secrets remain accessible until explicit cleanup.
当通过
git reset
git rebase
git commit --amend
等操作“移除”提交时,底层的Git对象不会立即被删除。它们会变为“不可达”状态,但会保留在仓库中,直到垃圾回收(garbage collection)执行。这种特性使得数据恢复成为可能,但也意味着在执行显式清理之前,机密数据仍然可以被访问。

Common Hiding Places for Secrets

机密数据常见隐藏位置

When searching for removed secrets, check these locations in order of likelihood:
  1. Reflog - Most common location for rewritten history (
    git reflog
    )
  2. Dangling commits - Commits with no branch reference (
    git fsck --unreachable
    )
  3. Stashes - Often overlooked (
    git stash list
    )
  4. Other branches - May contain the original commits
  5. Tags - May reference old commits
  6. Git notes - Annotations attached to commits
查找被移除的机密数据时,按以下优先级检查这些位置:
  1. Reflog - 重写历史后最常见的位置(
    git reflog
  2. 悬空提交 - 没有分支引用的提交(
    git fsck --unreachable
  3. 暂存区(Stashes) - 常被忽略的位置(
    git stash list
  4. 其他分支 - 可能包含原始提交
  5. 标签 - 可能引用旧提交
  6. Git 注释 - 附加到提交的注解

Workflow

工作流程

Phase 1: Reconnaissance

阶段1:侦察

Before attempting recovery, gather information about the repository state:
bash
undefined
在尝试恢复之前,先收集仓库状态信息:
bash
undefined

View current commit history

查看当前提交历史

git log --all --oneline
git log --all --oneline

Check reflog for all references

查看所有引用的reflog

git reflog show --all
git reflog show --all

Find unreachable objects

查找不可达对象

git fsck --unreachable
git fsck --unreachable

List stashes

列出所有暂存内容

git stash list
git stash list

List all branches

列出所有分支

git branch -a
git branch -a

List tags

列出所有标签

git tag -l
undefined
git tag -l
undefined

Phase 2: Recovery

阶段2:恢复

Once a target commit is identified:
bash
undefined
一旦找到目标提交:
bash
undefined

View commit contents without checking out

无需检出即可查看提交内容

git show <commit-hash>
git show <commit-hash>

View specific file from commit

查看提交中的特定文件

git show <commit-hash>:<path/to/file>
git show <commit-hash>:<path/to/file>

Extract file to working directory

将文件提取到工作目录

git show <commit-hash>:<path/to/file> > recovered_file.txt
undefined
git show <commit-hash>:<path/to/file> > recovered_file.txt
undefined

Phase 3: Cleanup

阶段3:清理

To completely remove secrets from the repository, perform cleanup in this specific order:
bash
undefined
要彻底从仓库中移除机密数据,请按以下特定顺序执行清理:
bash
undefined

Step 1: Expire all reflog entries

步骤1:过期所有reflog条目

git reflog expire --expire=now --all
git reflog expire --expire=now --all

Step 2: Run aggressive garbage collection

步骤2:执行激进的垃圾回收

git gc --prune=now --aggressive

The order matters: reflog must be expired first, otherwise GC will not remove the objects since they are still referenced.
git gc --prune=now --aggressive

顺序至关重要:必须先过期reflog,否则垃圾回收(GC)不会移除这些对象,因为它们仍然被reflog引用。

Phase 4: Verification

阶段4:验证

Verify cleanup was successful using multiple approaches:
bash
undefined
通过多种方式验证清理是否成功:
bash
undefined

Attempt to access the old commit (should fail)

尝试访问旧提交(应返回错误)

git show <old-commit-hash>
git show <old-commit-hash>

Search for secret patterns in repository

在仓库中搜索机密模式

grep -r "secret_pattern" . .git 2>/dev/null
grep -r "secret_pattern" . .git 2>/dev/null

Check for unreachable objects

检查不可达对象

git fsck --unreachable
git fsck --unreachable

Count loose objects (should decrease after GC)

统计松散对象数量(GC后应减少)

find .git/objects -type f | wc -l
find .git/objects -type f | wc -l

Verify working tree is clean

验证工作区是否干净

git status
undefined
git status
undefined

Verification Strategies

验证策略

Confirming Recovery Success

确认恢复成功

  • Verify the recovered data matches expected format/content
  • Write recovered data to the designated output location
  • Confirm existing commits and history remain intact after recovery
  • 验证恢复的数据与预期格式/内容匹配
  • 将恢复的数据写入指定输出位置
  • 确认恢复后现有提交和历史保持完整

Confirming Cleanup Success

确认清理成功

  • Old commit hash should return error when accessed via
    git show
  • grep -r
    for secret patterns should return no matches
  • git fsck --unreachable
    should show no objects containing the secret
  • Compare object count before and after GC to confirm removal
  • 尝试通过
    git show
    访问旧提交哈希时应返回错误
  • grep -r
    搜索机密模式应无匹配结果
  • git fsck --unreachable
    应显示不包含机密数据的对象
  • 比较GC前后的对象数量以确认数据已移除

Common Pitfalls

常见陷阱

Investigation Pitfalls

调查陷阱

  1. Only checking recent history - Use
    git reflog show --all
    not just
    git reflog
    to see all references
  2. Forgetting stashes - Stashes are a common place for accidentally stored secrets
  3. Missing other branches - Always check all branches with
    git branch -a
  1. 仅检查近期历史 - 使用
    git reflog show --all
    而非仅
    git reflog
    来查看所有引用
  2. 忘记暂存区 - 暂存区是意外存储机密数据的常见位置
  3. 遗漏其他分支 - 始终使用
    git branch -a
    检查所有分支

Cleanup Pitfalls

清理陷阱

  1. Wrong order of operations - Always expire reflog before running GC
  2. Missing the
    --all
    flag
    -
    git reflog expire --expire=now
    without
    --all
    only affects HEAD
  3. Using
    --prune
    without
    =now
    - Default prune time is 2 weeks, use
    --prune=now
    for immediate effect
  4. Not using
    --aggressive
    - Standard GC may not remove all unreachable objects
  1. 操作顺序错误 - 必须先过期reflog再执行GC
  2. 遗漏
    --all
    参数
    - 不带
    --all
    git reflog expire --expire=now
    仅影响HEAD
  3. 使用
    --prune
    但未指定
    =now
    - 默认修剪时间为2周,使用
    --prune=now
    可立即生效
  4. 未使用
    --aggressive
    - 标准GC可能无法移除所有不可达对象

Verification Pitfalls

验证陷阱

  1. Only checking working directory - Secrets in
    .git
    directory require explicit checks
  2. Not verifying object removal - Always confirm the commit hash is inaccessible
  3. Incomplete grep patterns - Search for multiple variations of the secret pattern
  1. 仅检查工作目录 -
    .git
    目录中的机密数据需要显式检查
  2. 未验证对象移除 - 始终确认提交哈希已无法访问
  3. grep模式不完整 - 搜索机密模式的多种变体

Decision Tree

决策树

Is the task about recovering lost data from Git?
├── Yes → Check reflog first (git reflog show --all)
│   ├── Found in reflog → Use git show <hash> to view/extract
│   └── Not in reflog → Check fsck, stashes, branches, tags
└── Is the task about cleaning up secrets from Git?
    ├── Yes → Follow cleanup sequence:
    │   1. git reflog expire --expire=now --all
    │   2. git gc --prune=now --aggressive
    │   3. Verify with multiple methods
    └── Both recovery AND cleanup needed?
        → Complete recovery first, verify data saved,
          then proceed with cleanup
任务是否为从Git中恢复丢失的数据?
├── 是 → 首先检查reflog(git reflog show --all)
│   ├── 在reflog中找到 → 使用git show <哈希>查看/提取数据
│   └── 未在reflog中找到 → 检查fsck、暂存区、分支、标签
└── 任务是否为从Git中清理机密数据?
    ├── 是 → 遵循清理流程:
    │   1. git reflog expire --expire=now --all
    │   2. git gc --prune=now --aggressive
    │   3. 使用多种方法验证
    └── 是否同时需要恢复和清理?
        → 先完成恢复,验证数据已保存,
          再执行清理流程

Important Considerations

重要注意事项

  • Backup first: Before any cleanup operations, ensure recovered data is saved outside the repository
  • Remote repositories: This cleanup only affects the local repository; if secrets were pushed to a remote, additional steps are needed
  • Cloned copies: Any cloned copies of the repository may still contain the secrets
  • Credential rotation: After recovering exposed secrets, rotate them immediately regardless of cleanup success
  • 先备份:在执行任何清理操作之前,确保已将恢复的数据保存到仓库外部
  • 远程仓库:此清理仅影响本地仓库;如果机密数据已推送到远程仓库,还需要额外步骤
  • 克隆副本:仓库的任何克隆副本可能仍包含机密数据
  • 凭证轮换:无论清理是否成功,恢复暴露的机密数据后应立即轮换凭证