audio-voice-recovery
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseForensic Audio Research Audio Voice Recovery Best Practices
法医音频研究:语音恢复最佳实践
Comprehensive audio forensics and voice recovery guide providing CSI-level capabilities for recovering voice from low-quality, low-volume, or damaged audio recordings. Contains 45 rules across 8 categories, prioritized by impact to guide audio enhancement, forensic analysis, and transcription workflows.
这份全面的音频取证与语音恢复指南提供了CSI级别的能力,可从低质量、低音量或受损的录音中恢复语音。指南包含8个类别共45条规则,按影响优先级排序,为音频增强、法医分析和转录工作流提供指导。
When to Apply
适用场景
Reference these guidelines when:
- Recovering voice from noisy or low-quality recordings
- Enhancing audio for transcription or legal evidence
- Performing forensic audio authentication
- Analyzing recordings for tampering or splices
- Building automated audio processing pipelines
- Transcribing difficult or degraded speech
在以下场景中可参考本指南:
- 从嘈杂或低质量录音中恢复语音
- 增强音频以用于转录或法律证据
- 进行法医音频认证
- 分析录音是否存在篡改或拼接
- 构建自动化音频处理流水线
- 转录难识别或受损的语音
Rule Categories by Priority
按优先级划分的规则类别
| Priority | Category | Impact | Prefix | Rules |
|---|---|---|---|---|
| 1 | Signal Preservation & Analysis | CRITICAL | | 5 |
| 2 | Noise Profiling & Estimation | CRITICAL | | 5 |
| 3 | Spectral Processing | HIGH | | 6 |
| 4 | Voice Isolation & Enhancement | HIGH | | 7 |
| 5 | Temporal Processing | MEDIUM-HIGH | | 5 |
| 6 | Transcription & Recognition | MEDIUM | | 5 |
| 7 | Forensic Authentication | MEDIUM | | 5 |
| 8 | Tool Integration & Automation | LOW-MEDIUM | | 7 |
| 优先级 | 类别 | 影响级别 | 前缀 | 规则数量 |
|---|---|---|---|---|
| 1 | 信号保留与分析 | 关键 | | 5 |
| 2 | 噪声分析与估计 | 关键 | | 5 |
| 3 | 频谱处理 | 高 | | 6 |
| 4 | 语音分离与增强 | 高 | | 7 |
| 5 | 时域处理 | 中高 | | 5 |
| 6 | 转录与识别 | 中 | | 5 |
| 7 | 法医认证 | 中 | | 5 |
| 8 | 工具集成与自动化 | 中低 | | 7 |
Quick Reference
快速参考
1. Signal Preservation & Analysis (CRITICAL)
1. 信号保留与分析(关键)
- - Never modify original recording
signal-preserve-original - - Use lossless formats for processing
signal-lossless-format - - Preserve native sample rate
signal-sample-rate - - Use maximum bit depth for processing
signal-bit-depth - - Analyze before processing
signal-analyze-first
- - 绝不修改原始录音
signal-preserve-original - - 使用无损格式进行处理
signal-lossless-format - - 保留原生采样率
signal-sample-rate - - 使用最大位深度进行处理
signal-bit-depth - - 先分析再处理
signal-analyze-first
2. Noise Profiling & Estimation (CRITICAL)
2. 噪声分析与估计(关键)
- - Extract noise profile from silent segments
noise-profile-silence - - Identify noise type before reduction
noise-identify-type - - Use adaptive estimation for non-stationary noise
noise-adaptive-estimation - - Measure SNR before and after
noise-snr-assessment - - Avoid over-processing and musical artifacts
noise-avoid-overprocessing
- - 从静音片段提取噪声特征
noise-profile-silence - - 降噪前先识别噪声类型
noise-identify-type - - 对非平稳噪声使用自适应估计
noise-adaptive-estimation - - 处理前后测量信噪比(SNR)
noise-snr-assessment - - 避免过度处理产生音乐伪影
noise-avoid-overprocessing
3. Spectral Processing (HIGH)
3. 频谱处理(高)
- - Apply spectral subtraction for stationary noise
spectral-subtraction - - Use Wiener filter for optimal noise estimation
spectral-wiener-filter - - Apply notch filters for tonal interference
spectral-notch-filter - - Apply frequency band limiting for speech
spectral-band-limiting - - Use forensic equalization to restore intelligibility
spectral-equalization - - Repair clipped audio before other processing
spectral-declip
- - 对平稳噪声应用谱减法
spectral-subtraction - - 使用维纳滤波器优化噪声估计
spectral-wiener-filter - - 对 tonal 干扰应用陷波滤波器
spectral-notch-filter - - 对语音应用频率带限制
spectral-band-limiting - - 使用法医均衡器恢复可懂度
spectral-equalization - - 在其他处理前修复削波音频
spectral-declip
4. Voice Isolation & Enhancement (HIGH)
4. 语音分离与增强(高)
- - Use RNNoise for real-time ML denoising
voice-rnnoise - - Use source separation for complex backgrounds
voice-dialogue-isolate - - Preserve formants during pitch manipulation
voice-formant-preserve - - Apply dereverberation for room echo
voice-dereverb - - Use AI speech enhancement services for quick results
voice-enhance-speech - - Use VAD for targeted processing
voice-vad-segment - - Boost frequency regions for specific phonemes
voice-frequency-boost
- - 使用 RNNoise 进行实时机器学习降噪
voice-rnnoise - - 使用源分离处理复杂背景
voice-dialogue-isolate - - 音调调整时保留共振峰
voice-formant-preserve - - 应用去混响处理房间回声
voice-dereverb - - 使用AI语音增强服务快速获得结果
voice-enhance-speech - - 使用VAD(语音活动检测)进行针对性处理
voice-vad-segment - - 针对特定音素提升频率区域
voice-frequency-boost
5. Temporal Processing (MEDIUM-HIGH)
5. 时域处理(中高)
- - Use dynamic range compression for level consistency
temporal-dynamic-range - - Apply noise gate to silence non-speech segments
temporal-noise-gate - - Use time stretching for intelligibility
temporal-time-stretch - - Repair transient damage (clicks, pops, dropouts)
temporal-transient-repair - - Trim silence and normalize before export
temporal-silence-trim
- - 使用动态范围压缩保持电平一致性
temporal-dynamic-range - - 应用噪声门静音非语音片段
temporal-noise-gate - - 使用时间拉伸提升可懂度
temporal-time-stretch - - 修复瞬态损伤(咔哒声、爆音、信号丢失)
temporal-transient-repair - - 导出前修剪静音并归一化
temporal-silence-trim
6. Transcription & Recognition (MEDIUM)
6. 转录与识别(中)
- - Use Whisper for noise-robust transcription
transcribe-whisper - - Use multi-pass transcription for difficult audio
transcribe-multipass - - Segment audio for targeted transcription
transcribe-segment - - Track confidence scores for uncertain words
transcribe-confidence - - Detect and filter ASR hallucinations
transcribe-hallucination
- - 使用 Whisper 进行抗噪转录
transcribe-whisper - - 对难处理音频使用多轮转录
transcribe-multipass - - 分割音频进行针对性转录
transcribe-segment - - 跟踪不确定词汇的置信度分数
transcribe-confidence - - 检测并过滤ASR(自动语音识别)幻觉
transcribe-hallucination
7. Forensic Authentication (MEDIUM)
7. 法医认证(中)
- - Use ENF analysis for timestamp verification
forensic-enf-analysis - - Extract and verify audio metadata
forensic-metadata - - Detect audio tampering and splices
forensic-tampering - - Document chain of custody for evidence
forensic-chain-custody - - Extract speaker characteristics for identification
forensic-speaker-id
- - 使用ENF(电网频率)分析验证时间戳
forensic-enf-analysis - - 提取并验证音频元数据
forensic-metadata - - 检测音频篡改与拼接
forensic-tampering - - 记录证据的保管链
forensic-chain-custody - - 提取说话人特征用于识别
forensic-speaker-id
8. Tool Integration & Automation (LOW-MEDIUM)
8. 工具集成与自动化(中低)
- - Master essential FFmpeg audio commands
tool-ffmpeg-essentials - - Use SoX for advanced audio manipulation
tool-sox-commands - - Build Python audio processing pipelines
tool-python-pipeline - - Use Audacity for visual analysis and manual editing
tool-audacity-workflow - - Install audio forensic toolchain
tool-install-guide - - Automate batch processing workflows
tool-batch-automation - - Measure audio quality metrics
tool-quality-assessment
- - 掌握FFmpeg核心音频命令
tool-ffmpeg-essentials - - 使用SoX进行高级音频处理
tool-sox-commands - - 构建Python音频处理流水线
tool-python-pipeline - - 使用Audacity进行可视化分析与手动编辑
tool-audacity-workflow - - 安装法医音频工具链
tool-install-guide - - 自动化批量处理工作流
tool-batch-automation - - 测量音频质量指标
tool-quality-assessment
Essential Tools
必备工具
| Tool | Purpose | Install |
|---|---|---|
| FFmpeg | Format conversion, filtering | |
| SoX | Noise profiling, effects | |
| Whisper | Speech transcription | |
| librosa | Python audio analysis | |
| noisereduce | ML noise reduction | |
| Audacity | Visual editing | |
| 工具 | 用途 | 安装命令 |
|---|---|---|
| FFmpeg | 格式转换、过滤 | |
| SoX | 噪声分析、效果处理 | |
| Whisper | 语音转录 | |
| librosa | Python音频分析 | |
| noisereduce | 机器学习降噪 | |
| Audacity | 可视化编辑 | |
Workflow Scripts (Recommended)
推荐工作流脚本
Use the bundled scripts to generate objective baselines, create a workflow plan, and verify results.
- - Generate a forensic preflight report (JSON or Markdown).
scripts/preflight_audio.py - - Create a workflow plan template from the preflight report.
scripts/plan_from_preflight.py - - Compare objective metrics between baseline and processed audio.
scripts/compare_audio.py
Example usage:
bash
undefined使用附带的脚本生成客观基准、创建工作流计划并验证结果。
- - 生成法医预检报告(JSON或Markdown格式)。
scripts/preflight_audio.py - - 根据预检报告创建工作流计划模板。
scripts/plan_from_preflight.py - - 比较基准音频与处理后音频的客观指标。
scripts/compare_audio.py
使用示例:
bash
undefined1) Analyze and capture baseline metrics
1) 分析并捕获基准指标
python3 skills/.experimental/audio-voice-recovery/scripts/preflight_audio.py evidence.wav --out preflight.json
python3 skills/.experimental/audio-voice-recovery/scripts/preflight_audio.py evidence.wav --out preflight.json
2) Generate a workflow plan template
2) 生成工作流计划模板
python3 skills/.experimental/audio-voice-recovery/scripts/plan_from_preflight.py --preflight preflight.json --out plan.md
python3 skills/.experimental/audio-voice-recovery/scripts/plan_from_preflight.py --preflight preflight.json --out plan.md
3) Compare baseline vs processed metrics
3) 比较基准与处理后音频的指标
python3 skills/.experimental/audio-voice-recovery/scripts/compare_audio.py
--before evidence.wav
--after enhanced.wav
--format md
--out comparison.md
--before evidence.wav
--after enhanced.wav
--format md
--out comparison.md
undefinedpython3 skills/.experimental/audio-voice-recovery/scripts/compare_audio.py
--before evidence.wav
--after enhanced.wav
--format md
--out comparison.md
--before evidence.wav
--after enhanced.wav
--format md
--out comparison.md
undefinedForensic Preflight Workflow (Do This Before Any Changes)
法医预检工作流(任何修改前必须执行)
Align preflight with SWGDE Best Practices for the Enhancement of Digital Audio (20-a-001) and SWGDE Best Practices for Forensic Audio (08-a-001).
Establish an objective baseline state and plan the workflow so processing does not introduce clipping, artifacts, or false "done" confidence.
Use to capture baseline metrics and preserve the report with the case file.
scripts/preflight_audio.pyCapture and record before processing:
- Record evidence identity and integrity: path, filename, file size, SHA-256 checksum, source, format/container, codec
- Record signal integrity: sample rate, bit depth, channels, duration
- Measure baseline loudness and levels: LUFS/LKFS, true peak, peak, RMS, dynamic range, DC offset
- Detect clipping and document clipped-sample percentage, peak headroom, exact time ranges
- Identify noise profile: stationary vs non-stationary, dominant noise bands, SNR estimate
- Locate the region of interest (ROI) and document time ranges and changes over time
- Inspect spectral content and estimate speech-band energy and intelligibility risk
- Scan for temporal defects: dropouts, discontinuities, splices, drift
- Evaluate channel correlation and phase anomalies (if stereo)
- Extract and preserve metadata: timestamps, device/model tags, embedded notes
Procedure:
- Prepare a forensic working copy, verify hashes, and preserve the original untouched.
- Locate ROI and target signal; document exact time ranges and changes across the recording.
- Assess challenges to intelligibility and signal quality; map challenges to mitigation strategies.
- Identify required processing and plan a workflow order that avoids unwanted artifacts.
Generate a plan draft with and complete it with case-specific decisions.
scripts/plan_from_preflight.py - Measure baseline loudness and true peak per ITU-R BS.1770 / EBU R 128 and record peak/RMS/DC offset.
- Detect clipping and dropouts; if clipping is present, declip first or pause and document limitations.
- Inspect spectral content and noise type; collect representative noise profile segments and estimate SNR.
- If stereo, evaluate channel correlation and phase; document anomalies.
- Create a baseline listening log (multiple devices) and define success criteria for intelligibility and listenability.
Failure-pattern guardrails:
- Do not process until every preflight field is captured.
- Document every process, setting, software version, and time segment to enable repeatability.
- Compare each processed output to the unprocessed input and assess progress toward intelligibility and listenability.
- Avoid over-processing; review removed signal (filter residue) to avoid removing target signal components.
- Keep intermediate files uncompressed and preserve sample rate/bit depth when moving between tools.
- Perform a final review against the original; if unsatisfactory, revise or stop and report limitations.
- If the request is not achievable, communicate limitations and do not declare completion.
- Require objective metrics and A/B listening before declaring completion.
- Do not rely solely on objective metrics; corroborate with critical listening.
- Take listening breaks to avoid ear fatigue during extended reviews.
预检流程需符合SWGDE《数字音频增强最佳实践》(20-a-001)与SWGDE《法医音频最佳实践》(08-a-001)。
建立客观基准状态并规划工作流,确保处理过程不会引入削波、伪影或错误的"完成"置信度。
使用捕获基准指标,并将报告与案件文件一同保存。
scripts/preflight_audio.py处理前需捕获并记录:
- 证据标识与完整性:路径、文件名、文件大小、SHA-256校验和、来源、格式/容器、编解码器
- 信号完整性:采样率、位深度、声道数、时长
- 测量基准响度与电平:LUFS/LKFS、真峰值、峰值、RMS、动态范围、直流偏移
- 检测削波并记录削波样本百分比、峰值余量、精确时间范围
- 识别噪声特征:平稳/非平稳、主导噪声频段、SNR估计值
- 定位感兴趣区域(ROI)并记录时间范围及随时间的变化
- 检查频谱内容并估计语音频段能量与可懂度风险
- 扫描时域缺陷:信号丢失、不连续、拼接、漂移
- 评估声道相关性与相位异常(若为立体声)
- 提取并保存元数据:时间戳、设备/型号标签、嵌入注释
步骤:
- 准备法医工作副本,验证哈希值,保留原始文件未被修改。
- 定位ROI与目标信号;记录精确时间范围及录音中的变化。
- 评估可懂度与信号质量的挑战;将挑战映射到缓解策略。
- 确定所需处理并规划工作流顺序,避免产生不必要的伪影。使用生成计划草稿,并结合案件特定决策完善。
scripts/plan_from_preflight.py - 根据ITU-R BS.1770 / EBU R 128测量基准响度与真峰值,并记录峰值/RMS/直流偏移。
- 检测削波与信号丢失;若存在削波,先进行修复或暂停并记录限制。
- 检查频谱内容与噪声类型;收集代表性噪声特征片段并估计SNR。
- 若为立体声,评估声道相关性与相位;记录异常情况。
- 创建基准监听日志(使用多个设备),并定义可懂度与可听度的成功标准。
失败模式防护措施:
- 除非所有预检字段都已捕获,否则不得进行处理。
- 记录每一个处理步骤、设置、软件版本与时间片段,确保可重复性。
- 将每个处理后的输出与未处理的输入进行比较,评估向可懂度与可听度进展的情况。
- 避免过度处理;检查被移除的信号(滤波器残留),以免移除目标信号组件。
- 保留未压缩的中间文件,在工具间切换时保留采样率/位深度。
- 与原始文件进行最终审查;若不满意,修改或停止并报告限制。
- 若请求无法实现,沟通限制条件并不得宣布完成。
- 宣布完成前需有客观指标与A/B监听结果。
- 不得仅依赖客观指标;需结合关键监听进行佐证。
- 长时间审查时需休息,避免耳朵疲劳。
Quick Enhancement Pipeline
快速增强流水线
bash
undefinedbash
undefined1. Analyze original (run preflight and capture baseline metrics)
1. 分析原始音频(运行预检并捕获基准指标)
python3 skills/.experimental/audio-voice-recovery/scripts/preflight_audio.py evidence.wav --out preflight.json
python3 skills/.experimental/audio-voice-recovery/scripts/preflight_audio.py evidence.wav --out preflight.json
2. Create working copy with checksum
2. 创建带校验和的工作副本
cp evidence.wav working.wav
sha256sum evidence.wav > evidence.sha256
cp evidence.wav working.wav
sha256sum evidence.wav > evidence.sha256
3. Apply enhancement
3. 应用增强处理
ffmpeg -i working.wav -af "
highpass=f=80,
adeclick=w=55:o=75,
afftdn=nr=12:nf=-30:nt=w,
equalizer=f=2500:t=q:w=1:g=3,
loudnorm=I=-16:TP=-1.5:LRA=11
" enhanced.wav
highpass=f=80,
adeclick=w=55:o=75,
afftdn=nr=12:nf=-30:nt=w,
equalizer=f=2500:t=q:w=1:g=3,
loudnorm=I=-16:TP=-1.5:LRA=11
" enhanced.wav
ffmpeg -i working.wav -af "
highpass=f=80,
adeclick=w=55:o=75,
afftdn=nr=12:nf=-30:nt=w,
equalizer=f=2500:t=q:w=1:g=3,
loudnorm=I=-16:TP=-1.5:LRA=11
" enhanced.wav
highpass=f=80,
adeclick=w=55:o=75,
afftdn=nr=12:nf=-30:nt=w,
equalizer=f=2500:t=q:w=1:g=3,
loudnorm=I=-16:TP=-1.5:LRA=11
" enhanced.wav
4. Transcribe
4. 转录音频
whisper enhanced.wav --model large-v3 --language en
whisper enhanced.wav --model large-v3 --language en
5. Verify original unchanged
5. 验证原始文件未被修改
sha256sum -c evidence.sha256
sha256sum -c evidence.sha256
6. Verify improvement (objective comparison + A/B listening)
6. 验证改进效果(客观比较 + A/B监听)
python3 skills/.experimental/audio-voice-recovery/scripts/compare_audio.py
--before evidence.wav
--after enhanced.wav
--format md
--out comparison.md
--before evidence.wav
--after enhanced.wav
--format md
--out comparison.md
undefinedpython3 skills/.experimental/audio-voice-recovery/scripts/compare_audio.py
--before evidence.wav
--after enhanced.wav
--format md
--out comparison.md
--before evidence.wav
--after enhanced.wav
--format md
--out comparison.md
undefinedHow to Use
使用方法
Read individual reference files for detailed explanations and code examples:
- Section definitions - Category structure and impact levels
- Rule template - Template for adding new rules
阅读单个参考文件获取详细说明与代码示例:
- Section definitions - 类别结构与影响级别说明
- Rule template - 添加新规则的模板
Reference Files
参考文件
| File | Description |
|---|---|
| AGENTS.md | Complete compiled guide with all rules |
| references/_sections.md | Category definitions and ordering |
| assets/templates/_template.md | Template for new rules |
| metadata.json | Version and reference information |
| 文件 | 描述 |
|---|---|
| AGENTS.md | 包含所有规则的完整编译指南 |
| references/_sections.md | 类别定义与排序说明 |
| assets/templates/_template.md | 新规则模板 |
| metadata.json | 版本与参考信息 |