analyzing-malicious-pdf-with-peepdf
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAnalyzing Malicious PDF with peepdf
使用peepdf分析恶意PDF
When to Use
适用场景
- When triaging suspicious PDF attachments from phishing emails
- During malware analysis of PDF-based exploit documents
- When extracting embedded JavaScript, shellcode, or executables from PDFs
- For forensic examination of weaponized document artifacts
- When building detection signatures for PDF-based threats
- 对钓鱼邮件中的可疑PDF附件进行分类排查时
- 针对基于PDF的漏洞利用文档进行恶意软件分析时
- 从PDF中提取嵌入的JavaScript、shellcode或可执行文件时
- 对武器化文档工件进行取证检查时
- 构建针对基于PDF的威胁的检测特征时
Prerequisites
前置条件
- Python 3.8+ with peepdf-3 installed (pip install peepdf-3)
- pdfid.py and pdf-parser.py from Didier Stevens suite
- Isolated analysis environment (VM or sandbox)
- Optional: PyV8 for JavaScript emulation within peepdf
- Optional: Pylibemu for shellcode analysis
- 安装了peepdf-3的Python 3.8+环境(使用pip install peepdf-3安装)
- Didier Stevens工具集中的pdfid.py和pdf-parser.py
- 隔离的分析环境(虚拟机或沙箱)
- 可选:用于在peepdf中进行JavaScript模拟的PyV8
- 可选:用于shellcode分析的Pylibemu
Workflow
工作流程
- Triage with pdfid: Scan PDF for suspicious keywords (/JS, /JavaScript, /OpenAction, /Launch, /EmbeddedFile).
- Interactive Analysis: Open PDF in peepdf interactive mode to explore object structure.
- Identify Suspicious Objects: Locate objects containing JavaScript, streams, or encoded data.
- Extract Content: Dump suspicious streams and decode filters (FlateDecode, ASCIIHexDecode).
- Deobfuscate JavaScript: Analyze extracted JS for shellcode, heap sprays, or exploit code.
- Check VirusTotal: Use peepdf vtcheck to cross-reference file hash with AV detections.
- Generate IOCs: Extract URLs, domains, hashes, and shellcode signatures.
- 使用pdfid进行初步排查:扫描PDF文件中的可疑关键字(/JS、/JavaScript、/OpenAction、/Launch、/EmbeddedFile)。
- 交互式分析:在peepdf的交互模式下打开PDF,探索其对象结构。
- 识别可疑对象:定位包含JavaScript、流或编码数据的对象。
- 提取内容:导出可疑流并解码过滤器(FlateDecode、ASCIIHexDecode)。
- 反混淆JavaScript:分析提取的JS代码,查找shellcode、堆喷射或漏洞利用代码。
- 检查VirusTotal:使用peepdf的vtcheck功能,将文件哈希与杀毒软件检测结果进行交叉比对。
- 生成IOC:提取URL、域名、哈希和shellcode特征。
Key Concepts
核心概念
| Concept | Description |
|---|---|
| /OpenAction | Automatic action executed when PDF is opened |
| /JavaScript /JS | Embedded JavaScript code in PDF objects |
| /Launch | Action that launches external applications |
| /EmbeddedFile | File embedded within the PDF structure |
| FlateDecode | zlib compression filter used to hide content |
| Object Streams | PDF objects stored in compressed streams |
| 概念 | 说明 |
|---|---|
| /OpenAction | 打开PDF时自动执行的操作 |
| /JavaScript /JS | 嵌入在PDF对象中的JavaScript代码 |
| /Launch | 启动外部应用程序的操作 |
| /EmbeddedFile | 嵌入在PDF结构中的文件 |
| FlateDecode | 用于隐藏内容的zlib压缩过滤器 |
| Object Streams | 存储在压缩流中的PDF对象 |
Tools & Systems
工具与系统
| Tool | Purpose |
|---|---|
| peepdf / peepdf-3 | Interactive PDF analysis with JS emulation |
| pdfid.py | Quick triage scanning for suspicious keywords |
| pdf-parser.py | Deep object-level PDF parsing |
| VirusTotal | Hash lookup and AV detection cross-reference |
| CyberChef | Decode and transform extracted payloads |
| 工具 | 用途 |
|---|---|
| peepdf / peepdf-3 | 支持JS模拟的交互式PDF分析工具 |
| pdfid.py | 快速初步扫描可疑关键字的工具 |
| pdf-parser.py | 深度PDF对象级解析工具 |
| VirusTotal | 哈希查询与杀毒软件检测结果交叉比对平台 |
| CyberChef | 解码和转换提取的载荷的工具 |
Output Format
输出格式
Analysis Report: PDF-MAL-[DATE]-[SEQ]
File: [filename.pdf]
SHA-256: [hash]
Suspicious Keywords: [/JS, /OpenAction, etc.]
Objects with JavaScript: [Object IDs]
Extracted URLs: [List]
Shellcode Detected: [Yes/No]
Embedded Files: [Count and types]
VirusTotal Detections: [X/Y engines]
Risk Level: [Critical/High/Medium/Low]Analysis Report: PDF-MAL-[DATE]-[SEQ]
File: [filename.pdf]
SHA-256: [hash]
Suspicious Keywords: [/JS, /OpenAction, etc.]
Objects with JavaScript: [Object IDs]
Extracted URLs: [List]
Shellcode Detected: [Yes/No]
Embedded Files: [Count and types]
VirusTotal Detections: [X/Y engines]
Risk Level: [Critical/High/Medium/Low]