yara-rule-authoring
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseYARA Rule Authoring
YARA规则创作
Write YARA-X rules that catch the intended family without drowning analysts in false positives.
Target runtime: YARA-X (Rust successor to legacy YARA). Install:orbrew install yara-x. Essential CLI:cargo install yara-x,yr check,yr scan,yr fmt.yr dump
编写YARA-X规则,精准捕获目标恶意软件家族,同时避免给分析人员带来大量误报。
目标运行环境: YARA-X(传统YARA的Rust继任版本)。安装方式:或brew install yara-x。核心CLI命令:cargo install yara-x、yr check、yr scan、yr fmt。yr dump
When to Use
适用场景
- Write, review, or optimize YARA-X rules for malware, hacktools, webshells, or supply-chain artifacts
- Convert IOCs or threat intel into maintainable signatures
- Debug false positives or tune /
any oflogicall of - Migrate legacy YARA rules to YARA-X stricter validation
- Author Chrome extension () or Android DEX (
crx) module rulesdex - Prepare rulesets for production, YARA-CI, or VirusTotal retrohunt
- 为恶意软件、黑客工具、webshell或供应链工件编写、审核或优化YARA-X规则
- 将IOC(指标)或威胁情报转换为可维护的签名
- 调试误报或调整/
any of逻辑all of - 将传统YARA规则迁移至YARA-X的严格验证体系
- 编写Chrome扩展()或Android DEX(
crx)模块规则dex - 为生产环境、YARA-CI或VirusTotal回溯狩猎准备规则集
When NOT to Use
不适用场景
- Full malware reverse engineering, disassembly, or unpacker development →
reverse-engineer - Network intrusion detection (Suricata, Snort, Zeek) → network security / SOC tooling skills
- Memory forensics with Volatility or live RAM analysis →
digital-forensics-analyst - Hash-only blocklists with no pattern logic → use IOC lists or EDR hash feeds
- Enterprise security strategy, GRC, or audit evidence → ,
cybersecuritycompliance-engineer - Embedding YARA in CI/CD pipelines as the primary task →
devsecops - Adversarial LLM or application red team →
ai-redteam
- 完整的恶意软件逆向工程、反汇编或脱壳工具开发 → 请使用技能
reverse-engineer - 网络入侵检测(Suricata、Snort、Zeek) → 属于网络安全/SOC工具类技能
- 使用Volatility进行内存取证或实时RAM分析 → 请使用技能
digital-forensics-analyst - 仅基于哈希的阻止列表(无模式逻辑) → 使用IOC列表或EDR哈希馈送
- 企业安全战略、GRC(治理、风险与合规)或审计证据 → 请使用、
cybersecurity技能compliance-engineer - 将YARA作为主要任务嵌入CI/CD流水线 → 请使用技能
devsecops - 对抗性LLM或应用红队测试 → 请使用技能
ai-redteam
Related skills
相关技能
| Need | Skill |
|---|---|
| Security program, IR strategy, detection philosophy | |
| SIEM/EDR rules, logging, control implementation | |
| Audit evidence, control mapping, CCM | |
| Pipeline gates, artifact scanning, SBOM | |
| Binary RE, unpacking, patch diff | |
| SOC alert triage and detection tuning (non-YARA) | |
| Proactive threat hunts and ATT&CK campaigns | |
| CTI briefs, IOC/TTP production | |
| Adversarial AI / prompt injection testing | |
| Disk imaging and forensic reports | |
| 需求 | 技能 |
|---|---|
| 安全程序、事件响应策略、检测理念 | |
| SIEM/EDR规则、日志记录、控制措施实施 | |
| 审计证据、控制映射、CCM | |
| 流水线闸门、工件扫描、SBOM | |
| 二进制逆向工程、脱壳、补丁差异分析 | |
| SOC告警分诊与检测调优(非YARA相关) | |
| 主动威胁狩猎与ATT&CK战役 | |
| CTI简报、IOC/TTP生成 | |
| 对抗性AI/提示注入测试 | |
| 磁盘镜像与取证报告 | |
Core principles
核心原则
- Atoms matter — YARA extracts 4-byte subsequences for Aho-Corasick prefilter. Strings with repeated bytes, common sequences, or under 4 bytes force expensive bytecode verification on too many files.
- Family-specific, not category-generic — "Detects ransomware" matches everything and nothing. Target identifiable mutexes, PDB paths, C2 paths, or structural markers for one family or campaign.
- Goodware before production — Validate against ecosystem-appropriate clean corpus (VT goodware for PE; top npm packages for JS; marketplace extensions for CRX).
- Short-circuit cheap checks first — → magic bytes → strings → modules.
filesize - Metadata is documentation — Name, description, author, reference, and date survive personnel changes.
- 原子至关重要 —— YARA提取4字节子序列用于Aho-Corasick预过滤。包含重复字节、常见序列或长度不足4字节的字符串会导致对过多文件执行昂贵的字节码验证。
- 针对家族而非通用类别 —— “检测勒索软件”这类规则范围过大,几乎匹配所有内容又等于没匹配。应针对特定家族或攻击活动的可识别互斥体、PDB路径、C2路径或结构标记。
- 生产前先验证良性软件 —— 使用适合生态的干净语料库进行验证(PE文件用VT良性软件;JS文件用顶级npm包;CRX用应用商店扩展)。
- 先执行低成本短路检查 —— 顺序为→ 魔术字节 → 字符串 → 模块。
filesize - 元数据即文档 —— 名称、描述、作者、参考链接和日期信息能在人员变动后保留下来。
Essential toolkit
核心工具集
| Tool | Purpose |
|---|---|
| yarGen | Candidate strings from samples ( |
| FLOSS | Obfuscated/stack strings when yarGen fails |
| yr | |
| YARA-CI / VT retrohunt | Goodware corpus testing before deploy |
| 工具 | 用途 |
|---|---|
| yarGen | 从样本中提取候选字符串( |
| FLOSS | 当yarGen失效时,提取混淆/栈字符串 |
| yr | 执行 |
| YARA-CI / VT回溯狩猎 | 部署前进行良性软件语料库测试 |
Core workflows
核心工作流
1. Scope samples and file type
1. 确定样本范围与文件类型
- Collect 3+ variants when possible (single-sample rules are brittle)
- Check packing: entropy > 7.0 or few strings → unpack or target packer/structure, not encrypted layer
- Choose platform path: PE magic / JS / Office ZIP / /
import "crx"import "dex"
See for install, CLI workflow, and migration.
references/yara_x_scope_and_tooling.md- 尽可能收集3个及以上变体(单样本规则稳定性差)
- 检查是否加壳:熵值>7.0或字符串极少 → 先脱壳,或针对加壳程序/结构而非加密层
- 选择平台路径:PE魔术字节 / JS / Office ZIP / /
import "crx"import "dex"
安装、CLI工作流与迁移相关内容请参见。
references/yara_x_scope_and_tooling.md2. Extract and filter strings
2. 提取并过滤字符串
- Run yarGen or FLOSS on unpacked samples
- Reject ~80% of yarGen output: API names, , format strings,
C:\Windows\/requirealonefetch - Prefer gold tier: mutex names, PDB paths, stack strings; silver: C2 paths, config markers
See for decision trees and modifiers.
references/string_selection_and_atoms.md- 对脱壳样本运行yarGen或FLOSS
- 剔除约80%的yarGen输出:API名称、、格式化字符串、单独的
C:\Windows\/requirefetch - 优先选择黄金级字符串:互斥体名称、PDB路径、栈字符串;次选白银级:C2路径、配置标记
字符串选择决策树与修饰符相关内容请参见。
references/string_selection_and_atoms.md3. Write rule with ordered conditions
3. 编写带有序条件的规则
yara
rule MAL_Win_Example_Loader_Jan26
{
meta:
description = "Detects Example loader via unique mutex and config path"
author = "Team <team@example.com>"
reference = "https://example.com/analysis"
date = "2026-01-15"
strings:
$mutex = "Global\\ExampleMutex" ascii wide
$cfg = "/api/beacon/check" ascii
condition:
filesize < 10MB and
uint16(0) == 0x5A4D and
all of ($mutex, $cfg)
}Condition order: → magic bytes → string matches → module calls (, , ).
filesizepecrxdexSee for atom theory, regex bounds, and loops.
references/conditions_and_performance.mdyara
rule MAL_Win_Example_Loader_Jan26
{
meta:
description = "Detects Example loader via unique mutex and config path"
author = "Team <team@example.com>"
reference = "https://example.com/analysis"
date = "2026-01-15"
strings:
$mutex = "Global\\ExampleMutex" ascii wide
$cfg = "/api/beacon/check" ascii
condition:
filesize < 10MB and
uint16(0) == 0x5A4D and
all of ($mutex, $cfg)
}条件顺序: → 魔术字节 → 字符串匹配 → 模块调用(、、)。
filesizepecrxdex原子理论、正则表达式边界与循环相关内容请参见。
references/conditions_and_performance.md4. Validate and test
4. 验证与测试
bash
yr check rule.yar && yr fmt -w rule.yar
yr scan -s rule.yar malware_samples/ # must match all targets
yr scan -c rule.yar goodware_corpus/ # must be zeroFP flow: on false positive → identify matching string → tighten, exclude vendor, or pivot to structure.
yr scan -sSee for corpus selection and investigation.
references/testing_goodware_and_fp_debugging.mdbash
yr check rule.yar && yr fmt -w rule.yar
yr scan -s rule.yar malware_samples/ # 必须匹配所有目标样本
yr scan -c rule.yar goodware_corpus/ # 必须零匹配误报排查流程: 对误报文件执行 → 识别匹配的字符串 → 收紧规则、排除厂商或转向结构特征。
yr scan -s语料库选择与调查相关内容请参见。
references/testing_goodware_and_fp_debugging.md5. Platform modules (when applicable)
5. 平台模块(适用时)
- Chrome extensions: — permissions,
import "crx"(v1.11.0+). Alwayspermhash()first.crx.is_crx - Android: —
import "dex",dex.contains_class(),contains_method(). API differs from legacy YARA dex module.contains_string()
See .
references/platform_modules_pe_crx_dex.md- Chrome扩展: —— 权限、
import "crx"(v1.11.0+)。务必先检查permhash()。crx.is_crx - Android: ——
import "dex"、dex.contains_class()、contains_method()。API与传统YARA的dex模块不同。contains_string()
相关内容请参见。
references/platform_modules_pe_crx_dex.md6. Deploy
6. 部署
- Naming: (e.g.
{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE})MAL_Win_Emotet_Loader_Jan26 - Peer review + quality checklist
- Monitor production FPs; version rules in Git with full metadata
See .
references/style_metadata_and_deployment.md- 命名规则:(例如
{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE})MAL_Win_Emotet_Loader_Jan26 - 同行评审 + 质量检查清单
- 监控生产环境中的误报;在Git中对规则进行版本控制并保留完整元数据
命名、元数据与部署相关内容请参见。
references/style_metadata_and_deployment.mdDecision trees (quick reference)
决策树(快速参考)
Is this string good enough?
该字符串是否足够优质?
< 4 bytes? → reject
Repeated bytes (0000, 9090)? → reject
API name or common path? → reject
Unique to family? → use
Common across malware? → combine with family-specific marker长度<4字节? → 剔除
包含重复字节(0000、9090)? → 剔除
是API名称或常见路径? → 剔除
对家族唯一? → 使用
在恶意软件中常见? → 与家族特定标记组合使用any of
vs all of
any ofall ofany of
vs all of
any ofall of- Individually unique strings →
any of ($a*) - Common strings that are suspicious only together →
all of ($a*) - Mixed confidence →
all of ($core_*) and any of ($variant_*)
Production lesson: with , , matches most web apps — require credential path and exfil destination and network call.
any of ($network_*)fetchaxioshttp- 字符串各自唯一 →
any of ($a*) - 单独常见但组合后可疑的字符串 →
all of ($a*) - 混合置信度 →
all of ($core_*) and any of ($variant_*)
生产经验:包含、、的会匹配大多数Web应用——需同时匹配凭证路径 和 数据泄露目标 和 网络调用。
fetchaxioshttpany of ($network_*)When strings fail → pivot
当字符串失效时 → 转向其他特征
Use for sections, imports, imphash, resources; on sections; packer signatures. If nothing unique remains, YARA alone may not be the right control.
yr dump -m pemath.entropy()使用查看节区、导入表、imphash、资源;对节区使用;加壳程序签名。如果没有唯一特征,仅靠YARA可能不是合适的控制手段。
yr dump -m pemath.entropy()Legacy YARA migration
传统YARA迁移
bash
yr check --relaxed-re-syntax rules/ # diagnostic only
yr check rules/ # fix until cleanCommon fixes: escape in regex; base64 strings need 3+ chars; → ; remove duplicate modifiers.
\{@a[-1]@a[#a - 1]bash
yr check --relaxed-re-syntax rules/ # 仅用于诊断
yr check rules/ # 修复直至无错误常见修复:转义正则中的;base64字符串需至少3个字符; → ;移除重复修饰符。
\{@a[-1]@a[#a - 1]Rationalizations to reject
需拒绝的不合理理由
| Thought | Reality |
|---|---|
| "yarGen gave me these strings" | yarGen suggests; you validate each string |
| "It works on 10 samples" | Test goodware corpus before deploy |
| "I'll tighten after FPs" | FPs burn trust — write tight rules upfront |
| "This API name is malicious" | Legitimate software uses the same APIs |
| "any of them is fine" | Common strings + |
| 想法 | 实际情况 |
|---|---|
| "yarGen给了我这些字符串" | yarGen只是提供建议;你需要逐个验证字符串 |
| "它在10个样本上有效" | 部署前需测试良性软件语料库 |
| "我会在出现误报后再收紧规则" | 误报会消耗信任——从一开始就编写严谨的规则 |
| "这个API名称是恶意的" | 合法软件也会使用相同的API |
| "any of就足够了" | 常见字符串+ |
Quality checklist
质量检查清单
- Name follows
{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE} - Description starts with "Detects" and states distinguishing feature
- Required meta: author, reference, date
- Strings ≥4 bytes with good atoms; no unbounded regex ()
.* - Condition: and magic bytes before modules
filesize - Matches all target samples; zero goodware matches
- and
yr checkpassyr fmt --check - Peer review completed
- 名称遵循格式
{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE} - 描述以"Detects"开头并说明区分特征
- 包含必填元数据:作者、参考链接、日期
- 字符串长度≥4字节且原子质量良好;无无界正则表达式()
.* - 条件:和魔术字节在模块调用之前
filesize - 匹配所有目标样本;零良性软件匹配
- 通过和
yr check检查yr fmt --check - 已完成同行评审
When to load references
何时查阅参考文档
| Topic | Reference |
|---|---|
| YARA-X install, CLI, migration, toolkit | |
| String quality, types, modifiers | |
| Atoms, condition order, regex, loops | |
| PE, macOS, JS, crx, dex patterns | |
| Goodware testing, FP debugging | |
| Naming, metadata, deployment | |
| 主题 | 参考文档 |
|---|---|
| YARA-X安装、CLI、迁移、工具集 | |
| 字符串质量、类型、修饰符 | |
| 原子、条件顺序、正则表达式、循环 | |
| PE、macOS、JS、crx、dex模式 | |
| 良性软件测试、误报调试 | |
| 命名、元数据、部署 | |