yara-rule-authoring

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

YARA Rule Authoring

YARA规则创作

Write YARA-X rules that catch the intended family without drowning analysts in false positives.
Target runtime: YARA-X (Rust successor to legacy YARA). Install:
brew install yara-x
or
cargo install yara-x
. Essential CLI:
yr check
,
yr scan
,
yr fmt
,
yr dump
.
编写YARA-X规则,精准捕获目标恶意软件家族,同时避免给分析人员带来大量误报。
目标运行环境: YARA-X(传统YARA的Rust继任版本)。安装方式:
brew install yara-x
cargo install yara-x
。核心CLI命令:
yr check
yr scan
yr fmt
yr dump

When to Use

适用场景

  • Write, review, or optimize YARA-X rules for malware, hacktools, webshells, or supply-chain artifacts
  • Convert IOCs or threat intel into maintainable signatures
  • Debug false positives or tune
    any of
    /
    all of
    logic
  • Migrate legacy YARA rules to YARA-X stricter validation
  • Author Chrome extension (
    crx
    ) or Android DEX (
    dex
    ) module rules
  • Prepare rulesets for production, YARA-CI, or VirusTotal retrohunt
  • 为恶意软件、黑客工具、webshell或供应链工件编写、审核或优化YARA-X规则
  • 将IOC(指标)或威胁情报转换为可维护的签名
  • 调试误报或调整
    any of
    /
    all of
    逻辑
  • 将传统YARA规则迁移至YARA-X的严格验证体系
  • 编写Chrome扩展(
    crx
    )或Android DEX(
    dex
    )模块规则
  • 为生产环境、YARA-CI或VirusTotal回溯狩猎准备规则集

When NOT to Use

不适用场景

  • Full malware reverse engineering, disassembly, or unpacker development →
    reverse-engineer
  • Network intrusion detection (Suricata, Snort, Zeek) → network security / SOC tooling skills
  • Memory forensics with Volatility or live RAM analysis →
    digital-forensics-analyst
  • Hash-only blocklists with no pattern logic → use IOC lists or EDR hash feeds
  • Enterprise security strategy, GRC, or audit evidence →
    cybersecurity
    ,
    compliance-engineer
  • Embedding YARA in CI/CD pipelines as the primary task →
    devsecops
  • Adversarial LLM or application red team →
    ai-redteam
  • 完整的恶意软件逆向工程、反汇编或脱壳工具开发 → 请使用
    reverse-engineer
    技能
  • 网络入侵检测(Suricata、Snort、Zeek) → 属于网络安全/SOC工具类技能
  • 使用Volatility进行内存取证或实时RAM分析 → 请使用
    digital-forensics-analyst
    技能
  • 仅基于哈希的阻止列表(无模式逻辑) → 使用IOC列表或EDR哈希馈送
  • 企业安全战略、GRC(治理、风险与合规)或审计证据 → 请使用
    cybersecurity
    compliance-engineer
    技能
  • 将YARA作为主要任务嵌入CI/CD流水线 → 请使用
    devsecops
    技能
  • 对抗性LLM或应用红队测试 → 请使用
    ai-redteam
    技能

Related skills

相关技能

NeedSkill
Security program, IR strategy, detection philosophy
cybersecurity
SIEM/EDR rules, logging, control implementation
information-security-engineer
Audit evidence, control mapping, CCM
compliance-engineer
Pipeline gates, artifact scanning, SBOM
devsecops
Binary RE, unpacking, patch diff
reverse-engineer
SOC alert triage and detection tuning (non-YARA)
defensive-security-analyst
,
soc-analyst
Proactive threat hunts and ATT&CK campaigns
threat-hunter
CTI briefs, IOC/TTP production
cti-analyst
Adversarial AI / prompt injection testing
ai-redteam
Disk imaging and forensic reports
digital-forensics-analyst
需求技能
安全程序、事件响应策略、检测理念
cybersecurity
SIEM/EDR规则、日志记录、控制措施实施
information-security-engineer
审计证据、控制映射、CCM
compliance-engineer
流水线闸门、工件扫描、SBOM
devsecops
二进制逆向工程、脱壳、补丁差异分析
reverse-engineer
SOC告警分诊与检测调优(非YARA相关)
defensive-security-analyst
soc-analyst
主动威胁狩猎与ATT&CK战役
threat-hunter
CTI简报、IOC/TTP生成
cti-analyst
对抗性AI/提示注入测试
ai-redteam
磁盘镜像与取证报告
digital-forensics-analyst

Core principles

核心原则

  1. Atoms matter — YARA extracts 4-byte subsequences for Aho-Corasick prefilter. Strings with repeated bytes, common sequences, or under 4 bytes force expensive bytecode verification on too many files.
  2. Family-specific, not category-generic — "Detects ransomware" matches everything and nothing. Target identifiable mutexes, PDB paths, C2 paths, or structural markers for one family or campaign.
  3. Goodware before production — Validate against ecosystem-appropriate clean corpus (VT goodware for PE; top npm packages for JS; marketplace extensions for CRX).
  4. Short-circuit cheap checks first
    filesize
    → magic bytes → strings → modules.
  5. Metadata is documentation — Name, description, author, reference, and date survive personnel changes.
  1. 原子至关重要 —— YARA提取4字节子序列用于Aho-Corasick预过滤。包含重复字节、常见序列或长度不足4字节的字符串会导致对过多文件执行昂贵的字节码验证。
  2. 针对家族而非通用类别 —— “检测勒索软件”这类规则范围过大,几乎匹配所有内容又等于没匹配。应针对特定家族或攻击活动的可识别互斥体、PDB路径、C2路径或结构标记。
  3. 生产前先验证良性软件 —— 使用适合生态的干净语料库进行验证(PE文件用VT良性软件;JS文件用顶级npm包;CRX用应用商店扩展)。
  4. 先执行低成本短路检查 —— 顺序为
    filesize
    → 魔术字节 → 字符串 → 模块。
  5. 元数据即文档 —— 名称、描述、作者、参考链接和日期信息能在人员变动后保留下来。

Essential toolkit

核心工具集

ToolPurpose
yarGenCandidate strings from samples (
--excludegood
); always
yr check
output
FLOSSObfuscated/stack strings when yarGen fails
yr
yr check
,
yr scan -s
,
yr fmt
,
yr dump -m pe|crx|dex
YARA-CI / VT retrohuntGoodware corpus testing before deploy
工具用途
yarGen从样本中提取候选字符串(
--excludegood
参数);务必对输出执行
yr check
FLOSS当yarGen失效时,提取混淆/栈字符串
yr执行
yr check
yr scan -s
yr fmt
yr dump -m pe|crx|dex
命令
YARA-CI / VT回溯狩猎部署前进行良性软件语料库测试

Core workflows

核心工作流

1. Scope samples and file type

1. 确定样本范围与文件类型

  1. Collect 3+ variants when possible (single-sample rules are brittle)
  2. Check packing: entropy > 7.0 or few strings → unpack or target packer/structure, not encrypted layer
  3. Choose platform path: PE magic / JS / Office ZIP /
    import "crx"
    /
    import "dex"
See
references/yara_x_scope_and_tooling.md
for install, CLI workflow, and migration.
  1. 尽可能收集3个及以上变体(单样本规则稳定性差)
  2. 检查是否加壳:熵值>7.0或字符串极少 → 先脱壳,或针对加壳程序/结构而非加密层
  3. 选择平台路径:PE魔术字节 / JS / Office ZIP /
    import "crx"
    /
    import "dex"
安装、CLI工作流与迁移相关内容请参见
references/yara_x_scope_and_tooling.md

2. Extract and filter strings

2. 提取并过滤字符串

  1. Run yarGen or FLOSS on unpacked samples
  2. Reject ~80% of yarGen output: API names,
    C:\Windows\
    , format strings,
    require
    /
    fetch
    alone
  3. Prefer gold tier: mutex names, PDB paths, stack strings; silver: C2 paths, config markers
See
references/string_selection_and_atoms.md
for decision trees and modifiers.
  1. 对脱壳样本运行yarGen或FLOSS
  2. 剔除约80%的yarGen输出:API名称、
    C:\Windows\
    、格式化字符串、单独的
    require
    /
    fetch
  3. 优先选择黄金级字符串:互斥体名称、PDB路径、栈字符串;次选白银级:C2路径、配置标记
字符串选择决策树与修饰符相关内容请参见
references/string_selection_and_atoms.md

3. Write rule with ordered conditions

3. 编写带有序条件的规则

yara
rule MAL_Win_Example_Loader_Jan26
{
    meta:
        description = "Detects Example loader via unique mutex and config path"
        author = "Team <team@example.com>"
        reference = "https://example.com/analysis"
        date = "2026-01-15"

    strings:
        $mutex = "Global\\ExampleMutex" ascii wide
        $cfg  = "/api/beacon/check" ascii

    condition:
        filesize < 10MB and
        uint16(0) == 0x5A4D and
        all of ($mutex, $cfg)
}
Condition order:
filesize
→ magic bytes → string matches → module calls (
pe
,
crx
,
dex
).
See
references/conditions_and_performance.md
for atom theory, regex bounds, and loops.
yara
rule MAL_Win_Example_Loader_Jan26
{
    meta:
        description = "Detects Example loader via unique mutex and config path"
        author = "Team <team@example.com>"
        reference = "https://example.com/analysis"
        date = "2026-01-15"

    strings:
        $mutex = "Global\\ExampleMutex" ascii wide
        $cfg  = "/api/beacon/check" ascii

    condition:
        filesize < 10MB and
        uint16(0) == 0x5A4D and
        all of ($mutex, $cfg)
}
条件顺序:
filesize
→ 魔术字节 → 字符串匹配 → 模块调用(
pe
crx
dex
)。
原子理论、正则表达式边界与循环相关内容请参见
references/conditions_and_performance.md

4. Validate and test

4. 验证与测试

bash
yr check rule.yar && yr fmt -w rule.yar
yr scan -s rule.yar malware_samples/    # must match all targets
yr scan -c rule.yar goodware_corpus/    # must be zero
FP flow:
yr scan -s
on false positive → identify matching string → tighten, exclude vendor, or pivot to structure.
See
references/testing_goodware_and_fp_debugging.md
for corpus selection and investigation.
bash
yr check rule.yar && yr fmt -w rule.yar
yr scan -s rule.yar malware_samples/    # 必须匹配所有目标样本
yr scan -c rule.yar goodware_corpus/    # 必须零匹配
误报排查流程: 对误报文件执行
yr scan -s
→ 识别匹配的字符串 → 收紧规则、排除厂商或转向结构特征。
语料库选择与调查相关内容请参见
references/testing_goodware_and_fp_debugging.md

5. Platform modules (when applicable)

5. 平台模块(适用时)

  • Chrome extensions:
    import "crx"
    — permissions,
    permhash()
    (v1.11.0+). Always
    crx.is_crx
    first.
  • Android:
    import "dex"
    dex.contains_class()
    ,
    contains_method()
    ,
    contains_string()
    . API differs from legacy YARA dex module.
See
references/platform_modules_pe_crx_dex.md
.
  • Chrome扩展:
    import "crx"
    —— 权限、
    permhash()
    (v1.11.0+)。务必先检查
    crx.is_crx
  • Android:
    import "dex"
    ——
    dex.contains_class()
    contains_method()
    contains_string()
    。API与传统YARA的dex模块不同。
相关内容请参见
references/platform_modules_pe_crx_dex.md

6. Deploy

6. 部署

  1. Naming:
    {CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}
    (e.g.
    MAL_Win_Emotet_Loader_Jan26
    )
  2. Peer review + quality checklist
  3. Monitor production FPs; version rules in Git with full metadata
See
references/style_metadata_and_deployment.md
.
  1. 命名规则:
    {CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}
    (例如
    MAL_Win_Emotet_Loader_Jan26
  2. 同行评审 + 质量检查清单
  3. 监控生产环境中的误报;在Git中对规则进行版本控制并保留完整元数据
命名、元数据与部署相关内容请参见
references/style_metadata_and_deployment.md

Decision trees (quick reference)

决策树(快速参考)

Is this string good enough?

该字符串是否足够优质?

< 4 bytes? → reject
Repeated bytes (0000, 9090)? → reject
API name or common path? → reject
Unique to family? → use
Common across malware? → combine with family-specific marker
长度<4字节? → 剔除
包含重复字节(0000、9090)? → 剔除
是API名称或常见路径? → 剔除
对家族唯一? → 使用
在恶意软件中常见? → 与家族特定标记组合使用

any of
vs
all of

any of
vs
all of

  • Individually unique strings →
    any of ($a*)
  • Common strings that are suspicious only together →
    all of ($a*)
  • Mixed confidence →
    all of ($core_*) and any of ($variant_*)
Production lesson:
any of ($network_*)
with
fetch
,
axios
,
http
matches most web apps — require credential path and exfil destination and network call.
  • 字符串各自唯一 →
    any of ($a*)
  • 单独常见但组合后可疑的字符串 →
    all of ($a*)
  • 混合置信度 →
    all of ($core_*) and any of ($variant_*)
生产经验:包含
fetch
axios
http
any of ($network_*)
会匹配大多数Web应用——需同时匹配凭证路径 数据泄露目标 网络调用。

When strings fail → pivot

当字符串失效时 → 转向其他特征

Use
yr dump -m pe
for sections, imports, imphash, resources;
math.entropy()
on sections; packer signatures. If nothing unique remains, YARA alone may not be the right control.
使用
yr dump -m pe
查看节区、导入表、imphash、资源;对节区使用
math.entropy()
;加壳程序签名。如果没有唯一特征,仅靠YARA可能不是合适的控制手段。

Legacy YARA migration

传统YARA迁移

bash
yr check --relaxed-re-syntax rules/   # diagnostic only
yr check rules/                       # fix until clean
Common fixes: escape
\{
in regex; base64 strings need 3+ chars;
@a[-1]
@a[#a - 1]
; remove duplicate modifiers.
bash
yr check --relaxed-re-syntax rules/   # 仅用于诊断
yr check rules/                       # 修复直至无错误
常见修复:转义正则中的
\{
;base64字符串需至少3个字符;
@a[-1]
@a[#a - 1]
;移除重复修饰符。

Rationalizations to reject

需拒绝的不合理理由

ThoughtReality
"yarGen gave me these strings"yarGen suggests; you validate each string
"It works on 10 samples"Test goodware corpus before deploy
"I'll tighten after FPs"FPs burn trust — write tight rules upfront
"This API name is malicious"Legitimate software uses the same APIs
"any of them is fine"Common strings +
any
= FP flood
想法实际情况
"yarGen给了我这些字符串"yarGen只是提供建议;你需要逐个验证字符串
"它在10个样本上有效"部署前需测试良性软件语料库
"我会在出现误报后再收紧规则"误报会消耗信任——从一开始就编写严谨的规则
"这个API名称是恶意的"合法软件也会使用相同的API
"any of就足够了"常见字符串+
any
会导致大量误报

Quality checklist

质量检查清单

  • Name follows
    {CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}
  • Description starts with "Detects" and states distinguishing feature
  • Required meta: author, reference, date
  • Strings ≥4 bytes with good atoms; no unbounded regex (
    .*
    )
  • Condition:
    filesize
    and magic bytes before modules
  • Matches all target samples; zero goodware matches
  • yr check
    and
    yr fmt --check
    pass
  • Peer review completed
  • 名称遵循
    {CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}
    格式
  • 描述以"Detects"开头并说明区分特征
  • 包含必填元数据:作者、参考链接、日期
  • 字符串长度≥4字节且原子质量良好;无无界正则表达式(
    .*
  • 条件:
    filesize
    和魔术字节在模块调用之前
  • 匹配所有目标样本;良性软件匹配
  • 通过
    yr check
    yr fmt --check
    检查
  • 已完成同行评审

When to load references

何时查阅参考文档

TopicReference
YARA-X install, CLI, migration, toolkit
references/yara_x_scope_and_tooling.md
String quality, types, modifiers
references/string_selection_and_atoms.md
Atoms, condition order, regex, loops
references/conditions_and_performance.md
PE, macOS, JS, crx, dex patterns
references/platform_modules_pe_crx_dex.md
Goodware testing, FP debugging
references/testing_goodware_and_fp_debugging.md
Naming, metadata, deployment
references/style_metadata_and_deployment.md
主题参考文档
YARA-X安装、CLI、迁移、工具集
references/yara_x_scope_and_tooling.md
字符串质量、类型、修饰符
references/string_selection_and_atoms.md
原子、条件顺序、正则表达式、循环
references/conditions_and_performance.md
PE、macOS、JS、crx、dex模式
references/platform_modules_pe_crx_dex.md
良性软件测试、误报调试
references/testing_goodware_and_fp_debugging.md
命名、元数据、部署
references/style_metadata_and_deployment.md