protocol-reverse-engineering

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Protocol Reverse Engineering

网络协议逆向工程

Comprehensive techniques for capturing, analyzing, and documenting network protocols for security research, interoperability, and debugging.
本文介绍用于安全研究、互操作性测试和调试的网络协议捕获、分析与文档编写的综合技术。

Traffic Capture

流量捕获

Wireshark Capture

Wireshark 捕获

bash
undefined
bash
undefined

Capture on specific interface

Capture on specific interface

wireshark -i eth0 -k
wireshark -i eth0 -k

Capture with filter

Capture with filter

wireshark -i eth0 -k -f "port 443"
wireshark -i eth0 -k -f "port 443"

Capture to file

Capture to file

tshark -i eth0 -w capture.pcap
tshark -i eth0 -w capture.pcap

Ring buffer capture (rotate files)

Ring buffer capture (rotate files)

tshark -i eth0 -b filesize:100000 -b files:10 -w capture.pcap
undefined
tshark -i eth0 -b filesize:100000 -b files:10 -w capture.pcap
undefined

tcpdump Capture

tcpdump 捕获

bash
undefined
bash
undefined

Basic capture

Basic capture

tcpdump -i eth0 -w capture.pcap
tcpdump -i eth0 -w capture.pcap

With filter

With filter

tcpdump -i eth0 port 8080 -w capture.pcap
tcpdump -i eth0 port 8080 -w capture.pcap

Capture specific bytes

Capture specific bytes

tcpdump -i eth0 -s 0 -w capture.pcap # Full packet
tcpdump -i eth0 -s 0 -w capture.pcap # Full packet

Real-time display

Real-time display

tcpdump -i eth0 -X port 80
undefined
tcpdump -i eth0 -X port 80
undefined

Man-in-the-Middle Capture

中间人捕获

bash
undefined
bash
undefined

mitmproxy for HTTP/HTTPS

mitmproxy for HTTP/HTTPS

mitmproxy --mode transparent -p 8080
mitmproxy --mode transparent -p 8080

SSL/TLS interception

SSL/TLS interception

mitmproxy --mode transparent --ssl-insecure
mitmproxy --mode transparent --ssl-insecure

Dump to file

Dump to file

mitmdump -w traffic.mitm
mitmdump -w traffic.mitm

Burp Suite

Burp Suite

Configure browser proxy to 127.0.0.1:8080

Configure browser proxy to 127.0.0.1:8080

undefined
undefined

Protocol Analysis

协议分析

Wireshark Analysis

Wireshark 分析

undefined
undefined

Display filters

Display filters

tcp.port == 8080 http.request.method == "POST" ip.addr == 192.168.1.1 tcp.flags.syn == 1 && tcp.flags.ack == 0 frame contains "password"
tcp.port == 8080 http.request.method == "POST" ip.addr == 192.168.1.1 tcp.flags.syn == 1 && tcp.flags.ack == 0 frame contains "password"

Following streams

Following streams

Right-click > Follow > TCP Stream Right-click > Follow > HTTP Stream
Right-click > Follow > TCP Stream Right-click > Follow > HTTP Stream

Export objects

Export objects

File > Export Objects > HTTP
File > Export Objects > HTTP

Decryption

Decryption

Edit > Preferences > Protocols > TLS
  • (Pre)-Master-Secret log filename
  • RSA keys list
undefined
Edit > Preferences > Protocols > TLS
  • (Pre)-Master-Secret log filename
  • RSA keys list
undefined

tshark Analysis

tshark 分析

bash
undefined
bash
undefined

Extract specific fields

Extract specific fields

tshark -r capture.pcap -T fields -e ip.src -e ip.dst -e tcp.port
tshark -r capture.pcap -T fields -e ip.src -e ip.dst -e tcp.port

Statistics

Statistics

tshark -r capture.pcap -q -z conv,tcp tshark -r capture.pcap -q -z endpoints,ip
tshark -r capture.pcap -q -z conv,tcp tshark -r capture.pcap -q -z endpoints,ip

Filter and extract

Filter and extract

tshark -r capture.pcap -Y "http" -T json > http_traffic.json
tshark -r capture.pcap -Y "http" -T json > http_traffic.json

Protocol hierarchy

Protocol hierarchy

tshark -r capture.pcap -q -z io,phs
undefined
tshark -r capture.pcap -q -z io,phs
undefined

Scapy for Custom Analysis

基于Scapy的自定义分析

python
from scapy.all import *
python
from scapy.all import *

Read pcap

Read pcap

packets = rdpcap("capture.pcap")
packets = rdpcap("capture.pcap")

Analyze packets

Analyze packets

for pkt in packets: if pkt.haslayer(TCP): print(f"Src: {pkt[IP].src}:{pkt[TCP].sport}") print(f"Dst: {pkt[IP].dst}:{pkt[TCP].dport}") if pkt.haslayer(Raw): print(f"Data: {pkt[Raw].load[:50]}")
for pkt in packets: if pkt.haslayer(TCP): print(f"Src: {pkt[IP].src}:{pkt[TCP].sport}") print(f"Dst: {pkt[IP].dst}:{pkt[TCP].dport}") if pkt.haslayer(Raw): print(f"Data: {pkt[Raw].load[:50]}")

Filter packets

Filter packets

http_packets = [p for p in packets if p.haslayer(TCP) and (p[TCP].sport == 80 or p[TCP].dport == 80)]
http_packets = [p for p in packets if p.haslayer(TCP) and (p[TCP].sport == 80 or p[TCP].dport == 80)]

Create custom packets

Create custom packets

pkt = IP(dst="target")/TCP(dport=80)/Raw(load="GET / HTTP/1.1\r\n") send(pkt)
undefined
pkt = IP(dst="target")/TCP(dport=80)/Raw(load="GET / HTTP/1.1\r\n") send(pkt)
undefined

Protocol Identification

协议识别

Common Protocol Signatures

常见协议特征

HTTP        - "HTTP/1." or "GET " or "POST " at start
TLS/SSL     - 0x16 0x03 (record layer)
DNS         - UDP port 53, specific header format
SMB         - 0xFF 0x53 0x4D 0x42 ("SMB" signature)
SSH         - "SSH-2.0" banner
FTP         - "220 " response, "USER " command
SMTP        - "220 " banner, "EHLO" command
MySQL       - 0x00 length prefix, protocol version
PostgreSQL  - 0x00 0x00 0x00 startup length
Redis       - "*" RESP array prefix
MongoDB     - BSON documents with specific header
HTTP        - "HTTP/1." or "GET " or "POST " at start
TLS/SSL     - 0x16 0x03 (record layer)
DNS         - UDP port 53, specific header format
SMB         - 0xFF 0x53 0x4D 0x42 ("SMB" signature)
SSH         - "SSH-2.0" banner
FTP         - "220 " response, "USER " command
SMTP        - "220 " banner, "EHLO" command
MySQL       - 0x00 length prefix, protocol version
PostgreSQL  - 0x00 0x00 0x00 startup length
Redis       - "*" RESP array prefix
MongoDB     - BSON documents with specific header

Protocol Header Patterns

协议头部模式

+--------+--------+--------+--------+
|  Magic number / Signature         |
+--------+--------+--------+--------+
|  Version       |  Flags          |
+--------+--------+--------+--------+
|  Length        |  Message Type   |
+--------+--------+--------+--------+
|  Sequence Number / Session ID     |
+--------+--------+--------+--------+
|  Payload...                       |
+--------+--------+--------+--------+
+--------+--------+--------+--------+
|  Magic number / Signature         |
+--------+--------+--------+--------+
|  Version       |  Flags          |
+--------+--------+--------+--------+
|  Length        |  Message Type   |
+--------+--------+--------+--------+
|  Sequence Number / Session ID     |
+--------+--------+--------+--------+
|  Payload...                       |
+--------+--------+--------+--------+

Binary Protocol Analysis

二进制协议分析

Structure Identification

结构识别

python
undefined
python
undefined

Common patterns in binary protocols

Common patterns in binary protocols

Length-prefixed message

Length-prefixed message

struct Message { uint32_t length; # Total message length uint16_t msg_type; # Message type identifier uint8_t flags; # Flags/options uint8_t reserved; # Padding/alignment uint8_t payload[]; # Variable-length payload };
struct Message { uint32_t length; # Total message length uint16_t msg_type; # Message type identifier uint8_t flags; # Flags/options uint8_t reserved; # Padding/alignment uint8_t payload[]; # Variable-length payload };

Type-Length-Value (TLV)

Type-Length-Value (TLV)

struct TLV { uint8_t type; # Field type uint16_t length; # Field length uint8_t value[]; # Field data };
struct TLV { uint8_t type; # Field type uint16_t length; # Field length uint8_t value[]; # Field data };

Fixed header + variable payload

Fixed header + variable payload

struct Packet { uint8_t magic[4]; # "ABCD" signature uint32_t version; uint32_t payload_len; uint32_t checksum; # CRC32 or similar uint8_t payload[]; };
undefined
struct Packet { uint8_t magic[4]; # "ABCD" signature uint32_t version; uint32_t payload_len; uint32_t checksum; # CRC32 or similar uint8_t payload[]; };
undefined

Python Protocol Parser

Python协议解析器

python
import struct
from dataclasses import dataclass

@dataclass
class MessageHeader:
    magic: bytes
    version: int
    msg_type: int
    length: int

    @classmethod
    def from_bytes(cls, data: bytes):
        magic, version, msg_type, length = struct.unpack(
            ">4sHHI", data[:12]
        )
        return cls(magic, version, msg_type, length)

def parse_messages(data: bytes):
    offset = 0
    messages = []

    while offset < len(data):
        header = MessageHeader.from_bytes(data[offset:])
        payload = data[offset+12:offset+12+header.length]
        messages.append((header, payload))
        offset += 12 + header.length

    return messages
python
import struct
from dataclasses import dataclass

@dataclass
class MessageHeader:
    magic: bytes
    version: int
    msg_type: int
    length: int

    @classmethod
    def from_bytes(cls, data: bytes):
        magic, version, msg_type, length = struct.unpack(
            ">4sHHI", data[:12]
        )
        return cls(magic, version, msg_type, length)

def parse_messages(data: bytes):
    offset = 0
    messages = []

    while offset < len(data):
        header = MessageHeader.from_bytes(data[offset:])
        payload = data[offset+12:offset+12+header.length]
        messages.append((header, payload))
        offset += 12 + header.length

    return messages

Parse TLV structure

Parse TLV structure

def parse_tlv(data: bytes): fields = [] offset = 0
while offset < len(data):
    field_type = data[offset]
    length = struct.unpack(">H", data[offset+1:offset+3])[0]
    value = data[offset+3:offset+3+length]
    fields.append((field_type, value))
    offset += 3 + length

return fields
undefined
def parse_tlv(data: bytes): fields = [] offset = 0
while offset < len(data):
    field_type = data[offset]
    length = struct.unpack(">H", data[offset+1:offset+3])[0]
    value = data[offset+3:offset+3+length]
    fields.append((field_type, value))
    offset += 3 + length

return fields
undefined

Hex Dump Analysis

十六进制转储分析

python
def hexdump(data: bytes, width: int = 16):
    """Format binary data as hex dump."""
    lines = []
    for i in range(0, len(data), width):
        chunk = data[i:i+width]
        hex_part = ' '.join(f'{b:02x}' for b in chunk)
        ascii_part = ''.join(
            chr(b) if 32 <= b < 127 else '.'
            for b in chunk
        )
        lines.append(f'{i:08x}  {hex_part:<{width*3}}  {ascii_part}')
    return '\n'.join(lines)
python
def hexdump(data: bytes, width: int = 16):
    """Format binary data as hex dump."""
    lines = []
    for i in range(0, len(data), width):
        chunk = data[i:i+width]
        hex_part = ' '.join(f'{b:02x}' for b in chunk)
        ascii_part = ''.join(
            chr(b) if 32 <= b < 127 else '.'
            for b in chunk
        )
        lines.append(f'{i:08x}  {hex_part:<{width*3}}  {ascii_part}')
    return '\n'.join(lines)

Example output:

Example output:

00000000 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d HTTP/1.1 200 OK.

00000000 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d HTTP/1.1 200 OK.

00000010 0a 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 74 .Content-Type: t

00000010 0a 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 74 .Content-Type: t

undefined
undefined

Encryption Analysis

加密分析

Identifying Encryption

识别加密方式

python
undefined
python
undefined

Entropy analysis - high entropy suggests encryption/compression

Entropy analysis - high entropy suggests encryption/compression

import math from collections import Counter
def entropy(data: bytes) -> float: if not data: return 0.0 counter = Counter(data) probs = [count / len(data) for count in counter.values()] return -sum(p * math.log2(p) for p in probs)
import math from collections import Counter
def entropy(data: bytes) -> float: if not data: return 0.0 counter = Counter(data) probs = [count / len(data) for count in counter.values()] return -sum(p * math.log2(p) for p in probs)

Entropy thresholds:

Entropy thresholds:

< 6.0: Likely plaintext or structured data

< 6.0: Likely plaintext or structured data

6.0-7.5: Possibly compressed

6.0-7.5: Possibly compressed

> 7.5: Likely encrypted or random

> 7.5: Likely encrypted or random

Common encryption indicators

Common encryption indicators

- High, uniform entropy

- High, uniform entropy

- No obvious structure or patterns

- No obvious structure or patterns

- Length often multiple of block size (16 for AES)

- Length often multiple of block size (16 for AES)

- Possible IV at start (16 bytes for AES-CBC)

- Possible IV at start (16 bytes for AES-CBC)

undefined
undefined

TLS Analysis

TLS分析

bash
undefined
bash
undefined

Extract TLS metadata

Extract TLS metadata

tshark -r capture.pcap -Y "ssl.handshake"
-T fields -e ip.src -e ssl.handshake.ciphersuite
tshark -r capture.pcap -Y "ssl.handshake"
-T fields -e ip.src -e ssl.handshake.ciphersuite

JA3 fingerprinting (client)

JA3 fingerprinting (client)

tshark -r capture.pcap -Y "ssl.handshake.type == 1"
-T fields -e ssl.handshake.ja3
tshark -r capture.pcap -Y "ssl.handshake.type == 1"
-T fields -e ssl.handshake.ja3

JA3S fingerprinting (server)

JA3S fingerprinting (server)

tshark -r capture.pcap -Y "ssl.handshake.type == 2"
-T fields -e ssl.handshake.ja3s
tshark -r capture.pcap -Y "ssl.handshake.type == 2"
-T fields -e ssl.handshake.ja3s

Certificate extraction

Certificate extraction

tshark -r capture.pcap -Y "ssl.handshake.certificate"
-T fields -e x509sat.printableString
undefined
tshark -r capture.pcap -Y "ssl.handshake.certificate"
-T fields -e x509sat.printableString
undefined

Decryption Approaches

解密方法

bash
undefined
bash
undefined

Pre-master secret log (browser)

Pre-master secret log (browser)

export SSLKEYLOGFILE=/tmp/keys.log
export SSLKEYLOGFILE=/tmp/keys.log

Configure Wireshark

Configure Wireshark

Edit > Preferences > Protocols > TLS

Edit > Preferences > Protocols > TLS

(Pre)-Master-Secret log filename: /tmp/keys.log

(Pre)-Master-Secret log filename: /tmp/keys.log

Decrypt with private key (if available)

Decrypt with private key (if available)

Only works for RSA key exchange

Only works for RSA key exchange

Edit > Preferences > Protocols > TLS > RSA keys list

Edit > Preferences > Protocols > TLS > RSA keys list

undefined
undefined

Custom Protocol Documentation

自定义协议文档编写

Protocol Specification Template

协议规范模板

markdown
undefined
markdown
undefined

Protocol Name Specification

Protocol Name Specification

Overview

Overview

Brief description of protocol purpose and design.
Brief description of protocol purpose and design.

Transport

Transport

  • Layer: TCP/UDP
  • Port: XXXX
  • Encryption: TLS 1.2+
  • Layer: TCP/UDP
  • Port: XXXX
  • Encryption: TLS 1.2+

Message Format

Message Format

Header (12 bytes)

Header (12 bytes)

OffsetSizeFieldDescription
04Magic0x50524F54 ("PROT")
42VersionProtocol version (1)
62TypeMessage type identifier
84LengthPayload length in bytes
OffsetSizeFieldDescription
04Magic0x50524F54 ("PROT")
42VersionProtocol version (1)
62TypeMessage type identifier
84LengthPayload length in bytes

Message Types

Message Types

TypeNameDescription
0x01HELLOConnection initiation
0x02HELLO_ACKConnection accepted
0x03DATAApplication data
0x04CLOSEConnection termination
TypeNameDescription
0x01HELLOConnection initiation
0x02HELLO_ACKConnection accepted
0x03DATAApplication data
0x04CLOSEConnection termination

Type 0x01: HELLO

Type 0x01: HELLO

OffsetSizeFieldDescription
04ClientIDUnique client identifier
42FlagsConnection flags
6varExtensionsTLV-encoded extensions
OffsetSizeFieldDescription
04ClientIDUnique client identifier
42FlagsConnection flags
6varExtensionsTLV-encoded extensions

State Machine

State Machine


[INIT] --HELLO--> [WAIT_ACK] --HELLO_ACK--> [CONNECTED]
|
DATA/DATA
|
[CLOSED] <--CLOSE--+

[INIT] --HELLO--> [WAIT_ACK] --HELLO_ACK--> [CONNECTED]
|
DATA/DATA
|
[CLOSED] <--CLOSE--+

Examples

Examples

Connection Establishment

Connection Establishment


Client -> Server: HELLO (ClientID=0x12345678)
Server -> Client: HELLO_ACK (Status=OK)
Client -> Server: DATA (payload)
undefined

Client -> Server: HELLO (ClientID=0x12345678)
Server -> Client: HELLO_ACK (Status=OK)
Client -> Server: DATA (payload)

Wireshark Dissector (Lua)

Wireshark 解析器(Lua)

lua
-- custom_protocol.lua
local proto = Proto("custom", "Custom Protocol")

-- Define fields
local f_magic = ProtoField.string("custom.magic", "Magic")
local f_version = ProtoField.uint16("custom.version", "Version")
local f_type = ProtoField.uint16("custom.type", "Type")
local f_length = ProtoField.uint32("custom.length", "Length")
local f_payload = ProtoField.bytes("custom.payload", "Payload")

proto.fields = { f_magic, f_version, f_type, f_length, f_payload }

-- Message type names
local msg_types = {
    [0x01] = "HELLO",
    [0x02] = "HELLO_ACK",
    [0x03] = "DATA",
    [0x04] = "CLOSE"
}

function proto.dissector(buffer, pinfo, tree)
    pinfo.cols.protocol = "CUSTOM"

    local subtree = tree:add(proto, buffer())

    -- Parse header
    subtree:add(f_magic, buffer(0, 4))
    subtree:add(f_version, buffer(4, 2))

    local msg_type = buffer(6, 2):uint()
    subtree:add(f_type, buffer(6, 2)):append_text(
        " (" .. (msg_types[msg_type] or "Unknown") .. ")"
    )

    local length = buffer(8, 4):uint()
    subtree:add(f_length, buffer(8, 4))

    if length > 0 then
        subtree:add(f_payload, buffer(12, length))
    end
end

-- Register for TCP port
local tcp_table = DissectorTable.get("tcp.port")
tcp_table:add(8888, proto)
lua
-- custom_protocol.lua
local proto = Proto("custom", "Custom Protocol")

-- Define fields
local f_magic = ProtoField.string("custom.magic", "Magic")
local f_version = ProtoField.uint16("custom.version", "Version")
local f_type = ProtoField.uint16("custom.type", "Type")
local f_length = ProtoField.uint32("custom.length", "Length")
local f_payload = ProtoField.bytes("custom.payload", "Payload")

proto.fields = { f_magic, f_version, f_type, f_length, f_payload }

-- Message type names
local msg_types = {
    [0x01] = "HELLO",
    [0x02] = "HELLO_ACK",
    [0x03] = "DATA",
    [0x04] = "CLOSE"
}

function proto.dissector(buffer, pinfo, tree)
    pinfo.cols.protocol = "CUSTOM"

    local subtree = tree:add(proto, buffer())

    -- Parse header
    subtree:add(f_magic, buffer(0, 4))
    subtree:add(f_version, buffer(4, 2))

    local msg_type = buffer(6, 2):uint()
    subtree:add(f_type, buffer(6, 2)):append_text(
        " (" .. (msg_types[msg_type] or "Unknown") .. ")"
    )

    local length = buffer(8, 4):uint()
    subtree:add(f_length, buffer(8, 4))

    if length > 0 then
        subtree:add(f_payload, buffer(12, length))
    end
end

-- Register for TCP port
local tcp_table = DissectorTable.get("tcp.port")
tcp_table:add(8888, proto)

Active Testing

主动测试

Fuzzing with Boofuzz

基于Boofuzz的模糊测试

python
from boofuzz import *

def main():
    session = Session(
        target=Target(
            connection=TCPSocketConnection("target", 8888)
        )
    )

    # Define protocol structure
    s_initialize("HELLO")
    s_static(b"\x50\x52\x4f\x54")  # Magic
    s_word(1, name="version")       # Version
    s_word(0x01, name="type")       # Type (HELLO)
    s_size("payload", length=4)     # Length field
    s_block_start("payload")
    s_dword(0x12345678, name="client_id")
    s_word(0, name="flags")
    s_block_end()

    session.connect(s_get("HELLO"))
    session.fuzz()

if __name__ == "__main__":
    main()
python
from boofuzz import *

def main():
    session = Session(
        target=Target(
            connection=TCPSocketConnection("target", 8888)
        )
    )

    # Define protocol structure
    s_initialize("HELLO")
    s_static(b"\x50\x52\x4f\x54")  # Magic
    s_word(1, name="version")       # Version
    s_word(0x01, name="type")       # Type (HELLO)
    s_size("payload", length=4)     # Length field
    s_block_start("payload")
    s_dword(0x12345678, name="client_id")
    s_word(0, name="flags")
    s_block_end()

    session.connect(s_get("HELLO"))
    session.fuzz()

if __name__ == "__main__":
    main()

Replay and Modification

重放与修改

python
from scapy.all import *
python
from scapy.all import *

Replay captured traffic

Replay captured traffic

packets = rdpcap("capture.pcap") for pkt in packets: if pkt.haslayer(TCP) and pkt[TCP].dport == 8888: send(pkt)
packets = rdpcap("capture.pcap") for pkt in packets: if pkt.haslayer(TCP) and pkt[TCP].dport == 8888: send(pkt)

Modify and replay

Modify and replay

for pkt in packets: if pkt.haslayer(Raw): # Modify payload original = pkt[Raw].load modified = original.replace(b"client", b"CLIENT") pkt[Raw].load = modified # Recalculate checksums del pkt[IP].chksum del pkt[TCP].chksum send(pkt)
undefined
for pkt in packets: if pkt.haslayer(Raw): # Modify payload original = pkt[Raw].load modified = original.replace(b"client", b"CLIENT") pkt[Raw].load = modified # Recalculate checksums del pkt[IP].chksum del pkt[TCP].chksum send(pkt)
undefined

Best Practices

最佳实践

Analysis Workflow

分析工作流

  1. Capture traffic: Multiple sessions, different scenarios
  2. Identify boundaries: Message start/end markers
  3. Map structure: Fixed header, variable payload
  4. Identify fields: Compare multiple samples
  5. Document format: Create specification
  6. Validate understanding: Implement parser/generator
  7. Test edge cases: Fuzzing, boundary conditions
  1. 捕获流量:多会话、多场景捕获
  2. 识别边界:消息起始/结束标记
  3. 映射结构:固定头部、可变负载
  4. 识别字段:对比多个样本
  5. 文档编写:创建规范文档
  6. 验证理解:实现解析器/生成器
  7. 测试边缘情况:模糊测试、边界条件

Common Patterns to Look For

需关注的常见模式

  • Magic numbers/signatures at message start
  • Version fields for compatibility
  • Length fields (often before variable data)
  • Type/opcode fields for message identification
  • Sequence numbers for ordering
  • Checksums/CRCs for integrity
  • Timestamps for timing
  • Session/connection identifiers
  • 消息开头的魔术数字/特征码
  • 用于兼容性的版本字段
  • 长度字段(通常位于可变数据之前)
  • 用于消息识别的类型/操作码字段
  • 用于排序的序列号
  • 用于完整性校验的校验和/CRC
  • 用于计时的时间戳
  • 会话/连接标识符