assembly-x86
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesex86-64 Assembly
x86-64汇编
Purpose
用途
Guide agents through x86-64 assembly: reading compiler output, understanding the ABI, writing inline asm, and common patterns.
指导Agent掌握x86-64汇编:包括阅读编译器输出、理解ABI、编写内联汇编以及常见代码模式。
Triggers
触发场景
- "How do I read the assembly GCC generated?"
- "What are the x86-64 registers?"
- "What is the calling convention on Linux/macOS?"
- "How do I write inline assembly in C?"
- "How do I use SSE/AVX intrinsics?"
- "This assembly uses /
%rsp— what does it mean?"%rbp
- "如何阅读GCC生成的汇编代码?"
- "x86-64的寄存器有哪些?"
- "Linux/macOS上的调用约定是什么?"
- "如何在C语言中编写内联汇编?"
- "如何使用SSE/AVX intrinsics?"
- "这段汇编代码里的/
%rsp是什么意思?"%rbp
Workflow
工作流程
1. Generate and read assembly
1. 生成与阅读汇编代码
bash
undefinedbash
undefinedAT&T syntax (GCC default)
AT&T syntax (GCC default)
gcc -S -O2 -fverbose-asm foo.c -o foo.s
gcc -S -O2 -fverbose-asm foo.c -o foo.s
Intel syntax
Intel syntax
gcc -S -masm=intel -O2 foo.c -o foo.s
gcc -S -masm=intel -O2 foo.c -o foo.s
From GDB
From GDB
(gdb) disassemble /s main # with source
(gdb) x/20i $rip
(gdb) disassemble /s main # with source
(gdb) x/20i $rip
From objdump
From objdump
objdump -d -M intel -S prog # Intel + source (needs -g)
undefinedobjdump -d -M intel -S prog # Intel + source (needs -g)
undefined2. x86-64 registers
2. x86-64寄存器
| 64-bit | 32-bit | 16-bit | 8-bit high | 8-bit low | Purpose |
|---|---|---|---|---|---|
| | | | | Return value / accumulator |
| | | | | Callee-saved |
| | | | | 4th arg / count |
| | | | | 3rd arg / 2nd return |
| | | — | | 2nd arg |
| | | — | | 1st arg |
| | | — | | Frame pointer (callee-saved) |
| | | — | | Stack pointer |
| | | — | | 5th–8th args / caller-saved |
| | | — | | Callee-saved |
| Instruction pointer | ||||
| | Status flags | |||
| FP/SIMD args and return | ||||
| Caller-saved SIMD | ||||
| AVX 256-bit | ||||
| AVX-512 512-bit |
| 64-bit | 32-bit | 16-bit | 8-bit high | 8-bit low | 用途 |
|---|---|---|---|---|---|
| | | | | 返回值 / 累加器 |
| | | | | 被调用者保存 |
| | | | | 第4个参数 / 计数器 |
| | | | | 第3个参数 / 128位时的高位返回值 |
| | | — | | 第2个参数 |
| | | — | | 第1个参数 |
| | | — | | 帧指针(被调用者保存) |
| | | — | | 栈指针 |
| | | — | | 第5-8个参数 / 调用者保存 |
| | | — | | 被调用者保存 |
| 指令指针 | ||||
| | 状态标志寄存器 | |||
| 浮点/SIMD参数与返回值 | ||||
| 调用者保存的SIMD寄存器 | ||||
| AVX 256位寄存器 | ||||
| AVX-512 512位寄存器 |
3. System V AMD64 ABI (Linux, macOS, FreeBSD)
3. System V AMD64 ABI(Linux、macOS、FreeBSD)
Integer/pointer argument registers (in order):
%rdi, %rsi, %rdx, %rcx, %r8, %r9Floating-point argument registers:
–
%xmm0%xmm7Return values:
- Integer: (low),
%rax(high if 128-bit)%rdx - Float: (low),
%xmm0(high)%xmm1
Caller-saved (scratch):
%rax, %rcx, %rdx, %rsi, %rdi, %r8–%r11, %xmm0–%xmm15Callee-saved (must preserve):
%rbx, %rbp, %r12–%r15Stack: 16-byte aligned before ; pushes 8 bytes → 16-byte aligned at function entry after prologue.
callcallRed zone: 128 bytes below may be used by leaf functions without adjusting . Not available in kernel/signal handlers.
%rsp%rsp整数/指针参数寄存器(顺序):
%rdi, %rsi, %rdx, %rcx, %r8, %r9浮点参数寄存器:
–
%xmm0%xmm7返回值:
- 整数:(低位),
%rax(128位时的高位)%rdx - 浮点数:(低位),
%xmm0(高位)%xmm1
调用者保存(临时寄存器):
%rax, %rcx, %rdx, %rsi, %rdi, %r8–%r11, %xmm0–%xmm15被调用者保存(必须保留):
%rbx, %rbp, %r12–%r15栈: 指令执行前需保持16字节对齐;指令会压入8字节 → 函数入口处执行序言代码后恢复16字节对齐。
callcall红区: 下方128字节区域可被叶子函数使用,无需调整。内核或信号处理程序中不可用。
%rsp%rsp4. Common instruction patterns
4. 常见指令模式
| Pattern | Meaning |
|---|---|
| Copy rdi to rax |
| Load 8 bytes from address in rdi |
| Store rax to rdi+8 |
| Load effective address rdi+8 into rax (no memory access) |
| Push rbx; rsp -= 8 |
| Pop into rbx; rsp += 8 |
| Push return addr; jmp foo |
| Pop return addr; jmp to it |
| Zero rax (smaller encoding than |
| Set ZF if rax == 0 (cheaper than |
| Set flags for rdi - 5 |
| Jump if signed less than |
| 模式 | 含义 |
|---|---|
| 将rdi的值复制到rax |
| 从rdi指向的地址加载8字节数据到rax |
| 将rax的值存储到rdi+8的地址 |
| 将rdi+8的有效地址加载到rax(无内存访问) |
| 将rbx压入栈;rsp -= 8 |
| 从栈弹出数据到rbx;rsp += 8 |
| 压入返回地址;跳转到foo |
| 弹出返回地址;跳转到该地址 |
| 将rax置零(编码比 |
| 若rax等于0则设置ZF标志(比 |
| 计算rdi - 5并设置标志位 |
| 有符号小于时跳转 |
5. AT&T vs Intel syntax
5. AT&T与Intel语法对比
| Feature | AT&T | Intel |
|---|---|---|
| Operand order | source, dest | dest, source |
| Register prefix | | |
| Immediate prefix | | |
| Memory operand | | |
| Size suffix | | — (inferred) |
GCC emits AT&T by default. Use for Intel syntax.
-masm=intel| 特性 | AT&T | Intel |
|---|---|---|
| 操作数顺序 | 源操作数在前,目的操作数在后 | 目的操作数在前,源操作数在后 |
| 寄存器前缀 | | |
| 立即数前缀 | | |
| 内存操作数 | | |
| 大小后缀 | | —(自动推导) |
GCC默认生成AT&T语法。使用参数可生成Intel语法。
-masm=intel6. Inline assembly (GCC extended asm)
6. 内联汇编(GCC扩展asm)
c
// Basic: increment a register
int x = 5;
__asm__ volatile (
"incl %0"
: "=r"(x) // outputs: =r means write-only register
: "0"(x) // inputs: 0 means same as output 0
: // clobbers: none
);
// CPUID example
uint32_t eax, ebx, ecx, edx;
__asm__ volatile (
"cpuid"
: "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx)
: "a"(1) // input: leaf 1
);
// Atomic increment
static inline int atomic_inc(volatile int *p) {
int ret;
__asm__ volatile (
"lock; xaddl %0, %1"
: "=r"(ret), "+m"(*p)
: "0"(1)
: "memory"
);
return ret + 1;
}Constraint codes:
- — any general register
"r" - — memory operand
"m" - — immediate integer
"i" - ,
"a","b","c"— specific registers (%rax, %rbx, %rcx, %rdx)"d" - prefix — output (write-only)
"=" - prefix — read-write
"+" - clobber — tells compiler memory may be modified (barrier)
"memory"
c
// Basic: increment a register
int x = 5;
__asm__ volatile (
"incl %0"
: "=r"(x) // outputs: =r means write-only register
: "0"(x) // inputs: 0 means same as output 0
: // clobbers: none
);
// CPUID example
uint32_t eax, ebx, ecx, edx;
__asm__ volatile (
"cpuid"
: "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx)
: "a"(1) // input: leaf 1
);
// Atomic increment
static inline int atomic_inc(volatile int *p) {
int ret;
__asm__ volatile (
"lock; xaddl %0, %1"
: "=r"(ret), "+m"(*p)
: "0"(1)
: "memory"
);
return ret + 1;
}约束码说明:
- — 任意通用寄存器
"r" - — 内存操作数
"m" - — 立即数
"i" - ,
"a","b","c"— 指定寄存器(%rax, %rbx, %rcx, %rdx)"d" - 前缀 — 输出(只写)
"=" - 前缀 — 读写
"+" - 破坏声明 — 告知编译器内存可能被修改(内存屏障)
"memory"
7. SSE/AVX intrinsics (preferred over inline asm)
7. SSE/AVX intrinsics(优先于内联汇编)
c
#include <immintrin.h> // includes all x86 SIMD headers
// Add 8 floats at once with AVX
__m256 a = _mm256_loadu_ps(arr_a); // load 8 floats (unaligned)
__m256 b = _mm256_loadu_ps(arr_b);
__m256 c = _mm256_add_ps(a, b);
_mm256_storeu_ps(result, c);Check CPU support at compile time: or .
Check at runtime: .
-mavx2-march=native__builtin_cpu_supports("avx2")For a full register and instruction reference, see references/reference.md.
c
#include <immintrin.h> // includes all x86 SIMD headers
// Add 8 floats at once with AVX
__m256 a = _mm256_loadu_ps(arr_a); // load 8 floats (unaligned)
__m256 b = _mm256_loadu_ps(arr_b);
__m256 c = _mm256_add_ps(a, b);
_mm256_storeu_ps(result, c);编译时检查CPU支持:使用或参数。
运行时检查:。
-mavx2-march=native__builtin_cpu_supports("avx2")完整的寄存器与指令参考,请查看references/reference.md。
Related skills
相关技能
- Use for AArch64/ARM assembly
skills/low-level-programming/assembly-arm - Use for
skills/compilers/gccflag details-S -masm=intel - Use for stepping through assembly (
skills/debuggers/gdb,si,ni)x/i
- 如需AArch64/ARM汇编相关内容,请使用技能
skills/low-level-programming/assembly-arm - 如需了解参数细节,请使用
-S -masm=intel技能skills/compilers/gcc - 如需了解汇编单步调试(,
si,ni),请使用x/i技能skills/debuggers/gdb