concurrency-debugging
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseConcurrency Debugging
并发调试
Purpose
用途
Guide agents through diagnosing and fixing concurrency bugs: reading ThreadSanitizer race reports, using Helgrind for lock-order analysis, detecting deadlocks with GDB thread inspection, identifying common misuse patterns, and applying happens-before reasoning in C++ and Rust.
std::atomic指导开发者诊断并修复并发Bug:读取ThreadSanitizer竞争报告、使用Helgrind进行锁顺序分析、通过GDB线程检查检测死锁、识别常见的误用模式,以及在C++和Rust中应用先行发生(happens-before)推理。
std::atomicTriggers
触发场景
- "ThreadSanitizer reported a data race — how do I read the report?"
- "My program deadlocks — how do I debug it?"
- "How do I use Helgrind to find threading bugs?"
- "Am I using std::atomic correctly?"
- "How does happens-before work in C++ memory ordering?"
- "How do I find which threads are deadlocked in GDB?"
- "ThreadSanitizer报告了数据竞争——我该如何解读这份报告?"
- "我的程序发生死锁——该如何调试?"
- "如何使用Helgrind查找线程Bug?"
- "我是否正确使用了std::atomic?"
- "C++内存模型中的先行发生(happens-before)是如何工作的?"
- "如何在GDB中找出死锁的线程?"
Workflow
工作流程
1. ThreadSanitizer (TSan) — race detection
1. ThreadSanitizer(TSan)——竞争检测
bash
undefinedbash
undefinedBuild with TSan
用TSan编译
clang -fsanitize=thread -g -O1 -o prog main.c
clang -fsanitize=thread -g -O1 -o prog main.c
or GCC
或使用GCC
gcc -fsanitize=thread -g -O1 -o prog main.c
gcc -fsanitize=thread -g -O1 -o prog main.c
Run (TSan intercepts memory accesses at runtime)
运行(TSan在运行时拦截内存访问)
./prog
./prog
TSan-specific options
TSan专属配置选项
TSAN_OPTIONS="halt_on_error=1:second_deadlock_stack=1" ./prog
Reading a TSan report:
```text
WARNING: ThreadSanitizer: data race (pid=12345)
Write of size 4 at 0x7f1234 by thread T2:
#0 increment /src/counter.c:8:5 ← access site in T2
#1 worker_thread /src/counter.c:22:3
Previous read of size 4 at 0x7f1234 by thread T1:
#0 read_counter /src/counter.c:3:14 ← conflicting access in T1
#1 main /src/counter.c:30:5
Thread T2 created at:
#0 pthread_create .../tsan_interceptors.cpp
#1 main /src/counter.c:28:3
SUMMARY: ThreadSanitizer: data race /src/counter.c:8:5 in incrementHow to read:
- Line 1: type of access (write/read) and address
- Stack under "Write of size": the thread that performed the write
- Stack under "Previous read/write": the conflicting thread
- "Thread T2 created at": where the thread was spawned
- Fix: the and
incrementfunctions access the same address without synchronizationread_counter
Common races and fixes:
| Race pattern | Fix |
|---|---|
| Read/write on global without lock | Add mutex or use |
Double-checked locking without | Use |
| Use |
| Container modified while iterated | Lock entire critical section |
| Already safe (ref count is atomic); but pointed-to object may not be |
TSAN_OPTIONS="halt_on_error=1:second_deadlock_stack=1" ./prog
解读TSan报告:
```text
WARNING: ThreadSanitizer: data race (pid=12345)
Write of size 4 at 0x7f1234 by thread T2:
#0 increment /src/counter.c:8:5 ← T2中的访问位置
#1 worker_thread /src/counter.c:22:3
Previous read of size 4 at 0x7f1234 by thread T1:
#0 read_counter /src/counter.c:3:14 ← T1中的冲突访问
#1 main /src/counter.c:30:5
Thread T2 created at:
#0 pthread_create .../tsan_interceptors.cpp
#1 main /src/counter.c:28:3
SUMMARY: ThreadSanitizer: data race /src/counter.c:8:5 in increment解读方法:
- 第1行:访问类型(写入/读取)和内存地址
- "Write of size"下的调用栈:执行写入操作的线程
- "Previous read/write"下的调用栈:产生冲突的线程
- "Thread T2 created at":线程的创建位置
- 修复方案:和
increment函数在无同步机制的情况下访问同一内存地址read_counter
常见竞争模式及修复方案:
| 竞争模式 | 修复方案 |
|---|---|
| 无锁访问全局变量的读/写操作 | 添加互斥锁或使用 |
无 | 使用 |
共享整数的 | 使用 |
| 遍历容器时修改容器 | 为整个临界区加锁 |
| 引用计数本身已安全(原子操作);但指向的对象可能不安全 |
2. Helgrind — lock-order and race detection
2. Helgrind——锁顺序与竞争检测
Helgrind uses Valgrind infrastructure to detect lock ordering violations (potential deadlocks) and data races:
bash
undefinedHelgrind基于Valgrind架构,用于检测锁顺序违规(潜在死锁)和数据竞争:
bash
undefinedRun with Helgrind
使用Helgrind运行程序
valgrind --tool=helgrind --log-file=helgrind.log ./prog
valgrind --tool=helgrind --log-file=helgrind.log ./prog
Lock order violation report
锁顺序违规报告
==1234== Thread #3: lock order "0x... M2" after "0x... M1"
==1234== observed (incorrect) order
==1234== at pthread_mutex_lock (helgrind/...)
==1234== by worker2 /src/worker.c:45 ← T3 takes M2 then M1
==1234==
==1234== required order established by acquisition of lock at address 0x... M1
==1234== at pthread_mutex_lock
==1234== by worker1 /src/worker.c:31 ← T1 takes M1 then M2
Lock-order violation = potential deadlock:
- Thread T1 acquires M1, then tries M2
- Thread T2 acquires M2, then tries M1
- Both can deadlock if they race
Fix: enforce a consistent global lock ordering. Always take M1 before M2 everywhere.==1234== Thread #3: lock order "0x... M2" after "0x... M1"
==1234== observed (incorrect) order
==1234== at pthread_mutex_lock (helgrind/...)
==1234== by worker2 /src/worker.c:45 ← T3先获取M2再获取M1
==1234==
==1234== required order established by acquisition of lock at address 0x... M1
==1234== at pthread_mutex_lock
==1234== by worker1 /src/worker.c:31 ← T1先获取M1再获取M2
锁顺序违规=潜在死锁:
- 线程T1先获取M1,再尝试获取M2
- 线程T2先获取M2,再尝试获取M1
- 若两者竞争则可能发生死锁
修复方案:强制全局一致的锁顺序,确保所有地方都先获取M1再获取M2。3. Deadlock detection with GDB
3. 使用GDB检测死锁
bash
undefinedbash
undefinedAttach GDB to a deadlocked process
将GDB附加到死锁进程
gdb -p $(pgrep prog)
gdb -p $(pgrep prog)
Or run under GDB then trigger deadlock
或在GDB中运行程序并触发死锁
(gdb) info threads # list all threads and current state
(gdb) info threads # 列出所有线程及当前状态
* 1 Thread 0x... (LWP 1234) "prog" ... in __lll_lock_wait ()
* 1 Thread 0x... (LWP 1234) "prog" ... in __lll_lock_wait ()
2 Thread 0x... (LWP 1235) "prog" ... in __lll_lock_wait ()
2 Thread 0x... (LWP 1235) "prog" ... in __lll_lock_wait ()
Threads blocked in __lll_lock_wait = waiting for mutex
处于__lll_lock_wait状态的线程=正在等待互斥锁
(gdb) thread 1
(gdb) bt # show which mutex thread 1 is waiting for
(gdb) thread 2
(gdb) bt # show which mutex thread 2 holds/waits
(gdb) thread 1
(gdb) bt # 查看线程1正在等待的互斥锁
(gdb) thread 2
(gdb) bt # 查看线程2持有/等待的互斥锁
Find the mutex owner
查找互斥锁的持有者
(gdb) p ((pthread_mutex_t*)0x601090)->__data.__owner # Linux glibc mutex
(gdb) p ((pthread_mutex_t*)0x601090)->__data.__owner # Linux glibc互斥锁
prints TID of owning thread
输出持有线程的TID
Python script to dump all mutex owners (GDB 7+)
用于导出所有互斥锁持有者的Python脚本(GDB 7+)
python
import gdb
for t in gdb.selected_inferior().threads():
t.switch()
print(f"Thread {t.num}: {gdb.execute('bt 3', to_string=True)}")
end
undefinedpython
import gdb
for t in gdb.selected_inferior().threads():
t.switch()
print(f"Thread {t.num}: {gdb.execute('bt 3', to_string=True)}")
end
undefined4. std::atomic misuse patterns
4. std::atomic误用模式
cpp
// WRONG: atomic variable, but non-atomic compound operation
std::atomic<int> counter{0};
if (counter == 0) counter = 1; // not atomic together! TOCTOU race
// CORRECT: use compare_exchange
int expected = 0;
counter.compare_exchange_strong(expected, 1);
// WRONG: relaxed ordering for sync flag
std::atomic<bool> ready{false};
// Producer:
data = 42;
ready.store(true, std::memory_order_relaxed); // WRONG: no happens-before
// CORRECT: release-acquire for publishing data
// Producer:
data = 42;
ready.store(true, std::memory_order_release); // syncs with acquire
// Consumer:
if (ready.load(std::memory_order_acquire)) { // syncs with release
use(data); // safe to read data here
}
// WRONG: using data across threads without atomic/mutex
// int shared_data; // non-atomic — UB on concurrent access
// CORRECT: protect with mutex or make atomic
std::mutex mtx;
std::unique_lock lock(mtx);
shared_data = 42;cpp
// 错误:原子变量,但执行非原子复合操作
std::atomic<int> counter{0};
if (counter == 0) counter = 1; // 整体非原子操作!存在TOCTOU竞争
// 正确:使用compare_exchange
int expected = 0;
counter.compare_exchange_strong(expected, 1);
// 错误:同步标志使用relaxed内存顺序
std::atomic<bool> ready{false};
// 生产者线程:
data = 42;
ready.store(true, std::memory_order_relaxed); // 错误:无先行发生关系
// 正确:使用release-acquire发布数据
// 生产者线程:
data = 42;
ready.store(true, std::memory_order_release); // 与acquire同步
// 消费者线程:
if (ready.load(std::memory_order_acquire)) { // 与release同步
use(data); // 此处读取data是安全的
}
// 错误:跨线程访问非原子/无互斥保护的数据
// int shared_data; // 非原子变量——并发访问会导致未定义行为
// 正确:用互斥锁保护或改为原子变量
std::mutex mtx;
std::unique_lock lock(mtx);
shared_data = 42;5. Happens-before reasoning
5. 先行发生(happens-before)推理
In C++, happens-before is established by:
Sequenced-before (within a thread):
Statement A comes before B in code → A happens-before B
Synchronizes-with (across threads):
store(release) → load(acquire) on SAME atomic variable
→ store happens-before load
→ everything before store happens-before everything after load
Thread creation/join:
spawn(T) → any action in T (create synchronizes-with)
any action in T → join(T) (join synchronizes-before)
Mutex:
unlock(M) → lock(M) (next acquirer)cpp
// Establishing happens-before across threads
std::atomic<int> flag{0};
int data = 0;
// Thread 1:
data = 42; // A
flag.store(1, memory_order_release); // B: A sequenced-before B
// Thread 2:
while (flag.load(memory_order_acquire) != 1) {} // C: synchronizes-with B
int x = data; // D: C sequenced-before D
// D reads 42: A happens-before B synchronizes-with C sequenced-before D
// → A happens-before D在C++中,先行发生关系通过以下方式建立:
线程内顺序先行(Sequenced-before):
代码中语句A在B之前 → A先行发生于B
线程间同步先行(Synchronizes-with):
同一原子变量上的store(release) → load(acquire)
→ store先行发生于load
→ store之前的所有操作先行发生于load之后的所有操作
线程创建/等待:
spawn(T) → T中的任意操作 (创建同步先行)
T中的任意操作 → join(T) (等待同步先行)
互斥锁:
unlock(M) → lock(M)(下一个获取者)cpp
// 跨线程建立先行发生关系
std::atomic<int> flag{0};
int data = 0;
// 线程1:
data = 42; // A
flag.store(1, memory_order_release); // B: A顺序先行于B
// 线程2:
while (flag.load(memory_order_acquire) != 1) {} // C: 与B同步先行
int x = data; // D: C顺序先行于D
// D会读取到42:A先行发生于B,B同步先行于C,C顺序先行于D
// → A先行发生于D6. Rust concurrency — compile-time guarantees
6. Rust并发——编译期保障
Rust prevents data races at compile time via ownership:
rust
use std::sync::{Arc, Mutex};
use std::thread;
// Shared mutable state: Arc<Mutex<T>>
let counter = Arc::new(Mutex::new(0u32));
let c = Arc::clone(&counter);
let t = thread::spawn(move || {
let mut val = c.lock().unwrap();
*val += 1;
});
t.join().unwrap();
println!("{}", *counter.lock().unwrap());
// Rust prevents:
// - Sharing &mut T across threads (Sync not impl for &mut T)
// - Moving non-Send types to threads (compiler error)
// Use TSAN_OPTIONS with cargo test if TSan checks are needed:
// RUSTFLAGS="-Z sanitizer=thread" cargo +nightly testRust通过所有权机制在编译期防止数据竞争:
rust
use std::sync::{Arc, Mutex};
use std::thread;
// 共享可变状态:Arc<Mutex<T>>
let counter = Arc::new(Mutex::new(0u32));
let c = Arc::clone(&counter);
let t = thread::spawn(move || {
let mut val = c.lock().unwrap();
*val += 1;
});
t.join().unwrap();
println!("{}", *counter.lock().unwrap());
// Rust会阻止:
// - 跨线程共享&mut T(&mut T未实现Sync)
// - 将非Send类型转移到线程(编译报错)
// 若需要TSan检查,可配合cargo test使用TSAN_OPTIONS:
// RUSTFLAGS="-Z sanitizer=thread" cargo +nightly testRelated skills
相关技能
- Use for TSan build flags and other sanitizers
skills/runtimes/sanitizers - Use for Helgrind and Memcheck integration
skills/profilers/valgrind - Use for advanced GDB thread inspection
skills/debuggers/gdb - Use for C++/Rust memory ordering theory
skills/low-level-programming/memory-model
- 如需TSan编译标志及其他sanitizer相关内容,可使用
skills/runtimes/sanitizers - 如需Helgrind和Memcheck集成相关内容,可使用
skills/profilers/valgrind - 如需高级GDB线程检查相关内容,可使用
skills/debuggers/gdb - 如需C++/Rust内存顺序理论相关内容,可使用
skills/low-level-programming/memory-model