concurrency-debugging

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Concurrency Debugging

并发调试

Purpose

用途

Guide agents through diagnosing and fixing concurrency bugs: reading ThreadSanitizer race reports, using Helgrind for lock-order analysis, detecting deadlocks with GDB thread inspection, identifying common

std::atomic

misuse patterns, and applying happens-before reasoning in C++ and Rust.

指导开发者诊断并修复并发Bug：读取ThreadSanitizer竞争报告、使用Helgrind进行锁顺序分析、通过GDB线程检查检测死锁、识别常见的

std::atomic

误用模式，以及在C++和Rust中应用先行发生（happens-before）推理。

Triggers

触发场景

"ThreadSanitizer reported a data race — how do I read the report?"
"My program deadlocks — how do I debug it?"
"How do I use Helgrind to find threading bugs?"
"Am I using std::atomic correctly?"
"How does happens-before work in C++ memory ordering?"
"How do I find which threads are deadlocked in GDB?"

"ThreadSanitizer报告了数据竞争——我该如何解读这份报告？"
"我的程序发生死锁——该如何调试？"
"如何使用Helgrind查找线程Bug？"
"我是否正确使用了std::atomic？"
"C++内存模型中的先行发生（happens-before）是如何工作的？"
"如何在GDB中找出死锁的线程？"

Workflow

工作流程

1. ThreadSanitizer (TSan) — race detection

1. ThreadSanitizer（TSan）——竞争检测

bash

undefined

bash

undefined

Build with TSan

用TSan编译

clang -fsanitize=thread -g -O1 -o prog main.c

or GCC

或使用GCC

gcc -fsanitize=thread -g -O1 -o prog main.c

Run (TSan intercepts memory accesses at runtime)

运行（TSan在运行时拦截内存访问）

./prog

TSan-specific options

TSan专属配置选项

TSAN_OPTIONS="halt_on_error=1:second_deadlock_stack=1" ./prog


Reading a TSan report:

```text
WARNING: ThreadSanitizer: data race (pid=12345)
  Write of size 4 at 0x7f1234 by thread T2:
    #0 increment /src/counter.c:8:5              ← access site in T2
    #1 worker_thread /src/counter.c:22:3

  Previous read of size 4 at 0x7f1234 by thread T1:
    #0 read_counter /src/counter.c:3:14          ← conflicting access in T1
    #1 main /src/counter.c:30:5

  Thread T2 created at:
    #0 pthread_create .../tsan_interceptors.cpp
    #1 main /src/counter.c:28:3

SUMMARY: ThreadSanitizer: data race /src/counter.c:8:5 in increment

How to read:

Line 1: type of access (write/read) and address
Stack under "Write of size": the thread that performed the write
Stack under "Previous read/write": the conflicting thread
"Thread T2 created at": where the thread was spawned
Fix: the
```
increment
```
and
```
read_counter
```
functions access the same address without synchronization

Common races and fixes:

Race pattern	Fix
Read/write on global without lock	Add mutex or use `std::atomic`
Double-checked locking without `atomic`	Use `std::once_flag` + `std::call_once`
`+=` on shared integer	Use `std::atomic<int>::fetch_add()`
Container modified while iterated	Lock entire critical section
`shared_ptr` ref count race	Already safe (ref count is atomic); but pointed-to object may not be

TSAN_OPTIONS="halt_on_error=1:second_deadlock_stack=1" ./prog


解读TSan报告：

```text
WARNING: ThreadSanitizer: data race (pid=12345)
  Write of size 4 at 0x7f1234 by thread T2:
    #0 increment /src/counter.c:8:5              ← T2中的访问位置
    #1 worker_thread /src/counter.c:22:3

  Previous read of size 4 at 0x7f1234 by thread T1:
    #0 read_counter /src/counter.c:3:14          ← T1中的冲突访问
    #1 main /src/counter.c:30:5

  Thread T2 created at:
    #0 pthread_create .../tsan_interceptors.cpp
    #1 main /src/counter.c:28:3

SUMMARY: ThreadSanitizer: data race /src/counter.c:8:5 in increment

解读方法：

第1行：访问类型（写入/读取）和内存地址
"Write of size"下的调用栈：执行写入操作的线程
"Previous read/write"下的调用栈：产生冲突的线程
"Thread T2 created at"：线程的创建位置
修复方案：
```
increment
```
和
```
read_counter
```
函数在无同步机制的情况下访问同一内存地址

常见竞争模式及修复方案：

竞争模式	修复方案
无锁访问全局变量的读/写操作	添加互斥锁或使用 `std::atomic`
无 `atomic` 的双重检查锁定	使用 `std::once_flag` + `std::call_once`
共享整数的 `+=` 操作	使用 `std::atomic<int>::fetch_add()`
遍历容器时修改容器	为整个临界区加锁
`shared_ptr` 引用计数竞争	引用计数本身已安全（原子操作）；但指向的对象可能不安全

2. Helgrind — lock-order and race detection

2. Helgrind——锁顺序与竞争检测

Helgrind uses Valgrind infrastructure to detect lock ordering violations (potential deadlocks) and data races:

bash

undefined

Helgrind基于Valgrind架构，用于检测锁顺序违规（潜在死锁）和数据竞争：

bash

undefined

Run with Helgrind

使用Helgrind运行程序

valgrind --tool=helgrind --log-file=helgrind.log ./prog

Lock order violation report

锁顺序违规报告

==1234== Thread #3: lock order "0x... M2" after "0x... M1" ==1234== observed (incorrect) order ==1234== at pthread_mutex_lock (helgrind/...) ==1234== by worker2 /src/worker.c:45 ← T3 takes M2 then M1 ==1234== ==1234== required order established by acquisition of lock at address 0x... M1 ==1234== at pthread_mutex_lock ==1234== by worker1 /src/worker.c:31 ← T1 takes M1 then M2


Lock-order violation = potential deadlock:
- Thread T1 acquires M1, then tries M2
- Thread T2 acquires M2, then tries M1
- Both can deadlock if they race

Fix: enforce a consistent global lock ordering. Always take M1 before M2 everywhere.

==1234== Thread #3: lock order "0x... M2" after "0x... M1" ==1234== observed (incorrect) order ==1234== at pthread_mutex_lock (helgrind/...) ==1234== by worker2 /src/worker.c:45 ← T3先获取M2再获取M1 ==1234== ==1234== required order established by acquisition of lock at address 0x... M1 ==1234== at pthread_mutex_lock ==1234== by worker1 /src/worker.c:31 ← T1先获取M1再获取M2


锁顺序违规=潜在死锁：
- 线程T1先获取M1，再尝试获取M2
- 线程T2先获取M2，再尝试获取M1
- 若两者竞争则可能发生死锁

修复方案：强制全局一致的锁顺序，确保所有地方都先获取M1再获取M2。

3. Deadlock detection with GDB

3. 使用GDB检测死锁

bash

undefined

bash

undefined

Attach GDB to a deadlocked process

将GDB附加到死锁进程

gdb -p $(pgrep prog)

Or run under GDB then trigger deadlock

或在GDB中运行程序并触发死锁

(gdb) info threads # list all threads and current state

(gdb) info threads # 列出所有线程及当前状态

* 1 Thread 0x... (LWP 1234) "prog" ... in __lll_lock_wait ()

2 Thread 0x... (LWP 1235) "prog" ... in __lll_lock_wait ()

Threads blocked in __lll_lock_wait = waiting for mutex

处于__lll_lock_wait状态的线程=正在等待互斥锁

(gdb) thread 1 (gdb) bt # show which mutex thread 1 is waiting for

(gdb) thread 2 (gdb) bt # show which mutex thread 2 holds/waits

(gdb) thread 1 (gdb) bt # 查看线程1正在等待的互斥锁

(gdb) thread 2 (gdb) bt # 查看线程2持有/等待的互斥锁

Find the mutex owner

查找互斥锁的持有者

(gdb) p ((pthread_mutex_t*)0x601090)->__data.__owner # Linux glibc mutex

(gdb) p ((pthread_mutex_t*)0x601090)->__data.__owner # Linux glibc互斥锁

prints TID of owning thread

输出持有线程的TID

Python script to dump all mutex owners (GDB 7+)

用于导出所有互斥锁持有者的Python脚本（GDB 7+）

python import gdb for t in gdb.selected_inferior().threads(): t.switch() print(f"Thread {t.num}: {gdb.execute('bt 3', to_string=True)}") end

undefined

python import gdb for t in gdb.selected_inferior().threads(): t.switch() print(f"Thread {t.num}: {gdb.execute('bt 3', to_string=True)}") end

undefined

4. std::atomic misuse patterns

4. std::atomic误用模式

cpp

// WRONG: atomic variable, but non-atomic compound operation
std::atomic<int> counter{0};
if (counter == 0) counter = 1;   // not atomic together! TOCTOU race

// CORRECT: use compare_exchange
int expected = 0;
counter.compare_exchange_strong(expected, 1);

// WRONG: relaxed ordering for sync flag
std::atomic<bool> ready{false};
// Producer:
data = 42;
ready.store(true, std::memory_order_relaxed);  // WRONG: no happens-before

// CORRECT: release-acquire for publishing data
// Producer:
data = 42;
ready.store(true, std::memory_order_release);   // syncs with acquire

// Consumer:
if (ready.load(std::memory_order_acquire)) {    // syncs with release
    use(data);  // safe to read data here
}

// WRONG: using data across threads without atomic/mutex
// int shared_data;  // non-atomic — UB on concurrent access

// CORRECT: protect with mutex or make atomic
std::mutex mtx;
std::unique_lock lock(mtx);
shared_data = 42;

cpp

// 错误：原子变量，但执行非原子复合操作
std::atomic<int> counter{0};
if (counter == 0) counter = 1;   // 整体非原子操作！存在TOCTOU竞争

// 正确：使用compare_exchange
int expected = 0;
counter.compare_exchange_strong(expected, 1);

// 错误：同步标志使用relaxed内存顺序
std::atomic<bool> ready{false};
// 生产者线程：
data = 42;
ready.store(true, std::memory_order_relaxed);  // 错误：无先行发生关系

// 正确：使用release-acquire发布数据
// 生产者线程：
data = 42;
ready.store(true, std::memory_order_release);   // 与acquire同步

// 消费者线程：
if (ready.load(std::memory_order_acquire)) {    // 与release同步
    use(data);  // 此处读取data是安全的
}

// 错误：跨线程访问非原子/无互斥保护的数据
// int shared_data;  // 非原子变量——并发访问会导致未定义行为

// 正确：用互斥锁保护或改为原子变量
std::mutex mtx;
std::unique_lock lock(mtx);
shared_data = 42;

5. Happens-before reasoning

5. 先行发生（happens-before）推理

In C++, happens-before is established by:

Sequenced-before (within a thread):
  Statement A comes before B in code → A happens-before B

Synchronizes-with (across threads):
  store(release) → load(acquire) on SAME atomic variable
    → store happens-before load
    → everything before store happens-before everything after load

Thread creation/join:
  spawn(T) → any action in T         (create synchronizes-with)
  any action in T → join(T)          (join synchronizes-before)

Mutex:
  unlock(M) → lock(M) (next acquirer)

cpp

// Establishing happens-before across threads
std::atomic<int> flag{0};
int data = 0;

// Thread 1:
data = 42;                        // A
flag.store(1, memory_order_release); // B: A sequenced-before B

// Thread 2:
while (flag.load(memory_order_acquire) != 1) {}  // C: synchronizes-with B
int x = data;                     // D: C sequenced-before D
// D reads 42: A happens-before B synchronizes-with C sequenced-before D
//             → A happens-before D

在C++中，先行发生关系通过以下方式建立：

线程内顺序先行（Sequenced-before）：
  代码中语句A在B之前 → A先行发生于B

线程间同步先行（Synchronizes-with）：
  同一原子变量上的store(release) → load(acquire)
    → store先行发生于load
    → store之前的所有操作先行发生于load之后的所有操作

线程创建/等待：
  spawn(T) → T中的任意操作         （创建同步先行）
  T中的任意操作 → join(T)          （等待同步先行）

互斥锁：
  unlock(M) → lock(M)（下一个获取者）

cpp

// 跨线程建立先行发生关系
std::atomic<int> flag{0};
int data = 0;

// 线程1：
data = 42;                        // A
flag.store(1, memory_order_release); // B: A顺序先行于B

// 线程2：
while (flag.load(memory_order_acquire) != 1) {}  // C: 与B同步先行
int x = data;                     // D: C顺序先行于D
// D会读取到42：A先行发生于B，B同步先行于C，C顺序先行于D
//             → A先行发生于D

6. Rust concurrency — compile-time guarantees

6. Rust并发——编译期保障

Rust prevents data races at compile time via ownership:

rust

use std::sync::{Arc, Mutex};
use std::thread;

// Shared mutable state: Arc<Mutex<T>>
let counter = Arc::new(Mutex::new(0u32));

let c = Arc::clone(&counter);
let t = thread::spawn(move || {
    let mut val = c.lock().unwrap();
    *val += 1;
});

t.join().unwrap();
println!("{}", *counter.lock().unwrap());

// Rust prevents:
// - Sharing &mut T across threads (Sync not impl for &mut T)
// - Moving non-Send types to threads (compiler error)
// Use TSAN_OPTIONS with cargo test if TSan checks are needed:
// RUSTFLAGS="-Z sanitizer=thread" cargo +nightly test

Rust通过所有权机制在编译期防止数据竞争：

rust

use std::sync::{Arc, Mutex};
use std::thread;

// 共享可变状态：Arc<Mutex<T>>
let counter = Arc::new(Mutex::new(0u32));

let c = Arc::clone(&counter);
let t = thread::spawn(move || {
    let mut val = c.lock().unwrap();
    *val += 1;
});

t.join().unwrap();
println!("{}", *counter.lock().unwrap());

// Rust会阻止：
// - 跨线程共享&mut T（&mut T未实现Sync）
// - 将非Send类型转移到线程（编译报错）
// 若需要TSan检查，可配合cargo test使用TSAN_OPTIONS：
// RUSTFLAGS="-Z sanitizer=thread" cargo +nightly test

concurrency-debugging

Original

Translation

Concurrency Debugging

并发调试

Purpose

用途

Triggers

触发场景

Workflow

工作流程

1. ThreadSanitizer (TSan) — race detection

1. ThreadSanitizer（TSan）——竞争检测

Build with TSan

用TSan编译

or GCC

或使用GCC

Run (TSan intercepts memory accesses at runtime)

运行（TSan在运行时拦截内存访问）

TSan-specific options

TSan专属配置选项

2. Helgrind — lock-order and race detection

2. Helgrind——锁顺序与竞争检测

Run with Helgrind

使用Helgrind运行程序

Lock order violation report

锁顺序违规报告

3. Deadlock detection with GDB

3. 使用GDB检测死锁

Attach GDB to a deadlocked process

将GDB附加到死锁进程

Or run under GDB then trigger deadlock

或在GDB中运行程序并触发死锁

* 1 Thread 0x... (LWP 1234) "prog" ... in __lll_lock_wait ()

* 1 Thread 0x... (LWP 1234) "prog" ... in __lll_lock_wait ()

2 Thread 0x... (LWP 1235) "prog" ... in __lll_lock_wait ()

2 Thread 0x... (LWP 1235) "prog" ... in __lll_lock_wait ()

Threads blocked in __lll_lock_wait = waiting for mutex

处于__lll_lock_wait状态的线程=正在等待互斥锁

Find the mutex owner

查找互斥锁的持有者

prints TID of owning thread

输出持有线程的TID

Python script to dump all mutex owners (GDB 7+)

用于导出所有互斥锁持有者的Python脚本（GDB 7+）

4. std::atomic misuse patterns

4. std::atomic误用模式

5. Happens-before reasoning

5. 先行发生（happens-before）推理

6. Rust concurrency — compile-time guarantees

6. Rust并发——编译期保障

Related skills

相关技能