Title: Who Goes First? Detecting Go Concurrency Bugs via Message Reordering
Name: GFuzz
Journal/Conference: ASPLOS'22
In Short
Problem
go语言的多线程编程通过提供goroutines(一种lightweight的线程,在用户空间被创建并可高效重复使用)& channels(inter-goroutine communication)实现,广泛使用
pass messages through channels to connect goroutines -> not shared memory
- unique concurrent messages的运行顺序是non-deterministic的,开发者难以考虑到大量的所有可能的运行顺序,无法保证安全性
- channel-related的Bugs在go程序中仍然常见
因此提出针对go语言中channel-related的concurrency bug进行检测的工具GFuzz
Related Work
- channels的通信方式与传统C/C++/Java等语言的shared memory不同
- 已知常见的concurrency bug检测工作主要围绕shared-memory primitives和accesses进行
- 并不能检测go语言特有的message-passing primitives(channels)
- 只有少量message orders会导致concurrency bugs
- model-checking技术检查所有可能的message orders,低效
- go现存的bug detectors不能很好地解决concurrency bug检测问题
- 静态的patterns有限&分析不精确FP高&不能应用到大型项目中
- 动态的没有对concurrency问题展开讨论
- 竞争对手:GCatch
- 采用约束求解constraint solver方式
- (3小时内发现的比它总共发现的更多)
Challenges
- identify concurrent messages <- under the same
select
statement - identify & prioritize suspicious messages orders <- Fuzz, generate new message orders from existing ones, prioritize interesting orders closer to trigger new bugs(rely on execution feedback)
- Closer: calculate SCORE, by a unified formula(new channel/distinct consecutive execution pairs of channel operations; whether a channel is fulfilled)
- execution feedback
- order
- check whether a channel-related blocking bug is triggered <- a novel runtime sanitizer
Contribution
提出GFuzz,动态Fuzz技术检测工具,检测channel-related Go concurrency bugs。
- 变换concurrent messages的顺序(mutation),探索程序execution states
- 监测程序executions,捕获触发的blocking bugs(LARGELY UNDETECTED)
- blocking bugs:some goroutines are stuck and cannot complete their executions
- non-blocking bugs: all goroutines can finish, but may produce undesired results
Evaluation
- 7 real-world go software (Docker/Kubernetes/gRPC)
- found 184 new bugs: 170 blocking bugs & 14 non-blocking bugs
- 12 false positives
An Example
Design & Implementation
Discussion
Limitations
- 只通过变换顺序,事实上改变输入/shared-memory access顺序/thread-interleaving等方式都可以改变程序执行情况
- 只认为同样的
select
下的messages是concurrent的,有遗漏 - 没有修改input,能够触发的code coverage取决于input的质量
- high overhead,FP&FN
其他
- Rust 和 Kotlin支持
select
方式,1个线程等待多个channel operations <- 进一步通用
TODO Next
- rust / kotlin的concurrency bug情况调研(同样也有channel方式)
- go的非
select
的concurrency bug探讨 - MUZZ/Conzzer的思想能否用在其改进上
- 通用detector