Detecting And Mitigating Concurrency Bugs
As computing hardware moves to multi-core systems, future software needs to be parallelized in order to benefit from increasing computing resources. However, writing a correct parallel program is notoriously difficult, partly because of non-determinism in concurrent program executions. Because thread executions can be interleaved in many ways, a parallel program may produce a non-deterministic outcome even for identical program inputs if threads are not properly synchronized. Such a non-deterministic behavior, if not intentional, is often referred to as a concurrency bug. In this research, a solution is presented for efficient concurrency bug detection and mitigation. As data races are widely used as a way to identify potential concurrency bugs, this research presents an efficient hardware architecture, named RaceSMM, that enables run-time data race detection with high coverage (99%) and minimal performance overhead (4.8% on average). The proposed hardware mechanism is based on the happens-before vector clock algorithm, which is known for its accuracy yet considered to be expensive due to a large amount of meta-data. As the main optimization, the proposed architecture in this research decouples meta-data storage from regular caches so that expensive meta-data is only selectively stored for memory locations that are accessed by multiple threads within a relatively short period where most data races happen. While data races can detect a broad range of concurrency bugs where conflicting memory accesses are not controlled at all, recent studies show that many concurrency bugs are not detectable by data races. Hence, this research introduces a new heuristic for non-race concurrency bug detection, named ordersensitive critical sections, which extends the intuition in data races to capture non-race bugs in practice. The order-sensitive critical sections are defined as a pair of critical sections that can lead to non-deterministic shared memory state depending on the order in which they execute. This research presents a runtime algorithm, named OSCS, that uses the notion of order-sensitive critical sections to detect real-world concurrency bugs. Experiments show OSCS provides a good bug coverage (90%) for both non-race atomicity and ordering violations, with a small number of false positives. Additionally, OSCS can be supported in hardware efficiently while maintaining a high detection coverage (84%) and causing a negligible performance overhead. Finally, to provide a solution to further mitigate concurrency bugs post detection, an efficient deterministic replay scheme for multithreaded programs is introduced based on a concept of commutative critical sections. The commutative critical sections are critical sections in a multithreaded program that can be executed in any order while producing a consistent program output. In this research, we propose a deterministic replay scheme, named CommuteReplay, which allows the commutative critical sections to execute without enforcing an explicit deterministic order between the commutative critical sections to allow a noticeably better replay performance (reducing more than 20% of the overall performance overhead on average).
Parallel Programming; Concurrency Bugs; Deterministic Replay
Suh, Gookwon Edward
Albonesi, David H.; Topaloglu, Huseyin
Ph.D. of Electrical Engineering
Doctor of Philosophy
dissertation or thesis