eCommons

 

Detecting And Mitigating Concurrency Bugs

Other Titles

Abstract

As computing hardware moves to multi-core systems, future software needs to be parallelized in order to benefit from increasing computing resources. However, writing a correct parallel program is notoriously difficult, partly because of non-determinism in concurrent program executions. Because thread executions can be interleaved in many ways, a parallel program may produce a non-deterministic outcome even for identical program inputs if threads are not properly synchronized. Such a non-deterministic behavior, if not intentional, is often referred to as a concurrency bug. In this research, a solution is presented for efficient concurrency bug detection and mitigation. As data races are widely used as a way to identify potential concurrency bugs, this research presents an efficient hardware architecture, named RaceSMM, that enables run-time data race detection with high coverage (99%) and minimal performance overhead (4.8% on average). The proposed hardware mechanism is based on the happens-before vector clock algorithm, which is known for its accuracy yet considered to be expensive due to a large amount of meta-data. As the main optimization, the proposed architecture in this research decouples meta-data storage from regular caches so that expensive meta-data is only selectively stored for memory locations that are accessed by multiple threads within a relatively short period where most data races happen. While data races can detect a broad range of concurrency bugs where conflicting memory accesses are not controlled at all, recent studies show that many concurrency bugs are not detectable by data races. Hence, this research introduces a new heuristic for non-race concurrency bug detection, named ordersensitive critical sections, which extends the intuition in data races to capture non-race bugs in practice. The order-sensitive critical sections are defined as a pair of critical sections that can lead to non-deterministic shared memory state depending on the order in which they execute. This research presents a runtime algorithm, named OSCS, that uses the notion of order-sensitive critical sections to detect real-world concurrency bugs. Experiments show OSCS provides a good bug coverage (90%) for both non-race atomicity and ordering violations, with a small number of false positives. Additionally, OSCS can be supported in hardware efficiently while maintaining a high detection coverage (84%) and causing a negligible performance overhead. Finally, to provide a solution to further mitigate concurrency bugs post detection, an efficient deterministic replay scheme for multithreaded programs is introduced based on a concept of commutative critical sections. The commutative critical sections are critical sections in a multithreaded program that can be executed in any order while producing a consistent program output. In this research, we propose a deterministic replay scheme, named CommuteReplay, which allows the commutative critical sections to execute without enforcing an explicit deterministic order between the commutative critical sections to allow a noticeably better replay performance (reducing more than 20% of the overall performance overhead on average).

Journal / Series

Volume & Issue

Description

Sponsorship

Date Issued

2013-05-26

Publisher

Keywords

Parallel Programming; Concurrency Bugs; Deterministic Replay

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Suh, Gookwon Edward

Committee Co-Chair

Committee Member

Albonesi, David H.
Topaloglu, Huseyin

Degree Discipline

Electrical Engineering

Degree Name

Ph. D., Electrical Engineering

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record