Trace Driven Approach for Testing and Debugging Concurrent Programs

Mahmoud Said, Western Michigan University


In the era of multi-core systems, concurrent application development will most likely become the main archetype in software applications; and despite the effort and complexity required to develop concurrent software, utilizing the processing power of the rising rate of multi-core hardware will mandate that transition. Nevertheless, concurrent software is susceptible to concurrency problems and defects that can escape traditional testing and debugging techniques. This research introduces practical tools and methodologies to exploit runtime execution traces to simplify the process of testing and debugging concurrent applications. First, we propose a coverage-guided systematic testing framework that uses runtime execution traces to learn ordering constraints over shared object accesses and select only high-risk interleaving in future tests. Thus, it can increase the coverage of important concurrency scenarios with a reasonable cost and detect most of the concurrency bugs in practice. Second, we utilize symbolic analysis, based on SMT solvers, to introduce two effective execution-replay techniques: 1) record and replay a shared-memory multi-threaded program for multi-processor execution; and 2) predictive framework to generate concurrent program schedules with data race defects. More specifically, the record and replay solution can construct shared-memory dependencies between threads off-line during replay. The proposed solution significantly reduces the complexity of hardware support required for enabling replay. Our prototype shows that constructing shared-memory dependencies off-line reduces the recording overhead to 1%. In the predictive framework, the tool searches for a concrete thread schedule with a data race in non-erroneous execution trace of a program. If such schedule exists, it will generate a complete execution schedule, called witness, which later can be used to trigger the data race by deterministically replaying the execution. The proposed framework eliminates the need to re-run and track the execution until a data race manifests which is usually required to capture a trace containing such defect. Evaluation results show that our analysis is scalable enough for a post-mortem analysis to help programmers better understand the data races.