Race Detector & Healer for Java

Is able to detect and heal data races and atomicity violations.
1. Introduction
    1.1 What Is a Data Race?
    1.2 What Is an Atomicity Violation?

2. Race Detection & Healing Tool Capabilities
    2.1 Eraser Algorithm
    2.2 AtomRace Algorithm
    2.3 Healing of Detected Violations
    2.4 Obtaining of Correct Atomicity

3. How to Use the Race Detection & Healing Tool
    3.1 Running Eraser Algorithm
    3.2 Running AtomRace Algorithm
    3.3 Preparing a Set of Variables Which Should not Be Analyzed
    3.4 Preparing a Set of Variables Which Should Be Focused by AtomRace Noise Injection
    3.5 Obtaining Atomicity of Tested Application
    3.6 Healing of Detected Problems




1. Introduction

1.1 What Is a Data Race?

A data race occurs when two concurrent threads access a shared variable and when:

Usually a data race is a serious error caused by failure to synchronize properly.

1.2 What Is an Atomicity Violation?

Atomicity violation occurs if a block of code that operates with some variable and is intended to be executed without inference with other running threads is interleaved by some other thread which accesses the variable and so causes such unwanted inference. For instance, when one thread is executing a block of code incrementing a variable x (x++;), other threads should not change the value of x.

Back to top of the page

2. Race Detection & Healing Tool Capabilities

2.1 Eraser Algorithm

Race Detection & Healing tool can use modified version of the Eraser algorithm to detect violations in a locking policy. It is simply trying to identify which lock is used to protect a shared variable. If some thread access a shared variable without a proper lock, a race warning is logged. Eraser also tries to identify the lock which should be used with the shared variable and suggests to programmer to use it. Eraser maintains for each shared variable a set of locks used with it. The set is build since the variable becomes shared. If the set becomes empty the race warning is produced.

Eraser algorithms does not supports other than synchronized{} based, Thread.join() based synchronization primitives.

2.2 AtomRace Algorithm

Race Detection & Healing tool can use a new algorithm AtomRace for detecting atomicity violations. This algorithm can be used for detecting both data races (detected as a special kind of atomicity violation) and violation of predefined atomicities (block of code that should be executed without unwanted inference).

Data race detection is based on straightforward application of data race detection. Each instruction accessing a shared data is enclosed by pseudo instructions beforeAccess and afterAccess which delimits a primitive atomic section span just only one instruction the one which access a shared variable. AtomRace algorithm then checks if two primitive atomic sections defined around accesses to the same shared variable does not overlap. Overlapping of such sections implies that accesses are being simultaneous and data race is possible. Of course, a chance on detecting such situation is sometimes very low. Therefore this approach can be combined with noise injection technique which injects noise within primitive atomic sections and so makes them longer what increases the probability of detecting data race.

Of course atomic sections can be constructed to span more than only one instruction. In such a case, atomic section starts at beforeAccess to a shared variable and has several ending points. At least one of ending points is afterAccess to the same shared variable. Atomic section in this case span two or more accesses to the shared variable. Other ending points cover situations when an execution takes different execution path which does not contain afterAccess operation. When an overlapping of such atomic section and any other atomic section which can not be serialized is detected, warning concerning atomicity violation is produced. Again, the noise injection technique can be used to increase the probability of hitting the problem.

Thread executing atomicity

Other thread

Problem description

read - write

write

The second write relies on a value from the preceding read that is overriden by other thread.

read - read

write

The write by other thread makes the two reads have different views of the variable.

write - write

read

Intermediate result that is assumed to be invisible to other threads is read by a remote access.

write - read

write

The read does not receive the local result it expects.

All these scenarios are unserializable (more can be read for example in the article of AVIO tool). If some of this scenarios is detected the atomicity violation warning is produced.

2.3 Healing of Detected Violations

The basic idea of healing is that Race Detection & Healing tool tries to force the predefined correct atomicity. If there is a race or atomicity violation over a variable and if there is predefined atomicity present, the Race Detection & Healing tool use selected method to minimize the probability of context switch in the middle of the problematic atomicity section. Several methods has been implemented and are briefly described later in this text.

Several methods for forcing predefined correct atomicity have been designed and implemented. Most of the methods below are not able to heal detected problem totally but they can decrease the probability of its manifestation. The only exception is the NEWMUTEX method which can really force the predefined correct atomicity. But because there are no checks if the locking is legal yet, this method can cause deadlock. Available methods for healing:

2.4 Obtaining of Correct Atomicity

As an input for atomicity violation detection and for healing a correct atomicity (a set of atomic sections) of tested application has to be predefined. This is not an easy step (if it would be the self healing is not necessary and we can correct the program immediately). There are currently two possibilities of obtaining correct atomicity of tested application both of them have some drawbacks.

Pattern based static analysis identifies blocks of code that are likely to be intended to executed atomically based on looking for some typical programming constructions, for which such an assumption is usually done. Currently only load-and-store pattern detected on one line of source code is supported. As an example of such pattern an x++; can be taken. This one command loads the value of x and after incrementation stores a new value such operation should not be interleaved by any write access to x.

Inferring of correct atomicity by repeatedly running the tested application with AtomRace activated in an learning mode. This approach has two steps. Firstly an initial set of atomic sections is generated by static analysis. This analysis produces all intraprocedural atomic sections available for the tested application. The application is then repeatedly run with AtomRace which in learning mode produces a set of atomic sections violated during that run. These atomic sections are then in the case that the run produced correct results removed from the set of atomic sections AtomRace should watch. After this process of pruning the set of atomic sections the correct atomicity of application is identified.


Back to top of the page

3. How to Use the Race Detection & Healing Tool

Race Detection & Healing tool works only in conjunction with instrumented Java programs and ConTest which calls Race Detection & Healing tool at runtime. For this purpose the proper listeners/listeners.xml file must be present in the directory where KingProperties file is (see ConTest manual).There are several properties that can be set for Race Detection & Healing tool and which influents its functionality. Please refer to attached RDKingProperties file for further information.

There are several scripts atached which can be after a slight modification used to run the race detector:

3.1 Running Eraser Algorithm

The Eraser algorithm does not need predefined atomicity for data race detection. The detection can be run by following steps:

  1. Change listeners/listeners.xml file so ConTest invokes Eraser class from Race Detection & Healing tool.
  2. Change setting in RDKingProperties file section RACE DETECTING ALGORITHM / Eraser setting if necessary.
  3. Execute the run script.

A resulting log file is situated in a ConTest output directory called com_ibm_contest in subdirectory Race Detection & Healing toolReport in file called racedetect_$contestrunid.txt. The file contains either only Race detection done. line or a set of races detected finished by this line. Here is an example of the detected race warning:

Race possible for variable 'test.Airlines$Flight@578ceb->test.Airlines$Flight.soldSeats' Race caused by thread : 'Thread-6(java.lang.Thread@f84386)' at line : 'Airlines.java 102' Variable accessed by threads (mode): * Thread 'Thread-4(java.lang.Thread@a470b8)' (WRITE) at 'Airlines.java 104' - Thread candidate locks: none. * Thread 'Thread-6(java.lang.Thread@f84386)' (WRITE) at 'Airlines.java 102' - Thread candidate locks: none. You probably should use a lock to protect this variable. You can make a new one or use some lock used by other threads. Race detection done.

This warning describes a data race identified in file Airlines.java on line 102 when accessing an instance of Flight.soldSeats variable. Thread-6 did not used proper lock (from thread candidate locks sets we can see that none of threads used a lock) and so caused violation in locking policy. This warning does not imply that there is a race, it only warns that because the variable is not guarded by a proper locking policy there could be a race. If the application uses different synchronization policy for accessing the variable the Eraser algorithm can not detect it.

The synchronization policy used for the variable can either be manually checked or the name of the variable test.Airlines$Flight.soldSeats can be used to focus AtomRace to find the conflict caused by this race. Or if this warning is spurious, you can instruct Race Detection & Healing tool to does not detect problems on this variable. How to do that see text below.

3.2 Running AtomRace Algorithm

The AtomRace algorithm also does not need predefined atomicity to be used for detecting data races. However, this limits its ability to detect them and does not allow AtomRace to detect atomicity violation. For running AtomRace without predefined atomicity follow these steps:

  1. Change listeners/listeners.xml file so ConTest invokes AtomRace class from Race Detection & Healing tool.
  2. Change setting in RDKingProperties file section RACE DETECTING ALGORITHM / AtomRace setting if necessary.
  3. Execute the run script.

A resulting log file is situated in a ConTest output directory called com_ibm_contest in subdirectory racedetectorReport in file called racedetect_$contestrunid.txt. The file contains either only Race detection done. line or a set of races detected finished by this line. Here is an example of a detected race conflict warning:

Race possible for variable 'test.Airlines$Flight.soldSeats' The variable was accessed simultaneously by: - Thread: Thread-6(java.lang.Thread@17f1ba3) at 'Airlines.java 104' (READ) - Thread: Thread-8(java.lang.Thread@ecd7e) at 'Airlines.java 102' (WRITE) The variable should be declared as volatile or a proper synchronization should be added. Race detection done.

This warning describes a data race conflict warning. Two threads (Thread6 and Thread8) access the Flight.soldSeats variable at the same time. This two accesses are an evidence that the variable is not guarded correctly and several threads are able to access it simultaneously. Such warning issued for volatile and final variables are false alarms and can be easily suppressed by informing AtomRace not to detect problems on these variables (as is described later in this text).

The AtomRace algorithm provides better results if a set of atomic sections to be checked are given. The way of obtaining such a set is described below and for now it is enough to know that these are stored in race_detector/atomicity.xml file. The way of executing AtomRace is the same as in the previous example but the output log file can contain also warnings of violated atomicity:

Race possible for variable '1feca64->test.Airlines$Flight.soldSeats' The atomic section: From: Airlines.java 102 To: Airlines.java 102, Airlines.java -2 Executed by thread 'Thread-8(java.lang.Thread@76cbf7)' was violated by the following accesses: Thread-12(java.lang.Thread@143c8b3)(WRITE) at Airlines.java 104 Race detection done.

The warning describes an atomicity violation. Atomic section defined for variable test.Airlines$Flight.soldSeats and starting at Airlines.java line 102 and ending in the same line (the line -2 is used by ConTest to distinguish unhandled exception target) was violated by an access to the same variable from a different thread executing Airlines.java line 104. This is again a very concrete evidence of what happened. Both AtomRace warnings provide precise description of the recorded conflict.

If there is an atomicity given to AtomRace and rdAtomRaceLearn setting is enabled in RDKingProperties, after execution a file atomicity/violated_$contestid.xml is stored. This file contains a set of atomic sections violated durring the AtomRace execution. This set can be used for pruning the initial set of atomic sections.

3.3 Preparing a Set of Variables Which Should not Be Analyzed

There is no reason for detecting data races and atomicity violations on variables which are declared as final and/or volatile. Operations with those types of variables are proved by Java to be race tolerant. Those variables can be easily detected by a static analysis. For this follow these steps:

  1. Run the findOmit script.
  2. Set the rdOmitVariables option in RDKingProperties file to true.

This script produces race_detector/omitvariables file which contains on each line one name of a variable which should not be analysed because they are declared as final and/or volatile. This set of variables can be also used for specifying other variables which should not be analysed e.g. because there are some spurious false warnings produced by the detection algorithm.

3.4 Preparing a Set of Variables Which Should Be Focused by AtomRace Noise Injection

AtomRace algorithm is sensible to scheduling of threads used by the tested application. ConTest offers noise injection to locations chosen by ConTest algorithms. AtomRace can get use of ConTest noise injection mechanism to put ConTest noise into locations which help AtomRace to detect more conflicts and violations. To set up noise injection follow these steps:

  1. Create the noisevariables text file in the race_detector directory and fill it with names of variables (one variable name per line). Variable name must be in Race Detection & Healing tool format without instance identification (see omitvariabes file and/or warning files generated by Eraser or AtomRace).
  2. Set the rdNoiseVariables and rdNoiseFrequency option in the RDKingProperties file.
  3. Set the NoiseFrequency option to 0 in the KingProperties file (this disable ConTest noise injection). The ConTest noise strength, noise type and halt-one-thread options apply also for noise injected by AtomRace.
  4. Execute the run script.

This setting slowdown the application but detects problem with a higher probability. The best way how to use this option is to take all warnings obtained by the Eraser algorithm and focus AtomRace on those variables.

3.5 Obtaining Atomicity of Tested Application

As was written above there are two possibilities of producing a set of atomic sections which violation is detected by the AtomRace algorithm and which are necessary for correct healing of detected bugs. Firstly a pattern based approach will be described followed by an inferring approach.

Pattern based static analysis detects only a few atomic section (all which follow predefined patterns). Therefore this approach can be used for detecting atomicity violation (with nearly zero probability of obtaining false alarm) but it is not suitable for healing because can miss some atomicity (this can lead to atomicity violation even during healing process). To get pattern based atomicity follow these steps:

  1. Execute the findPatterns script which produces atomicity.xml file in the race_detector directory during execution ignore warnings concerning missing ConTest and Race Detection & Healing tool classes among tested classes (the warning is there because of analyzing instrumented bytecode which refers to these classes).

The produced XML file contains detected atomic sections in the following format:

<DOUBLEATOM> <BEGIN loc="UNPROVIDED Airlines.java bookTicket() 98 1" mode="READ"/> <END loc="UNPROVIDED Airlines.java bookTicket() 98 2" mode="WRITE"/> <END loc="UNPROVIDED Airlines.java bookTicket() -2 1" mode="EXIT"/> </DOUBLEATOM>

Each atomic section has one begin point and two or more possible ending points. This atomic section starts at line 98 and ends at the same line. The other end is a special EXIT type of end denoting a control flow path if an exception is thrown within the atomic section.

Inferring of correct atomicity still needs some help from the user. The inferring process consists of the following steps:

  1. Execute the findAtomicity script which produces atomicity.xml file in the race_detector directory during execution ignore warnings concerning missing ConTest and Race Detection & Healing tool classes among tested classes (the warning is there because of analyzing instrumented bytecode which refers to these classes). This script produces initial set of atomic sections for the tested application.
  2. Then execute the AtomRace algorithm with learning enabled by running the run script with proper settings. This produces a violation_$contestid.xml file in the atomicity directory. This file contains a set of violated atomic sections and if you call from your code static function Race Detection & Healing tool.setRunAsFail(); in the case of some check did not pass, the file will also contain this information otherwise all executions are considered as successful.
  3. Then if the execution ends without any problem (means that a problem, if there is some, did not manifest) you can safely remove violated atomic sections from the set of atomic sections. This can be done by executing the removeAtomicity script in the following way: ./removeAtomicity race_detector/atomicity.xml atomicity/violated_$contestid.xml .
  4. If there is no atomicity violations in all of latest executions the tested application the atomicity of the application has been inferred and the process can be finished. Otherwise repeat steps 2-4.

Please understand, that this process of inferring correct atomicity of the application is going to be changed and it is just only an idea how the inferring can be done. In the future, there is going to be more complex tool cooperating with code coverage and other available tools.

3.6 Healing of Detected Problems

The list of implemented healing techniques was given earlier in this text. For successful healing a correct atomicity of the program has to be given to the detecting algorithm. Firstly, a description of simple healing is given. Then, an AtomRace possibility of healing even the first occurrence of the problem is described. And finally, a way how to instruct Race detector to heal some variables from the beginning (without detecting a problem over it) is introduced.

Basic healing can be used with both algorithms Eraser and AtomRace. To enable healing follow these steps:

  1. Prepare a set of correct atomic sections for the problematic variable (or entire application).
  2. Set rdHealing option in RDKingProperties file and choose healing method by setting rdHealingMethod option below.
  3. Execute the run script.

This kind of healing starts to heal the tested application immediately after an algorithm detects a problem over some variable. This implies that the first occurrence of a problem is not healed because both algorithms usually detects a problem when it is happening. The first occurrence of a problem can be healed only when atomic sections are tracking what AtomRace algorithm is doing. To enable AtomRace to heal even the first occurrence of a problem follow these steps:

  1. Prepare a set of correct atomic sections for the problematic variable (or entire application).
  2. Set rdHealing option in RDKingProperties file and choose healing method by setting rdHealingMethod option below.
  3. Set rdAtomImmediateHealing option in RDKingProperties file.
  4. Execute the run script.

In this case, AtomRace is not only tracking the correct atomicity of the application but it also stop one or more threads before they execute a problematic instruction which can violate the predefined atomicity. This approach can also in some special cases cause a significant slowdown because sometimes AtomRace stop some thread till situation where it is evident whether a problem occurs or not.

Imagine a situation that you have detected a problem in a previous execution of the tested application and patch is not yet available. Then Race detector can be used to take care of correct atomicity in any following executions it simply starts healing from the beginning of the execution. To enable this option follow these steps:

  1. Prepare a set of correct atomic sections for the problematic variable (or entire application) and enable healing.
  2. Enable healing and choose newmutex healing method in RDKingProperties file.
  3. Make a file name race_detector/healvariables. Each line of the file must contain only one name of a problematic variable in an Race detector format (same as in noisevariables and omitvariables files).
  4. Enable rdHealVariables option in the RDKingProperties file.
  5. Execute the run script.


Back to top of the page

Acknowledgement

This work is partially supported by the European Community under the Information Society Technologies (IST) programme of the 6th FP for RTD - project SHADOWS, contract IST-035157. The authors are solely responsible for the content of this work. It does not represent the opinion of the European Community, and the European Community is not responsible for any use that might be made of data appearing therein. This work is partially supported by the Czech Ministry of Education, Youth, and Sport under the project Security-Oriented Research in Information Technology, contract CEZ MSM 0021630528, and by the Czech Grant Agency within project Advanced Formal Approaches in the Design and Verification of Computer-Based Systems, contract 102/07/0322, and project Methods and Tools for Automated Bug Detection in Software, contract 102/04/0780.