Systems and methods for finding concurrency errors
Systems and methods for detecting concurrency bugs are provided. In some embodiments, context-aware communication graphs that represent inter-thread communication are collected during test runs, and may be labeled according to whether the test run was correct or failed. Graph edges that are likely to be associated with failed behavior are determined, and probable reconstructions of failed behavior are constructed to assist in debugging. In some embodiments, software instrumentation is used to collect the communication graphs. In some embodiments, hardware configured to collect the communication graphs is provided.
Latest University of Washington through its Center for Commercialization Patents:
- SELECTIVE MODIFICATION OF POLYMER SUBUNITS TO IMPROVE NANOPORE-BASED ANALYSIS
- COMPOSITIONS AND METHODS FOR IMPROVING NANOPORE SEQUENCING
- MASSIVELY PARALLEL CONTIGUITY MAPPING
- METHODS AND SYSTEMS FOR PERFORMING DIGITAL ASSAYS USING POLYDISPERSE DROPLETS
- CHROMOPHORIC POLYMER DOTS WITH NARROW-BAND EMISSION
This application claims the benefit of U.S. Provisional Application No. 61/420,185, filed Dec. 6, 2010, which is incorporated herein by reference in its entirety for all purposes.
STATEMENT OF GOVERNMENT LICENSE RIGHTSThis invention was made with government support under CNS-0720593 and CCF-0930512, awarded by the National Science Foundation. The government has certain rights in the invention.
BACKGROUNDConcurrency errors are difficult problems for developers writing multi-threaded applications to solve. Even expert programmers have difficulty predicting complicated behaviors resulting from the unexpected interaction of operations in different threads. Three exemplary types of concurrency errors are data races, atomicity violations, and ordering violations. Data races occur when two or more memory operations in different threads, at least one of which is a write, access the same memory location and are not properly synchronized. Atomicity violations happen when memory operations assumed to be executed atomically are not enclosed inside a single critical section. Ordering violations happen when memory accesses in different threads happen in an unexpected order. Some particularly difficult concurrency errors to resolve involve multiple variables. Though some efforts have been made to individually detect data races, locking discipline violations, and atomicity violations, what is needed are automated systems and methods for finding general concurrency errors, including multivariable errors and ordering violations.
SUMMARYThis summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In some embodiments, a nontransitory computer-readable medium having computer-executable instructions stored thereon is provided. If executed by one or more processors of a computing device, the instructions cause the computing device to perform actions to analyze a set of context-aware communication graphs for debugging. The actions comprise creating a set of aggregate reconstructions based on edges of the set of communication graphs, ranking the aggregate reconstructions in order of likelihood of being associated with a failed execution, and presenting one or more highly ranked aggregate reconstructions.
In some embodiments, a computer-implemented method of building a context-aware communication graph is provided. The method comprises detecting an access of a memory location by a first instruction of a first thread; updating a context associated with the first thread; and, in response to determining that a second instruction of a second thread different from the first thread was a last thread to write to the memory location, adding an edge to the context-aware communication graph, the edge including the context associated with the first thread, a sink identifying the first instruction, a source identifying the second instruction, and a context associated with the second thread.
In some embodiments, a computing device for detecting concurrency bugs is provided. The device comprises at least two processing cores, at least two cache memories, a coherence interconnect, and a communication graph data store. Each cache memory is associated with at least one processing core, and is associated with coherence logic. The coherence interconnect is communicatively coupled to each of the cache memories. The coherence logic is configured to add edges to a communication graph stored in the communication graph data store based on coherence messages transmitted on the coherence interconnect.
The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same become better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
Though it may be difficult to find through a mere inspection of the code listing 104, the Spider class includes a concurrency error. Specifically, there is an implicit assumption that Instruction K and Instruction M are included in a single atomic operation. Since there is no protection mechanism in place, multiple threads concurrently executing this code may sometimes experience an attempt to access a null pointer in Instruction N.
Thread one 110 begins by executing Instruction A and Instruction B to initialize the “items” variable and to set the “qsize” variable to “0.” Next, thread one 110 executes Instruction C to add the value “i” to the “items” variable, and executes Instruction D to increment the value of the “qsize” variable from “0” to “1.” Thread two 112 enters the “while” loop at Instruction J, and executes the check at Instruction K to determine whether the size of the Queue object is “0.” At Instruction I, thread two 112 accesses the “qsize” variable, which was last incremented to “1” by thread one 110. Thread two 112 will then proceed to Instruction M, because the value retrieved from the “qsize” variable was not “0.”
Next, thread three 114 proceeds to begin to dequeue the single item from the Queue object. At Instruction I, thread three 114 reads the “qsize” variable, and determines that it may proceed to dequeue an object. Assuming the execution of thread three 114 next proceeds to Instruction G, thread three 114 writes to the “qsize” variable, decrementing it to “0.”
Next, execution returns to thread two 112. At Instruction M, thread two 112 calls the Dequeue( ) function, which proceeds to Instruction E. At Instruction E, thread two 112 accesses the “qsize” variable, and determines that it is now “0” (as updated by thread three 114). At Instruction F, the Dequeue( ) function returns “null” in response to the value of the “qsize” variable, and so the value of “item” in Instruction M is set to “null.” At Instruction N, thread two 112 attempts to call the function GetD( ) on a pointer set to “null,” which causes an exception, a system crash, or some other undefined failure depending on the operating environment.
Communication Graphs
A communication graph may be used to represent communication between threads in a multi-threaded environment. In some embodiments, a communication graph includes one or more edges that represent communication events. Each edge includes a source node and a sink (or destination) node. The source node of an edge represents a write instruction. The sink node of an edge represents a read instruction or a write instruction that accessed the memory location written by the write instruction of the source node. In some embodiments, the communication graph may also include a source node for uninitialized states, thus allowing edges to be created when a memory location first accesses otherwise uninitialized memory locations.
Communication graphs may be context-oblivious or context-aware. In a context-oblivious communication graph, concurrency errors may lead to edges that are only present in graphs of buggy executions, and so may be useful for detecting some concurrency errors. However, if a given edge may be present in both failed executions and correct executions, such as in an interleaving error affecting multiple variables, a context-oblivious communication graph may not include enough information to detect the error.
In a context-aware communication graph, each edge may include information representing a relative order of communication events. One example of a context-aware communication graph is illustrated in
For ease of discussion, the description herein only analyzes the memory locations denoted by the variables “qsize” and “items,” so that each line of pseudocode may be considered to include a single instruction that affects a single memory location. Also, the description treats the variable “items” and the Add( ) function that affects it as affecting a single memory location. One of ordinary skill in the art will understand that, in some embodiments, context-aware communication graphs may describe every memory access separately, including multiple memory accesses for a single line of code.
The context stored in each node represents a relative order of communication events, and may be any suitable type of information for storing such information. In some embodiments, context information may include information uniquely identifying every dynamic memory operation. However, since the size of such a graph would continue to grow over time, it may be desirable to store a smaller set of context information that nonetheless represents sufficient detail to allow for the detection of concurrency bugs.
In some embodiments, the context information may include a sequence of communication events observed by a thread immediately prior to the execution of a memory instruction regardless of the memory location involved. The communication events may be stored in a FIFO queue of a predetermined length, such that once the queue is full, an oldest entry is discarded before adding a new entry. In some embodiments, the predetermined length of the FIFO queue may be any length, such as five elements, more than five elements, or less than five elements. In the embodiment illustrated in
In some embodiments, four types of communication events may be observed by a local thread. A local read (“LocRd”) is a read of a memory location last written by a remote thread. A local write (“LocWr”) is a write to a memory location last written by a remote thread. A remote read (“RemRd”) is a read of a memory location by a remote thread that was last written by the local thread. A remote write (“RemWr”) is a write to a memory location by a remote thread that was last written by the local thread. The type of event is what is stored in the context FIFO, without the memory location associated with the event.
In
A second node 206 refers to the second memory access in the execution trace, where thread one 110 executes Instruction B to initialize the “qsize” memory location. The second node 204 stores the instruction location (Instruction B) and a context, which currently contains a single element, “LocWr,” representing the local write to the “items” memory location at Instruction A. An edge (“Edge 2”) is created between the uninitialized state node 202 and the second node 204.
Two more nodes, a third node 208 and a fourth node 210, are added when thread one 110 executes Instruction C and Instruction D to update the “items” memory location and the “qsize” memory location, respectively. The context for the third node 208 is “LocWr, LocWr,” as the memory writes in Instruction A and Instruction B caused two LocWr states to be pushed onto the context FIFO queue for thread one 110, and the context for the fourth node 210 is “LocWr, LocWr, LocWr,” as the memory write in Instruction C caused another LocWr state to be pushed onto the context FIFO queue for thread one 110. No edges are created with the third node 208 or the fourth node 210 as a sink, because the last thread to write to the memory location in each case was the local thread, so there was no thread-to-thread communication.
A fifth node 212 is created when thread two 112 reads the “qsize” memory location at Instruction I. The context for thread two 112 contains “RemWr, RemWr, RemWr, RemWr,” representing the four remote write operations performed by thread one 110. An edge (“Edge 3”) is created having the fourth node 210 as the source node and the fifth node 212 as the sink node, because the fourth node 210 represents the last write operation to the “qsize” memory location, and because the last thread to write to the “qsize” memory location was not thread two 112, the thread currently accessing the “qsize” memory location.
A sixth node 214 is created when thread three 114 reads the “qsize” memory location at Instruction I. A remote read event was pushed onto the context FIFO for thread three 114 when thread two 112 read the “qsize” memory location, and so the context stored for the sixth node 214 is “RemRd, RemWr, RemWr, RemWr, RemWr.” An edge (“Edge 4”) is created having the fourth node 210 as the source node and the sixth node 214 as the sink node, because the fourth node 210 represents the last write operation to the “qsize” memory location, and because the last thread to write to the “qsize” memory location was not thread three 114, the thread currently accessing the “qsize” memory location. One should note that, in a context-oblivious communication graph, the interleaving between thread one 110 and thread two 112 and between thread one 110 and thread three 114 would be lost, because both memory reads would be represented by a single edge and would not be distinguishable by context.
A seventh node 216 is created when thread three 114 writes to the “qsize” memory location at Instruction G. A local read event was pushed onto the context FIFO for thread three 114 when it read the “qsize” memory location. The oldest element in the context FIFO, the remote read event added when thread one 110 executed Instruction A, was dropped from the context FIFO because the context FIFO was full before the local read event was pushed onto the context FIFO. Hence, the context stored for the seventh node 216 is “LocRd, RemRd, RemWr, RemWr, RemWr.” An edge (“Edge 5”) is created having the fourth node 210 as the source node and the seventh node 216 as the sink node, because the fourth node 210 represents the last write operation to the “qsize” memory location, and because the last thread to write to the “qsize” memory location was not thread three 114, the thread currently accessing the “qsize” memory location.
An eighth node 218 is created when thread two 112 reads from the “qsize” memory location at Instruction E. A remote read event was pushed onto the context FIFO for thread two 112 when thread three 114 read the “qsize” memory location, and a remote write event was pushed onto the context FIFO for thread two when thread three 114 wrote to the “qsize” memory location. The two oldest elements were removed from the full context FIFO, and so the context stored in the eighth node 218 is “RemWr, RemRd, LocRd, RemWr, RemWr.” An edge (“Edge 6”) is created having the seventh node 216 as the source node and the eighth node 218 as the sink node, because the seventh node 216 represents the last write operation to the “qsize” memory location, and because the last thread to write to the “qsize” memory location was not thread two 112, the thread currently accessing the “qsize” memory location. Edge 6 is illustrated as a dashed line, because it is this inter-thread communication that occurs in failed executions. Systems and methods for determining that Edge 6 is identified as being associated with a concurrency error are discussed in further detail below.
Collecting Communication Graphs
One of ordinary skill in the art will recognize that, in general, to access data from a memory location in main memory 302, a processor core checks if a valid copy of the data from the memory location is present in its associated cache. If so, the processor core uses the cached copy of the data. If not, the coherence interconnect 304 obtains data from the memory location either from another cache which has a valid copy of the data or from main memory 302. In some embodiments, the coherence interconnect 304 may be a coherence bus, a scalable coherence interface, or any other suitable coherence interconnect technology. In some embodiments, the main memory 302 may be any suitable computer-readable medium, such as SRAM, DRAM, flash memory, a magnetic storage medium, and/or the like. In some embodiments, each of the cache memories 312, 316, 320 includes coherence logic 314, 318, 322 that interacts with the coherence interconnect 304 to synchronize the contents of the cache memories.
One of ordinary skill in the art will recognize that each processor core 306, 308, 310 may be located in a separate physical processor, or may be separate processing cores in a single physical processor. Further, one of ordinary skill in the art will also recognize that three processor cores and three cache memories have been illustrated herein for ease of discussion, and that in some embodiments, more or fewer processor cores, and/or more or fewer cache memories, may be used. In addition, in some embodiments, additional levels of cache memory between the illustrated cache and the main memory, or between the illustrated cache and the associated processor core, may be used, multiple processor cores may be associated with a single cache memory, and/or multiple cache memories may be associated with a single processor core. In some embodiments, the computing device 300 may be a desktop computer, a laptop computer, a tablet computing device, a mobile computing device, a server computer, and/or any other suitable computing device having at least one processor that executes more than one thread.
Two ways of collecting context-aware communication graphs include adding software-based instrumentation that monitors memory accesses within the executable program to be studied, and adding hardware-based features that monitor memory accesses within an uninstrumented executable program.
In some embodiments, the components 454 include a graph analysis engine 456, a memory location metadata data store 458, a thread context data store 460, and a communication graph data store 462. The thread context data store 460 is configured to store a context FIFO queue for each thread executed by the computing device 400. The memory location metadata data store 458 is configured to store metadata for each memory location identifying at least an instruction and thread that last wrote to the memory location. The communication graph data store 462 is configured to store one or more communication graphs built using the information stored in the thread context data store 460 and the memory location metadata data store 458. The communication graph data store 462 may also store an indication of whether each communication graph is associated with correct behavior or failed behavior. The graph analysis engine 456 is configured to analyze a stored communication graph to find edges to be inspected for errors, as discussed further below.
In some embodiments, to analyze an executable program using the computing device 300, the executable program is instrumented to monitor memory accesses. For example, in some embodiments, a binary may be instrumented using the Pin dynamic instrumentation tool by Intel Corporation. As another example, in some embodiments, Java code may be instrumented using the RoadRunner dynamic analysis framework developed by Cormac Flanagan and Stephen N. Freund. The instrumentation tracks thread contexts, and memory location metadata while the program is executing, and builds the communication graph for storage in the communication graph data store 462. After collection, the graph analysis engine 456 may be used to analyze the communication graphs.
As understood by one of ordinary skill in the art, a “data store” may include any suitable device configured to store data for access by a computing device. Each data store may include a relational database, a structured flat file, and/or any other suitable data storage format.
For example, in some embodiments, the memory location metadata data store 458 may include a fixed-size hash table. To find metadata associated with a particular memory location, the memory location address modulo the hash table size may be used as an index into the hash table. In such an embodiment, a lossy collision resolution policy in which an access may read or overwrite a colliding location's metadata may be tolerated without unduly sacrificing performance if the fixed size of the hash table is large enough, such as having at least 32 million entries. As another example, in some embodiments that use a language such as Java and/or the like, the memory location metadata data store 458 may use a shadow memory feature of an instrumentation utility such as RoadRunner and/or the like to implement a distributed metadata table. Unique identifiers of memory access instructions in the bytecode may be used instead of instruction addresses. Contexts may be stored as integers using bit fields.
As yet another example, in some embodiments, a communication graph data store 462 may include a chaining hash table. To access the chaining hash table, a hash function may separately sum the entries in the source node context and the sink node context. Each node's sum may then be XORed with the instruction address of the node. The hash key may then be generated by XORing the result of the computation for the source node with the result of the computation for the sink node. As still another example, in some embodiments, a communication graph data store 462 may include an adjacency list and may use hash sets. In such an embodiment, nodes may be indexed by instruction address/context pairs. In some embodiments, other methods or data structures may be used within the communication graph data store 462, the memory location metadata data store 458, or any other data store described herein.
Each data store may include one or more non-volatile computer-readable storage media, such as a magnetic drive, optical drive, flash drive, and/or the like, and/or may include one or more volatile computer-readable storage media, such as DRAM, SRAM, and/or the like. Each data store may be accessible locally by the computing device, or may be accessible over some type of network. One of ordinary skill in the art will recognize that separate data stores described herein may be combined into a single data store, and/or a single data store described herein may be separated into multiple data stores, without departing from the scope of the present disclosure. For example, in some embodiments, partial communication graphs may be stored in separate communication graph data stores 462 that are local to each thread. In such an embodiment, performance may be improved by making addition of edges to the graph a thread-local operation. When such a thread ends, the partial communication graph may be merged into a global communication graph stored in a master communication graph data store 462.
As understood by one of ordinary skill in the art, the term “engine” as used herein refers to logic embodied in hardware or software instructions, which may be written in a programming language, such as C, C++, COBOL, JAVA™, PHP, Perl, C#, and/or the like. An engine may be compiled into executable programs or written in interpreted programming languages. Software engines may be callable from other engines, or from themselves. Generally, the engines described herein refer to logical modules that may be merged with other engines or applications, or may be divided into sub-engines. The engines may be stored on any type of computer-readable medium or computer storage device and be stored on and executed by one or more general purpose computing devices, thus creating a special purpose computing device configured to provide the engine.
Upon detecting a memory access, the information in the memory location metadata data store 458 may be consulted to determine whether an edge should be added to a communication graph, and then may be updated if the memory access is a write. For example, upon detecting the read of the “qsize” location by Instruction I at time 5 in thread two 112, the entry for the “qsize” location is checked, and it is determined that the last writer thread was not thread two 112 (see
Each processor core 706, 708, 710 is augmented with a context register 707, 709, 711. The context register 707, 709, 711 is configured to store a context FIFO queue, as described above, for a thread currently being executed by the associated processor core 706, 708, 710. Further, each cache line in each cache memory 712, 716, 720 is augmented with metadata 713, 717, 721 that describes the last instruction to write to the cache line. Details of the cache lines, including the metadata 713, 717, 721, are discussed further below with respect to
Whereas the cache memories illustrated in
In some embodiments, the modified coherence logic 715, 719, 723 is based on a modified MESI coherence protocol. Standard MESI coherence protocols are generally known in the art, and so are not discussed herein at length. However,
The modified coherence logic 715, 719, 723 may adhere to a normal MESI coherence protocol, but may augment some coherence messages to share information about the instructions involved with the communication. For example, when a read reply is transmitted, the modified coherence logic 715, 719, 723 may include the metadata 713, 717, 721 of the corresponding cache line to provide information for read-after-write (RAW) communication. As another example, when an invalidate reply or acknowledgement is transmitted, the modified coherence logic 715, 719, 723 may include the metadata 713, 717, 721 of the cache line that was invalidated to provide information for write-after-write (WAW) communication.
The modified coherence logic 715, 719, 723 monitors traffic on the coherence interconnect 704, and pushes context events into the context register 707, 709, 711 of the associated processor core 706, 708, 710 when appropriate. For example, the modified coherence logic 715, 719, 723 may push a local read event into the context register 707, 709, 711 upon detecting a local read miss, a local write event upon detecting a local write miss or upgrade miss, a remote write event upon detecting an incoming invalidate request, and a remote read event upon detecting an incoming read request.
When appropriate, the modified coherence logic 715, 719, 723 also updates the communication graph. For example, the modified coherence logic 715, 719, 723 may add an edge to the communication graph upon detecting a read reply, an invalidate reply, or a read miss serviced from memory 702. Upon detecting a read reply, an edge is added having a source node including information from the metadata included in the read reply, and a sink node including information relating to the local instruction that caused the miss and the context in which the miss happened. Upon detecting an invalidate reply, an edge is added having a source node including information from the metadata for the cache line that was invalidated, and a sink node including information relating to the local instruction that caused the invalidate request and the context in which the request originated. Upon detecting a read miss serviced from memory 702, an edge is added with a source node set to a null value and a sink node including information relating to the local instruction that caused the miss and the context in which the miss happened, to indicate that an otherwise uninitialized memory location was accessed.
Reconstructions
Context-aware communication graphs may be analyzed to determine instructions that are likely associated with failed program behavior. However, since concurrency bugs are difficult to diagnose, it would be helpful if a representation of the behavior of all threads around the instruction could be presented for debugging, and not just the single instruction or the single thread that failed. By adding timestamp data to the nodes of a context-aware communication graph, behavior likely to occur before, during, and after an instruction may be presented for debugging purposes. A reconstruction, according to various aspects of the present disclosure, presents communication nodes that occur before, during, and after an identified edge from a communication graph.
Though a reconstruction based on a single execution may be useful for understanding what occurred around a given edge, combining results from multiple executions may give a more complete picture of the behavior that is causing problems. However, since the problems represented are multi-threaded and indeterminate in nature, it is likely that even if an edge is repeated in multiple executions the associated reconstructions will not be the same.
The code was executed a plurality of times, and communication graphs were created for each execution. In those executions, four executions were identified that had a particular edge having node Y as the source node and node Z as the sink node. For each execution, a reconstruction 1202, 1204, 1206, 1208 was calculated based on the timestamps of the nodes in the communication graph around node Y and node Z. The reconstructions 1202, 1204, 1206, 1208 are slightly different in each case, reflecting the indeterminate nature of the execution.
To form the aggregate reconstruction 1210, the prefixes, bodies, and suffixes of each reconstruction 1202, 1204, 1206, 1208 are unioned together to form an aggregate prefix, an aggregate body, and an aggregate suffix. Nodes may appear in more than one portion of the aggregate reconstruction, because in some executions, a given node may occur before the sink node or source node, and in other executions, the given node may occur after the sink node or source node. Each node in the aggregate reconstruction 1210 is then assigned a confidence value, which indicates a proportion of executions for which the given node appeared in the given portion of the reconstruction. For example, node U in the body of the aggregate reconstruction 1210 is assigned a confidence value 1212 of 100%, because node U was present in the body of every reconstruction. Meanwhile, node S is assigned a confidence value 1214 of 50% in the prefix, and a confidence value 1216 of 50% in the body, because node S appeared in each portion of the reconstructions twice for the four executions. One of ordinary skill in the art will recognize that the other confidence values were similarly derived. In some embodiments, the nodes in the aggregate reconstruction 1210 are not ordered other than being segregated into prefix, body, and suffix portions, as the timestamps may not be comparable from one execution to another. The use of aggregate reconstructions and confidence values to find likely reconstructions that show failures will be discussed further below.
Using Context-Aware Communication Graphs for Debugging
Once collected, the context-aware communication graphs and reconstructions described above may be used to find concurrency errors.
From a start block, the method 1300 proceeds to block 1302, where a computing device is configured to collect context-aware communication graph information. The computing device may be a software-instrumented computing device 300, a hardware-instrumented computing device 700, or any other suitable computing device configured for collecting context-aware communication graph information, and may be configured as described above. Next, at block 1304, a procedure is performed wherein the computing device collects context-aware communication graphs for a set of failed executions and a set of correct executions.
The procedure 1400 then proceeds to a for loop between a start block 1404 and an end block 1410, wherein the test case is executed and a test case result is determined. In some embodiments, the for loop between blocks 1404 and 1410 is executed a predetermined number of times. In other embodiments, the for loop between blocks 1404 and 1410 may be executed until a predetermined number of failed test case results are collected, and/or any other suitable number of times. From the for loop start block 1404, the procedure 1400 proceeds to block 1406, where the computing device collects and stores a communication graph during execution of the test case. The computing device may collect and store the communication graph via a suitable technique as described above. At block 1408, the computing device associates the communication graph with a test case result. For example, an automated testing framework may store a failed test case result with the communication graph upon detecting that an error occurred or an expected result was not obtained, and may store a correct test case result with the communication graph upon detecting that an expected result was obtained without any errors. As another example, a test user may analyze the results of the test case, and may indicate whether a correct test case result or a failed test case result should be stored with the communication graph.
The procedure 1400 proceeds to the for loop end block 1410 and determines whether the for loop should be executed again. If so, the procedure 1400 returns to the for loop start block 1404. If not, the procedure 1400 proceeds to block 1412, where the computing device creates a set of failed communication graphs based on the communication graphs having failed test case results. At block 1414, the computing device creates a set of correct communication graphs based on the communication graphs having correct test case results. In some embodiments, the computing device may store the set of failed communication graphs and the set of correct communication graphs in the communication graph data store 762 or 462, while in other embodiments, the computing device may store the set of failed communication graphs and the set of correct communication graphs in a separate data store for future processing. The procedure 1400 then proceeds to an end block and terminates.
Returning now to
At block 1504, the graph analysis engine determines a correct execution fraction for the edges of the communication graphs based on a number of occurrences of the edges in the set of correct communication graphs and a total number of correct communication graphs. In some embodiments, the correct execution fraction for a given edge may be expressed by the following equation, wherein Fracc is the correct execution fraction for the edge, EdgeFreqc is the number of correct communication graphs in which the edge appears, and #Runsc is the total number correct communication graphs.
Next, at block 1506, the graph analysis engine determines a failed frequency ratio for the edges of the communication graphs based on the failed execution fraction and the correct execution fraction. In some embodiments, the failed frequency ratio for a given edge may be expressed by the following equation, wherein F is the failed frequency ratio:
In some embodiments, edges having a Fracc of zero may be particularly likely to be associated with failures, but would cause Function 3 above to be undefined. In such cases, the Fracc value may be replaced by a value that yields a large value for F. For example, in some embodiments, a Fracc of zero may be replaced by the following value:
The procedure 1500 then proceeds to block 1508, where the graph analysis engine selects a set of edges for further analysis based on the failed frequency ratios. In some embodiments, the graph analysis engine may select a predetermined number of edges having the highest failed frequency ratios. In some embodiments, the graph analysis engine may select edges having a failed frequency ratio greater than a threshold value. The procedure 1500 then proceeds to an end block and terminates.
Returning now to
From the for loop start block 1602, the procedure 1600 proceeds to another for loop between a for loop start block 1604 and a for loop end block 1608, wherein the for loop executes once for each failed communication graph containing the selected edge to create reconstructions for the selected edge for each failed communication graph. From the for loop start block 1604, the procedure 1600 proceeds to block 1606, where the graph analysis engine creates a failed reconstruction based on timestamps of the source node and the sink node of the selected edge in the failed communication graph, as well as timestamps of neighboring nodes in the failed communication graph. As discussed above with respect to FIG. KK, the failed reconstruction may be built by selecting nodes having timestamps between the timestamp of the source node and sink node of the edge, a predetermined number of nodes having timestamps before the timestamp of the source node, and a predetermined number of nodes having timestamps after the timestamp of the sink node.
The procedure 1600 then proceeds to the for loop end block 1608 and determines whether the for loop should be executed again. If so, the procedure 1600 returns to the for loop start block 1604 and calculates a failed reconstruction for another failed communication graph. If not, the procedure 1600 proceeds to block 1610, where the graph analysis engine creates an aggregate failed reconstruction for the selected edge based on frequencies of nodes in the prefix, body, and suffix of the created failed reconstructions. In some embodiments, the aggregate failed reconstruction for the selected edge may be built using a method similar to the construction of the aggregate reconstruction illustrated and described in
The procedure 1600 then proceeds to the for loop end block 1620 and determines whether the for loop should be executed again. If so, the procedure 1600 returns to the for loop start block 1602 and calculates an aggregate reconstruction for the next selected edge. If not, the procedure 1600 proceeds to an end block and terminates.
Returning now to
At block 1312, a procedure is performed wherein the graph analysis engine determines a difference in interleaving around the edge in failed communication graphs versus correct communication graphs. In some embodiments, the difference in interleaving may be represented by a context variation ratio, which is based on a comparison of a number of contexts in which either the source instruction or the sink instruction communicate in failed communication graphs versus correct communication graphs. Large differences between the number of contexts in correct communication graphs compared to failed communication graphs may be correlated with failures.
From a start block, the procedure 1700 proceeds to block 1702, where the graph analysis engine determines a source instruction and a sink instruction associated with the edge used to create the aggregate reconstruction. Next, at block 1704, the graph analysis engine determines a number of failed source contexts based on a number of nodes in the failed communication graphs that include the source instruction. The failed source contexts may include contexts from any node wherein the source instruction appears, whether the node is a source node or a sink node. The procedure 1700 proceeds to block 1706, where the graph analysis engine determines a number of failed sink contexts based on a number of nodes in the failed communication graphs that include the sink instruction. Again, the failed sink contexts may include contexts from any node wherein the sink instruction appears. Next, at block 1708, the graph analysis engine adds the number of failed source contexts and the number of failed sink contexts to obtain a number of failed contexts. The number of failed contexts represents a count of the contexts in which either the source instruction or the sink instruction communicates as represented by the failed communication graphs.
The procedure 1700 proceeds to block 1710, where the graph analysis engine determines a number of correct source contexts based on a number of nodes in the correct communication graphs that include the source instruction. At block 1712, the graph analysis engine determines a number of correct sink contexts based on a number of nodes in the correct communication graphs that include the sink instruction. As discussed above, the source contexts and sink contexts include nodes wherein the source instruction or sink instruction, respectively, are present in either a source node or sink node. The procedure 1700 proceeds to block 1714, where the graph analysis engine adds the number of correct source contexts and the number of correct sink contexts to obtain a number of correct contexts.
At block 1716, the graph analysis engine determines a context variation ratio based on the number of failed contexts and the number of correct contexts. The procedure 1700 then proceeds to an end block and terminates. In some embodiments, the context variation ratio C may be represented by the following equation, wherein #Ctxf is the number of failed contexts and #Ctxc is the number of correct contexts.
Returning now to
The method 1300 illustrated and discussed above relates to cases in which failed executions are distinguished from correct executions. However, similar techniques for analyzing context-aware communication graphs to find possible causes of concurrency errors using executions which are not known to be failed or correct may also be useful.
The equation functions to rank instructions that were executed in rare contexts higher to reflect their increased likelihood of being associated with failed behavior. At block 1808, the graph analysis engine ranks the instructions based on the associated instruction ranks to identify one or more instructions for presenting for debugging. In some embodiments, reconstructions and/or aggregate reconstructions may be built as described above based on the highly ranked instruction and/or one or more edges associated with the highly ranked instruction to make debugging easier. The method 1800 then proceeds to an end block and terminates.
One of ordinary skill in the art will recognize that the pseudocode, execution listings, and communication graphs illustrated and discussed above are exemplary only, and that actual embodiments of the present disclosure may be used to find other concurrency errors, for any suitable code listings and/or communication graphs. In some embodiments, other types of errors, such as performance bottlenecks and/or the like, may also be detected using similar systems and/or methods.
While illustrative embodiments have been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the claimed subject matter.
Claims
1. A nontransitory computer-readable medium having computer-executable instructions stored thereon that, if executed by one or more processors of a computing device, cause the computing device to perform actions to analyze a set of context-aware communication graphs for debugging, the actions comprising:
- creating, by the computing device, a set of aggregate reconstructions based on edges of the set of communication graphs;
- ranking, by the computing device, the aggregate reconstructions in order of likelihood of being associated with a failed execution; and
- presenting, by the computing device, one or more highly ranked aggregate reconstructions;
- wherein edges of the set of communication graphs represent communication events between threads;
- wherein nodes of the set of communication graphs each include an instruction address and a context; and
- wherein the context represents a sequence of communication events observed by a thread prior to the execution of an instruction at the instruction address regardless of the memory location involved in the sequence of communication events.
2. The computer-readable medium of claim 1, wherein the actions further comprise:
- selecting edges of the set of communication graphs for creating aggregate reconstructions based on a correlation of edges of the set of communication graphs with failed executions.
3. The computer-readable medium of claim 2, wherein selecting edges includes determining a correlation for one or more edges of the set of communication graphs with failed executions.
4. The computer-readable medium of claim 3, wherein determining the correlation for an edge of the set of communication graphs with failed executions comprises:
- determining a failed execution fraction for the edge;
- determining a correct execution fraction for the edge; and
- determining a failed frequency ratio based on the failed execution fraction and the correct execution fraction.
5. The computer-readable medium of claim 1, wherein each aggregate reconstruction includes an edge, wherein ranking the aggregate reconstructions includes calculating a score for each aggregate reconstruction, and wherein the score is based on at least one of:
- a correlation of the edge of the aggregate reconstruction with failed executions;
- a difference in interleaving around the edge between failed executions and correct executions; and
- a level of consistency for the aggregate reconstruction.
6. The computer-readable medium of claim 5, wherein the difference in interleaving around the edge between failed executions and correct executions is calculated by:
- calculating a number of failed contexts associated with a source node of the edge and a sink node of the edge from failed executions;
- calculating a number of correct contexts associated with the source node and the sink node from correct executions; and
- calculating a context variation ratio based on the number of failed contexts and the number of correct contexts.
7. The computer-readable medium of claim 5, wherein the level of consistency for the aggregate reconstruction is calculated by:
- calculating a first total of confidence values for each prefix node in the aggregate reconstruction;
- calculating a second total of confidence values for each body node in the aggregate reconstruction;
- calculating a third total of confidence values for each suffix node in the aggregate reconstruction;
- calculating a sum of the first, second and third total confidence values; and
- dividing the sum by a sum of a total number of prefix nodes, a total number of body nodes, and a total number of suffix nodes.
8. A computing device for detecting concurrency bugs, the device comprising:
- at least two processing cores;
- at least two cache memories, wherein each cache memory is associated with at least one processing core, and wherein each cache memory is associated with coherence logic;
- a coherence interconnect communicatively coupled to each of the cache memories; and
- a communication graph data store;
- wherein the coherence logic is configured to add edges to a communication graph stored in the communication graph data store based on coherence messages transmitted on the coherence interconnect;
- wherein edges of the communication graph represent communication events between threads;
- wherein nodes of the communication graph each include an instruction address and a context; and
- wherein the context represents a sequence of communication events observed by a thread prior to the execution of an instruction at the instruction address regardless of the memory location involved in the sequence of communication events.
9. The computing device of claim 8, wherein each cache memory includes a plurality of cache lines, each cache line including metadata associated with a last write to the cache line.
10. The computing device of claim 9, wherein the metadata includes a writer instruction address.
11. The computing device of claim 10, wherein the metadata includes a writer context.
12. The computing device of claim 11, wherein the metadata further includes a timestamp.
13. The computing device of claim 8, wherein each processing core includes a context register.
14. The computing device of claim 8, wherein the coherence logic is configured according to an MESI cache coherence protocol.
15. The computing device of claim 14, wherein the MESI cache coherence protocol includes:
- a read reply that includes a writer context and a writer instruction address of an associated cache line; and
- an invalidate reply that includes a writer context and a writer instruction address of an associated cache line.
16. The computing device of claim 15,
- wherein the coherence logic is configured to add an edge to a communication graph stored in the communication graph data store upon detecting a read reply;
- wherein the edge includes a source node and a sink node;
- wherein the source node includes the writer context and the writer instruction address of the read reply; and
- wherein the sink node includes a reader instruction and a context of a thread that caused a cache miss associated with the read reply.
17. The computing device of claim 15,
- wherein the coherence logic is configured to add an edge to a communication graph stored in the communication graph data store upon detecting an invalidate reply;
- wherein the edge includes a source node and a sink node;
- wherein the source node includes the writer context and the writer instruction address of the invalidate reply; and
- wherein the sink node includes a writer instruction and a context of a thread that caused the invalidate request to be generated.
7346486 | March 18, 2008 | Ivancic et al. |
7784035 | August 24, 2010 | Kahlon et al. |
7844959 | November 30, 2010 | Isard |
7865778 | January 4, 2011 | Duesterwald et al. |
7984429 | July 19, 2011 | Hunt |
8201142 | June 12, 2012 | Isard et al. |
8356287 | January 15, 2013 | Tzoref et al. |
8527976 | September 3, 2013 | Kahlon et al. |
8578342 | November 5, 2013 | Artzi et al. |
20050183094 | August 18, 2005 | Hunt |
20080097941 | April 24, 2008 | Agarwal |
20080201629 | August 21, 2008 | Duesterwald et al. |
20090113399 | April 30, 2009 | Tzoref et al. |
20090125887 | May 14, 2009 | Kahlon et al. |
20090217303 | August 27, 2009 | Grechanik et al. |
20110083123 | April 7, 2011 | Lou et al. |
20130014090 | January 10, 2013 | Takahashi |
- Burckhardt, S., et al., “A Randomized Scheduler With Probabilistic Guarantees of Finding Bugs,” Proceedings of the 15th Annual Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '10), Pittsburgh, Pa., Mar. 13-17, 2010, 12 pages.
- Flanagan, C., and S.N. Freund, “The RoadRunner Dynamic Analysis Framework for Concurrent Programs,” Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE '10), Toronto, Jun. 5-6, 2010, 8 pages.
- Flanagan, C., et al., “Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs,” Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08), Tucson, Ariz., Jun. 7-13, 2008, 11 pages.
- Hammer, C., et al., “Dynamic Detection of Atomic-Set-Serializability Violations,” 30th International Conference on Software Engineering (ICSE '08), Leipzig, Germany, May 10-18, 2008, 10 pages.
- Kononenko, I., “Estimating Attributes: Analysis and Extensions of RELIEF,” Proceedings of the European Conference on Machine Learning (ECML-94), Catania, Italy, Apr. 6-8, 1994, 12 pages.
- Liblit, B.R., “Cooperative Bug Isolation,” doctoral dissertation, University of California, Berkeley, Fall 2004, 172 pages.
- Lu, S., et al., “AVIO: Detecting Atomicity Violations via Access Interleaving Invariants,” Proceedings of the 12th International Conference of Architectural Support for Programming Languages and Operating Systems (ASPLOS '06), San Jose, Calif., Oct. 21-25, 2006, 12 pages.
- Lucia, B., and L. Ceze, “Finding Concurrency Bugs with Context-Aware Communication Graphs,” Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '09), New York, Dec. 12-16, 2009, 11 pages.
- Luk, C-K., et al., “Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation,” Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '05), Chicago, Jun. 11-15, 2005, 11 pages.
- Musuvathi, M., et al., “Finding and Reproducing Heisenbugs in Concurrent Programs,” Proceedings of the 8th USENIX Conference on Operating Systems Designs and Implementation (OSDI '08), San Diego, Dec. 8-10, 2008, pp. 267-280.
- Park, S., et al., “CTrigger: Exposing Atomicity Violation Bugs from Their Hiding Places,” Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '09), Washington, D.C., Mar. 7-11, 2009, 12 pages.
- Park, S., et al., “Falcon: Fault Localization in Concurrent Programs,” Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE '10), Cape Town, South Africa, May 2-8, 2010, vol. 1, pp. 245-254.
- Shi, Y., et al., “Do I Use the Wrong Definition? DeFuse: Definition-Use Invariants for Detecting Concurrency and Sequential Bugs,” Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA/SPLASH '10), Reno, Nev., Oct. 17-21, 2010, 15 pages.
- Xu, M., et al., “A Serializability Violation Detector for Shared-Memory Server Programs,” Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '05), Chicago, Jun. 11-15, 2005, 14 pages.
- Yu, J., and S. Narayanasamy, “A Case for an Interleaving Constrained Shared-Memory Multi-Processor,” Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09), Austin, Tex., Jun. 20-24, 2009, 11 pages.
Type: Grant
Filed: Dec 6, 2011
Date of Patent: Sep 9, 2014
Patent Publication Number: 20120144372
Assignee: University of Washington through its Center for Commercialization (Seattle, WA)
Inventors: Luis Ceze (Seattle, WA), Brandon Lucia (Seattle, WA)
Primary Examiner: Isaac Tecklu
Application Number: 13/312,844
International Classification: G06F 9/45 (20060101); G06F 9/44 (20060101);