Method and apparatus for non-deterministic incremental program replay using checkpoints and syndrome tracking
Methods and apparatus are provided for non-deterministic incremental program replay using checkpoints and syndrome tracking. Replay of a program proceeds by, for a given execution of the program, recording one or more checkpoints of the program, the one or more checkpoints containing program state information; and a recorded list of values for one or more identified variables executing in one or more threads of the program. Thereafter, during a replay execution of the program, the process continues by commencing execution from a particular one of the recorded checkpoints; restoring the program state information associated with the particular one of the recorded checkpoints; comparing an observed list of values to the recorded list of values for the one or more identified variables executing in each of the one or more threads; and identifying a difference between the observed list of values and the recorded list of values. A perturbation or suspend statement can optionally be introduced into the replay execution of the program.
Latest IBM Patents:
The present invention relates generally to software application programming, and more particularly, to techniques for program replay under non-deterministic conditions.
BACKGROUND DESCRIPTIONThe amount of program development time spent on debugging is a well-known problem that is further exacerbated by increasing software complexity. In part, this complexity derives from the use of new software technologies, including more sophisticated programming paradigms, and the increasing use of available components or libraries, and increasing use of distributed computing. Furthermore, multi-threaded computing is becoming more pervasive due to several factors, including: (i) application requirements for multi-tasking, especially to compensate for computing time lost during transaction waits; (ii) the increasing availability of multi-core computers, whose key feature is the leverage of threads to improve computing performance; and general increased software complexity with component usage which may itself impose threading on new or existing applications.
From a debugging viewpoint, these complex combinations of factors increase the difficulty of locating program defects. However, amongst those factors, non-determinism poses the greatest challenge. Non-determinism constitutes a set of influencing factors, usually external to an application, that make reproducibility of a run difficult. Such factors include data non-determinism, such as clock readings, or database updates spanning various runs. Non-determinism due to timing is another major inhibitor to reproducibility. Timing factors include thread scheduling, or interception of events, such as I/O events or human interaction events. Thread schedule timing is heavily influenced by the current system load, or computing resource availability. All of these factors are particularly difficult to manage, in view of reproducing a computer application execution that could reveal a critical programming flaw.
A need therefore exists for methods and apparatus for dealing with non-determinism for program replay, addressing the issues presented by the above-mentioned factors. Yet another need exists for methods and apparatus that facilitate application replay, accounting for non-determinism. A further need exists for methods and apparatus for non-deterministic incremental program replay using checkpoints and syndrome tracking.
SUMMARY OF THE INVENTIONGenerally, methods and apparatus are provided for non-deterministic incremental program replay using checkpoints and syndrome tracking. According to one aspect of the invention, replay of a program proceeds by, for a given execution of the program, recording one or more checkpoints of the program, the one or more checkpoints containing program state information; and a recorded list of values for one or more identified variables executing in one or more threads of the program. Thereafter, during a replay execution of the program, the process continues by commencing execution from a particular one of the recorded checkpoints; restoring the program state information associated with the particular one of the recorded checkpoints; comparing an observed list of values to the recorded list of values for the one or more identified variables executing in each of the one or more threads; and identifying a difference between the observed list of values and the recorded list of values.
The observed list of values can comprise, for example, before and after values for each value change for each of the one or more identified variables. The observed list of values can be stored as an ordered list of value changes for each of the one or more identified variables executing in the one or more threads of the program. The recorded list of values for the one or more identified variables can be obtained for a determined set of recorded threads of the program and wherein the replay execution of the program comprises replaying the determined set of recorded threads. The comparing step can be performed for each of the threads for each value change to compare before and after values for each value change.
According to a further aspect of the invention, a perturbation or suspend statement can be introduced into the replay execution of the program. In another aspect of the invention, where a plurality of threads in the program are inter-dependent, the plurality of inter-dependent threads are partitioned into a partition and the program threads in each of the partitions are replayed separately until a successful execution.
A more complete understanding of the present invention, as well as further features and advantages of the present invention, will be obtained by reference to the following detailed description and drawings.
The present invention provides methods and apparatus for application execution replay. In particular, the present invention provides methods and apparatus for non-deterministic incremental program replay using checkpoints and syndrome tracking. The present invention may be employed for program debugging or program replay. More particularly the present invention may be employed to recreate a program execution for debugging, and even more particularly to recreate debug executions influenced by non-determinism, for example, due to thread scheduling and influences of computer systems loads.
According to one aspect of the invention, data values of a selected set of data variables in an application are recorded for a specified set of threads, at various points in each thread. A recording of the values of these data states is made during a primary run. Secondary runs of the application are made wherein for the corresponding threads, the data values for data variables, at specified program execution locations, are compared to the values recorded in the primary run. If a variance in values is detected, an event is emitted requesting further action. In some embodiments, for example, response to this event includes program re-execution. Again in some embodiments, for example, response to this event may be the the halting of the execution by a debugger controlling the application for further exploration.
According to another aspect of the invention, the state recording mentioned above occurs between application checkpoints that provide sufficient information to restart an application at various points of execution. This facilitates the use of this invention for long-running and complex applications, in that when the detection event is emitted, replay can re-commence from a prior checkpoint.
According to another aspect of the invention, for any thread recording, the threads may be partitioned into execution groups whose executions are independent of each other, as determined by the application's design. In this aspect, the replay may proceed by running each group separately from others. In this manner, the replay is more granular, allowing potentially faster replay.
Perturbation threads can be instantiated that could impose further non-determinism on the application, thereby increasing the likelihood of uncovering further application defects.
Referring now to the drawings, and more particularly to
The processing system 100 optionally presents information to the user on display 107, which is coupled to the data processor 101. A user data entry device 108 (e.g., a keyboard or another interactive device) and a pointing device 109, for example, a mouse or a trackball, are also optionally coupled to the data processor 101.
The display 107 can provide a presentation space for the IDE (Integrated Development Environment) in order to display information related to the program replay. In further embodiments, either the pointing device 109 or predefined keys of the data entry device 108 may be used to manipulate the data in conformity with aspects of the present invention.
It is also contemplated that a persistent storage mechanism 110 may exist and be utilized to store application programs 105 and data 106. This type of storage media may include, but is not limited to, standard disk drive technology, tape, or flash memory. The program information 106 may be both stored onto the persistent media, and/or retrieved by similar processing system 100 for execution.
Again referring to
A mismatch 1011 is an indicator of non-determinism. Upon detecting a mismatch, a debugging tool can optionally be initiated to fix or understand the change in value between the observed and recorded executions. Alternatively, the replay can be restarted at the previous successful checkpoint, until a successful execution is completed. Generally, the present invention allows a user to reach a critical point in program execution for further analysis (bypassing intermediate discrepancies). It is noted that all threads should successfully execute between each checkpoint, before proceeding beyond the next checkpoint.
System and Article of Manufacture Details
As is known in the art, the methods and apparatus discussed herein may be distributed as an article of manufacture that itself comprises a computer readable medium having computer readable code means embodied thereon. The computer readable program code means is operable, in conjunction with a computer system, to carry out all or some of the steps to perform the methods or create the apparatuses discussed herein. The computer readable medium may be a recordable medium (e.g., floppy disks, hard drives, compact disks, or memory cards) or may be a transmission medium (e.g., a network comprising fiber-optics, the world-wide web, cables, or a wireless channel using time-division multiple access, code-division multiple access, or other radio-frequency channel). Any medium known or developed that can store information suitable for use with a computer system may be used. The computer-readable code means is any mechanism for allowing a computer to read instructions and data, such as magnetic variations on a magnetic media or height variations on the surface of a compact disk.
The computer systems and servers described herein each contain a memory that will configure associated processors to implement the methods, steps, and functions disclosed herein. The memories could be distributed or local and the processors could be distributed or singular. The memories could be implemented as an electrical, magnetic or optical memory, or any combination of these or other types of storage devices. Moreover, the term “memory” should be construed broadly enough to encompass any information able to be read from or written to an address in the addressable space accessed by an associated processor. With this definition, information on a network is still within a memory because the associated processor can retrieve the information from the network.
It is to be understood that the embodiments and variations shown and described herein are merely illustrative of the principles of this invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention.
Claims
1. A method for replaying a program, said method comprising the steps of:
- recording for a given execution of said program:
- one or more checkpoints of said program, said one or more checkpoints containing program state information; and
- a recorded list of values for one or more identified variables executing in one or more threads of said program; and
- during a replay execution of said program:
- commencing execution from a particular one of said recorded checkpoints;
- restoring said program state information associated with said particular one of said recorded checkpoints;
- comparing an observed list of values to said recorded list of values for said one or more identified variables executing in each of said one or more threads; and
- identifying a difference between said observed list of values and said recorded list of values.
2. The method of claim 1, wherein said observed list of values comprises an ordered list of value changes for each of said one or more identified variables executing in said one or more threads of said program.
3. The method of claim 1, wherein said observed list of values comprises before and after values for each value change for each of said one or more identified variables.
4. The method of claim 3, wherein said observed list of values comprises a time-stamp for each of said value changes.
5. The method of claim 1, wherein said recorded list of values for one or more identified variables is obtained for a determined set of recorded threads of said program and wherein said replay execution of said program comprises replaying said determined set of recorded threads.
6. The method of claim 1, wherein said comparing step is performed for each of said threads for each value change to compare before and after values for each value change.
7. The method of claim 1, wherein said identifying step further comprises the steps of launching a debugging tool or restarting said replay at a previous successful checkpoint.
8. The method of claim 1, further comprising the step of introducing a perturbation or suspend statement into said replay execution of said program.
9. The method of claim 1, wherein a plurality of threads in said program are inter-dependent and are partitioned into a partition and said method further comprising the step of replaying program threads in each of said partitions separately until a successful execution.
10. The method of claim 1, wherein said recorded list of values comprises a new value for said one or more identified variables following a value change.
11. A system for replaying a program, comprising:
- a memory; and
- at least one processor, coupled to the memory, operative to:
- record for a given execution of said program:
- one or more checkpoints of said program, said one or more checkpoints containing program state information; and
- a recorded list of value changes for one or more identified variables executing in one or more threads of said program; and
- during a replay execution of said program:
- commence execution from a particular one of said recorded checkpoints;
- restore said program state information associated with said particular one of said recorded checkpoints;
- compare an observed list of values to said recorded list of values for said one or more identified variables executing in each of said one or more threads; and
- identify a difference between said observed list of values and said recorded list of values.
12. The system of claim 11, wherein said observed list of values comprises an ordered list of value changes for each of said one or more identified variables executing in said one or more threads of said program.
13. The system of claim 11, wherein said observed list of values comprises before and after values for each value change for each of said one or more identified variables.
14. The system of claim 11, wherein said recorded list of values for one or more identified variables is obtained for a determined set of recorded threads of said program and wherein said replay execution of said program comprises replaying said determined set of recorded threads.
15. The system of claim 11, wherein said comparison is performed for each of said threads for each value change to compare before and after values for each value change.
16. The system of claim 11, wherein said processor is further configured to introduce a perturbation or suspend statement into said replay execution of said program.
17. The system of claim 11, wherein a plurality of threads in said program are inter-dependent and are partitioned into a partition and said processor is further configured to replay program threads in each of said partitions separately until a successful execution.
18. The system of claim 11, wherein said recorded list of values comprises a new value for said one or more identified variables following a value change.
19. An article of manufacture for replaying a program, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
- recording for a given execution of said program:
- one or more checkpoints of said program, said one or more checkpoints containing program state information; and
- a recorded list of values for one or more identified variables executing in one or more threads of said program; and
- during a replay execution of said program:
- commencing execution from a particular one of said recorded checkpoints;
- restoring said program state information associated with said particular one of said recorded checkpoints;
- comparing an observed list of values to said recorded list of values for said one or more identified variables executing in each of said one or more threads; and
- identifying a difference between said observed list of values and said recorded list of values.
20. The article of manufacture of claim 19, wherein said observed list of values comprises before and after values for each value change for each of said one or more identified variables.
Type: Application
Filed: Aug 21, 2006
Publication Date: Feb 21, 2008
Applicant: International Business Machines Corporation (Armonk, NY)
Inventors: Wim De Pauw (Scarborough, NY), Donald P. Pazel (Montrose, NY)
Application Number: 11/507,166
International Classification: G06F 9/44 (20060101);