STORAGE OF SOFTWARE EXECUTION DATA BY BEHAVIORAL IDENTIFICATION
Methods and systems for analyzing software. For example, one method can include executing a software program including a function by a computer. The method also includes producing an execution sequence for the function when, during execution, the software program executes the function. The method further includes generating an identifier for the execution sequence, wherein the identifier uniquely identifies a path of execution through the function represented by the execution sequence. In addition, the method includes saving the identifier and making the identifier available to at least one user through a user interface.
This application is a continuation-in-part of U.S. application Ser. No. 13/428,572, filed on Mar. 23, 2012, which claims priority to U.S. Provisional Application No. 61/466,818, filed on Mar. 23, 2011, the entire content of these applications is hereby incorporated by reference. This application is also a continuation-in-part of U.S. patent application Ser. No. 13/428,597, filed on Mar. 23, 2012, which claims priority to U.S. Provisional Application Ser. No. 61/466,828, filed on Mar. 23, 2011, the entire content of these applications is hereby incorporated by reference.
FIELDEmbodiments of the present invention relate to developing and analyzing computer software. For example, embodiments of the invention provide methods and systems for identifying unique behaviors of a software execution sequence, storing the unique behaviors, and using and/or exporting the stored unique behavior to assess the computer software.
BACKGROUNDSoftware is created from source code that is written by software developers. In the process of writing software, many defects are unintentionally introduced into the software code. These defects are generally referred to as “bugs,” and can be very difficult to isolate and understand using existing tools and methods. Accordingly, defect-free computer software has always been difficult to create. In all but a few instances, computer software knowingly contains many residual defects that are too elusive or subtle to economically remove.
For example, consider the following example of a small software function:
From initial inspection this function might be expected to behave in only four possible ways (i.e., one path for each “case” statement reached by evaluating argument “z”). However, there are additional behaviors to this example function that can be difficult to detect. For example, there is no “default” condition for the “switch” statement. Therefore, if the value of argument “z” is something other than 0, 1, 2, or 3, then no case statement will be reached and the “switch” statement will fall-through and return a 0. The effects of this defect can range from benign to catastrophic. Similarly, if the sum of arguments “x” and “y” result in a value of 0 when argument “z” is set to 1, the result will be a divide-by-zero exception (see “case 1”), which is generally viewed as a catastrophic error condition. Also, if argument “y” is greater than 31 when argument “z” is 2, the overflow of the shift operation will cause the return value to be 0 or −1 regardless of the value of argument “x.” Any of these behaviors can be very difficult to detect using conditional-capture methods. Also, the effects of any of these unwanted behaviors can be so catastrophic (such as a system reset) that they eradicate the evidence of the cause of the error. Similarly, in some situations, the effects of any of these unwanted behaviors can be so benign that nobody notices that something is incorrect or can happen so infrequently that they cannot be reproduced within a reasonable time frame (and, therefore, cannot be properly debugged using traditional software debugging tools). Note that the above example function is simple and used solely for illustration purposes. In real software development, functions are likely more complex, which leads to more potential behaviors (both wanted and unwanted). This complexity further complicates the debugging process.
SUMMARYWhen considering the task of discovering the root cause of a software defect, all of the answers are in the computer chip or system. In particular, the computer chip or system contains the cause of every bug, the value of every program variable, how every line of software actually behaves, and every software vulnerability and optimization opportunity. If it were possible to access and analyze this information in its entirety, then software development could be much easier and result in fewer residual bugs. This superabundance of information is always present in a computer that is running software, yet for much of the computer age this was too much information to export, collect, or process economically. In response, software debugging tools have been designed to limit the export of execution information to a tiny portion of the total available, to give software developers only the information they specifically request using tools and methods of conditional debugging.
For example, software developers traditionally have relied on tools and methods of conditional debugging. Conditional debugging requires software developers to pre-determine a condition or sequence of conditions that must be satisfied in the target computer before enabling the capture of execution data. Examples of conditional debuggers include breakpoint debuggers (where one or more predefined breakpoint conditions are set at fixed locations in the software code to enable data capture), single-step debuggers (wherein program code can be stepped instruction-by-instruction, resulting in manual data capture at instruction boundaries), print debugging (wherein the target software has additional instructions inserted to export data from predetermined locations), and real-time trace debuggers (wherein dedicated circuitry performs the real-time export of software execution data while the computer system is running at full speed, and includes triggering circuitry to enable data capture around a predefined condition or a predefined sequence of conditions).
A shortcoming of conditional debugging is that the developer must know in advance the exact condition around which to capture data for each and every behavior of interest that the software exhibits. For example, a software developer may become aware of a defect or undesirable behavior of software and begins searching for its cause. Using conditional debugging, the developer can set a breakpoint condition or trigger condition based on the developer's best guess of the possible cause of the incorrect behavior. The software program is then executed until the breakpoint or trigger condition is satisfied. When the condition is satisfied, execution data is collected. However, the collected execution data may not necessarily reveal the underlying cause of the incorrect behavior. In particular, in many situations, the developer needs to modify the breakpoint or trigger condition to more-correctly match the conditions of the incorrect behavior. The developer repeats this process until the defect is located. This iterative process can take hours or days to complete and typically results in the correction of just one software defect.
These forms of conditional debugging are highly intrusive. In particular, these techniques can alter the flow of program execution enough to make the original problems non-reproducible during debugging. Furthermore, these methods are created on the premise that a software developer will search for the cause of one known, reproducible bug at a time. Searching for one bug at a time requires the developer to first make an educated guess about where a particular defect originates (i.e., to set a breakpoint, trigger, or other mechanism to capture of the exact portion of execution data that contains evidence of the cause of the present problem). This search for defects is usually an iterative process, since the cause of software errors are often not easy to determine, and a series of iterations can add up to span a long time duration to find and correct just one error, particularly if the error has a low recurrence rate or is otherwise difficult to reproduce. Furthermore, these debugging techniques may only help a developer isolate software defects that the developer becomes aware of through external symptoms. Defects with subtle symptoms or very low recurrence rates can often elude detection through the entire development process, and end up shipping with the final product.
Breakpoint debuggers and other traditional conditional debugging tools are rooted in a past era wherein technical limitations prevented the economic export and capture of the vast amount of information available on the computer chip. A recent development is the real-time trace (“RTT”) port, such as ARM ETM, MIPS PDTrace and IEEE/ISTO Nexus-5001, which is specialized logic added to a computer system to non-intrusively export the vast amount of execution data present in the computer as it runs at full speed. As these RTT ports are capable of exporting very large quantities of data and as an aid to conditional debugging methods, they generally include condition-detection logic to signal that a pre-defined triggering event or sequence has occurred, which is then used to indicate the exported data should be captured for analysis in either an in-system buffer or by an external system.
Accordingly, software debuggers using RTT have been developed with a similar mindset as breakpoint debuggers. For example, the debuggers are used to capture a relatively small quantity of data around a pre-defined event or sequence. These RTT debuggers offer a similar set of features as their conditional debugger predecessors: breakpoints, single-stepping, examining variables, etc. using the data that has been captured from the RTT port.
Recent improvements in RTT debuggers involving the collection of larger quantities of real-time trace data show some promise as a more effective means of software debugging. These systems use fixed-size buffers of up to 4 gigabytes for high-bandwidth collection of several seconds of execution data, or employ spool-to-disk methods for low-bandwidth execution data collection over extended periods. The captured data can then be analyzed to obtain profiling or code coverage information, or replayed as though debugging a live computer target with a conditional debugger. For example, Lauterbach GmbH's “Real-time Streaming (ETMv3)” technology performs extended-duration recording of real-time trace data and creates profiling and code coverage summaries on-the-fly. Execution profiling and code coverage is useful and has been available for many years, but neither of these will detect the unique individual behaviors of the called functions. Correct and incorrect behaviors will be included ambiguously in the profiling and coverage summaries just like any other function-behavioral iteration. In short, these enhancements continue to rely on the developer to manually locate any behavioral anomalies. This crucial shortcoming is inherent in all conditional debuggers: they do not detect variations in the behavior of the software, nor do they use this as a basis for data collection.
As newly written software will typically contain many defects, the process of debug and test can take an unpredictably long period of time to complete, and can account for 80% of the total cost of software development. This has made computer software the most expensive and unpredictable component in many of the intelligent, connected devices that utilize computer software for enhanced functionality. These difficulties have remained remarkably constant for decades, despite continuing advances and repeated “breakthroughs” in software debugging technology.
Remembering that the answer to every software defect is on the computer chip, conditional debugging methods are hindering developers from getting the answers they need and are the direct cause of the high costs, unpredictable schedules, and poor resulting quality in software development.
Accordingly, embodiments of the present invention provide means to uniquely identify software behaviors at a point where the execution information is most abundant—inside the computer system. If implemented inside the computer system, this more effectively manages the limited capacity of conventional debug collection or export facilities for the exclusive use of unique software behaviors. Given sufficient capacity for behavior identifiers and execution data export or capture, continuous software behavioral analysis and behavioral anomaly capture can be accomplished for entire software programs and multi-program systems. This can also be implemented external to the target computer system, receiving execution data from a high-capacity RTT port or other resources. Both implementations improve the software development process by eliminating the need for conditional debugging and by enabling a more rigorous approach to software quality through providing a means to individually review and approve-or-improve every unique behavior exhibited by every software function.
Therefore, embodiments of the present invention provide methods and systems for identifying behavioral uniqueness of software execution sequences as a basis for collection and/or export of software execution data and related information. One method can include executing a software program and continuously producing a sequence of execution information. The method can also include determining if the execution information is within a functional boundary of the software program, and determining if the execution sequence of the execution information is a new execution sequence or a repeat execution sequence.
One system can include a functional boundary detector for continuously analyzing an execution information of a software program to determine if the execution information is within a functional boundary of said software program, an execution behavior identification number generator to create unique behavioral identifiers for unique execution sequences, and a comparator provided for determining if an execution sequence of the execution information is a new execution sequence or a repeat execution sequence and producing a unique detection signal if the new execution sequence is detected. Therefore, the system identifies behavioral uniqueness of software execution sequences.
In particular, embodiments of the present invention provide methods and systems for analyzing and debugging a software program. In one embodiment, a method for processing software includes executing a software program including a function, by a computer. The method also includes producing an execution sequence of the function when the software program executes the function. In addition, the method includes generating an identifier for the execution sequence, saving the identifier, and making the identifier available to at least one user through a user interface. The identifier uniquely identifies a path of execution through the function represented by the execution sequence
The method can also include accessing at least one data storage medium storing previously-generated identifiers associated with functions of the software program, and comparing the identifier to the previously-generated identifiers to determine whether the identifier is already stored in the at least one data storage medium. The operation of saving the identifier can include saving the identifier when the identifier is not already stored in the at least one data storage medium. The method can further include incrementing a count value associated with the identifier when the identifier is previously stored in the at least one data storage medium. The function includes a defined function or a specific code segment with sequential code instructions. A high count value can represent a higher frequency of execution of the execution sequence, which can be used to identify infrequently used execution sequences which may represent an execution sequence with an error. In an example, the identifier for the execution sequence includes a sum of operational code hash values or conditional execution instruction hash values for the execution sequence.
In another configuration, the method can further include executing a second function in the software program when encountering a function call, a call stack, a context switch, a switch statement, a branch point, or a conditional execution instruction. The next operations can include producing a second execution sequence of the second function, generating a second identifier for the execution sequence, where the second identifier uniquely identifies a path of execution through the second function represented by the second execution sequence, and saving the second identifier when the identifier is not already stored in the at least one data storage medium. The software program can include multiple distinct functions.
In another configuration, the method can further include generating a hash table of identifiers associated with functions of the software program, wherein each identifier includes a hash value, counting a number of times each execution sequence is encountered in the execution of the software program represented by the identifier for each execution sequence and associating a count with the corresponding identifier, and displaying the hash table of identifiers and the count associated with functions of the software program. The hash table can improve the accessibility and visualization of the identifiers and execution sequence of the software program. The method can include selecting the identifier, identifying source code or function variables representing the execution sequence of the function, and displaying the identifier with a link to the source code or function variables representing execution sequence of the function. The method can include identifying source code or function variables representing the execution sequence of the function; and saving the identifier with a link to the source code or function variables representing execution sequence of the function, or saving the identifier with the source code or values of the function variables representing execution sequence of the function. Linking the identifier to the source code in the source file can allow a user to quickly replay, analyze, or visualize the source file for an identifier.
Another embodiment of the invention can provide a system for processing software can include a processor and at least one data storage medium. The processor is configured to execute a software program comprising a function, produce an execution sequence of the function during execution of the function, and generate an identifier for the execution sequence. The identifier uniquely identifies a path of execution through the function represented by the execution sequence. The at least one data storage medium configured to save the identifier.
The processor can further be configured to access the at least one data storage medium storing previously-generated identifiers associated with functions of the software program, and compare the identifier to the previously-generated identifiers to determine whether the identifier is already stored in the at least one data storage medium. The processor is further configured to save the identifier when the identifier is not already stored in the at least one data storage medium. The system can include a counter configured to increment a count value associated with the identifier when the identifier is previously stored in the at least one data storage medium. The function includes a defined function or a specific code segment with sequential code instructions. The identifier for the execution sequence is derived from an arithmetic and/or logic operation on the operational code hash values or conditional execution instruction hash values for the execution sequence. The system can include a data buffer configured to collect execution sequences of functions in real-time during of the execution of the software program.
Another embodiment of the invention can provide a user interface configured to make unique behavior identifiers available to at least one user. A processor can be configured to generate an index table of identifiers associated with functions of the software program, where each identifier includes an index value. The at least one data storage medium can be configured to save the index table of identifiers. The user interface can be configured to make the index table of identifiers accessible to at least one user.
Other aspects of the invention will become apparent by consideration of the detailed description and accompanying drawings.
The accompanying drawings are incorporated in and constitute a part of the specification. The drawings, together with the general description given above and the detailed description of the exemplary embodiments and methods given below, serve to explain the principles of the invention. The objects and advantages of the invention will become apparent from a study of the following specification when viewed in light of the accompanying drawings
TABLE 1 illustrates computer instructions for the sample software function of
TABLES 2A, 2B, and 2C illustrate execution of instructions in TABLE 1.
TABLE 3 illustrates implementation of a software-only solution using the sample software function of
TABLE 4 illustrates effect of interrupts or exceptions on the processing of instruction trace compression.
DETAILED DESCRIPTIONBefore any embodiments of the invention are explained in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangement of components set forth in the following description or illustrated in the following drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways.
Reference will now be made in detail to exemplary embodiments and methods of the invention as illustrated in the accompanying drawings, in which like reference characters designate like or corresponding parts throughout the drawings. It should be noted, however, that the invention in its broader aspects is not limited to the specific details, representative devices and methods, and illustrative examples shown and described in connection with the exemplary embodiments and methods.
This description of exemplary embodiments is intended to be read in connection with the accompanying drawings, which are to be considered part, of the entire written description. The word “a” as used in the claims means “at least one” and the word “two” as used in the claims means “at least two.”
As used herein, a behavioral identifier, a captured execution behavior, compressed behavioral data, or an execution path identifier can refer to an identifier. A functional boundary can refer to beginning or end of an execution sequence for the function. Execution information or execution path can refer to the execution sequence for a single function or multiple functions. A function can include a defined function that has a function initialization and return value at the completion of the function. Alternatively, the function can include a specific code segment that can be grouped or blocked together because of the sequential nature of the code instructions or the repeatability of the code instructions. For example, a function can include a specific code segment that is manually defined by a user or can be automatically identified (e.g., based on a predefined number of instructions, location of particular types of instructions, such as breaks, returns, repeats, etc.).
As noted above, embodiments of the present invention provide methods and systems for analyzing and debugging a software program.
As such, embodiments of the present invention provide methods and systems for identifying behavioral uniqueness of software execution sequences. In particular, execution information is continuously analyzed to determine if a behavioral iteration of the computer program is unique or merely a repeat of previously-observed behavior. When a unique behavior is detected, the data of interest is captured, stored, and indexed by a behavioral identifier. The input data used to create a behavioral identification can include but is not limited to: execution trace data, program variables, execution timing, and related signals, conditions, and events. These data values are progressively combined into a behavioral identifier as the program executes and exported on software functional boundaries to be evaluated for uniqueness. Using the example software function described above in the Summary section, embodiments of the present invention uniquely identify the four case statements (i.e. cases 0-3) and the three additional behaviors (e.g., default condition, divide-by-zero condition, overflow condition) discussed above (i.e., if actually executed). A software developer could then review the collected behaviors at their leisure to determine if each behavior is correct or incorrect.
Using the behavioral capture method as described above provides benefits over conditional capture methods. For example, software developers no longer have to set conditional breakpoints or triggers in an iterative attempt to capture evidence of just one incorrect software behavior after another. Rather, every behavior is automatically captured the first time it occurs. This nearly eliminates the need to find and fix software bugs in an iterative approach, which commonly is one of the most expensive components of software development. In addition, since every behavior is uniquely identified and captured, including incorrect behaviors with otherwise subtle symptoms or low recurrence rates, defects can be corrected as soon as they happen at least one time. The result is improved software quality, with very low residual defect rates achievable without undue expense. Furthermore, the identification and capture can be performed on the entirety of executing software, not just those functions of interest to an individual developer. This enables an intimate knowledge of unfamiliar code to be gained quickly by a software developer. A process that is very difficult using existing methods.
Additional details regarding the method 100 illustrated in
In contrast,
A CAMData bus provides read/write access to the contents of the CAM 202 to the host system and debugging tools. These configuration and access interfaces provide the user with options to pre-load the CAM 202 with known good behavior identifiers which can then be ignored by the system resulting in the capture of unknown behaviors exclusively. Similarly, a user can pre-load known bad behavior identifiers, reserving capture for only these behaviors of interest. Furthermore, these behavior identifiers can be read from the CAM 202 and stored externally for future use. The CAM block 200 can also include event counters for each behavioral identifier element to indicate the accumulated total number of times each behavior has occurred during a given interval. In some embodiments, the CAM block 200 can also be paired with a secondary cache system to pre-load the related behavior identifiers for functional sections of a computer program as they are executed in a running system, thereby expanding the coverage of the system by effectively increasing the working size of the CAM 202.
As described above, after storing unique behaviors, the behaviors are made available to users for analysis. For example,
As noted above, embodiments of the invention can use different techniques for generating a unique behavior identifier. For example,
Referring to
An input stream or trace of the software execution information, generally depicted with the reference numeral 411 (e.g., the software instructions, the execution status, the address, and the like), is supplied (e.g., continuously) to an execution path identification creator 412 while the computer system 410 executes a software program. The trace represents an execution path through the software program or a portion thereof. For example, an execution path can be the path through which input data (i.e., the software execution instructions) passes during the period of being processed in operation modules of the computer system 410. In each operation module of the computer system 410, there are typically various branch points so that different input data can pass through different branches at these branch points. The branches through which the input data passes form an execution path of the input data.
The execution path identification creator 412 converts the input stream or trace of the software execution information 411 from the computer system 410 into a stream of encoded data values representing a specific path taken by the software execution information executed within each path. The data values are uniquely created for every specific execution path and serve as behavior identifiers for the executing software program. The stream of encoded data values represents at least one unique execution sequence of the software execution instructions. For example, in one embodiment, the execution path identification creator 412 continuously accesses the execution instructions of the computer software, identifies execution sequences of the software execution instructions, and creates a unique execution path identifier 414 of each of the execution sequences by summing the conditional execution instructions when the conditional execution instructions are within a functional boundary. Therefore, the execution path identification creator 412 creates a unique execution path identifier 414 representing a compressed unique execution sequence of the execution instructions. The resulting execution path identifier 414 is then available for writing to one or more storage devices 420.
As further illustrated in
As illustrated in
The above-described process continues until the functional boundary is reached in the program image (at block 510). At this point, a resulting sum 416 in the accumulator 418 is exported as a unique, repeatable representation of the behavior of that segment of the software program (at block 512). The accumulator 418 is then reset to a base value to begin accumulation of the path identification of the software execution information of the next segment of software program (e.g., the next function executed by the computer 410). The resulting sum represents an execution path identifier 414.
For example,
As illustrated in
Alternatively, if the execution information 411 has a relative value (at block 520), a determination is made whether the current execution address is known to the system 408 (at block 528). If so, then the relative address information is summed with the current known address and the address is looked up in a reference table (at block 526). If not, the next software execution information 411 is obtained (at block 518).
After the address is looked in the reference table, an opcode hash is added to and summarized in the accumulator 418 (at block 530). Then, a determination is made whether the functional boundary is reached (at block 532). If the boundary has not been reached, the next software execution information 411 is obtained (at block 518). If the boundary has been reached, however, the resulting sum in the accumulator 18 is exported as a unique, repeatable representation of the behavior of that segment of the software program (at block 534). The resulting sum represents the execution path identifier 414. The accumulator 418 is then reset to a base value to begin accumulation of the path identification of the software execution information of the next segment of the software program.
In some embodiments, the decoding and gap reconstruction are performed by the above-described flow steps, and their results are used with a reference table to look up the current instruction opcode and the current instruction opcode's pre-computed canonical hash, as well as the pre-computed functional boundaries and locations of conditional instructions. These are then presented to the accumulator 418 as described above.
In some embodiments, the system 408 continuously collects and categorizes execution information, thus imposing no limits on the software developer's visibility into the executing software program.
TABLE 2C illustrates how even small changes in the executing software program results in changes to the resulting execution path identification value for the affected path(s) but may leave other execution paths in the same software program unaffected. In this modified example, the value returned for the values of the variable “a” less than 1 has changed from 25 to 24, which represents a small change to the software program. However the resulting execution path identification value changes from “d4b696cd” to “7146c1b4.” This change only affects the path taken when the value of the variable “a” is greater than “0.” The execution path identifier produced when the variable “a” is less than “0” remains the same as before.
TABLE 3 illustrates insertion points for a software-only embodiment of the present invention. Using the same sample code from
TABLE 4 and
It should be understood that embodiments of the present invention are amenable to additional compression logic, which can increase the compression of the execution trace data. For example,
Therefore, the present invention provides a novel method and system of compressing software instruction execution trace sequences white simultaneously creating a unique identification for the sequence that is a direct representation of the software's behavior. The method and system of the present invention accesses information about the executed instructions in a computer system and converts that information into a uniquely representative identification of the specific conditions and execution path taken by a stream of execution.
In particular, embodiments of the present invention access execution trace data of a computer system. This trace data is analyzed to determine program functional boundaries. A behavioral identifier variable is initialized to a base value at the start of a program functional boundary. During execution within a program functional boundary, the execution trace data and other related data of interest is progressively combined with the behavioral identifier variable using arithmetic and/or logical operations until the end of the program functional boundary, at which point the behavioral identifier variable is exported to a behavior uniqueness detector. The behavior uniqueness detector maintains a store of behavioral identifiers to be compared with the newly presented behavioral identifiers as a test of uniqueness. If the presented identifier does not exist in the store, the presented identifier is added to the store and a signal is asserted that the behavior is unique, and the associated execution data around and including the unique behavior should be captured and stored in a storage system, such as a database, file system, or similar.
Further according to the present invention, pre-collected execution data is analyzed to create unique behavioral identifiers corresponding to functional boundaries within the target software program. These identifiers can then be used to index the pre-collected data, to eliminate duplicate behavior sequences from the pre-collected execution data, or in the creation of a common index for multiple buffers of pre-collected execution data.
Moreover, the sequence of the behavioral identifiers may be stored in the storage system sequentially as they appear. This enables a continuous reconstruction of the entirety of observed software execution to be created from the data in the storage system.
Also according to some embodiments of the present invention, the relevant executable software image and associated source files can be saved in the storage system, thus facilitating the anytime retrieval, reconstruction, and replay of the entirety of captured execution behaviors. Storing this data enables the on-demand replay, analysis, and visualization of not only all behaviors of all executed software functions, but also of every revision of every executed software function, using the correct source files and program image for reconstruction and presentation in a replay debugger or analyzer. This stored data also results in the creation of a self-assembling knowledge base of the entirety of behaviors exhibited by the target software, spanning all changes incurred during development and maintenance. Existing tools and methods routinely discard this valuable execution data, and generally provide no facility for correlated storage of the associated source and executable files.
Despite the ever-growing size and complexity of software programs, an insight into reducing and simultaneously organizing the abundant execution data of a software program is that the software program is executed strictly within rigidly defined segments of instructions that are interconnected by branching junctions that have a finite number of connections. Furthermore, the execution path that is actually taken by a running software program is most often a very small subset of all possible paths.
With this insight, a means of compressing the execution information based on execution information's behavior has been described in the present application. By replacing extended sequences of execution with a uniquely representative and consistently repeatable execution path identifier for every uniquely executed path in the software program, unexpected benefits are produced. For example, the execution path identifiers themselves are representative of distinct behaviors of the executed software functions, automatically classifying the execution trace data by the execution trace data's behavior. This simplifies software debugging, because every behavior of the software correct or incorrect is individually identified during compression, regardless of the behavior's transience or commonality. Reviewing the complete range of behaviors of the target program or any subset of interest can be done by decompressing the results at the appearance of each unique identifier type for the functions of interest. Also, the compression ratio can be an improvement over existing systems and can replace the trace data of thousands of instructions with a single representative value. In addition, because of the rigid-track nature of computer software execution, when observed over extended periods of time, a software program will spend the vast majority of time executing within a small subset of all possible paths and executing functions in frequently repeated sequences. This pattern of execution can be exploited to achieve extremely high compression ratios, by replacing extended sequences of already-observed functional unit executions with a single representative value.
Embodiments of the present invention therefore offer advantages by achieving higher compression ratios than existing systems, easing the burden of implementing into working computer systems, and providing compressor output that is a direct representation of the functional behavior of the target software. Embodiments of the present invention can also be used as an identifier for defect isolation and execution profiling, to assist software developers in rapidly learning intimate details about unfamiliar software code, and more.
Embodiments of the present invention is suitable for a plurality of embodiments including implementation in computer logic (thereby reducing the required capacity for trace export and storage); implementation with existing real-time trace processors, and as a software-only implementation for use with computer systems that may have no real-time trace export capabilities. By classifying the trace data by the behavior of the software being traced while compressing the trace data can overcome many of the difficulties found in existing systems and methods, embodiments of the present invention can achieve higher compression ratios than previous techniques discussed above, while producing a result that is simpler to use for the tasks of software debugging, software testing and analysis, and in gaining a deeper understanding of how the software actually behaves during full-speed execution.
Also according to some embodiments of the present invention, methods and systems are provided for inserting pre-computed software instructions into specific points of a software application to create unique execution path identifiers using a software-only approach. One method can include analyzing the target software to determine the appropriate canonical hash values and appropriate insertion points in the application, inserting these additional instructions into the application at the appropriate conditional instructions and branch points, accumulating and storing the unique execution path identifiers at runtime to a designated memory buffer or output port, and retrieving the resulting execution path identifiers at runtime for immediate use or storage.
Through the methods and systems according to embodiments of the present invention, execution behavior identifiers can be created and collected from an operating computer system using minimal system resources. The identifiers can also be compared to a computed set of identifiers representing a fill reconstruction of the execution path taken by the application. This results in abundant information that is pre-classified by behavioral type and therefore easier to differentiate which identifier represents software that is running in normal, expected ways, and which represents software that is running in new, potentially anomalous, and unexpected ways. This is particularly useful for software debugging, where countless hours are spent using existing techniques attempting the capture of transient events that are not yet fully understood. Embodiments of the present invention are also useful to quickly gain a deep understanding of unfamiliar software, because every behavior the software exhibits can be immediately identified as the behavior occurs. These benefits can be amplified when embodiments of the present invention are paired with additional system data capture, such as correlated capture of program variables, execution timing information, or external system signals at runtime.
Some of the functional units described in this specification have been labeled as modules, in order to more particularly emphasize their implementation independence. For example, a module may be implemented as a hardware circuit comprising custom very-large-scale integration (VLSI) circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices or the like.
Modules may also be implemented in software for execution by various types of processors. An identified module of executable code may, for instance, comprise one or more blocks of computer instructions, which may be organized as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different locations which comprise the module and achieve the stated purpose for the module when joined logically together.
Indeed, a module of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Similarly, operational data may be identified and illustrated herein within modules, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices. The modules may be passive or active, including agents operable to perform desired functions.
The technology described here can also be stored on a non-transitory computer readable storage medium (e.g., data storage medium) that includes volatile and non-volatile, removable and non-removable media implemented with any technology for the storage of information such as computer readable instructions, data structures, program modules, or other data. Computer readable storage media include, but is not limited to, random-access memory (“RAM”), read only memory (“ROM”), erasable programmable ROM (“EPROM”), electrically EPROM (“EEPROM”), flash memory or other memory technology, compact disc-read-ROM (“CD-ROM”), digital versatile disks (“DVD”) or other optical storage, flash drive, solid state drive, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or any other computer storage medium which can be used to store the desired information and described technology.
The devices described herein may also contain communication connections or networking apparatus and networking connections that allow the devices to communicate with other devices. Communication connections are an example of communication media. Communication media typically embodies computer readable instructions, data structures, program modules and other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. A “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency, infrared, and other wireless media. The term computer readable media as used herein includes communication media.
The foregoing description of embodiments of the present invention has been presented for the purpose of illustration. The description is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obvious modifications or variations are possible light of the above teachings. The embodiments disclosed herein were chosen in order to best illustrate the principles of the present invention and its practical application to thereby enable those of ordinary skill in the art to best utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated, as long as the principles described herein are followed. Thus, changes can be made in the above-described invention without departing from the intent and scope thereof. For example, the various configurations of the system and methods described and illustrated in the present application can be combined and distributed in various ways. It is also intended that the scope of the present invention be defined by the claims appended thereto.
Claims
1. A method for processing software, the method comprising:
- executing a software program, by a computer, the software program comprising a function;
- when, during execution, the software program executes the function, producing an execution sequence of the function;
- generating an identifier for the execution sequence, wherein the identifier uniquely identifies a path of execution through the function represented by the execution sequence;
- saving the identifier; and
- making the identifier available to at least one user through a user interface.
2. The method of claim 1, further comprising:
- accessing at least one data storage medium storing previously-generated identifiers associated with functions of the software program; and
- comparing the identifier to the previously-generated identifiers to determine whether the identifier is already stored in the at least one data storage medium.
3. The method of claim 2, wherein saving the identifier includes saving the identifier when the identifier is not already stored in the at least one data storage medium.
4. The method of claim 2, further comprising incrementing a count value associated with the identifier when the identifier is previously stored in the at least one data storage medium.
5. The method of claim 2, wherein the function includes a defined function or set of instructions.
6. The method of claim 1, wherein identifier for the execution sequence includes a sum of operational code hash values or conditional execution instruction hash values for the execution sequence.
7. The method of claim 1, further comprising:
- executing a second function in the software program when encountering a function call, a call stack, a context switch, a switch statement, a branch point, or a conditional execution instruction;
- producing a second execution sequence of the second function;
- generating a second identifier for the execution sequence, wherein the second identifier uniquely identifies a path of execution through the second function represented by the second execution sequence; and
- saving the second identifier when the identifier is not already stored in the at least one data storage medium.
8. The method of claim 1, further comprising:
- generating a hash table of identifiers associated with functions of the software program, wherein each identifier includes a hash value;
- counting a number of times each execution sequence is encountered in the execution of the software program represented by the identifier for each execution sequence and associating a count with the corresponding identifier; and
- displaying the hash table of identifiers and the count associated with functions of the software program.
9. The method of claim 1, further comprising:
- selecting the identifier;
- identifying source code or function variables representing the execution sequence of the function; and
- displaying the identifier with a link to the source code or function variables representing execution sequence of the function.
10. The method of claim 1, further comprising:
- identifying source code or function variables representing the execution sequence of the function; and
- saving at least one selected from the group comprising the identifier with a link to the source code or function variables representing execution sequence of the function and the identifier with the source code or values of the function variables representing execution sequence of the function.
11. At least one non-transitory machine readable storage medium comprising a plurality of instructions adapted to be executed to implement the method of claim 1.
12. A system for processing software, the system comprising:
- a processor configured to execute a software program comprising a function; produce an execution sequence of the function during execution of the function; generate an identifier for the execution sequence, wherein the identifier uniquely identifies a path of execution through the function represented by the execution sequence; and
- at least one data storage medium configured to save the identifier.
13. The system of claim 12, further comprising a user interface configured to make the identifier available to at least one user.
14. The system of claim 12, wherein
- the processor is further configured to generate an index table of identifiers associated with functions of the software program, wherein each identifier includes an index value;
- the at least one data storage medium configured to save the index table of identifiers; and
- the user interface configured to the index table of identifiers to the at least one user.
15. The system of claim 12, wherein the processor is further configured to
- access the at least one data storage medium storing previously-generated identifiers associated with functions of the software program; and
- compare the identifier to the previously-generated identifiers to determine whether the identifier is already stored in the at least one data storage medium.
16. The system of claim 15, wherein the processor is further configured to save the identifier when the identifier is not already stored in the at least one data storage medium.
17. The system of claim 16, further comprising a counter configured to increment a count value associated with the identifier when the identifier is previously stored in the at least one data storage medium.
18. The system of claim 12, wherein the function includes a defined function or a specific code segment with sequential code instructions.
19. The system of claim 12, wherein identifier for the execution sequence is derived from an arithmetic or logic operation on the operational code hash values or conditional execution instruction hash values for the execution sequence.
20. The system of claim 12, further comprising a data buffer configured to collect execution sequences of functions in real-time during of the execution of the software program.
Type: Application
Filed: Jun 13, 2014
Publication Date: Nov 13, 2014
Inventor: Neil Craig Puthuff (McLean, VA)
Application Number: 14/304,050
International Classification: G06F 11/36 (20060101);