METHOD FOR VALIDATION OF BINARY CODE TRANSFORMATIONS
A method of validating binary code transformation in one aspect includes analyzing original program and transform program. Control flow graphs are generated for both programs. The two graphs are traversed to create respective linear invariant representations. The linear representations are compared to identify incorrect transformations.
Latest IBM Patents:
- DYNAMIC TIME-BASED DATA ACCESS POLICY DEFINITION AND ENFORCEMENT
- HOMOMORPHIC ENCRYPTED ONE-HOT MAPS USING INTERMEDIATE CHINESE REMAINDER THEOREM (CRT) MAPS
- MINIMIZING COGNITIVE OVERLOAD USING DYNAMIC INTERACTION ENVIRONMENT ADJUSTMENT
- Datacenter temperature control and management
- Translation support for a virtual cache
This application is a continuation of U.S. patent application Ser. No. 11/940,750 filed on Nov. 15, 2007.
FIELD OF THE INVENTIONThe present disclosure relates to optimizing computer executable codes, and particularly to a method for validating binary code transformation.
BACKGROUND OF THE INVENTIONOptimizing executable code is a known technique to improve the performance of code that has already been linked and is ready for execution. It is typically performed using a runtime profile of the code. Different optimization techniques are available such as inlining and code restructuring, which transform the code to functionally equivalent form. If the code optimization does not correctly transform the code to functionally equivalent form, unpredictable consequences may result, such as a program crash.
While there are existing technologies that perform validations on program source code, semantics of compiler's internal representation of a code, or even hardware level code, those technologies are incapable of handling the kind of transformations performed on the binary applications. Thus, what is desirable is a method that helps to validate the correctness of binary code transformations.
BRIEF SUMMARY OF THE INVENTIONA method for validating binary code transformations is provided In one aspect, the method may comprise analyzing binary code of an executable program to produce a sequence of basic units; generating control flow graph associated with the sequence of basic units; generating invariant linear function representation based on the control flow graph; analyzing optimized transformation of the executable program to produce a second sequence of basic units; generating second control flow graph associated with the second sequence of basic units; generating second invariant linear function representation based on the second control flow graph; comparing the invariant linear function representation and the second invariant linear function representation; and identifying one or more incorrect transformations in the optimized transformation.
A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform the above method may also be provided.
Further features as well as the structure and operation of various embodiments are described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements.
The binary code of the original program and the transformed program are analyzed, using various available techniques such as static techniques using relocation information and/or dynamic techniques by intercepting execution and recognizing the accessed basic units, and a control flow graph for both programs is generated. For each validated function, the two graphs are both traversed in consistent fashion, creating their linear invariant textual representations. These linear representations can be compared as simple text strings in order to identify incorrect transformation.
Referring back to
An edge in the CFG carries an execution count, that is, the number of times control passed along that edge when the program was executed. This information can be collected by various means, for example, the “pixie” tool, or the basic block profiling provided by standard compilers like GCC. An edge that carries relatively high execution count is termed hot edge. A basic block that executes many times relative to the average count is termed hot basic block.
Referring back to
The basic set of transformations includes at least code restructuring, function inlining, and hot-cold code motion. Code restructuring is an optimization, which places basic blocks close to each other if they are connected by relatively hot edges. For example, basic block A (shown in
Function inlining replaces the call instruction by a copy of the fiction in places where the call instruction is very hot. Hot-cold code motion optimization moves instructions from hot basic block to a colder one, making sure these instructions are properly replicated to preserve the semantics.
The following algorithm is used to create an invariant linear representation of a function in one embodiment. The representation is in a form of a sequence of strips. A strip is a possible path through the program CFG, that is, a trace of non-branch instructions that may execute sequentially when the program runs.
An example strip follows. In the example, the branch instructions, which are not part of the strips, are commented out.
At 108, the generated strips of two implementations of a function are compared. The comparison can be a textual or character-by-character comparison. Incorrect transformations are identified from the comparison. For example, the strip or strips corresponding to the transformed or optimized code that do not match the strip or strips of the original code are identified as being incorrect.
The system and method of the present disclosure may be implemented and run on a general-purpose computer or computer system. The computer system may be any type of known or will be known systems and may typically include a processor, memory device, a storage device, input/output devices, internal buses, and/or a communications interface for communicating with other computer systems in conjunction with communication hardware and software, etc.
The terms “computer system” and “computer network” as may be used in the present application may include a variety of combinations of fixed and/or portable computer hardware, software, peripherals, and storage devices. The computer system may include a plurality of individual components that are networked or otherwise linked to perform collaboratively, or may include one or more stand-alone components. The hardware and software components of the computer system of the present application may include and may be included within fixed and portable devices such as desktop, laptop, and server. A module may be a component of a device, software, program, or system that implements some “functionality”, which can be embodied as software, hardware, firmware, electronic circuitry, or etc.
The embodiments described above are illustrative examples and it should not be construed that the present invention is limited to these particular embodiments. Thus, various changes and modifications may be effected by one skilled in the art without departing from the spirit or scope of the invention as defined in the appended claims.
Claims
1. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform a method for validating binary code transformations, comprising:
- analyzing binary code of an executable program to produce a sequence of basic units comprising smallest elements of the executable program that stay intact under every reordering;
- generating control flow graph associated with the sequence of basic units;
- generating invariant linear fiction representation based on the control flow graph;
- analyzing optimized transformation of the executable program to produce a second sequence of basic units;
- generating second control flow graph associated with the second sequence of basic units;
- generating second invariant linear function representation based on the second control flow graph;
- comparing the invariant linear function representation and the second invariant linear function representation; and
- identifying one or more incorrect transformations in the optimized transformation,
- wherein the invariant linear function representation and the second invariant linear function representation are invariants under a set of predefined optimization transformation and include a sequence of strips comprising a path through a trace of non-branch instructions executing sequentially when the executable program runs.
Type: Application
Filed: Sep 8, 2008
Publication Date: May 21, 2009
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION (Armonk, NY)
Inventor: Yaakov Yaari (Haifa)
Application Number: 12/206,578
International Classification: G06F 9/45 (20060101);