Control flow analysis utilizing function dominator trees
A method for control flow analysis according to an embodiment of the present invention includes: acquiring an original function call tree of a program, wherein nodes of the original function call tree represent functions and a parent/child relation between the nodes represents a calling relation; generating a corresponding function dominator tree from the calling relation, wherein nodes of the function dominator tree represent the functions and a parent/child relation between the nodes represents a dominator relation, wherein a first function dominates a second function if all the invocations to the second function are originated by the first function; and simplifying the original function call tree according to the function dominator tree so as to obtain a simplified function call tree. According to an embodiment of the present invention, the function call tree for control flow analysis can be simplified.
Latest IBM Patents:
This application claims priority from Chinese patent application number 201110461369.9, filed Dec. 30, 2011, which is hereby incorporated herein by reference in its entirety.
BACKGROUNDOne or more aspects of the present invention relate to data analysis technology, and more specifically, to a method and apparatus for control flow analysis.
Control flow analysis is an important aspect in performance analysis of a computer program. The basis for the control flow analysis is call relations between the respective functions of the computer program. Those skilled in the art can understand that the functions mentioned here refer to code units which can realize certain functions independently, and can be named methods etc in some situations. A calling function transfers a parameter to a called function, the called function calculates the parameter and returns a result of the calculation to the calling function. Generally, the function calling relation is recorded to be represented as a function call tree. In the function call tree, a parent node represents the calling function and the child node represents the called function. The use of the function call tree to represent the calling relation between the functions facilitates determining the function which is called frequently and the function which has excessive CPU time overhead, thereby determining the performance bottleneck of the program and further improving the performance of the program. For example, for the function which is called frequently, a more complicated optimization algorithm can be used to perform optimization or the frequency of calling that function can be reduced.
Nowadays, programs usually contain complicated business logic and thus the corresponding function call tree per se is very huge. For example, applications of a business level usually contain more than one hundred thousand invocations and more than 200 invocation levels. Since such applications are very complicated, they have many “noise calling” for auxiliary software modules in addition to actual business logics. The analysis for the huge function call tree needs a large amount of time and efforts. In addition, modern applications are generally based on a complicated framework and the business logics used are usually packaged within the framework, and thus it is difficult to separate these packaged business logics from the framework so as to perform more accurate analysis.
BRIEF SUMMARYTherefore, a need exists for simplifying the function call tree such that the function call tree can be better used for control flow analysis to find possible performance bottlenecks.
Embodiments of the present invention provide a method, apparatus and computer program product for control flow analysis.
A method for control flow analysis according to an embodiment of the present invention includes, for instance: acquiring an original function call tree of a program, wherein nodes of the original function call tree represent functions and a parent/child relation between the nodes represents a calling relation; generating a corresponding function dominator tree from the calling relation, wherein nodes of the function dominator tree represent the functions and a parent/child relation between the nodes represents a dominator relation, wherein a first function dominates a second function if all the invocations to the second function are originated by the first function; and simplifying the original function call tree according to the function dominator tree so as to obtain a simplified function call tree.
An apparatus for control flow analysis according to an embodiment of the present invention includes, for instance: acquiring means configured to acquire an original function call tree of a program, wherein nodes of the original function call tree represent functions and a parent/child relation between the nodes represents a calling relation; generating means configured to generate a corresponding function dominator tree from the calling relation, wherein nodes of the function dominator tree represent the functions and a parent/child relation between the nodes represents a dominator relation, wherein a first function dominates a second function if all the invocations to the second function are originated by the first function; and simplifying means configured to simplify the original function call tree according to the function dominator tree so as to obtain a simplified function call tree.
According to embodiments of the present invention, the call tree for control flow analysis can be simplified.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, microcode, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied therein.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples (a non-exhaustive list) of the computer readable medium include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device or any suitable combination of above. In the context of this document, a computer readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a data signal with computer readable program code embodied therein, propagated in baseband or as part of a carrier wave. Such propagated signal may be in a plurality of forms, including but not limited to electromagnetic signal, optical signal or any suitable combinations of above. The computer readable signal medium may not be the computer readable storage medium but may be any computer readable medium capable of transmitting, propagating or transferring a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc. and any suitable combination of the above.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the blocks of the flowchart illustrations and/or block diagrams.
These computer program instructions may also be stored in a computer readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instruction means which implement the functions/acts specified in the blocks of the flowchart illustrations and/or block diagrams.
The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable data processing apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the blocks of the flowchart illustrations and/or block diagrams.
In the following, aspects of the present invention will be described in conjunction with particular embodiments. Thus, the description is only for illustration purpose and not intended to limit the scope of the present invention.
In order to simplify the call tree, a method for filtering may be considered. Obviously, a method which analyzes nodes on the call tree one-by-one to determine whether the nodes should be removed from the call tree is not realistic. A relatively realistic method is setting a filter standard by an analyst, and filtering the call tree according to the filter standard. For example, in order to remove the node of a library function of a Java software development kit (SDK) in the call tree, the filter standard can be set as “removing the node having the name beginning with the character of java/”. Furthermore, for example, the node with called times less than a certain threshold can be removed from the call tree, that is, the filter standard is set as “removing the node with called times less than a threshold”. To use the filter method, the analyst is to have a very profound understanding of the program. If the filter standard is not appropriately set, the effect of simplifying the function call tree cannot be realized or the key nodes may be removed from the call tree which will result in the loss of key information. Furthermore, converting the actual filter requirement into an executable filter standard calls for a very high demand for the analyst. For example, not all the names of the library node of the Java software development kit (SDK) begin with the character of “java/”.
In the following, a method for performing control flow analysis according to an embodiment of the present invention will be described with reference to
At step 201, an original function call tree of a program is acquired, wherein a node of the original function call tree represents a function, and a parent/child relation between the nodes of the original function call tree represents a calling relation.
Those skilled in the art may acquire the original function call tree of the program through a plurality of means; for example, the original function call tree may be acquired from source codes or object codes through a method of static analysis, or the original function call tree may be acquired through a method of dynamic analysis during execution of the program. In one method of dynamic analysis, operations such as stack-in, skip etc. are to be performed when the calling function performs a call, and operations such as stack-out, skip are to be performed thereby the calling can be identified.
An illustrated original function call tree is shown in
Function A and function B are not called by any other functions.
At step 202, the original function call tree of the program is converted into a function call directed graph.
Converting a tree structure into a directed graph structure is commonly-used technical means in the art, and thus will not be described in details here. It should be noted that when the original function call tree is converted into the function call directed graph, a same function on the different paths of the original function call tree only corresponds to one node of the function call directed graph, but different calling relations related to the function on the original function call tree correspond to different directional sides on the function call directed graph.
Function A calls function C, and function C calls function D;
Function A calls function C, and function C calls function E;
Function B calls function C, and function C calls function E.
In
The following are examples of the pseudo code for performing step 202 according to an embodiment of the present invention:
At step 203, the function call directed graph is converted into a function dominator tree, wherein the parent/child relation in the function dominator tree represents a dominator relation.
In the function dominator tree, a parent node dominates a child node. The dominator relation is defined as follows, for example:
If function X must be called before calling function Y from entry to a program, function X dominates function Y. In other words, if all the calls to function Y are originated by function X, function X dominates function Y. The “origination” here refers to the case that function X calls function Y directly as well as the case that function X calls a third function and the third function calls function Y. If function X dominates function Y, and function X does not dominate other functions dominating function Y, function X dominates function Y directly.
It can be seen from
It also can be seen from
Regarding function H, it may be called through a path of function C→function D→function F→function H, and it may also be called through a path of function C→function E→function G→function H. Therefore, function H is dominated by function C, but is not dominated by any one of function D, function E, function F or function G.
Regarding function D and function F, since function D must be gone through when function F is called but function F is not necessarily gone through when function D is called, function D dominates function F but function F does not dominate function D.
Similarly, dominator relations between other functions can be derived thereby the function dominator tree as shown in
It can be seen from the above description that the function dominator tree is based on the calling relation between the functions. The information on the calling relation is recited in the original function call tree. Therefore, the function dominator tree as shown in
Examples of the pseudo code for performing step 203 according to an embodiment of the present invention are as follows:
At step 204, the original function call tree is processed according to the function dominator tree obtained at step 203 to obtain an simplified function call tree.
It can be seen from
Function H does not dominate other functions, thus the above condition (1) is not satisfied, so function H is not the strong dominator function. Even function H dominates other functions, it is not the strong dominator function because of the following reasons: although function H is called by function F and function G, that is, the above condition (2) is satisfied, the strong dominator function C dominates function F and function G, that is, the above condition (3) is not satisfied.
Function F does not dominate other functions, thus the above condition (1) is not satisfied, so function F is not the strong dominator function. If function F dominates other functions, the above condition (1) is satisfied. If furthermore, function F is called by a plurality of other functions, for example, function F is called by function D and function E, the above condition (2) is satisfied. If function C dominating all of function D and function E is not the strong dominator function, the above condition (3) is satisfied. In this way, function F can become the strong dominator function. Similarly, function G can become the strong dominator function under the case that varying conditions are satisfied.
As described above, in a method of dynamic analysis, calling is detected by capturing actions such as stack-in, stack-out, skip etc. These actions such as stack-in, stack-out, skip etc. may occur within some basic function packages. The basic function package envelops codes for performing some general functions and generally has been optimized. An Application Programming Interface (API) is a typical basic function package. General applications realize general functions by calling a particular API and do not need to understand detailed implementation inside the API. API has been optimized during envelopment, thus there has little optimization space left. In addition, some API is provided in form of object code, thus it is hard to perform optimization. The feature of the strong dominator function is identical with the feature of entry function of the basic function package, thus identifying the strong dominator function from the function dominator tree can identify the entry function of the basic function package. In the subsequent analysis, the analysis to the inside of API can be omitted, thereby analysis may be more focused on the application per se.
In particular, the original function call tree may be traversed, if a node corresponds to the strong dominator function, all the child nodes of the node are removed so as to obtain a simplified function call tree. The removing here may be merging the child nodes to the node corresponding to the strong dominator function, or may be deleting the child nodes. In the former embodiment, if required, the node corresponding to the strong dominator function may be spread to obtain information inside the basic function package.
Examples of the pseudo code for performing step 204 according to an embodiment of the present invention are as follows:
According to a method of the present invention, the call tree can be simplified and such simplification embodies not only on nodes count of the call tree, but also on level depth of the call tree. As a real example, an original function call tree with 62301 nodes and 19 levels is processed, after which a simplified function call tree with 16303 nodes and 14 levels is obtained.
A method according to an embodiment of the present invention can be used in combination with the method based on the filter standard. For example, the functions inside the basic function package such as API may be firstly removed by the method according to an embodiment of the present invention, then the filter based on the filter standard such as calling times is performed for the simplified function call tree.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which includes one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Although the various methods and apparatus according to aspects of the present invention have been described in detail above in combination with the specific embodiments, aspects of the present invention are not limited thereto. Those skilled in the art could make various variations, substitutions and modifications to aspects of the present invention according to the teaching of the specification, without departing from the spirit and scope of aspects of the invention. It should be appreciated that all these variations, substitutions and modifications still fall into the scope of protection of aspects of the present invention. The scope of protection of aspects of the present invention is defined by the accompanying claims.
Claims
1. A method for control flow analysis, the method comprising:
- acquiring an original function call tree of a program, wherein nodes of the original function call tree represent functions and a parent/child relation between the nodes represents a calling relation;
- generating a corresponding function dominator tree from the calling relation, wherein nodes of the function dominator tree represent the functions and a parent/child relation between the nodes represents a dominator relation, wherein a first function dominates a second function if all the invocations to the second function are originated by the first function; and
- simplifying the original function call tree according to the function dominator tree so as to obtain a simplified function call tree, wherein the simplifying comprises: identifying an entry function of a basic function package according to the function dominator tree; and removing children nodes of a node corresponding to the entry function from the original function call tree so as to obtain the simplified function call tree, wherein the removing comprises deleting the children nodes.
2. The method according to claim 1, wherein generating the corresponding function dominator tree from the calling relation comprises:
- converting the original function call tree into a function call directed graph; and
- converting the function call directed graph into a function dominator tree.
3. The method according to claim 1, wherein identifying the entry function of the basic function package according to the function dominator tree comprises:
- identifying a strong dominator function in the function dominator tree, wherein the strong dominator function is called by a plurality of other functions and there are no other strong dominator functions dominating all the plurality of other functions; and
- using the strong dominator function as the entry function.
4. The method according to claim 1, further comprising:
- filtering the simplified function call tree according to a filter standard.
5. A computer system for control flow analysis, the computer system comprising:
- a memory; and
- a processor in communications with the memory, wherein the computer system is configured to perform a method, the method comprising: acquiring an original function call tree of a program, wherein nodes of the original function call tree represent functions and a parent/child relation between the nodes represents a calling relation; generating a corresponding function dominator tree from the calling relation, wherein nodes of the function dominator tree represent the functions and a parent/child relation between the nodes represents a dominator relation, wherein a first function dominates a second function if all the invocations to the second function are originated by the first function; and simplifying the original function call tree according to the function dominator tree so as to obtain a simplified function call tree, wherein the simplifying comprises: identifying an entry function of a basic function package according to the function dominator tree; and removing children nodes of a node corresponding to the entry function from the original function call tree so as to obtain the simplified function call tree, wherein the removing comprises deleting the children nodes.
6. The computer system according to claim 5, wherein the generating comprises:
- converting the original function call tree into a function call directed graph; and
- converting the function call directed graph into a function dominator tree.
7. The computer system according to claim 5, wherein the identifying the entry function of the basic function package according to the function dominator tree comprises:
- identifying a strong dominator function in the function dominator tree, wherein the strong dominator function is called by a plurality of other functions and there has no other strong dominator functions dominating all the plurality of other functions; and
- using the strong dominator function as the entry function.
8. The computer system according to claim 5, further comprising:
- filtering the simplified function call tree according to a filter standard.
9. A computer program product for control flow analysis, the computer program product comprising:
- a non-transitory computer readable storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method comprising: acquiring an original function call tree of a program, wherein nodes of the original function call tree represent functions and a parent/child relation between the nodes represents a calling relation; generating a corresponding function dominator tree from the calling relation, wherein nodes of the function dominator tree represent the functions and a parent/child relation between the nodes represents a dominator relation, wherein a first function dominates a second function if all the invocations to the second function are originated by the first function; and simplifying the original function call tree according to the function dominator tree so as to obtain a simplified function call tree, wherein the simplifying comprises: identifying an entry function of a basic function package according to the function dominator tree; and removing children nodes of a node corresponding to the entry function from the original function call tree so as to obtain the simplified function call tree, wherein the removing comprises deleting the children nodes.
10. The computer program product according to claim 9, wherein generating the corresponding function dominator tree from the calling relation comprises:
- converting the original function call tree into a function call directed graph; and
- converting the function call directed graph into a function dominator tree.
11. The computer program product according to claim 9, wherein identifying the entry function of the basic function package according to the function dominator tree comprises:
- identifying a strong dominator function in the function dominator tree, wherein the strong dominator function is called by a plurality of other functions and there are no other strong dominator functions dominating all the plurality of other functions; and
- using the strong dominator function as the entry function.
12. The computer program product according to claim 9, wherein the method further comprises:
- filtering the simplified function call tree according to a filter standard.
5448737 | September 5, 1995 | Burke et al. |
6415433 | July 2, 2002 | Callahan et al. |
7496900 | February 24, 2009 | Dimpsey et al. |
7519961 | April 14, 2009 | Alexander, III et al. |
7721269 | May 18, 2010 | Cates |
7747653 | June 29, 2010 | Srinivas et al. |
8826255 | September 2, 2014 | Avadhanula et al. |
20030233640 | December 18, 2003 | Reynaud |
20060053414 | March 9, 2006 | Bhandari et al. |
20070006191 | January 4, 2007 | Franz et al. |
- Ferrante et al., “The program dependence graph and its use in optimization”, Jul. 3, 1987, ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 9 Issue 3, pp. 319-349.
- Ramalingam et al., “An incremental algorithm for maintaining the dominator tree of a reducible flowgraph”, 1994, POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 287-296.
- Smith, Adam et al., “Test Suite Reduction and Prioritization with Call Trees,” ASE ;07, Nov. 5-9, 2007, pp. 539-540.
- “Javascript Performance Validator—Call Tree”, http://www.softwareverify.com/javascript-profiler-hotspots.php, pp. 1, dated Dec. 12, 2012.
Type: Grant
Filed: Dec 20, 2012
Date of Patent: Sep 15, 2015
Patent Publication Number: 20130174127
Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION (Armonk, NY)
Inventors: Qin Yue Chen (Shanghai), Qi Liang (Shanghai), Hong Chang Lin (Shanghai), Feng Liu (Shanghai)
Primary Examiner: Chameli Das
Assistant Examiner: Joanne Macasiano
Application Number: 13/721,185
International Classification: G06F 9/45 (20060101); G06F 11/34 (20060101); G06F 9/44 (20060101);