Method and apparatus for doing program analysis

The invention provides a method and apparatus for doing program analysis. According to embodiments of the invention program analysis comprises assigning an alias to each equivalence class of possibly overlapping memory accesses as defined by an alias analysis of an intermediate language program; and defining a definition-use relationship between statements in each equivalence class wherein definition statements which belong to the equivalence class reference the alias associated with that class, and wherein use statements which belong to the equivalence class reference the alias associated with for that class. The invention also provides a program analysis algorithm which utilizes a dependence flow graph having the property that the edge cardinality is independent of the definition-use of structure the program being analyzed.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
FIELD OF THE INVENTION

[0001] This invention relates to program analysis. In particular, it relates to program analysis in optimizing compilers.

BACKGROUND

[0002] Many program analysis problems involve propagating abstract values, which are compile-time approximations of the actual values computed by a program. A convenient structure for doing program analysis includes a dependence flow graph wherein nodes in the graph represents statements in the program and there is an edge from each statement that defines (writes) a storage location to a statement that uses (reads) the storage location. When there are many definitions and uses of a storage location, the number of edges in such a dependence flow graph becomes large relative to the number of nodes. This affects both the storage and time required to perform a program analysis using the graph.

BRIEF DESCRIPTION OF THE DRAWINGS

[0003] FIG. 1 shows a schematic drawing of an optimizing compiler in which embodiments of the invention may be practiced;

[0004] FIG. 2 shows a schematic drawing of a dependence flow graph constructed in accordance with one embodiment of the invention;

[0005] FIG. 3 shows a flowchart of a method for constructing a dependence flow graph according to one embodiment of the invention;

[0006] FIG. 4 shows an algorithm for doing a value propagation program analysis according to one embodiment of the invention; and

[0007] FIG. 5 shows a schematic drawing of hardware for performing program analysis in accordance with the invention.

DETAILED DESCRIPTION

[0008] FIG. 1 of the drawings shows an optimizing compiler 10 in which embodiments of the invention may be practiced. The optimizing compiler 10 includes a lexical analyzer 12 which takes a source program and breaks it up to meaningful units called tokens. A syntax analyzer 14 determines the structure of the program and of the individual statements therein by grouping the tokens into grammatical phrases which are then checked by a semantic analyzer 16 for semantic errors. The compiler 10 further includes an intermediate code generator 18 which generates an intermediate program representation of the source program in an intermediate language. A code optimizer 20 attempts to optimizer program representation. The final phase of the compiler 10 is carried out by a code generator 22 which generates target comprising machine or assembly code.

[0009] In determining what optimizations to make, the code optimizer 20 performs a value analysis of the intermediate language program. Examples of such analysis include constant propagation, range analysis of subscript values, and type inference in dynamically typed programs.

[0010] The present invention permits value analysis problems to be solved over large input programs without excessive time or space penalties. In particular, program analysis according to embodiments of the invention, includes constructing dependent flow graphs in which the number of edges in the dependence flow graph is linear in the number of nodes in the graph, i.e. there are a constant number of edges per node. Embodiments of the invention make use of an equivalence class based alias analysis of the intermediate language program to create dependence flow graphs which have the property that the edge cardinality is independent of the definition-use structure of the program being analyzed. An equivalence class is a class of overlapping memory accesses.

[0011] For purposes of describing the present invention, it is assumed that assignment statements in the intermediate language have the following syntax:

[0012] E: (PUT V E)

[0013] | (INTEGER Z)

[0014] | (ADD E E)

[0015] | (SUB E E)

[0016] | (GET V)

[0017] V: variable

[0018] Z: integer

[0019] It is assumed further that INTEGER, ADD, SUB, and GET expressions all have the same type; the exact nature of the type (e.g., how many bits) is irrelevant. An assignment statement must be a PUT expression, and a PUT expression cannot be the subexpression of another subexpression.

[0020] (PUT V E): This statement writes a value to a variable. The expression E gives the value which is written to the location. V specifies a variable. It is assumed that variables are named by integers, and that other than to distinguish one variable from another, these integer names have no significance. It is also assumed that there is no aliasing or overlap among the variables used in PUT and GET expressions.

[0021] (INTEGER Z): This is the expression for an integer constant. Z is an integer literal that gives the value of the constant.

[0022] (ADD E E): This expression returns the sum of its arguments.

[0023] (SUB E E): This expression returns the difference of its arguments.

[0024] (GET V): This expression reads from the variable named by V and returns its value.

[0025] Only the syntax for the assignment statements of the intermediate language (the PUT expressions) have been shown in the above example. The reason for this is that only the PUT expressions are necessary to describe flow-insensitive program analysis in accordance with the present invention. However, it will be appreciated that a realistic intermediate language will include control flow constructs and other operators, not necessary for the present description.

[0026] FIG. 2 of the drawings shows a dependence flow graph constructed in accordance with one embodiment of the invention. In constructing the dependence flow graph shown in FIG. 2, the PUT and GET expressions in the program are labeled with an alias. An alias, as used herein, is an equivalence class of PUT and GET expressions. An equivalence relation over aliases has the property that if there is a program execution in which two PUT and/or GET expressions in the program access the same storage location during that execution, then the two PUT and/or GET expressions have the same alias number. In other words, the equivalence relation over aliases summarizes the dependence structure of the program. Any alias analysis technique that produces such a labeling of the PUT and GET expressions of the program may be used for purposes of the present invention. In FIG. 2 statements in the program text which define a storage location X (in other words the PUTs to X in the program text) each form a node 30 in the dependence flow graph. Each statement in the program which uses memory location X (in other words expressions which GET x in the program text) forms a node 34 in the graph. A node for the aliases over the PUTs and GETs in the program text is represented by reference numeral 32. It will be seen that there is a single edge in the graph from each node 30 to node 32 and from each node 34 to node 32. In essence, the alias node 32 separates the definition-use structure of the program text.

[0027] The process of constructing a dependence flow graph such as the one shown in FIG. 2 of the drawings is illustrated in a flow chart shown in FIG. 3 of the drawings. Referring to FIG. 3, at block 50 one node in the dependence flow graph (DFG) is associated with each PUT expression in the program. At block 52 one node in the DFG is associated with each alias in the program. At block 54 an edge is added to the DFG from the node representing each PUT expression to the node representing the alias for that put. Finally, at block 56, for each GET expression G in the right hand side of the PUT expression P, an edge is added to the dependence flow graph from the node representing the alias of G to the node representing P. A dependence flow graph constructed in accordance with the above method will have at most one edge for each PUT and GET expression in the program.

[0028] FIG. 4 of the drawings shows an algorithm, in pseudocode, for performing to perform a flow insensitive program analysis, in accordance with one embodiment of the invention. Referring to the algorithm, A is equal to the number of aliases associated with the GETs and PUTs of the program. G is a dependence flow graph defined in accordance with the above method and Q is a set of nodes of G, which is initially empty. The algorithm assigns an abstract value to each alias in the dependence flow graph. It is assumed that the abstract value from a joint complete partial order, and that for two abstract values V1 and V2, the expression LE (V1, V2) returns true if V1 is less than or equal to V2 in the partial order. The expression JOIN (V1, V2) returns the JOIN of V1 and V2 in the partial order. E1 is an expression in the program and Eval (E1) returns the value of E1. For each memory alias, M the expression InitialValue (M) returns an abstract value that is a safe approximation of the initial contents of the storage location (s) represented by M.

[0029] Referring to FIG. 5 of the drawings reference numeral 100 generally indicates hardware for performing term rewriting in accordance with the invention. The hardware 100 includes a memory 104, which may represent one or more physical memory devices, which may include any type of random access memory (RAM) read only memory (ROM) (which may be programmable), flash memory, non-volatile mass storage device, or a combination of such memory devices. The memory 104 is connected via a system bus 112 to a processor 102. The memory 104 includes instructions 106 which when executed by the processor 102 cause the processor to perform the methodology of the invention as discussed above. Additionally the system 100 includes a disk drive 108 and a CD ROM drive 110 each of which is coupled to a peripheral-device and user-interface 114 via bus 112. Processor 102, memory 104, disk drive 108 and CD ROM 110 are generally known in the art. Peripheral-device and user-interface 114 provides an interface between system bus 112 and various components connected to a peripheral bus 116 as well as to user interface components, such as display, mouse and other user interface devices. A network interface 118 is coupled to peripheral bus 116 and provides network connectivity to system 100.

[0030] For the purposes of this specification, a machine-readable medium includes any mechanism that provides (i.e. stores and/or transmits) information in a form readable by a machine (e.g. computer) for example, a machine-readable medium includes read-only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals (e.g. carrier waves, infra red signals, digital signals, etc.); etc.

[0031] It will be apparent from this description the aspects of the present invention may be embodied, at least partly, in software. In other embodiments, hardware circuitry may be used in combination with software instructions to implement the present invention. Thus, the techniques are not limited to any specific combination of hardware circuitry and software.

[0032] Although the present invention has been described with reference to specific exemplary embodiments, it will be evident that the various modification and changes can be made to these embodiments without departing from the broader spirit of the invention as set forth in the claims. Accordingly, the specification and drawings are to be regarded in an illustrative sense rather than in a restrictive sense.

Claims

1. A method comprising:

assigning an alias to each equivalence class of possibly overlapping memory accesses as defined by an alias analysis of an intermediate language program; and
defining a definition-use relationship between statements in each equivalence class wherein definition-statements which belong to the equivalence class reference the alias associated with that class and wherein use-statements which belong to the equivalence class reference the alias associated with that class.

2. The method of claim 1 further comprising constructing a dependence flow graph to represent said definition-use relationship comprising:

assigning a definition-node for each definition statement in the program;
assigning a use-node for each use statement in the program;
assigning an alias-node for each alias;
introducing a single edge into the graph connecting each definition-node to its associated alias-node; and
introducing a single edge in the graph connecting each use-node to its associated alias-node.

3. The method of claim 1 further comprising first performing a memory alias analysis of said intermediate language program to partition the memory accesses of said intermediate language program into equivalence classes such that any two memory accesses that reference the same storage location belong to the same equivalence class.

4. The method of claim 2 further comprising performing a program analysis using said dependence flow graph.

5. The method of claim 4 wherein said program analysis comprises:

for each alias-node in the dependence flow graph assigning an initial value to the alias corresponding to said alias-node and adding said alias-node to a set of nodes; and
while said set of nodes is not empty, iteratively performing the following:
removing a node from said set of nodes;
if said node is an alias-node then adding the successors of said node in the dependence flow graph to said set of nodes;
if said node is a definition-node for a statement of the form PUT (A, E) then determining a value for E, updating said initial value based on the value of E; and adding A to said set of nodes.

6. The method of claim 5 wherein said initial value comprises a set of abstract values which form a join-complete partial order.

7. A machine-readable medium that provides instructions, which when executed by a processor, cause the processor to perform operations comprising:

assigning an alias to each equivalence class of possibly overlapping memory accesses as defined by an alias analysis of an intermediate language program; and
defining a definition-use relationship between statements in each equivalence class wherein definition statements which belong to the equivalence class reference the alias associated with that class and wherein used statement which belong to said equivalence class reference the alias associated with that class.

8. The machine-readable medium of claim 7, wherein said operations further comprise constructing a dependence flow graph to represent said definition-use relationship comprising:

assigning a definition-node for each definition statement in the program;
assigning a use-node for each use statement in the program;
assigning an alias-node for each alias;
introducing a single edge into the graph connecting each definition-node to its associated alias-node; and
introducing a single edge in the graph connecting each use-node to its associated alias-node.

9. The machine-readable medium of claim 7 wherein said operations further comprise performing a memory alias analysis of said intermediate language program to partition the memory accesses of said intermediate language program into equivalence classes such that any two memory accesses that reference the same storage location belong to the same equivalence class.

10. The machine-readable medium of claim 8 wherein said operations further comprise performing a program analysis using said dependence flow graph.

11. The machine-readable medium of claim 10 wherein said program analysis comprises:

for each alias-node in the dependence flow graph assigning an initial value to the alias corresponding to said alias-node, and adding said alias-node to a set of nodes; and
while said set of nodes is not empty, iteratively performing the following:
removing a node from said set of nodes;
if said node is an alias-node then adding the successors of said node in the dependence flow graph to said set of nodes;
if said node is a definition-node for a statement of the form PUT (A, E) then determining a value for E, updating said initial value based on the value of E; and adding A to said set of nodes.

12. The machine-readable medium of claim 11 wherein said initial value comprises a set of abstract values which form a join-complete partial order.

13. An apparatus for doing program analysis comprising:

a processor and a memory coupled thereto having a set of instructions which when executed by the processor cause the processor to perform a method comprising:
assigning an alias to each equivalence class of possibly overlapping memory accesses as defined by an alias analysis of an intermediate language program; and
redefining a definition-use relationship between statements in each equivalence class wherein definition-statements which belong to the equivalence class reference the alias associated with that class and wherein use-statements which belong to said equivalence class reference the alias associated with that class.

14. The apparatus of claim 13 wherein said method further comprises constructing a dependence flow graph to represent said redefined definition-use relationship comprising:

assigning a definition-node for each definition statement in the program;
assigning a use-node for each use statement in the program;
assigning an alias-node for each alias;
introducing a single edge into the graph connecting each definition-node to its associated alias-node; and
introducing a single edge in the graph connecting each use-node to its associated alias-node.

15. The apparatus of claim 14 wherein said method further comprises first performing a memory alias analysis of said intermediate language program to partition the memory accesses of said intermediate language program into equivalence classes such that any two memory accesses that reference the same storage location belong to the same equivalence class.

16. The apparatus of claim 13 wherein said method further comprises performing a program analysis using said dependence flow graph.

17. The apparatus of claim 15 wherein said program analysis comprises:

for each alias-node in the dependence flow graph assigning an initial value to the alias corresponding to said alias-node, and adding said alias-node to a set of nodes; and
while said set of nodes is not empty, iteratively performing the following:
removing a node from said set of nodes;
if said node is an alias-node then adding the successors of said node in the dependence flow graph to said set of nodes;
if said node is a definition-node for a statement of the form PUT (A, E) then determining a value for E, updating said initial value based on the value of E; and adding A to said set of nodes.

18. The apparatus of claim 17 wherein said initial value comprises a set of abstract values which form a join-complete partial order.

19. Apparatus for doing program analysis comprising:

means for assigning an alias to each equivalence class of possibly overlapping memory accesses as defined by an alias analysis of an intermediate language program; and
means for defining a definition-use relationship between statements in each equivalence class wherein definition-statements which belong to the equivalence class reference the alias associated with that class and wherein use-statements which belong to said equivalence class reference the alias associated with that class.

20. The apparatus of claim 19 further comprising means for constructing a dependence flow graph to represent said definition-use relationship.

Patent History
Publication number: 20030074652
Type: Application
Filed: Oct 11, 2001
Publication Date: Apr 17, 2003
Patent Grant number: 7117490
Inventors: Williams L. Harrison (Brookline, MA), Cotton Seed (Brookline, MA)
Application Number: 09976313
Classifications
Current U.S. Class: Optimization (717/151); Data Flow Analysis (717/155)
International Classification: G06F009/45;