EFFICIENT GENERATION OF EXECUTABLE FILE FROM PROGRAM FILES WHEN SOME OF THE PROGRAM FILES EXPRESSLY INCORPORATE OTHER PROGRAM FILES
Efficient generation of executable file from program files when some of the program files expressly incorporate other program files. In an embodiment, dependency information representing which program files (conditionally or unconditionally) incorporate other program files is generated and stored in a secondary (non-volatile) storage. When some program files are modified, the dependency information is used to identify for recompilation all the program files that incorporate any of the modified program files. The modified program files and the identified program files are recompiled and the executable file is regenerated.
Latest Oracle Patents:
The present application is a divisional application of and claims priority from the co-pending U.S. patent application Ser. No. 11/308,800, Filed: May 9, 2006, entitled, “Efficient Generation of Executable File From Program Files When Some of the Program Files Expressly Incorporate Other Program Files”, naming the same inventors as in the subject patent application, which in turn claims priority from India Patent Application entitled, “Efficient Generation of Executable File From Program Files When Some of the Program Files Expressly Incorporate Other Program Files”, Serial Number: 548/CHE/2006, Filed: Mar. 27, 2006, both of which are incorporated in their entirety herewith.
BACKGROUND OF THE INVENTION1. Field of the Invention
The present invention relates to software, and more specifically to a method and apparatus to generate an executable file from program files when some of the program files expressly incorporate other program files.
2. Related Art
A program file is typically represented as text containing a list of instructions in a programming language. Large program files may be split into a number of smaller program files for separating functionality, providing modularity and/or for ease of usage, as is well understood in the relevant arts.
In general, instructions are provided by the programming language using which program files can be expressly incorporated into one another—thereby effectively designed for operation as one large program file. For convenience, the file incorporating another file is referred to as an “incorporating file” and the file being incorporated is referred to as an “incorporated file”.
An example of such a programming language is C programming language where large program files can be split into header and source files. The C programming language provides a construct “#include <filename>” by which one header or source file can expressly incorporate another header or source file. Typically, source files ending with extension “.c” incorporate header files ending with extension “.h”.
Program files need to be converted into an executable file before they can be executed by the underlying hardware. The executable file generally contains instructions (typically in binary form) suitable for execution by the processors contained in the hardware.
The process of generating an executable file from program files typically consists of converting (or compiling) each program file into a compiled file using a compiler of the programming language and then building the executable file from the compiled files. Building generally entails linking the compiled filed into an executable file noted above.
Typically, an executable file is generated from a large number of program files. As such, a change made in one of the program files may necessitate the recompilation of all the program files, which is not desirable. Various approaches have been proposed for increasing the efficiency of generating an executable file from program files.
In one prior approach, a program file is recompiled only when the modification date of a program file (as maintained by the underlying operating system) is more recent than the modification date of its corresponding compiled file. Such an approach is used in utilities such as ‘make’ and ‘gmake’ well known in Unix type environments and ‘nmake’ well known in Windows type environments.
One disadvantage with such an approach is that consideration of modification date alone may not lead to efficient generation of accurate executable file since incorporating files need to be recompiled if the incorporated files are modified. Various aspects of the present invention overcome such deficiencies as described in sections below.
What is therefore needed is an approach, which enables the efficient generation of an executable file from program files while addressing one or more problems/requirements described above.
The present invention will be described with reference to the accompanying drawings briefly described below.
In the drawings, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements. The drawing in which an element first appears is indicated by the leftmost digit(s) in the corresponding reference number.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS1. Overview
An aspect of the present invention generates an executable file accurately from program files by using dependency information between incorporated files and incorporating files. The dependency information is generated by parsing the program files and is stored in secondary storage. On receiving an indication that a program file has been modified, the dependency information is retrieved from secondary storage and is used to identify for recompilation all the incorporating files that incorporate (directly or indirectly) the modified program file. By recompiling all the incorporating files along with the modified program file, the executable file can be generated accurately.
Another aspect of the present invention generates an executable file accurately when the program files are conditionally incorporated. Data representing each condition is stored associated with each file in which the condition is present. In an embodiment, the result of evaluation of a condition depends on whether a flag (part of the condition) is defined or not, and data representing the specific flags which are defined is also received. On receiving an indication that a program file has been modified, the conditions associated with the program files are evaluated (based on the flag information in the noted embodiment), and used to identify the incorporating files which have to be recompiled for accurately generating the executable file.
Several aspects of the invention are described below with reference to examples for illustration. It should be understood that numerous specific details, relationships, and methods are set forth to provide a full understanding of the invention. One skilled in the relevant art, however, will readily recognize that the invention can be practiced without one or more of the specific details, or with other methods, etc. In other instances, well-known structures or operations are not shown in detail to avoid obscuring the features of the invention.
2. Digital Processing System
CPU 110 may execute instructions stored in RAM 120 to provide several features of the present invention described in the present application. CPU 110 may contain multiple processing units, with each processing unit potentially being designed for a specific task. Alternatively, CPU 110 may contain only a single general purpose-processing unit. RAM 120 may receive instructions from secondary memory 130 using communication path 150.
Graphics controller 160 generates display signals (e.g., in RGB format) to display unit 170 based on data/instructions received from CPU 110. Display unit 170 contains a display screen to display the images defined by the display signals. Display unit 170 may be used to display the dependency information (described below) stored in secondary memory 130. Input interface 190 may correspond to a key-board and/or mouse. Network interface 180 provides connectivity to a network and may be used to communicate with other external systems.
Secondary memory 130 may contain hard drive 135; flash memory 136 and removable storage drive 137. Some or all of the data and instructions may be provided on removable storage unit 140, and the data and instructions may be read and provided by removable storage drive 137 to CPU 110. Secondary memory 130 may be used to store the dependency information generated from the program files (also potentially stored in secondary memory 130). Floppy drive, magnetic tape drive, CD-ROM drive, DVD Drive, Flash memory, removable memory chip (PCMCIA Card, EPROM) are examples of such removable storage drive 137.
Removable storage unit 140 may be implemented using medium and storage format compatible with removable storage drive 137 such that removable storage drive 137 can read the data and instructions. Thus, removable storage unit 140 includes a computer readable storage medium having stored therein computer software and/or data.
In this document, the term “computer program product” is used to generally refer to removable storage unit 140 or hard disk installed in hard drive 135. These computer program products are means for providing software to digital processing system 100. CPU 110 may retrieve the software instructions, and execute the instructions to provide various features of the present invention described below.
3. Generating an Executable File
In step 210, CPU 110 parses each of the program files read from secondary memory 130 and determines dependency information representing which program files incorporate other program files. In an embodiment, CPU 110 parses each of the program files written in the C programming language to identify all the “#include <filename>” instructions and determines that the parsed program file incorporates all the “filename” program files.
In step 220, CPU 110 stores the dependency information in secondary memory 130. The dependency information can be stored in any form. An example form/format is described in sections below.
In step 240, CPU 110 receives an indication that a program file has been modified. In an embodiment, this indication may be in the form of a list of file names and may be received from a source control program (which keeps track of the modifications related information).
In step 260, CPU 110 identifies program files, which directly (incorporating files) or indirectly (i.e., files incorporating an identified incorporating file) incorporate the modified program file based on the dependency information retrieved from secondary memory 130.
In step 270, CPU 110 marks each of the identified program files as candidates for recompilation. In an embodiment, the marking may be done by changing the modified date (in the underlying operating system) of the identified program files, which would automatically cause the recompilation of the identified program files.
In step 280, CPU 110 recompiles all the marked program files to generate the corresponding compiled files. In a Unix environment where C-programming language is used, object files with extension “.o” are generated from the header and source files.
In step 290, CPU 110 generates the executable file from the compiled files of all the program files. Due to the recompilation of the incorporating files when the corresponding incorporated file is modified, an accurate executable file can be generated. The flowchart ends in step 299.
It may be appreciated that the storage of dependency information in secondary storage facilitates the reuse of the dependency information, other than the efficient generation of an executable file. In an example, the dependency information may be used to analyze the impact of changing a program file by identifying all the program files that are affected due to the change. The files that need to be recompiled (and/or a count thereof) may be displayed to a user in response to receiving an identifier (e.g., file name) of a program file of interest. As such, a user may interactively provide several identifiers to check the relative impact of changing each program file, and decide to change program files which would cause least impact if there is a choice of modifying one of several program files to achieve a given objective.
In another example, statistics concerning the program files like the number of dependent/independent files, cross-incorporated files, and cyclic incorporation among the program files may be generated using the dependency information. The features of
4. Example
File b.h 320 contains the instruction “#include <a.h>” implying that the contents of file a.h 310 must be expressly incorporated into file b.h 320 thereby specifying a dependency of file b.h 320 (incorporating file) on file a.h 310 (incorporated file). Any modification made to file a.h 310 necessitates the recompilation of “.c” files incorporating file b.h 320, since b.h 320 is a header file. Similarly, file c.h 330 depends on file a.h 310 and file e.c 350 depends on file c.h 330. File d.c 340 contains two “#include” instructions by which the contents of files b.h 320 and c.h 330 are expressly incorporated thereby specifying a dependency of file d.c 340 on both files b.h 320 and c.h 330.
In relation to
Each node of the hierarchy of
Once the details of a node have been stored, the node is referred using its unique identification number. For example, in Line 394, unique identification number 3 is used to refer to the node whose details are stored in Line 392. In general, the formats need to enable the dependency information to be reconstructed as a hierarchy (of
It may be appreciated that the dependency information stored in secondary storage can be retrieved and used when an indication (that a program file has been modified) is received. It may be further appreciated that the dependency information needs to be updated for only the modified program files thereby reducing the computational requirements associated with maintaining the dependency information.
To implement steps 240 and 260, CPU 110 on receiving an indication about the modified program files, may retrieve the dependency information from secondary storage and generate the hierarchy (of
For example, when CPU 110 receives an indication that file c.h 330 (incorporated file) has been modified, CPU 110 generates the hierarchy and identifies node c.h 370 as the node corresponding to the modified file c.h 330. CPU 110 then identifies the child nodes d.c 375 and e.c 380 of node c.h 370, and their corresponding program files d.c 340 and e.c 350 (incorporating files) for recompilation. Both the modified files (file c.h 330) and the corresponding identified files (file d.c 340 and file e.c 350) are recompiled and the executable file is generated.
The above example describes the generation of executable file when some program files incorporate other program files unconditionally. On the other hand several programming languages support conditional incorporation of program files and the approaches described above may need to be extended to support such constructions.
For example, in C programming language, a user may define various flags (e.g., “#define FLAG”) and then include conditional instructions (“:#ifdef FLAG . . . #endif') for controlling the incorporation of files only upon the flag being defined. Assuming a include (”#include <filename>“) instruction is contained in the body of an ifdef construct (”:#ifdef FLAG . . . #endif',), the “filename” program file is incorporated only the “FLAG” is defined in the environment. The features of the present invention in such a context are illustrated with an example below.
5. Example of Conditional Incorporation
The file b.h 420 contains the instruction “#ifdef X #include <a.h> #define Y #endif” implying that the contents of file a.h 410 must be incorporated into file b.h 420 only when the flag “X” has been defined thereby specifying a conditional dependency of b.h 420 (incorporating file) on file a.h 410 (incorporated file) (or that b.h 420 conditionally incorporates a.h 410). Similarly, the other dependencies (for files c.h 430, d.c 440 and e.c 450) are also generated.
Step 210 needs to be extended to handle such conditional incorporation. Each of the program files is parsed to determine not only all the “#include <filename>” instructions but also the “#define FLAG” instructions and the “#ifdef FLAG . . . #endif” instructions that enclose the “#include” instructions.
The description is continued with respect to the manner in which the information of
6. Identifying Program Files in Case of Conditional Incorporation
Step 240 needs to be extended to handle conditional incorporation. CPU 110 besides receiving an indication that program file has been modified, also receives a list of flags that have been defined. In an embodiment, the list of flags may be received by parsing a make file (containing flag definitions as options) used to generate the executable file from the program files or the list of flags may be specified in a configuration file. For example, when CPU 110 receives an indication that file c.h 430 (incorporated file) has been modified, CPU 110 may generate the hierarchy and identifies node c.h 470 as the node corresponding to the modified file c.h 430. CPU 110 also receives a list of flags, which are maintained in RAM 120.
Step 260 needs to be extended to handle conditional incorporation. CPU 110 identifies all the child nodes of the node corresponding to the modified file. The text associated with the nodes corresponding to modified program files and the identified child (any number of levels down) nodes, is retrieved and parsed to identify the “#define” and “#ifdef” instructions. When a “#define FLAG” instruction is identified in the associated text, the “FLAG” is added to the list of flag maintained in RAM 120. When the “#ifdef FLAG” instruction is identified, the list of flags is checked for “FLAG” and if “FLAG” is found, the child node is identified for recompilation.
For example, continuing with the above example in which file c.h 430 has been modified, node c.h 470 (corresponding to file c.h 430 in the hierarchy) associated with the text “#define Z” is parsed to identify flag “Z”, which is added to the list of flags maintained in RAM 120. CPU 110 then identifies the child nodes d.c 475 and e.c 480 of node c.h 470, and parses the text associated with each node to verify any conditional dependencies. In this example, node e.c 480 is associated with the text “#ifdef X #include <c.h> #endif” which specifies a conditional dependency between node e.c 480 and node c.h 470 based on the flag “X”.
If flag “X” is not defined (in the list maintained in RAM 120), node e.c 480 is not included in the list of nodes identified for recompilation. In general, if the evaluation of the condition does not require the incorporation of the conditionally incorporated program file, the corresponding incorporating program file is not identified for recompilation.
As explained in detail above, CPU 110 generates the executable file after recompiling the identified files. The features thus described above can be implemented in various types of embodiments. The description is continued with respect to an example implementation.
7. Example Implementation
File list 510 contains a list of file identifiers identifying the program files (as depicted in
File finder 520 receives file list 510 and uses it to identify the files in the underlying file system and passes the information to file reader 530. In an embodiment, file finder 520 may search for each file name in the underlying file system and retrieve the actual location or path of the file, which is then passed to file reader 530.
File reader 530 receives the information about the files from file finder 520 and reads the content of the files from secondary storage 130. The content of the files is then passed to file parser 540. In an embodiment, a circular queue may be implemented between file reader 530 and file parser 540. File reader 530 adds the content of each file to the queue, while file parser 540 removes each file from the queue for processing.
File parser 540 receives the content of each file from file reader 530 and parses the content to identify instructions specifying the incorporation of other program files (identified by file identifiers in file list 510) in the parsed file. File parser 540 also identifies any instructions that are used to specify conditional incorporation of other program files. The identified instructions are passed to file association 550.
File association 550 receives instructions from file parser 540 for each file and generates the dependency information in the form of a hierarchy (as depicted in
File association 550 receives an indication from source control 570 that some program files have been modified. In an embodiment, the indication may be in the form of a list of file names (each name identifying a program file). On receiving the indication, file association 550 retrieves the dependency information from secondary storage 130, generates a hierarchy and passes the hierarchy and the modified program files to dependency finder 560.
Dependency finder 560 receives the hierarchy representing the dependency information and the modified program files and identifies all the program files that need to be recompiled (in response to change of an incorporated file) by using the dependency information (using the approaches described in sections above). Dependency finder 560 may then send the list of identified program files to source control 570.
Source control 570 may change the modification date of each file identified by dependency finder 560. The change causes automatic recompilation of the identified files (and the modified files) during the generation of the executable file (due to the manner in which make type utilities operate).
Thus, due to the use of the dependency information, only the modified files and the incorporating files are recompiled, and the executable file is generated accurately. Due to the storing of the dependency information in the secondary storage, the information generated once can be maintained and used multiple times extending over large durations (thereby reducing computational requirements).
8. Conclusion
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents. Also, the various aspects, features, components and/or embodiments of the present invention described above may be embodied singly or in any combination in a data storage system such as a database system.
Claims
1. A method of processing a set of program files implementing a user application, wherein each program file in said set of program files incorporates a corresponding subset of program files, each of said subset of program files being contained in said set of program files, said method comprising:
- parsing at least some of said set of program files to generate a dependency information associating each program to said corresponding subset of program files;
- storing said dependency information in a secondary storage;
- receiving from a user an identifier of a first program file contained in said set of program files;
- retrieving said dependency information from said secondary storage;
- identifying a second subset of program files based on said dependency information retrieved from said secondary storage, wherein said second subset of program files contains a subset of program files associated with said first program file according to said dependency information.
2. The method of claim 1, further comprising:
- determining a number of files in said second subset of program files, said number representing impact of change of said first program file in generating an executable file for said user application; and
- displaying said number to said user.
3. A non-transitory computer readable storage medium carrying one or more sequences of instructions for causing a system to process a set of program files implementing a user application, wherein each program file in said set of program files incorporates a corresponding subset of program files, each of said subset of program files being contained in said set of program files, wherein execution of said one or more sequences of instructions by one or more processors contained in said system causes said system to perform the actions of:
- parsing at least some of said set of program files to generate a dependency information associating each program to said corresponding subset of program files;
- storing said dependency information in a secondary storage;
- receiving from a user an identifiers of a first program file and a second program file contained in said set of program files;
- retrieving said dependency information from said secondary storage;
- identifying a first subset of program files required to be recompiled if said first program file were to be modified, and a second subset of program files required to be recompiled if said second program file were to be modified based on said dependency information retrieved from said secondary storage;
- determining a first number of files in said first subset of program files and a second number of files in said second subset of program files, said first number and said second number respectively representing impact of modifying said first program file and said second program file in generating an executable file for said user application; and
- displaying said first number and said second number to said user, thereby enabling said user to decide whether to modify said first program file or said second program file assuming either program file can be modified in future to achieve a given objective.
4. The non-transitory computer readable storage medium of claim 3, wherein said dependency information is represented as a hierarchy in which incorporating program files are represented as children of the incorporated program files such that said first subset of program files and said second subset of program files are respectively represented as children of said first program file and said second program file in said hierarchy,
- wherein said first number is the sum of said number of files in said first subset of program files and the number of children of each of said first subset of program files in said hierarchy and said second number is the sum of said number files in said second subset of program files and the number of children of each of said second subset of program files in said hierarchy.
5. A digital processing system comprising:
- a processor;
- a random access memory (RAM); and
- a machine readable medium to store a set of instructions, which when retrieved into said RAM and executed by said processor is designed to cause said digital processing system to perform the actions of: receiving a set of program files implementing a user application, wherein each program file in said set of program files incorporates a corresponding subset of program files, each of said subset of program files being contained in said set of program files; parsing at least some of said set of program files to generate a dependency information associating each program to said corresponding subset of program files, wherein said dependency information is represented as a hierarchy in which incorporating program files are represented as children of the incorporated program files such that said first subset of program files and said second subset of program files are respectively represented as children of said first program file and said second program file in said hierarchy; storing said dependency information in a secondary storage; receiving from a user identifiers of a first program file and a second program file contained in said set of program files; retrieving said dependency information from said secondary storage; identifying a first subset of program files required to be recompiled if said first program file were to be modified, and a second subset of program files required to be recompiled if said second program file were to be modified based on said dependency information retrieved from said secondary storage; determining a first number of files in said first subset of program files and a second number of files in said second subset of program files, said first number and said second number respectively representing impact of modifying said first program file and said second program file in generating an executable file for said user application; and displaying said first number and said second number to said user, thereby enabling said user to decide whether to modify said first program file or said second program file assuming either program file can be modified in future to achieve a given objective, wherein said first number is the sum of said number of files in said first subset of program files and the number of children of each of said first subset of program files in said hierarchy and said second number is the sum of said number files in said second subset of program files and the number of children of each of said second subset of program files in said hierarchy.
Type: Application
Filed: Feb 17, 2012
Publication Date: Jun 14, 2012
Applicant: Oracle International Corporation (Redwood Shores, CA)
Inventors: Mrinal Sharma (Kurukshetra), Shelendra Singh (Aligarh), Vivek Sam Sunder Raj (Kanyakumari)
Application Number: 13/398,867
International Classification: G06F 9/44 (20060101);