Loop Compiling Patents (Class 717/150)
  • Patent number: 8146064
    Abstract: Dynamically controlling a prefetching range of a software controlled cache is provided. A compiler analyzes source code to identify at least one of a plurality of loops that contain irregular memory references. For each irregular memory reference in the source code, the compiler determines whether the irregular memory reference is a candidate for optimization. Responsive to identifying an irregular memory reference that may be optimized, the complier determines whether the irregular memory reference is valid for prefetching. If the irregular memory reference is valid for prefetching, a store statement for an address of the irregular memory reference is inserted into the at least one loop. A runtime library call is inserted into a prefetch runtime library to dynamically prefetch the irregular memory references. Data associated with the irregular memory references are dynamically prefetched into the software controlled cache when the runtime library call is invoked.
    Type: Grant
    Filed: April 4, 2008
    Date of Patent: March 27, 2012
    Assignee: International Business Machines Corporation
    Inventors: Tong Chen, Marc Gonzalez tallada, Zehra N. Sura, Tao Zhang
  • Patent number: 8146067
    Abstract: Vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores is presented. In the framework presented herein, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirement of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residue iteration counts, and multiple statements with arbitrary alignment combinations. Beyond generating a valid simdization, a preferred embodiment further improves the quality of the generated codes. Four stream-shift placement policies are disclosed, which minimize the number of data reorganization generated by the alignment handling.
    Type: Grant
    Filed: April 23, 2008
    Date of Patent: March 27, 2012
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Peng Wu
  • Patent number: 8146071
    Abstract: A mechanism for folding all the data dependencies in a loop into a single, conservative dependence. This mechanism leads to one pair of synchronization primitives per loop. This mechanism does not require complicated, multi-stage compile time analysis. This mechanism considers only the data dependence information in the loop. The low synchronization cost balances the loss in parallelism due to the reduced overlap between iterations. Additionally, a novel scheme is presented to implement required synchronization to enforce data dependences in a DOACROSS loop. The synchronization is based on an iteration vector, which identifies a spatial position in the iteration space of the loop. Multiple iterations executing in parallel have their own iteration vector for synchronization where they update their position in the iteration space. As no sequential updates to the synchronization variable exist, this method exploits a greater degree of parallelism.
    Type: Grant
    Filed: September 18, 2007
    Date of Patent: March 27, 2012
    Assignee: International Business Machines Corporation
    Inventors: Raul Esteban Silvera, Priya Unnikrishnan
  • Patent number: 8136107
    Abstract: A method for managing multiple values assigned to a variable during various stages of a software pipelined process executed in a computing environment. The method comprises allocating two or more slots in a vector register to two or more values associated with said variable during two or more stages of a pipeline process; and rotating values in each slot responsive to an instruction.
    Type: Grant
    Filed: October 24, 2007
    Date of Patent: March 13, 2012
    Assignee: International Business Machines Corporation
    Inventor: Ayal Zaks
  • Patent number: 8104030
    Abstract: A computer implemented method, computer usable program code, and a system for parallelizing a loop. A parameter that will be used to limit parallelization of the loop is identified to limit parallelization of the loop. The parameter specifies a minimum number of loop iterations that a thread should execute. The parameter can be adjusted based on a parallel performance factor. A parallel performance factor is a factor that influences the performance of parallel code. A number of threads from a plurality of threads is selected for processing iterations of the loop based on the parameter. The number of threads is selected prior to execution of the first iteration of the loop.
    Type: Grant
    Filed: December 21, 2005
    Date of Patent: January 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Raul Esteban Silvera, Priya Unnikrishnan, Guansong Zhang
  • Patent number: 8087010
    Abstract: Mechanisms for selective code generation optimization for an advanced dual-representation polyhedral loop transformation framework are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.
    Type: Grant
    Filed: September 26, 2007
    Date of Patent: December 27, 2011
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
  • Patent number: 8087011
    Abstract: Mechanisms for domain stretching for an advanced dual-representation polyhedral loop transformation framework are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.
    Type: Grant
    Filed: September 26, 2007
    Date of Patent: December 27, 2011
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
  • Patent number: 8051272
    Abstract: A method for generating addresses for a processor is provided. The addresses are for use by an application that may be executed by the processor. The application comprises a plurality of instructions, and each instruction comprises at least one line. The method includes storing a plurality of predetermined addresses and, for each line of each instruction, generating at least one address for the processor based on the predetermined addresses.
    Type: Grant
    Filed: September 15, 2006
    Date of Patent: November 1, 2011
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Eran Pisek
  • Publication number: 20110265067
    Abstract: A tracing just-in-time (TJIT) compiler system is described for performing parallelization of code in a runtime phase in the execution of code. Upon detecting a hot loop during the execution of the code, the compiler system extracts trace information from sequentially recorded traces. In a first phase, the compiler system uses the trace information to identify at least one group of operation components that can be operated on in a parallel manner. In a second phase, the compiler system provides instructions which allocate the group of operation components to plural processing resources. A native code generator module carries out those instructions by recompiling native code that directs the operation of a native system to perform parallel processing. The compiler system terminates a group if it encounters program data in a loop iteration that is not consistent with previously encountered predicated information (upon which it records a new trace in a sequential manner).
    Type: Application
    Filed: April 21, 2010
    Publication date: October 27, 2011
    Applicant: Microsoft Corporation
    Inventors: Wolfram Schulte, Nikolai Tillmann, Michal J. Moskal, Manuel A. Fahndrich, Daniel JP Leijen, Barend H. Venter
  • Patent number: 8028281
    Abstract: Parallelization of loops is performed for loops having indirect loop index variables and embedded conditional statements in the loop body. Loops having any finite number of array variables in the loop body, and any finite number of indirect loop index variables can be parallelized. There are two particular limitations of the described techniques: (i) that there are no cross-iteration dependencies in the loop other than through the indirect loop index variables; and (ii) that the loop index variables (either direct or indirect) are not redefined in the loop body.
    Type: Grant
    Filed: January 5, 2007
    Date of Patent: September 27, 2011
    Assignee: International Business Machines Corporation
    Inventor: Rajendra K. Bera
  • Patent number: 8024717
    Abstract: An apparatus and a method for processing an array in a loop in a computer system, including: applying loop unrolling to a multi-dimensional array included in a loop based on a predetermined unrolling factor to generate a plurality of unrolled multi-dimensional arrays; and transforming each of the plurality of unrolled multi-dimensional arrays into a one-dimensional array having an array subscript expression in a form of an affine function with respect to a loop counter variable.
    Type: Grant
    Filed: July 26, 2006
    Date of Patent: September 20, 2011
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Dong-Hoon Yoo, Hee Seok Kim, Jeong Wook Kim, Soo Jung Ryu
  • Patent number: 8006238
    Abstract: A process, compiler, computer program product and system for workload partitioning in a heterogeneous system. The process includes determining heterogeneous alignment constraints in the workload, partitioning a portion of tasks to a processing element sensitive to alignment constraints, and partitioning a remaining portion of tasks to a processing element not sensitive to alignment constraints.
    Type: Grant
    Filed: September 26, 2006
    Date of Patent: August 23, 2011
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Kathryn M. O'Brien, Tong Chen
  • Patent number: 7979852
    Abstract: The inventive system for automatically generating optimizes codes (19) which are operational on a predefined hardware platform (90) comprises at least one processor which is (91) based on code sources (17) provided by users and comprises means (51, 52) for receiving symbolic code sequences or standard sequences (1) representative for the processor (91) behavior in terms of performance for a predetermined application area, means (53), for receiving static parameters (2), means (55) for receiving dynamic parameters (7), an analysing device (10) for defining optimization rules (9) on the basis of performance tests and measures determined on the basis of the standard sequences (1) and the static (2) and dynamic (7) parameters, a device (80) for optimizing and generating the code receiving the standard sequences (1) and the optimization rules (9) for examining the code sources (17) of the users, detecting optimizable loops, decomposing into cores and for assembling and injecting the codes in such a way that the op
    Type: Grant
    Filed: January 13, 2005
    Date of Patent: July 12, 2011
    Assignees: Commissariat a l'Energie Atomique et Aux Energies Alternatives, Caps Entreprise, University of Versailles Saint-Quentin-En-Yvelines
    Inventors: François Bodin, William Jalby, Xavier Le Pasteur, Christophe Lemuet, Eric Courtois, Jean Papadopoulo, Pierre Leca
  • Patent number: 7962906
    Abstract: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.
    Type: Grant
    Filed: March 15, 2007
    Date of Patent: June 14, 2011
    Assignee: International Business Machines Corporation
    Inventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien, Daniel Arthur Prener
  • Patent number: 7926046
    Abstract: This invention describes a compilation method of extracting and implementing an accelerator control program from an application source code in a processor based system. The application source code comprises arrays and loops. The input application source code is sequential, with loop, branch and call control structures, while the generated output of this invention has parallel execution semantics. The compilation method comprises the step of performing loop nest analysis, transformations and backend processes. The step of loop nest analysis consists of dependence analysis and pointer analysis. Dependence analysis determines the conflicts between the various references to arrays in the loop, and pointer analysis determines if two pointer references in a loop are in conflict. Transformations convert the loops from their original sequential execution semantics to parallel execution semantics. The back-end process determines the parameters and memory map of the accelerator and the hardware dependent software.
    Type: Grant
    Filed: July 7, 2006
    Date of Patent: April 12, 2011
    Inventors: Soorgoli Ashok Halambi, Sarang Ramchandra Shelke, Bhramar Bhushan Vatsa, Dibyapran Sanyal, Nishant Manohar Nakate, Ramanujan K Valmiki, Sai Pramod Kumar Atmakuru, William C Salefski, Vidya Praveen
  • Patent number: 7895587
    Abstract: A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.
    Type: Grant
    Filed: September 8, 2006
    Date of Patent: February 22, 2011
    Assignee: Elbrus International
    Inventors: Boris A. Babaian, Yuli Kh. Sakhin, Vladimir Yu. Volkonskiy, Sergey A. Rozhkov, Vladimir V. Tikhorsky, Feodor A. Gruzdov, Leonid N. Nazarov, Mikhail L. Chudakov
  • Patent number: 7890942
    Abstract: A method and system for substituting array values (i.e., expressions) in a program at compile time. An initialization of an array is identified in a loop. The initialization is an assignment of an expression (i.e., a constant or a function of an induction variable to elements of the array). The expression is stored in a table that associates the expression with the array and indices of the array. An assignment statement is detected that is to assign at least one element of the initialized elements. The expression is retrieved from the table based on the expression being associated with the array and corresponding indices. The expression is substituted for the at least one element so that the expression is to be assigned by the assignment statement. The process of substituting array values is extended to interprocedural analysis.
    Type: Grant
    Filed: August 15, 2006
    Date of Patent: February 15, 2011
    Assignee: International Business Machines Corporation
    Inventor: Rohini Nair
  • Patent number: 7882498
    Abstract: Provided are a method, system, and program for parallelizing source code with a compiler. Source code including source code statements is received. The source code statements are processed to determine a dependency of the statements. Multiple groups of statements are determined from the determined dependency of the statements, wherein statements in one group are dependent on one another. At least one directive is inserted in the source code, wherein each directive is associated with one group of statements. Resulting threaded code is generated including the inserted at least one directive. The group of statements to which the directive in the resulting threaded code applies are processed as a separate task. Each group of statements designated by the directive to be processed as a separate task may be processed concurrently with respect to other groups of statements.
    Type: Grant
    Filed: March 31, 2006
    Date of Patent: February 1, 2011
    Assignee: Intel Corporation
    Inventors: Guilherme D. Ottoni, Xinmin Tian, Hong Wang, Richard A. Hankins, Wei Li, John Shen
  • Patent number: 7877742
    Abstract: A method, system, and computer program product for generating terminating, pseudo-random test instruction streams, including forward and backward branching instructions. A first instruction stream is generated, including at least one backward branching instruction and at least one forward branching instruction. Each backward branching instruction is preceded by at least one forward branching instruction, which is used to guarantee termination of the loop formed by the backward branching instruction. Backward branching targets are resolved when the backward branching instruction is inserted into the first instruction stream. Forward branching targets remain unresolved in the first instruction stream. A set of potential branch targets is determined for each forward branching instruction. For each forward branching instruction, a branch target is randomly selected from the set of potential branch targets for that forward branching instruction.
    Type: Grant
    Filed: June 8, 2006
    Date of Patent: January 25, 2011
    Assignee: International Business Machines Corporation
    Inventors: Ali Y. Duale, Theodore J. Bohizic, Dennis W. Wittig
  • Patent number: 7877739
    Abstract: A computer-implemented method for determining whether an array within a loop can be privatized for that loop is presented. The method calculates the array sections that require first or last privatization and copies only those sections, reducing the privatization overhead of the known solutions.
    Type: Grant
    Filed: October 9, 2006
    Date of Patent: January 25, 2011
    Assignee: International Business Machines Corporation
    Inventors: Roch G. Archambault, Erik P. Charlebois, Guansong Zhang
  • Patent number: 7849453
    Abstract: One embodiment of the present invention provides a system that generates code for software scouting the regions of a program. During operation, the system receives source code for a program. The system then compiles the source code. In the first step of the compilation process, the system identifies a first set of loops from a hierarchy of loops in the source code, wherein each loop in the first set of loops contains at least one effective prefetch candidate. Then, from the first set of loops, the system identifies a second set of loops where scout-mode prefetching is profitable. Next, for each loop in the second set of loops, the system produces executable code for a helper-thread which contains a prefetch instruction for each effective prefetch candidate. At runtime the helper-thread is executed in parallel with the main thread in advance of where the main thread is executing to prefetch data items for the main thread.
    Type: Grant
    Filed: November 9, 2005
    Date of Patent: December 7, 2010
    Assignee: Oracle America, Inc.
    Inventors: Partha P. Tirumalai, Yonghong Song, Spiros Kalogeropulos
  • Patent number: 7836456
    Abstract: In a smart card system in which applications executing in different execution contexts are allowed to communicate with each other only through shareable interface objects (SIO's), a registry mechanism is provided to mediate in inter-application communication between legacy applets, extended applets and servlets. A request by a client application for a SIO of a server application in a different execution context is routed to the registry mechanism by the system. Dependent on what types the client and server applications are, the registry mechanism provides call interfaces as would be expected by the applications to enable passing the SIO from the server application to the client application. In one embodiment, servlets and extended applets may also register and unregister their SIOs dynamically with the registration mechanism.
    Type: Grant
    Filed: October 31, 2006
    Date of Patent: November 16, 2010
    Assignee: Oracle America, Inc.
    Inventors: Thierry P. Violleau, Tanjore Ravishankar, Matthew R. Hill
  • Patent number: 7836434
    Abstract: Methods, systems, and articles of manufacture consistent with the present invention provide an improved technique for analyzing statements that use pointer or array syntax to access dynamically-allocated arrays to determine whether the statement generates a reference that is outside the bounds of the array's allocated memory. Statements that use pointer or array syntax to access dynamically-allocated arrays can be either statically (at compile-time) or dynamically bounds (at run-time) checked. Methods and systems in accordance with the present invention determine at compile-time if an array reference can be determined to always be in bounds or definitely out of bounds at least once, and if not, insert code into the program to check the array bounds dynamically at run-time before the access of the array reference.
    Type: Grant
    Filed: May 4, 2004
    Date of Patent: November 16, 2010
    Assignee: Oracle America, Inc.
    Inventor: Michael L. Boucher
  • Publication number: 20100281471
    Abstract: Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.
    Type: Application
    Filed: December 31, 2009
    Publication date: November 4, 2010
    Inventors: Shih-Wei Liao, Xinmin Tian, Gerolf F. Hoflehner, Hong Wang, Daniel M. Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John P. Shen
  • Patent number: 7827542
    Abstract: A compiler apparatus that improves the performance of loop processing. The compiler apparatus translates a C program that includes a loop into a machine language program, and includes: a movement judgment unit that judges whether or not an instruction which is positioned outside of the loop of the C program can be moved into the loop, based on a state of live ranges of variables used in the instruction; a movement execution unit that moves the instruction into the loop in the case where the movement judgment unit judges that the instruction can be moved into the loop, thereby generating an intermediate program; and a translation unit that translates the intermediate program into the machine language program.
    Type: Grant
    Filed: September 25, 2006
    Date of Patent: November 2, 2010
    Assignee: Panasonic Corporation
    Inventors: Hajime Ogawa, Ryoko Miyachi, Toshiyuki Sakata
  • Publication number: 20100275191
    Abstract: Fine-grained parallelism within isolated object graphs is used to provide safe concurrent operations within the isolated object graphs. One example provides an abstraction labeled IsolatedObjectGraph that encapsulates at least one object graph, but often two or more object graphs, rooted by an instance of a type member. By encapsulating the object graph, no references from outside of the object graph are allowed to objects inside of the object graph. Also, the encapsulated object graph does not contain references to objects outside of the graphs. The isolated object graphs provide for safe data parallel operations, including safe data parallel mutations such as for each loops. In an example, the ability to isolate the object graph is provided through type permissions.
    Type: Application
    Filed: April 24, 2009
    Publication date: October 28, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: John J. Duffy, Niklas Gustafsson, Vance Morrison
  • Patent number: 7823141
    Abstract: A method for executing a loop in an application that includes executing iterations in a first segment of the loop by a base thread, logging memory transactions that occur during execution of iterations in the first segment by a co-inspector thread to obtain a co-inspector log, executing iterations in a second segment of the loop by a co-thread to obtain temporary results, logging memory transactions that occur during execution of iterations in the second segment to obtain a co-thread log, and comparing the co-inspector log and the co-thread log to determine whether a thread interdependency exists.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: October 26, 2010
    Assignee: Oracle America, Inc.
    Inventors: Phyllis E. Gustafson, Michael H. Paleczny, Christopher A. Vick, Olaf Manczak, Jay R. Freeman, Yuguang Wu
  • Patent number: 7818730
    Abstract: The present invention provides a method and system for building an executable using only the necessary source modules or a reduced set of source modules. The complete list of necessary source modules can be determined by checking for dependency of any already identified necessary source modules. Hence, if any of the source modules belongs to a library, the entire library will not need to be compiled in order to use any necessary source module to build the executable. The present invention has the advantage that the executable takes shorter time to build and the executable is smaller in memory size so that it is easier to be ported to a target system. The present invention may also be used to minimize or reduce the memory needed to load a model so that only the elements/blocks that are used in the model are loaded into memory when a model loads.
    Type: Grant
    Filed: June 30, 2006
    Date of Patent: October 19, 2010
    Assignee: The Math Works, Inc.
    Inventors: Anthony Robert Ryan, James Carrick
  • Patent number: 7818729
    Abstract: Automated (e.g., compiler implemented) techniques provide safe secure software development including techniques for testing and verifying software for determining and/or certifying that the software had certain characteristics and/or complies with certain properties. In another illustrative implementation, methods are provided whereby the consumer can verify, to any desired level of certainty, that software as delivered truly has the specified properties, and that the compiler used to produce that software can be trusted to provide those assurances.
    Type: Grant
    Filed: June 23, 2006
    Date of Patent: October 19, 2010
    Inventors: Thomas S. Plum, David M. Keaton
  • Patent number: 7814468
    Abstract: A method for loop reformulation is provided such that a single exit ill-formed loop (SEIFL) can be reformulated into a reformulated code block that contains a transformed well-formed loop (TWFL). A SEIFL loop is a loop that can exit from the loop body of the loop. After the loop reformulation, the TWFL of the reformulated code block can only exit from the end of the loop. The reformulated code block will replace the SEIFL in the compiler's internal representation (IR) such that a more efficient executable machine code can be generated by optimizing the reformulated compiler's IR.
    Type: Grant
    Filed: April 20, 2005
    Date of Patent: October 12, 2010
    Assignee: Oracle America, Inc.
    Inventors: Yonghong Song, Xiangyun Kong
  • Patent number: 7805413
    Abstract: A program stored in a storage device is read. Partial compression, in the element in an array in a loop nest in the program, is performed by replacing an element local only in the loop nest in the entire program with a scalar variable. Access to an original array is inserted into a program for an non-local element.
    Type: Grant
    Filed: December 22, 2003
    Date of Patent: September 28, 2010
    Assignee: Fujitsu Limited
    Inventor: Akira Hosoi
  • Publication number: 20100235611
    Abstract: A compiler compiling a source code and is implemented in a plurality of processor cores includes a parallel loop processing detection unit configured to detect from the source code a loop processing code for execution of an internal processing operation for a given number of repeating times, and an independent parallel loop processing code in the internal processing operation performed for each repetition to be concurrently processed, and a dynamic parallel conversion unit configured to generate a control core code for control of the number of repeating times in the parallel loop processing code and a parallel processing code for changing the number of repeating times corresponding to the control from the control core code.
    Type: Application
    Filed: March 18, 2010
    Publication date: September 16, 2010
    Applicant: FUJITSU LIMITED
    Inventor: Koichiro YAMASHITA
  • Patent number: 7797692
    Abstract: A system that estimates a dominant computational resource which is used by a computer program. During operation, for each basic block in the computer program, the system determines a nesting level for the basic block. Next, the system selects basic blocks with nesting levels greater than a specified threshold. For each selected basic block, the system analyzes the basic block to estimate the dominant computational resource used by the basic block. The system then uses the estimated dominant computational resources for the selected basic blocks to estimate the dominant computational resource for the computer program.
    Type: Grant
    Filed: May 12, 2006
    Date of Patent: September 14, 2010
    Assignee: Google Inc.
    Inventor: Grzegorz J. Czajkowski
  • Patent number: 7793278
    Abstract: Systems and methods perform affine partitioning on a code stream to produce code segments that may be parallelized. The code segments include copies of the original code stream with conditional inserted that aid in parallelizing code. The conditional is formed by determining the constraints on a processor variable determined by the affine partitioning and applying the constraints to the original code stream.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: September 7, 2010
    Assignee: Intel Corporation
    Inventors: Zhao Hui Du, Shih-Wei Liao, Gansha Wu, Guei-Yuan Lueh
  • Patent number: 7784040
    Abstract: A method and system for profiling performance behaviour of executed loops. For each invocation of a loop, a count of a measured event is incremented. A display is provided for a loop (209) showing the number of measured events for each of the loop's invocations. The code of a loop is instrumented (204) to obtain the count of loop invocations (207) and the occurrences of the measured event (208).
    Type: Grant
    Filed: November 15, 2005
    Date of Patent: August 24, 2010
    Assignee: International Business Machines Corporation
    Inventors: Gad Haber, Marcel Zalmanovici
  • Patent number: 7779382
    Abstract: Validity of one or more assertions for any concurrent execution of a plurality of software instructions with at most k?1 context switches can be determined. Validity checking can account for execution of the software instructions in an unbounded stack depth scenario. A finite data domain representation can be used. The software instructions can be represented by a pushdown system. Validity checking can account for thread creation during execution of the plurality of software instructions.
    Type: Grant
    Filed: December 10, 2004
    Date of Patent: August 17, 2010
    Assignee: Microsoft Corporation
    Inventors: Niels Jakob Rehof, Shaz Qadeer
  • Publication number: 20100205589
    Abstract: A method and a system for non-locally constraining a plurality of related but separated program entities (e.g., a loop operation and a related accumulation operation within the loop's scope) such that any broad program transformation affecting both will have the machinery to assure that the changes to both entities will preserve the invariant properties of and dependencies among them. For example, if a program transform alters one entity (e.g., re-expresses an accumulation operation as a vector operation incorporating some or all of the loop's iteration) the constraint will provide the machinery to assure a compensating alteration of the other entities (e.g., the loop operation is reduced to reflect the vectorization of the accumulation operation). One realization of this method comprises specialized instances of the related entities that while retaining their roles as program entities (i.e., operators), also contain data and machinery to define the non-local constraint relationship.
    Type: Application
    Filed: April 25, 2010
    Publication date: August 12, 2010
    Inventor: Ted James Biggerstaff
  • Patent number: 7774766
    Abstract: Various embodiments of the present invention relate to methods and systems for optimizing an intermediate code in a compilation logic. The intermediate code is optimized by performing reassociation in software loops. The intermediate code includes at least one critical recurrence cycle. The performance of reassociation in software loops can reduce a critical recurrence cycle in them, which can speed up their execution. The subject method can include the determination of one or more critical recurrence cycles in a software loop. The method can also include the determination of at least one edge in a critical recurrence cycle, with respect to which reassociation can be performed, if one or more pre-determined criteria are met. The method can further include performing reassociation of a dependee and a dependent of an edge. In an embodiment, when one or more pre-determined criteria are met, the logic of the software loop is maintained after performing reassociation of the dependee and the dependent of the edge.
    Type: Grant
    Filed: September 29, 2005
    Date of Patent: August 10, 2010
    Assignee: Intel Corporation
    Inventors: Kalyan Muthukumar, Daniel M Lavery
  • Publication number: 20100192138
    Abstract: Methods, apparatus and computer software product for local memory compaction are provided. In an exemplary embodiment, a processor in connection with a memory compaction module identifies inefficiencies in array references contained within in received source code, allocates a local array and maps the data from the inefficient array reference to the local array in a manner which improves the memory size requirements for storing and accessing the data. In another embodiment, a computer software product implementing a local memory compaction module is provided. In a further embodiment a computing apparatus is provided. The computing apparatus is configured to improve the efficiency of data storage in array references. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
    Type: Application
    Filed: February 4, 2009
    Publication date: July 29, 2010
    Inventors: Allen K. Leung, Benoit J. Meister, David E. Wohlford, Nicolas T. Vasilache, Richard A. Lethin
  • Patent number: 7765534
    Abstract: A compiling program with cache utilization optimizations employs an inter-procedural global analysis of the data access patterns of compile units to be processed. The global analysis determines sufficient information to allow intelligent application of optimization techniques to be employed to enhance the operation and utilization of the available cache systems on target hardware.
    Type: Grant
    Filed: April 30, 2004
    Date of Patent: July 27, 2010
    Assignee: International Business Machines Corporation
    Inventors: Roch G. Archambault, Robert J. Blainey, Yaoqing Gao
  • Patent number: 7757222
    Abstract: Code is affine partitioned to generate affine partitioning mappings. Parallel code is generated based on the affine partitioning mappings. Generating the parallel code includes coalescing loops in the parallel code generated from the affine partitioning mappings to generate coalesced parallel code and optimizing the coalesced parallel code.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: July 13, 2010
    Assignee: Intel Corporation
    Inventors: Shih-wei Liao, Zhao Hui Du, Bu Qi Cheng, Gansha Wu, Guei-Yuan Lueh
  • Patent number: 7739530
    Abstract: Provided is a method of reliably reducing power consumption of a computer, while promoting prompt compilation of a source code and execution of an output code. The method according to this invention includes the steps of: reading a code which is preset and analyzing an amount of operation of the CPU and an access amount with respect to the cache memory based on the code; obtaining an execution rate of the CPU and an access rate with respect to the cache memory based on the amount of operation and the access amount; determining an area in which the access rate with respect to the cache memory is higher than the execution rate of the CPU, based on the code; adding a code for enabling the power consumption reduction function to the area; and generating an execution code executable on the computer, based on the code.
    Type: Grant
    Filed: February 16, 2007
    Date of Patent: June 15, 2010
    Assignee: Hitachi, Ltd.
    Inventors: Koichi Takayama, Naonobu Sukegawa
  • Patent number: 7730463
    Abstract: A computer implemented method, system and computer program product for automatically generating SIMD code. The method begins by analyzing data to be accessed by a targeted loop including at least one statement, where each statement has at least one memory reference, to determine if memory accesses are safe. If memory accesses are safe, the targeted loop is simdized. If not safe, it is determined if a scheme can be applied in which safety need not be guaranteed. If such a scheme can be applied, the targeted loop is simdized according to the scheme. If such a scheme cannot be applied, it is determined if padding is appropriate. If padding is appropriate, the data is padded and the targeted loop is simdized. If padding is not appropriate, non-simdized code is generated based on the targeted loop for handling boundary conditions, the targeted loop is simdized and combined with the non-simdized code.
    Type: Grant
    Filed: February 21, 2006
    Date of Patent: June 1, 2010
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu, Peng Zhao
  • Patent number: 7725696
    Abstract: A processor method and apparatus that allows for the overlapped execution of multiple iterations of a loop while allowing the compiler to include only a single copy of the loop body in the code while automatically managing which iterations are active. Since the prologue and epilogue are implicitly created and maintained within the hardware in the invention, a significant reduction in code size can be achieved compared to software-only modulo scheduling. Furthermore, loops with iteration counts less than the number of concurrent iterations present in the kernel are also automatically handled. This hardware enhanced scheme achieves the same performance as the fully-specified standard method. Furthermore, the hardware reduces the power requirement as the entire fetch unit can be deactivated for a portion of the loop's execution.
    Type: Grant
    Filed: October 4, 2007
    Date of Patent: May 25, 2010
    Inventors: Wen-mei W. Hwu, Matthew C. Merten
  • Patent number: 7721267
    Abstract: A software pipelined loop tracing method involves inhibiting an output of trace data at a start of a software pipelined loop (SPLOOP). A skip in an output trace packet is indicated if the SPLOOP is skipped, and the SPLOOP is indicated at a cycle of an epilog state in the output trace packet if the SPLOOP is not skipped. An iteration count indication SPLOOP information and a position within a SPLOOP, is maintained. A periodic SPLOOP marker (PerSP) coinciding with a sync point is output if the SPLOOP is active.
    Type: Grant
    Filed: May 16, 2006
    Date of Patent: May 18, 2010
    Assignee: Texas Instruments Incorporated
    Inventor: Manisha Agarwala
  • Patent number: 7712091
    Abstract: A method and system for optimizing the execution of a software loop is provided. The method involves the determination of an edge in a critical recurrence cycle in the software loop. The edge is a dependency link between two instructions and contains a dependee and a dependent. The dependee is an instruction that produces a result, and the dependent is an instruction that uses the result. The method further involves performing predicate promotion of at least one of the dependee and the dependent if one or more pre-determined conditions are met.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: May 4, 2010
    Assignee: Intel Corporation
    Inventors: Kalyan Muthukumar, Robyn A. Sampson, Daniel Lavery
  • Patent number: 7707543
    Abstract: A method, a device and a system arrangement are disclosed for generating self-contained software components having in each case synchronous and/or asynchronous interfaces with an internal threading model. The concept disclosed enables all necessary synchronization mechanisms to be provided automatically. The concept is based on an asynchronous operation manager used to divert callbacks from a called component into a calling component.
    Type: Grant
    Filed: November 22, 2005
    Date of Patent: April 27, 2010
    Assignee: Siemens Aktiengesellschaft
    Inventors: Detlef Becker, Karlheinz Dorn, Vladyslav Ukis, Hans-Martin Von Stockhausen
  • Patent number: 7698696
    Abstract: A compiler comprises an analysis unit that detects directives (options and pragmas) from a user to the compiler, an optimization unit that is made up of a processing unit (a global region allocation unit, a software pipelining unit, a loop unrolling unit, a “if” conversion unit, and a pair instruction generation unit) that performs individual optimization processing designated by options and pragmas from a user, following the directives and the like from the analysis unit, etc. The global region allocation unit performs optimization processing, following designation of the maximum data size of variables to be allocated to a global region, designation of variables to be allocated to the global region, and options and pragmas regarding designation of variables not to be allocated in the global region.
    Type: Grant
    Filed: June 30, 2003
    Date of Patent: April 13, 2010
    Assignee: Panasonic Corporation
    Inventors: Hajime Ogawa, Taketo Heishi, Toshiyuki Sakata, Shuichi Takayama, Shohei Michimoto, Tomoo Hamada, Ryoko Miyachi
  • Patent number: 7689735
    Abstract: An interface requests instructions from a data store storing instructions of an application to be processed by a data processor, and receives and transmits the instructions to the data processor. The interface includes: an input that receives the instructions from the data store via at least one input bus; a buffer that stores received instructions; an output that outputs instructions to the data processing apparatus via the output bus; a control signal input that receives a control signal; and a buffer controller that controls the buffer to request an instruction subsequent to a previously received instruction within an instruction stream of the application from the data store in response to detection of no control signal on the control signal input and to detection of available buffer storage capacity.
    Type: Grant
    Filed: October 3, 2005
    Date of Patent: March 30, 2010
    Assignee: ARM Limited
    Inventors: Martinus Cornelis Wezelenburg, Dirk Duerinckx, Jan Guffens
  • Patent number: 7689980
    Abstract: Linear transformations of statements in code are performed to generate linear expressions associated with the statements. Parallel code is generated using the linear expressions. Generating the parallel code includes splitting the computation-space of the statements into intervals and generating parallel code for the intervals.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: March 30, 2010
    Assignee: Intel Corporation
    Inventors: Zhao Hui Du, Shih-wei Liao, Gansha Wu, Guei-Yuan Lueh