Including Loop Patents (Class 717/160)

Including scheduling instructions (Class 717/161)

DATA DEPENDENCE TESTING FOR LOOP FUSION WITH CODE REPLICATION, ARRAY CONTRACTION, AND LOOP INTERCHANGE

Publication number: 20090307675

Abstract: Methods and apparatus to data dependence testing for loop fusion, e.g., with code replication, array contraction, and/or loop interchange, are described. In one embodiment, a compiler may optimize code for efficient execution during run-time by testing for dependencies associated with improving memory locality through code replication in loops that enable various loop transformations. Other embodiments are also described.

Type: Application

Filed: June 4, 2008

Publication date: December 10, 2009

Inventors: John L. Ng, Rakesh Krishnaiyer, Alexander Y. Ostanevich
System and Method for Domain Stretching for an Advanced Dual-Representation Polyhedral Loop Transformation Framework

Publication number: 20090307673

Abstract: A system and method for domain stretching for an advanced dual-representation polyhedral loop transformation framework are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.

Type: Application

Filed: September 26, 2007

Publication date: December 10, 2009

Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
IMPROVING DATA LOCALITY AND PARALLELISM BY CODE REPLICATION AND ARRAY CONTRACTION

Publication number: 20090307674

Abstract: Provided are a method, system, and article of manufacture improving data locality and parallelism by code replication and array contraction. Source code including an array of elements referenced using at least two indices is processed. The array is nested within multiple loops, wherein at least two of the loops perform iterations with respect to the indices of the array, wherein the index incremented in at least one innermost loop of the loops does not comprise a leftmost index in the array. The source code is transformed to object code by performing operations including fusing at least two innermost loops of the loops in object code generated by compiling the source code by replicating statements from at least one of the innermost loops into a fused innermost loop and performing loop interchange in the object code to have the fused innermost loop provide iterations with respect to the leftmost index in the array.

Type: Application

Filed: June 4, 2008

Publication date: December 10, 2009

Inventors: John L. NG, Alexander Y. OSTANEVICH, Alexander L. SUSHENTSOV
Methods and products for processing loop nests

Patent number: 7631305

Abstract: Methods and products for processing a software kernel of instructions are disclosed. The software kernel has stages representing a loop nest. The software kernel is processed by partitioning iterations of an outermost loop into groups with each group representing iterations of the outermost loop, running the software kernel and rotating a register file for each stage of the software kernel preceding an innermost loop to generate code to prepare for filling and executing instructions in software pipelines for a current group, running the software kernel for each stage of the software kernel in the innermost loop to generate code to fill the software pipelines for the current group with the register file being rotated after at least one run of the software kernel for the innermost loop, and repeatedly running the software kernel to unroll inner loops to generate code to further fill the software pipelines for the current group.

Type: Grant

Filed: September 20, 2004

Date of Patent: December 8, 2009

Assignee: University of Delaware

Inventors: Hongbo Rong, Guang R. Gao, Alban Douillet, Ramaswamy Govindarajan
PARALLELIZING NON-COUNTABLE LOOPS WITH HARDWARE TRANSACTIONAL MEMORY

Publication number: 20090288075

Abstract: A system and method for speculatively parallelizing non-countable loops in a multi-threaded application. A multi-core processor receives instructions for a multi-threaded application. The application may contain non-countable loops. Non-countable loops have an iteration count value that cannot be determined prior to the execution of the non-countable loop, a loop index value that cannot be non-speculatively determined prior to the execution of an iteration of the non-countable loop, and control that is not transferred out of the loop body by a code line in the loop body. The compiler replaces the non-countable loop with a parallelized loop pattern that uses outlined function calls defined in a parallelization library (PL) in order to speculatively execute iterations of the parallelized loop. The parallelized loop pattern is configured to squash and re-execute any speculative thread of the parallelized loop pattern that is signaled to have a transaction failure.

Type: Application

Filed: May 19, 2008

Publication date: November 19, 2009

Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
Parallelization scheme for generic reduction

Patent number: 7620945

Abstract: One embodiment of the present invention provides a system that supports parallelized generic reduction operations in a parallel programming language, wherein a reduction operation is an associative operation that can be divided into a group of sub-operations that can execute in parallel. During operation, the system detects generic reduction operations in source code. In doing so, the system identifies a set of reduction variables upon which the generic reduction operation will operate, along with a set of initial values for the variables. The system additionally identifies a merge operation that merges partial results from the parallel generic reduction operations into a final result. The system then compiles the program's source code into a form which facilitates executing the generic reduction operations in parallel.

Type: Grant

Filed: August 16, 2005

Date of Patent: November 17, 2009

Assignee: Sun Microsystems, Inc.

Inventors: Yonghong Song, Yuan Lin, Prashanth Narayanaswamy
Method and system for virtual prototyping

Patent number: 7613599

Abstract: An integrated design environment (IDE) is disclosed for forming virtual embedded systems. The IDE includes a design language for forming finite state machine models of hardware components that are coupled to simulators of processor cores, preferably instruction set accurate simulators. A software debugger interface permits a software application to be loaded and executed on the virtual embedded system. A virtual test bench may be coupled to the simulation to serve as a human-machine interface. In one embodiment, the IDE is provided as a web-based service for the evaluation, development and procurement phases of an embedded system project. IP components, such as processor cores, may be evaluated using a virtual embedded system. In one embodiment, a virtual embedded system is used as an executable specification for the procurement of a good or service related to an embedded system.

Type: Grant

Filed: June 1, 2001

Date of Patent: November 3, 2009

Assignee: Synopsys, Inc.

Inventors: Stephen L Bade, Shay Ben-Chorin, Paul Caamano, Marcelo E Montoreano, Ani Taggu, Filip C Theon, Dean C Wills
EXECUTION OF RETARGETTED GRAPHICS PROCESSOR ACCELERATED CODE BY A GENERAL PURPOSE PROCESSOR

Publication number: 20090259828

Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.

Type: Application

Filed: March 20, 2009

Publication date: October 15, 2009

Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy, Jayant B. Kolhe, John Bryan Pormann, Douglas Saylor
Straight-line post-increment optimization for memory access instructions

Patent number: 7594223

Abstract: A compiler configured for optimizing non-loop memory access instructions of a computer program to form architected memory instructions conforming to a base register auto-incrementing addressing mode. The compiler includes code for obtaining an intermediate stream of code containing pseudo-memory instructions from the non-loop memory access instructions. The intermediate stream of code includes at least one place holder instruction preserving a value associated with a base of a first non-loop memory access instruction even after the first non-loop memory instruction is converted to one of the pseudo-memory instructions. The compiler includes code for converting the intermediate stream of code using the intermediate stream of code and the at least one place holder instruction to obtain the architected memory instructions.

Type: Grant

Filed: June 27, 2005

Date of Patent: September 22, 2009

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Richard E. Hank, Le-Chun Wu
Dependency analysis system and method

Patent number: 7581215

Abstract: We present a technique to perform dependence analysis on more complex array subscripts than the linear form of the enclosing loop indices. For such complex array subscripts, we decouple the original iteration space and the dependence test iteration space and link them through index-association functions. The dependence analysis is performed in the dependence test iteration space to determine whether the dependence exists in the original iteration space. The dependence distance in the original iteration space is determined by the distance in the dependence test iteration space and the property of index-association functions. For certain non-linear expressions, we show how to transform it to a set of linear expressions equivalently. The latter can be used in dependence test with traditional techniques. We also show how our advanced dependence analysis technique can help parallelize some otherwise hard-to-parallelize loops.

Type: Grant

Filed: June 24, 2004

Date of Patent: August 25, 2009

Assignee: Sun Microsystems, Inc.

Inventors: Yonghong Song, Xiangyun Kong
Method and structure for producing high performance linear algebra routines using preloading of floating point registers

Patent number: 7571435

Abstract: A method (and structure) for executing linear algebra subroutines, includes, for an execution code controlling operation of a floating point unit (FPU) performing the linear algebra subroutine execution, unrolling instructions to preload data into a floating point register (FReg) of the FPU. The unrolling generates an instruction to load data into the FReg and the instruction is inserted into a sequence of instructions that execute the linear algebra subroutine on the FPU.

Type: Grant

Filed: September 29, 2003

Date of Patent: August 4, 2009

Assignee: International Business Machines Corporation

Inventors: Fred Gehrung Gustavson, John A. Gunnels
Compiler apparatus for optimizing high-level language programs using directives

Patent number: 7571432

Abstract: A compiler 58, which is a compiler that realizes program development in a fewer man hours, translates a source program 72 written in a high-level language into a machine language program. This compiler 58 is comprised of: a directive obtainment unit that obtains a directive that a machine language program to be generated should be optimized; a parser unit 76 that parses the source program 72; an intermediate code conversion unit 78 that converts the source program 72 into intermediate codes based on a result of the parsing performed by the parser unit 76; an optimization unit 68 that optimizes the intermediate codes according to the directive; and a code generation unit 90 that converts the intermediate codes into the machine language program. The above directive is a directive to optimize the machine language program targeted at a processor that uses a cache memory.

Type: Grant

Filed: September 21, 2004

Date of Patent: August 4, 2009

Assignee: Panasonic Corporation

Inventors: Taketo Heishi, Hajime Ogawa, Yasuhiro Yamamoto, Kyoko Hattori, Shohei Michimoto, Kenji Hattori, Hirotetsu Tomita, Teruo Kawabata, Kiyoshi Nakashima
Apparatus, systems, and methods for execution-driven loop splitting and load-safe code hosting

Patent number: 7549146

Abstract: Techniques for execution-driven loop splitting and load-safe code hosting are provided. Compiled code includes statements associated with an original loop and statements associated with an alternative loop. The alternative loop reproduces the original loop except for conditional load-safe invariant expressions that appeared in the original loop and that are separated out of the alternative loop. During processing, once the conditional load-safe invariant expressions are computed and referenced for a first time within the original loop, processing dynamically switches to the alternative loop where the conditional load-safe invariant expressions are computed outside of the alternative loop and referenced from within the alternative loop.

Type: Grant

Filed: June 21, 2005

Date of Patent: June 16, 2009

Assignee: Intel Corporation

Inventors: Xinmin Tian, Milind B. Girkar
System and method for optimized swing modulo scheduling based on identification of constrained resources

Patent number: 7546592

Abstract: A method, computer program product, and a data processing system for scheduling instructions in a data processing system are provided. Dependencies among a plurality of nodes are analyzed to determine if any of the plurality of nodes uses a constrained resource. Each of the plurality of nodes represents an instruction in a set of instructions. A subset of the plurality of nodes is designated as resource-constrained nodes. An attempt is made to generate a schedule with the subset of the plurality of nodes scheduled with priority with respect to any of the plurality of nodes not included in the subset.

Type: Grant

Filed: July 21, 2005

Date of Patent: June 9, 2009

Assignee: International Business Machines Corporation

Inventor: Allan Russell Martin
Method and system for code modification based on cache structure

Patent number: 7530063

Abstract: A method and system of modifying instructions forming a loop is provided. A method of modifying instructions forming a loop includes modifying instructions forming a loop including: determining static and dynamic characteristics for the instructions; selecting a modification factor for the instructions based on a number of separate equivalent sections forming a cache in a processor which is processing the instructions; and modifying the instructions to interleave the instructions in the loop according to the modification factor and the static and dynamic characteristics when the instructions satisfy a modification criteria based on the static and dynamic characteristics.

Type: Grant

Filed: May 27, 2004

Date of Patent: May 5, 2009

Assignee: International Business Machines Corporation

Inventors: Roch Georges Archambault, Robert James Blainey, Yaoqing Gao, John David McCalpin, Francis Patrick O'Connell, Pascal Vezolle, Steven Wayne White
Program development supporting apparatus, method, program and recording medium

Patent number: 7516481

Abstract: A program development supporting apparatus that groups a plurality of events each executed in an information processor to divide the events into a plurality of parallel execution units to be executed in parallel with each other has a directional graph acquisition section that acquires directional graph data expressing each of the plurality of events as a vertex and a restriction on the execution order between two of the plurality of events as a directional branch, an inverse chain partial set extraction section that traces the directional branch from each event in the forward direction to extract from the directional graph data an inverse partial set that is a combination of the events having such a relationship that any one of the events cannot be reached from the other events, and a parallel execution unit assignment section that assigns the plurality of events belonging to the inverse partial set to units different from each other in the parallel execution units.

Type: Grant

Filed: December 2, 2004

Date of Patent: April 7, 2009

Assignee: International Business Machines Corporation

Inventor: Toshiyuki Fujikura
System and Method for Advanced Polyhedral Loop Transformations of Source Code in a Compiler

Publication number: 20090083724

Abstract: A system and method for advanced polyhedral loop transformations of source code in a compiler are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.

Type: Application

Filed: September 26, 2007

Publication date: March 26, 2009

Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
PIPELINED PARALLELIZATION OF MULTI-DIMENSIONAL LOOPS WITH MULTIPLE DATA DEPENDENCIES

Publication number: 20090077545

Abstract: A mechanism for folding all the data dependencies in a loop into a single, conservative dependence. This mechanism leads to one pair of synchronization primitives per loop. This mechanism does not require complicated, multi-stage compile time analysis. This mechanism considers only the data dependence information in the loop. The low synchronization cost balances the loss in parallelism due to the reduced overlap between iterations. Additionally, a novel scheme is presented to implement required synchronization to enforce data dependences in a DOACROSS loop. The synchronization is based on an iteration vector, which identifies a spatial position in the iteration space of the loop. Multiple iterations executing in parallel have their own iteration vector for synchronization where they update their position in the iteration space. As no sequential updates to the synchronization variable exist, this method exploits a greater degree of parallelism.

Type: Application

Filed: September 18, 2007

Publication date: March 19, 2009

Inventors: Raul Esteban Silvera, Priya Unnikrishnan
METHOD, SYSTEM AND PROGRAM PRODUCT FOR OPTIMIZING EMULATION OF A SUSPECTED MALWARE

Publication number: 20090077544

Abstract: A method, system and program product for optimizing emulation of a suspected malware. The method includes identifying, using an emulation optimizer tool, whether an instruction in a suspected malware being emulated by an emulation engine in a virtual environment signifies a long loop and, if so, generating a first hash for the loop. Further, the method includes ascertaining whether the first hash generated matches any long loop entries in a storage and, if so calculating a second hash for the long loop. Furthermore, the method includes inspecting any long loop entries ascertained to find an entry having a respective second hash matching the second hash calculated. If an entry matching the second hash calculated is found, the method further includes updating one or more states of the emulation engine, such that, execution of the long loop of the suspected malware is skipped, which optimizes emulation of the suspected malware.

Type: Application

Filed: September 14, 2007

Publication date: March 19, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Ji Yan Wu
Method and apparatus for determining the profitability of expanding unpipelined instructions

Patent number: 7506331

Abstract: A method, apparatus, and computer instructions for processing instructions. A data dependency graph is built. The data dependency graph is analyzed for recurrences, and unpipelined instructions that lie outside of the recurrences are expanded.

Type: Grant

Filed: August 30, 2004

Date of Patent: March 17, 2009

Assignee: International Business Machines Corporation

Inventors: Roch Georges Archambault, Robert Frederick Enenkel, Robert William Hay, Allan Russell Martin, James Lawrence McInnes, Ronald Ian McIntosh, Mark Peter Mendell
Method and apparatus to achieve maximum outer level parallelism of a loop

Publication number: 20090064120

Abstract: In one embodiment, the present invention includes a method for constructing a data dependency graph (DDG) for a loop to be transformed, performing statement shifting to transform the loop into a first transformed loop according to at least one of first and second algorithms, performing unimodular and echelon transformations of a selected one of the first or second transformed loops, partitioning the selected transformed loop to obtain maximum outer level parallelism (MOLP), and partitioning the selected transformed loop into multiple sub-loops. Other embodiments are described and claimed.

Type: Application

Filed: August 30, 2007

Publication date: March 5, 2009

Inventors: Li Liu, Buqi Cheng, Gansha Wu
Systems, Methods, And Computer Products For Compiler Support For Aggressive Safe Load Speculation

Publication number: 20090064119

Abstract: Systems, methods and computer products for compiler support for aggressive safe load speculation. Exemplary embodiments include a method for aggressive safe load speculation for a compiler in a computer system, the method including building a control flow graph, identifying both countable and non-countable loops, gathering a set of candidate loops for load speculation, for each candidate loop in the set of candidate loops gathered for load speculation performing computing an estimate of the iteration count, delay cycles, and code size, performing a profitability analysis and determine an unroll factor based on the delay cycles and the code size, transforming the loop by generating a prologue loop to achieve data alignment and an unrolled main loop with loop directives, indicating which loads can safely be executed speculatively and performing low-level instruction on the generated unrolled main loop.

Type: Application

Filed: August 27, 2007

Publication date: March 5, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Roch G. Archambault, Geoffrey O. Blandy, Roland Froese, Yaoqing Gao, Liangxiao Hu, James L. McInnes, Raul E. Silvera
Eliminate Maximum Operation in Loop Bounds with Loop Versioning

Publication number: 20090055815

Abstract: A method and computer program product for eliminating maximum and minimum expressions within loop bounds are provided. A loop in a code is identified. The loop is determined to meet conditions, which require an upper loop bound and a lower loop bound to contain maximum and minimum expressions, loop-invariant operands, a predetermined size for a code size, and a total number of instructions to be greater than a predetermined constant. A profitability of loop versioning is determined based on a performance gain of a fast version of the loop, a probability of executing the fast version of the loop at runtime, and an overhead for performing loop versioning. A pair of lower loop bound and upper loop bound values resulting in a constant number is identified. A loop iteration value is checked to be a non-zero constant. Branches are identified, and loop versioning is performed to generate a versioned loop.

Type: Application

Filed: August 21, 2007

Publication date: February 26, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Edwin Chan
Conditional compilation of intermediate language code based on current environment

Patent number: 7493604

Abstract: Conditional compilation of intermediate language code based on current environment includes loading intermediate language code on a device. Portions of the intermediate language code are conditionally just-in-time compiled based on a current environment of the device. In accordance with certain aspects, intermediate language code is loaded on a device and a current environment of the device is identified. The intermediate language code is modified based on the current environment, and portions of the modified intermediate language code are just-in-time compiled as needed when running the intermediate language code.

Type: Grant

Filed: October 21, 2004

Date of Patent: February 17, 2009

Assignee: Microsoft Corporation

Inventor: Rico Mariani
Method and apparatus for automatic second-order predictive commoning

Patent number: 7493609

Abstract: A method and apparatus for automatic second-order predictive commoning is provided by the present invention. During an analysis phase, the intermediate representation of a program code is analyzed to identify opportunities for second-order predictive commoning optimization. The analyzed information is used by the present invention for apply transformations to the program code, such that the number of memory access and the number of computations are reduced for loop iterations and performance of program code is improved.

Type: Grant

Filed: August 30, 2004

Date of Patent: February 17, 2009

Assignee: International Business Machines Corporation

Inventors: Arie Tal, Dina Tal
Method and system for auto parallelization of zero-trip loops through induction variable substitution

Patent number: 7487497

Abstract: A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.

Type: Grant

Filed: August 26, 2004

Date of Patent: February 3, 2009

Assignee: International Business Machines Corporation

Inventors: Zhixing Ren, Raul Esteban Silvera, Guansong Zhang
SIMD code generation in the presence of optimized misaligned data reorganization

Patent number: 7478377

Abstract: Generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths, is disclosed. Further, a preferred embodiment of the present invention includes a novel techique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.

Type: Grant

Filed: August 16, 2004

Date of Patent: January 13, 2009

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
SIMD code generation for loops with mixed data lengths

Patent number: 7475392

Abstract: Generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths, is disclosed. Further, a preferred embodiment of the present invention includes a novel technique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. Length conversion operations, for packing and unpacking data values, are included in the alignment handling framework. These operations are formally defined in terms of standard SIMD instructions that are readily available on various SIMD platforms. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.

Type: Grant

Filed: August 16, 2004

Date of Patent: January 6, 2009

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
SCALAR CODE REDUCTION USING SHORTEST PATH ROUTING

Publication number: 20080313621

Abstract: This document discusses, among other things, a system and method computing the shortest path expression in a loop having a plurality of expressions. Candidate expressions in the loop are identified and partitioned into sets. A cost matrix is computed as a function of the sets. Paths are found through the cost matrix and, if there are cycles in the paths, the cycles are broken. One or more shortest path expressions are generated as a function of the paths and one or more of the expressions in the loop are replaced with the shortest path expressions.

Type: Application

Filed: June 15, 2007

Publication date: December 18, 2008

Applicant: Cray Inc.

Inventor: James C. Beyer
SYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT FOR REDUCING NUMBER OF EXCEPTION CHECKS

Publication number: 20080271005

Abstract: Based on operations within an uncounted loop of source code, one or more calculations are generated for determining, at runtime, an expected number of iterations through which the uncounted loop can iterate before encountering an exception corresponding to at least one target exception check. A copy of the uncounted loop omitting each target exception check is generated. The uncounted loop, the copy of the uncounted loop, and the one or more calculations are arranged in compiled code so that at runtime program flow enters the copy of the uncounted loop. If a maximum number of iterations of the copy of the uncounted loop is reached, program flow proceeds from the copy of the uncounted loop to the uncounted loop. The maximum number of iterations is no more than the smallest member of a set consisting of the expected number of iterations for each target exception check.

Type: Application

Filed: April 27, 2007

Publication date: October 30, 2008

Inventor: Mark Graham Stoodley
Method and System for State Machine Translation

Publication number: 20080263524

Abstract: A state machine program is generated from a state machine. The state machine has states, transitions and events. A basic structure for the state machine program is generated. The basic structure has therein a structure that operates in non-final states. A statement is generated within the structure for detecting an event. A statement is generated within the structure for evaluating the detected event based on a current state to identify if the current state is valid for the detected event. A statement is generated within the structure for determining a next state if the current state is valid. A statement is generated within the structure for transitioning the current state to the next state.

Type: Application

Filed: September 9, 2005

Publication date: October 23, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Gregory D. Adams, Jonathan David Bennett, Perry Randolph Giffen, Axel Martens, William Gerald O'Farrell
Tiling across loop nests with possible recomputation

Publication number: 20080250401

Abstract: Described is a technology by which a series of loop nests corresponding to source code are detected by a compiler, with the series of loop nests tiled together, (thereby increasing the ratio of cache hits to misses in a multi-processor environment). The compiler transforms the series of loop nests into a plurality of tile loops within a controller loop, including using dependency analysis to determine which results from a tile loop need to be pre-computed before another tile loop. For dependency analysis, the compiler may use a directed acyclic graph as a high-level intermediate representation, and split the graph into sub-graphs each representing an array. The compiler uses descriptors processed from the graph to determine the controller loop and the tile loops within that controller loop.

Type: Application

Filed: April 9, 2007

Publication date: October 9, 2008

Applicant: Microsoft Corporation

Inventors: Siddhartha Puri, Jaydeep P. Marathe
METHOD AND APPARATUS FOR EXPLOITING THREAD-LEVEL PARALLELISM

Publication number: 20080244549

Abstract: According to one example embodiment, there is disclosed herein uses partial recurrence relaxation for parallelizing DOACROSS loops on multi-core computer architectures. By one example definition, a DOACROSS may be a loop that allows successive iterations executing by overlapping; that is, all iterations must impose a partial execution order. According to one embodiment, the inventive subject matter may be used to transform the dependence structure of a given loop with recurrences for maximal degree of thread-level parallelism (TLP), where the threads can be mapped on to either different logical processors (in a hyperthreaded processor) or can be mapped onto different physical cores (or processors) in a multi-core processor.

Type: Application

Filed: March 31, 2007

Publication date: October 2, 2008

Inventors: Arun Kejariwal, Xinmin Tian, Wei Li, Milind B. Girkar
Continuous trip count profiling for loop optimizations in two-phase dynamic binary translators

Patent number: 7428731

Abstract: A method, machine readable medium, and system are disclosed. In one embodiment the method comprises collecting a loop trip count continuously during runtime of a region of code being executed that contains a loop, categorizing the trip count to identify one or more code modification techniques applicable to the loop, and dynamically applying the one or more applicable code modification techniques to alter the code that relates to the loop.

Type: Grant

Filed: March 31, 2004

Date of Patent: September 23, 2008

Assignee: Intel Corporation

Inventors: Youfeng Wu, Mauricio Breternitz, Jr.
Compiler Method for Employing Multiple Autonomous Synergistic Processors to Simultaneously Operate on Longer Vectors of Data

Publication number: 20080229298

Abstract: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.

Type: Application

Filed: March 15, 2007

Publication date: September 18, 2008

Inventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien, Daniel Arthur Prener
Efficient Code Generation Using Loop Peeling for SIMD Loop Code with Multiple Misaligned Statements

Publication number: 20080222623

Abstract: An approach is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.

Type: Application

Filed: May 16, 2008

Publication date: September 11, 2008

Applicant: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
Optimizing branch condition expressions in a JIT compiler

Patent number: 7421687

Abstract: A Java virtual machine includes a just in time (JIT) Java compiler. The JIT compiler includes at least one optimizer. Each of the at least one optimizer includes logic for recognizing a pattern in a received Java byte code, logic for optimizing the recognized pattern to produce optimized native code and logic for outputting optimized native code. A method of producing optimized native code is also provided.

Type: Grant

Filed: September 9, 2004

Date of Patent: September 2, 2008

Assignee: Sun Microsystems, Inc.

Inventors: Frank N. Yellin, Yin Zin Mark Lam
Runtime quality verification of execution units

Patent number: 7415700

Abstract: One embodiment disclosed relates to a method of compiling a program to be executed on a target microprocessor with multiple execution units of a same type. The method includes selecting one of the execution units for testing and scheduling the parallel execution of program code and diagnostics code. The diagnostic code is scheduled to be executed on the selected execution unit. The program code is scheduled to be executed on remaining execution units of the same type.

Type: Grant

Filed: October 14, 2003

Date of Patent: August 19, 2008

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Ken Gary Pomaranski, Andrew Harvey Barr, Dale John Shidla
Transforming locks in software loops

Patent number: 7404183

Abstract: An improved method and system for acquisition and release of locks within a software program is disclosed. In an exemplary embodiment, a lock within a loop is transformed by relocating acquisition and release instructions from within the loop to positions outside the loop. This may significantly decrease unnecessarily lock acquisition and release during execution of the software program. In order to avoid contention problems which may arise from acquiring and keeping a lock on an object over a relatively long period of time, a contention test may be inserted into the loop. Such a contention test may temporarily release the lock if another thread in the software program requires access to the locked object.

Type: Grant

Filed: May 13, 2004

Date of Patent: July 22, 2008

Assignee: International Business Machines Corporation

Inventors: Nikola Grcevski, Kevin Alexander Stoodley, Mark Graham Stoodley, Vijay Sundaresan
Framework for efficient code generation using loop peeling for SIMD loop code with multiple misaligned statements

Patent number: 7395531

Abstract: A system and method is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.

Type: Grant

Filed: August 16, 2004

Date of Patent: July 1, 2008

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
Macroscalar processor architecture

Patent number: 7395419

Abstract: A macroscalar processor architecture is described herein. In one embodiment, an exemplary processor includes one or more execution units to execute instructions and one or more iteration units coupled to the execution units. The one or more iteration units receive one or more primary instructions of a program loop that comprise a machine executable program. For each of the primary instructions received, at least one of the iteration units generates multiple secondary instructions that correspond to multiple loop iterations of the task of the respective primary instruction when executed by the one or more execution units. Other methods and apparatuses are also described.

Type: Grant

Filed: April 23, 2004

Date of Patent: July 1, 2008

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
Efficient data reorganization to satisfy data alignment constraints

Patent number: 7386842

Abstract: An approach is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In the framework presented herein, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirement of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residue iteration counts, and multiple statements with arbitrary alignment combinations. Beyond generating a valid simdization, a preferred embodiment further improves the quality of the generated codes. Four stream-shift placement policies are disclosed, which minimize the number of data reorganization generated by the alignment handling.

Type: Grant

Filed: June 7, 2004

Date of Patent: June 10, 2008

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Peng Wu
Defining instruction extensions in a standard programming language

Patent number: 7373642

Abstract: A method is provided for modifying a program written in a standard programming language so that when the program is compiled both an executable file is produced and an instruction is programmed into a programmable logic device of a processor system. The method includes identifying a critical code segment of a program, rewriting the critical code segment as a function, revising the program, and compiling the program. Revising the program includes designating the function as code to be compiled by an extension compiler and replacing the critical code segment of the program with a statement that calls the function. Compiling the program includes compiling the code with an extension compiler to produce a header file and the instruction for the programmable logic device. Compiling the program also includes using a standard compiler to compile the remainder of the program together with the header file to generate the executable file.

Type: Grant

Filed: July 29, 2003

Date of Patent: May 13, 2008

Assignee: Stretch, Inc.

Inventors: Kenneth M Williams, Albert Wang
Framework for integrated intra- and inter-loop aggregation of contiguous memory accesses for SIMD vectorization

Patent number: 7367026

Abstract: A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop contains multiple non-stride-one memory accesses that operate over a contiguous stream of memory is disclosed. A preferred embodiment identifies groups of isomorphic statements within a loop body where the isomorphic statements operate over a contiguous stream of memory over the iteration of the loop. Those identified statements are then converted into virtual-length vector operations. Next, the hardware's available vector length is used to determine a number of virtual-length vectors to aggregate into a single vector operation for each iteration of the loop. Finally, the aggregated, vectorized loop code is converted into SIMD operations.

Type: Grant

Filed: August 16, 2004

Date of Patent: April 29, 2008

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
Compiler-driven dynamic memory allocation methodology for scratch-pad based embedded systems

Patent number: 7367024

Abstract: A highly predictable, low overhead and yet dynamic, memory allocation methodology for embedded systems with scratch-pad memory is presented. The dynamic memory allocation methodology for global and stack data (i) accounts for changing program requirements at runtime; (ii) has no software-caching tags; (iii) requires no run-time checks; (iv) has extremely low overheads; and (v) yields 100% predictable memory access times. The methodology provides that for data that is about to be accessed frequently is copied into the SRAM using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary.

Type: Grant

Filed: September 21, 2004

Date of Patent: April 29, 2008

Assignee: University of Maryland

Inventors: Rajeev Kumar Barua, Sumesh Udayakumaran
Tracing the execution path of a computer program

Patent number: 7353505

Abstract: The invention relates to tracing the execution path of a computer program comprising at least one module including a plurality of instructions. At least one of these instructions is a branch instruction. Each branch instruction is identified and evaluated to be one of true and false. An evaluation of true results in a unique identifier being pushed into a predefined area of storage. This unique identifier is associated with the instructions executed as a result of an evaluation of true.

Type: Grant

Filed: September 13, 2001

Date of Patent: April 1, 2008

Assignee: International Business Machines Corporation

Inventor: Anthony John O'Dowd
Scheduling technique for software pipelining

Patent number: 7331045

Abstract: An improved scheduling technique for software pipelining is disclosed which is designed to find schedules requiring fewer processor clock cycles and reduce register pressure hot spots when scheduling multiple groups of instructions (e.g. as represented by multiple sub-graphs of a DDG) which are independent, and substantially identical. The improvement in instruction scheduling and reduction of hot spots is achieved by evenly distributing such groups of instructions around the schedule for a given loop.

Type: Grant

Filed: April 29, 2004

Date of Patent: February 12, 2008

Assignee: International Business Machines Corporation

Inventors: Allan Russell Martin, James Lawrence McInnes
Method, Apparatus, and Program Product for Improving Branch Prediction in a Processor Without Hardware Branch Prediction but Supporting Branch Hint Instruction

Publication number: 20080010635

Abstract: A compiler includes a mechanism for improving branch prediction in a processor that supports a branch hint instruction. The compiler receives a sequence of instructions, wherein the sequence of instructions comprises a loop. This loop sequence employs an hbr instruction to avoid the misprediction penalty of the taken branch to the start of the loop on each loop iteration. However, this penalty will be incurred regardless, on exiting the loop. The compiler inserts a compare and select instruction sequence which dynamically changes the input to the hbr instruction thereby avoiding this penalty when leaving the loop.

Type: Application

Filed: July 7, 2006

Publication date: January 10, 2008

Inventors: John Kevin O'Brien, Kathryn M. O'Brien
Method and apparatus for a generic language interface to apply loop optimization transformations

Patent number: 7318223

Abstract: A generic language interface is provided to apply a number of loop optimization transformations. The language interface includes two new directives. The present invention detects the directives in a computer program, and generates code that has been applied at least one loop transformation based on the directives.

Type: Grant

Filed: August 26, 2004

Date of Patent: January 8, 2008

Assignee: International Business Machines Corporation

Inventors: Robert James Blainey, Arie Tal
System, method, and apparatus for spilling and filling rotating registers in software-pipelined loops

Patent number: 7316012

Abstract: An efficient method for software-pipelining (SWP) of loops to translate programs, from higher level languages into equivalent object or machine language code for execution on a computer. In one example embodiment, this is accomplished by spilling and filling multiple computed values, in a register, that are live across multiple stages in a software-pipelined loop, using multiple rotating stack memory locations to reduce compiler-time of SWP, and complexity of the implemented SWP.

Type: Grant

Filed: September 29, 2003

Date of Patent: January 1, 2008

Assignee: Intel Corporation

Inventor: Kalyan Muthukumar

prev … 3 4 5 6 7 8 9 next