Including Loop Patents (Class 717/160)
  • Publication number: 20090307675
    Abstract: Methods and apparatus to data dependence testing for loop fusion, e.g., with code replication, array contraction, and/or loop interchange, are described. In one embodiment, a compiler may optimize code for efficient execution during run-time by testing for dependencies associated with improving memory locality through code replication in loops that enable various loop transformations. Other embodiments are also described.
    Type: Application
    Filed: June 4, 2008
    Publication date: December 10, 2009
    Inventors: John L. Ng, Rakesh Krishnaiyer, Alexander Y. Ostanevich
  • Publication number: 20090307673
    Abstract: A system and method for domain stretching for an advanced dual-representation polyhedral loop transformation framework are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.
    Type: Application
    Filed: September 26, 2007
    Publication date: December 10, 2009
    Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
  • Publication number: 20090307674
    Abstract: Provided are a method, system, and article of manufacture improving data locality and parallelism by code replication and array contraction. Source code including an array of elements referenced using at least two indices is processed. The array is nested within multiple loops, wherein at least two of the loops perform iterations with respect to the indices of the array, wherein the index incremented in at least one innermost loop of the loops does not comprise a leftmost index in the array. The source code is transformed to object code by performing operations including fusing at least two innermost loops of the loops in object code generated by compiling the source code by replicating statements from at least one of the innermost loops into a fused innermost loop and performing loop interchange in the object code to have the fused innermost loop provide iterations with respect to the leftmost index in the array.
    Type: Application
    Filed: June 4, 2008
    Publication date: December 10, 2009
    Inventors: John L. NG, Alexander Y. OSTANEVICH, Alexander L. SUSHENTSOV
  • Patent number: 7631305
    Abstract: Methods and products for processing a software kernel of instructions are disclosed. The software kernel has stages representing a loop nest. The software kernel is processed by partitioning iterations of an outermost loop into groups with each group representing iterations of the outermost loop, running the software kernel and rotating a register file for each stage of the software kernel preceding an innermost loop to generate code to prepare for filling and executing instructions in software pipelines for a current group, running the software kernel for each stage of the software kernel in the innermost loop to generate code to fill the software pipelines for the current group with the register file being rotated after at least one run of the software kernel for the innermost loop, and repeatedly running the software kernel to unroll inner loops to generate code to further fill the software pipelines for the current group.
    Type: Grant
    Filed: September 20, 2004
    Date of Patent: December 8, 2009
    Assignee: University of Delaware
    Inventors: Hongbo Rong, Guang R. Gao, Alban Douillet, Ramaswamy Govindarajan
  • Publication number: 20090288075
    Abstract: A system and method for speculatively parallelizing non-countable loops in a multi-threaded application. A multi-core processor receives instructions for a multi-threaded application. The application may contain non-countable loops. Non-countable loops have an iteration count value that cannot be determined prior to the execution of the non-countable loop, a loop index value that cannot be non-speculatively determined prior to the execution of an iteration of the non-countable loop, and control that is not transferred out of the loop body by a code line in the loop body. The compiler replaces the non-countable loop with a parallelized loop pattern that uses outlined function calls defined in a parallelization library (PL) in order to speculatively execute iterations of the parallelized loop. The parallelized loop pattern is configured to squash and re-execute any speculative thread of the parallelized loop pattern that is signaled to have a transaction failure.
    Type: Application
    Filed: May 19, 2008
    Publication date: November 19, 2009
    Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
  • Patent number: 7620945
    Abstract: One embodiment of the present invention provides a system that supports parallelized generic reduction operations in a parallel programming language, wherein a reduction operation is an associative operation that can be divided into a group of sub-operations that can execute in parallel. During operation, the system detects generic reduction operations in source code. In doing so, the system identifies a set of reduction variables upon which the generic reduction operation will operate, along with a set of initial values for the variables. The system additionally identifies a merge operation that merges partial results from the parallel generic reduction operations into a final result. The system then compiles the program's source code into a form which facilitates executing the generic reduction operations in parallel.
    Type: Grant
    Filed: August 16, 2005
    Date of Patent: November 17, 2009
    Assignee: Sun Microsystems, Inc.
    Inventors: Yonghong Song, Yuan Lin, Prashanth Narayanaswamy
  • Patent number: 7613599
    Abstract: An integrated design environment (IDE) is disclosed for forming virtual embedded systems. The IDE includes a design language for forming finite state machine models of hardware components that are coupled to simulators of processor cores, preferably instruction set accurate simulators. A software debugger interface permits a software application to be loaded and executed on the virtual embedded system. A virtual test bench may be coupled to the simulation to serve as a human-machine interface. In one embodiment, the IDE is provided as a web-based service for the evaluation, development and procurement phases of an embedded system project. IP components, such as processor cores, may be evaluated using a virtual embedded system. In one embodiment, a virtual embedded system is used as an executable specification for the procurement of a good or service related to an embedded system.
    Type: Grant
    Filed: June 1, 2001
    Date of Patent: November 3, 2009
    Assignee: Synopsys, Inc.
    Inventors: Stephen L Bade, Shay Ben-Chorin, Paul Caamano, Marcelo E Montoreano, Ani Taggu, Filip C Theon, Dean C Wills
  • Publication number: 20090259828
    Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
    Type: Application
    Filed: March 20, 2009
    Publication date: October 15, 2009
    Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy, Jayant B. Kolhe, John Bryan Pormann, Douglas Saylor
  • Patent number: 7594223
    Abstract: A compiler configured for optimizing non-loop memory access instructions of a computer program to form architected memory instructions conforming to a base register auto-incrementing addressing mode. The compiler includes code for obtaining an intermediate stream of code containing pseudo-memory instructions from the non-loop memory access instructions. The intermediate stream of code includes at least one place holder instruction preserving a value associated with a base of a first non-loop memory access instruction even after the first non-loop memory instruction is converted to one of the pseudo-memory instructions. The compiler includes code for converting the intermediate stream of code using the intermediate stream of code and the at least one place holder instruction to obtain the architected memory instructions.
    Type: Grant
    Filed: June 27, 2005
    Date of Patent: September 22, 2009
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Richard E. Hank, Le-Chun Wu
  • Patent number: 7581215
    Abstract: We present a technique to perform dependence analysis on more complex array subscripts than the linear form of the enclosing loop indices. For such complex array subscripts, we decouple the original iteration space and the dependence test iteration space and link them through index-association functions. The dependence analysis is performed in the dependence test iteration space to determine whether the dependence exists in the original iteration space. The dependence distance in the original iteration space is determined by the distance in the dependence test iteration space and the property of index-association functions. For certain non-linear expressions, we show how to transform it to a set of linear expressions equivalently. The latter can be used in dependence test with traditional techniques. We also show how our advanced dependence analysis technique can help parallelize some otherwise hard-to-parallelize loops.
    Type: Grant
    Filed: June 24, 2004
    Date of Patent: August 25, 2009
    Assignee: Sun Microsystems, Inc.
    Inventors: Yonghong Song, Xiangyun Kong
  • Patent number: 7571435
    Abstract: A method (and structure) for executing linear algebra subroutines, includes, for an execution code controlling operation of a floating point unit (FPU) performing the linear algebra subroutine execution, unrolling instructions to preload data into a floating point register (FReg) of the FPU. The unrolling generates an instruction to load data into the FReg and the instruction is inserted into a sequence of instructions that execute the linear algebra subroutine on the FPU.
    Type: Grant
    Filed: September 29, 2003
    Date of Patent: August 4, 2009
    Assignee: International Business Machines Corporation
    Inventors: Fred Gehrung Gustavson, John A. Gunnels
  • Patent number: 7571432
    Abstract: A compiler 58, which is a compiler that realizes program development in a fewer man hours, translates a source program 72 written in a high-level language into a machine language program. This compiler 58 is comprised of: a directive obtainment unit that obtains a directive that a machine language program to be generated should be optimized; a parser unit 76 that parses the source program 72; an intermediate code conversion unit 78 that converts the source program 72 into intermediate codes based on a result of the parsing performed by the parser unit 76; an optimization unit 68 that optimizes the intermediate codes according to the directive; and a code generation unit 90 that converts the intermediate codes into the machine language program. The above directive is a directive to optimize the machine language program targeted at a processor that uses a cache memory.
    Type: Grant
    Filed: September 21, 2004
    Date of Patent: August 4, 2009
    Assignee: Panasonic Corporation
    Inventors: Taketo Heishi, Hajime Ogawa, Yasuhiro Yamamoto, Kyoko Hattori, Shohei Michimoto, Kenji Hattori, Hirotetsu Tomita, Teruo Kawabata, Kiyoshi Nakashima
  • Patent number: 7549146
    Abstract: Techniques for execution-driven loop splitting and load-safe code hosting are provided. Compiled code includes statements associated with an original loop and statements associated with an alternative loop. The alternative loop reproduces the original loop except for conditional load-safe invariant expressions that appeared in the original loop and that are separated out of the alternative loop. During processing, once the conditional load-safe invariant expressions are computed and referenced for a first time within the original loop, processing dynamically switches to the alternative loop where the conditional load-safe invariant expressions are computed outside of the alternative loop and referenced from within the alternative loop.
    Type: Grant
    Filed: June 21, 2005
    Date of Patent: June 16, 2009
    Assignee: Intel Corporation
    Inventors: Xinmin Tian, Milind B. Girkar
  • Patent number: 7546592
    Abstract: A method, computer program product, and a data processing system for scheduling instructions in a data processing system are provided. Dependencies among a plurality of nodes are analyzed to determine if any of the plurality of nodes uses a constrained resource. Each of the plurality of nodes represents an instruction in a set of instructions. A subset of the plurality of nodes is designated as resource-constrained nodes. An attempt is made to generate a schedule with the subset of the plurality of nodes scheduled with priority with respect to any of the plurality of nodes not included in the subset.
    Type: Grant
    Filed: July 21, 2005
    Date of Patent: June 9, 2009
    Assignee: International Business Machines Corporation
    Inventor: Allan Russell Martin
  • Patent number: 7530063
    Abstract: A method and system of modifying instructions forming a loop is provided. A method of modifying instructions forming a loop includes modifying instructions forming a loop including: determining static and dynamic characteristics for the instructions; selecting a modification factor for the instructions based on a number of separate equivalent sections forming a cache in a processor which is processing the instructions; and modifying the instructions to interleave the instructions in the loop according to the modification factor and the static and dynamic characteristics when the instructions satisfy a modification criteria based on the static and dynamic characteristics.
    Type: Grant
    Filed: May 27, 2004
    Date of Patent: May 5, 2009
    Assignee: International Business Machines Corporation
    Inventors: Roch Georges Archambault, Robert James Blainey, Yaoqing Gao, John David McCalpin, Francis Patrick O'Connell, Pascal Vezolle, Steven Wayne White
  • Patent number: 7516481
    Abstract: A program development supporting apparatus that groups a plurality of events each executed in an information processor to divide the events into a plurality of parallel execution units to be executed in parallel with each other has a directional graph acquisition section that acquires directional graph data expressing each of the plurality of events as a vertex and a restriction on the execution order between two of the plurality of events as a directional branch, an inverse chain partial set extraction section that traces the directional branch from each event in the forward direction to extract from the directional graph data an inverse partial set that is a combination of the events having such a relationship that any one of the events cannot be reached from the other events, and a parallel execution unit assignment section that assigns the plurality of events belonging to the inverse partial set to units different from each other in the parallel execution units.
    Type: Grant
    Filed: December 2, 2004
    Date of Patent: April 7, 2009
    Assignee: International Business Machines Corporation
    Inventor: Toshiyuki Fujikura
  • Publication number: 20090083724
    Abstract: A system and method for advanced polyhedral loop transformations of source code in a compiler are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.
    Type: Application
    Filed: September 26, 2007
    Publication date: March 26, 2009
    Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
  • Publication number: 20090077545
    Abstract: A mechanism for folding all the data dependencies in a loop into a single, conservative dependence. This mechanism leads to one pair of synchronization primitives per loop. This mechanism does not require complicated, multi-stage compile time analysis. This mechanism considers only the data dependence information in the loop. The low synchronization cost balances the loss in parallelism due to the reduced overlap between iterations. Additionally, a novel scheme is presented to implement required synchronization to enforce data dependences in a DOACROSS loop. The synchronization is based on an iteration vector, which identifies a spatial position in the iteration space of the loop. Multiple iterations executing in parallel have their own iteration vector for synchronization where they update their position in the iteration space. As no sequential updates to the synchronization variable exist, this method exploits a greater degree of parallelism.
    Type: Application
    Filed: September 18, 2007
    Publication date: March 19, 2009
    Inventors: Raul Esteban Silvera, Priya Unnikrishnan
  • Publication number: 20090077544
    Abstract: A method, system and program product for optimizing emulation of a suspected malware. The method includes identifying, using an emulation optimizer tool, whether an instruction in a suspected malware being emulated by an emulation engine in a virtual environment signifies a long loop and, if so, generating a first hash for the loop. Further, the method includes ascertaining whether the first hash generated matches any long loop entries in a storage and, if so calculating a second hash for the long loop. Furthermore, the method includes inspecting any long loop entries ascertained to find an entry having a respective second hash matching the second hash calculated. If an entry matching the second hash calculated is found, the method further includes updating one or more states of the emulation engine, such that, execution of the long loop of the suspected malware is skipped, which optimizes emulation of the suspected malware.
    Type: Application
    Filed: September 14, 2007
    Publication date: March 19, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Ji Yan Wu
  • Patent number: 7506331
    Abstract: A method, apparatus, and computer instructions for processing instructions. A data dependency graph is built. The data dependency graph is analyzed for recurrences, and unpipelined instructions that lie outside of the recurrences are expanded.
    Type: Grant
    Filed: August 30, 2004
    Date of Patent: March 17, 2009
    Assignee: International Business Machines Corporation
    Inventors: Roch Georges Archambault, Robert Frederick Enenkel, Robert William Hay, Allan Russell Martin, James Lawrence McInnes, Ronald Ian McIntosh, Mark Peter Mendell
  • Publication number: 20090064120
    Abstract: In one embodiment, the present invention includes a method for constructing a data dependency graph (DDG) for a loop to be transformed, performing statement shifting to transform the loop into a first transformed loop according to at least one of first and second algorithms, performing unimodular and echelon transformations of a selected one of the first or second transformed loops, partitioning the selected transformed loop to obtain maximum outer level parallelism (MOLP), and partitioning the selected transformed loop into multiple sub-loops. Other embodiments are described and claimed.
    Type: Application
    Filed: August 30, 2007
    Publication date: March 5, 2009
    Inventors: Li Liu, Buqi Cheng, Gansha Wu
  • Publication number: 20090064119
    Abstract: Systems, methods and computer products for compiler support for aggressive safe load speculation. Exemplary embodiments include a method for aggressive safe load speculation for a compiler in a computer system, the method including building a control flow graph, identifying both countable and non-countable loops, gathering a set of candidate loops for load speculation, for each candidate loop in the set of candidate loops gathered for load speculation performing computing an estimate of the iteration count, delay cycles, and code size, performing a profitability analysis and determine an unroll factor based on the delay cycles and the code size, transforming the loop by generating a prologue loop to achieve data alignment and an unrolled main loop with loop directives, indicating which loads can safely be executed speculatively and performing low-level instruction on the generated unrolled main loop.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Roch G. Archambault, Geoffrey O. Blandy, Roland Froese, Yaoqing Gao, Liangxiao Hu, James L. McInnes, Raul E. Silvera
  • Publication number: 20090055815
    Abstract: A method and computer program product for eliminating maximum and minimum expressions within loop bounds are provided. A loop in a code is identified. The loop is determined to meet conditions, which require an upper loop bound and a lower loop bound to contain maximum and minimum expressions, loop-invariant operands, a predetermined size for a code size, and a total number of instructions to be greater than a predetermined constant. A profitability of loop versioning is determined based on a performance gain of a fast version of the loop, a probability of executing the fast version of the loop at runtime, and an overhead for performing loop versioning. A pair of lower loop bound and upper loop bound values resulting in a constant number is identified. A loop iteration value is checked to be a non-zero constant. Branches are identified, and loop versioning is performed to generate a versioned loop.
    Type: Application
    Filed: August 21, 2007
    Publication date: February 26, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Edwin Chan
  • Patent number: 7493604
    Abstract: Conditional compilation of intermediate language code based on current environment includes loading intermediate language code on a device. Portions of the intermediate language code are conditionally just-in-time compiled based on a current environment of the device. In accordance with certain aspects, intermediate language code is loaded on a device and a current environment of the device is identified. The intermediate language code is modified based on the current environment, and portions of the modified intermediate language code are just-in-time compiled as needed when running the intermediate language code.
    Type: Grant
    Filed: October 21, 2004
    Date of Patent: February 17, 2009
    Assignee: Microsoft Corporation
    Inventor: Rico Mariani
  • Patent number: 7493609
    Abstract: A method and apparatus for automatic second-order predictive commoning is provided by the present invention. During an analysis phase, the intermediate representation of a program code is analyzed to identify opportunities for second-order predictive commoning optimization. The analyzed information is used by the present invention for apply transformations to the program code, such that the number of memory access and the number of computations are reduced for loop iterations and performance of program code is improved.
    Type: Grant
    Filed: August 30, 2004
    Date of Patent: February 17, 2009
    Assignee: International Business Machines Corporation
    Inventors: Arie Tal, Dina Tal
  • Patent number: 7487497
    Abstract: A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.
    Type: Grant
    Filed: August 26, 2004
    Date of Patent: February 3, 2009
    Assignee: International Business Machines Corporation
    Inventors: Zhixing Ren, Raul Esteban Silvera, Guansong Zhang
  • Patent number: 7478377
    Abstract: Generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths, is disclosed. Further, a preferred embodiment of the present invention includes a novel techique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.
    Type: Grant
    Filed: August 16, 2004
    Date of Patent: January 13, 2009
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
  • Patent number: 7475392
    Abstract: Generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths, is disclosed. Further, a preferred embodiment of the present invention includes a novel technique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. Length conversion operations, for packing and unpacking data values, are included in the alignment handling framework. These operations are formally defined in terms of standard SIMD instructions that are readily available on various SIMD platforms. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.
    Type: Grant
    Filed: August 16, 2004
    Date of Patent: January 6, 2009
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
  • Publication number: 20080313621
    Abstract: This document discusses, among other things, a system and method computing the shortest path expression in a loop having a plurality of expressions. Candidate expressions in the loop are identified and partitioned into sets. A cost matrix is computed as a function of the sets. Paths are found through the cost matrix and, if there are cycles in the paths, the cycles are broken. One or more shortest path expressions are generated as a function of the paths and one or more of the expressions in the loop are replaced with the shortest path expressions.
    Type: Application
    Filed: June 15, 2007
    Publication date: December 18, 2008
    Applicant: Cray Inc.
    Inventor: James C. Beyer
  • Publication number: 20080271005
    Abstract: Based on operations within an uncounted loop of source code, one or more calculations are generated for determining, at runtime, an expected number of iterations through which the uncounted loop can iterate before encountering an exception corresponding to at least one target exception check. A copy of the uncounted loop omitting each target exception check is generated. The uncounted loop, the copy of the uncounted loop, and the one or more calculations are arranged in compiled code so that at runtime program flow enters the copy of the uncounted loop. If a maximum number of iterations of the copy of the uncounted loop is reached, program flow proceeds from the copy of the uncounted loop to the uncounted loop. The maximum number of iterations is no more than the smallest member of a set consisting of the expected number of iterations for each target exception check.
    Type: Application
    Filed: April 27, 2007
    Publication date: October 30, 2008
    Inventor: Mark Graham Stoodley
  • Publication number: 20080263524
    Abstract: A state machine program is generated from a state machine. The state machine has states, transitions and events. A basic structure for the state machine program is generated. The basic structure has therein a structure that operates in non-final states. A statement is generated within the structure for detecting an event. A statement is generated within the structure for evaluating the detected event based on a current state to identify if the current state is valid for the detected event. A statement is generated within the structure for determining a next state if the current state is valid. A statement is generated within the structure for transitioning the current state to the next state.
    Type: Application
    Filed: September 9, 2005
    Publication date: October 23, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gregory D. Adams, Jonathan David Bennett, Perry Randolph Giffen, Axel Martens, William Gerald O'Farrell
  • Publication number: 20080250401
    Abstract: Described is a technology by which a series of loop nests corresponding to source code are detected by a compiler, with the series of loop nests tiled together, (thereby increasing the ratio of cache hits to misses in a multi-processor environment). The compiler transforms the series of loop nests into a plurality of tile loops within a controller loop, including using dependency analysis to determine which results from a tile loop need to be pre-computed before another tile loop. For dependency analysis, the compiler may use a directed acyclic graph as a high-level intermediate representation, and split the graph into sub-graphs each representing an array. The compiler uses descriptors processed from the graph to determine the controller loop and the tile loops within that controller loop.
    Type: Application
    Filed: April 9, 2007
    Publication date: October 9, 2008
    Applicant: Microsoft Corporation
    Inventors: Siddhartha Puri, Jaydeep P. Marathe
  • Publication number: 20080244549
    Abstract: According to one example embodiment, there is disclosed herein uses partial recurrence relaxation for parallelizing DOACROSS loops on multi-core computer architectures. By one example definition, a DOACROSS may be a loop that allows successive iterations executing by overlapping; that is, all iterations must impose a partial execution order. According to one embodiment, the inventive subject matter may be used to transform the dependence structure of a given loop with recurrences for maximal degree of thread-level parallelism (TLP), where the threads can be mapped on to either different logical processors (in a hyperthreaded processor) or can be mapped onto different physical cores (or processors) in a multi-core processor.
    Type: Application
    Filed: March 31, 2007
    Publication date: October 2, 2008
    Inventors: Arun Kejariwal, Xinmin Tian, Wei Li, Milind B. Girkar
  • Patent number: 7428731
    Abstract: A method, machine readable medium, and system are disclosed. In one embodiment the method comprises collecting a loop trip count continuously during runtime of a region of code being executed that contains a loop, categorizing the trip count to identify one or more code modification techniques applicable to the loop, and dynamically applying the one or more applicable code modification techniques to alter the code that relates to the loop.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: September 23, 2008
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Mauricio Breternitz, Jr.
  • Publication number: 20080229298
    Abstract: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.
    Type: Application
    Filed: March 15, 2007
    Publication date: September 18, 2008
    Inventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien, Daniel Arthur Prener
  • Publication number: 20080222623
    Abstract: An approach is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.
    Type: Application
    Filed: May 16, 2008
    Publication date: September 11, 2008
    Applicant: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
  • Patent number: 7421687
    Abstract: A Java virtual machine includes a just in time (JIT) Java compiler. The JIT compiler includes at least one optimizer. Each of the at least one optimizer includes logic for recognizing a pattern in a received Java byte code, logic for optimizing the recognized pattern to produce optimized native code and logic for outputting optimized native code. A method of producing optimized native code is also provided.
    Type: Grant
    Filed: September 9, 2004
    Date of Patent: September 2, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Frank N. Yellin, Yin Zin Mark Lam
  • Patent number: 7415700
    Abstract: One embodiment disclosed relates to a method of compiling a program to be executed on a target microprocessor with multiple execution units of a same type. The method includes selecting one of the execution units for testing and scheduling the parallel execution of program code and diagnostics code. The diagnostic code is scheduled to be executed on the selected execution unit. The program code is scheduled to be executed on remaining execution units of the same type.
    Type: Grant
    Filed: October 14, 2003
    Date of Patent: August 19, 2008
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Ken Gary Pomaranski, Andrew Harvey Barr, Dale John Shidla
  • Patent number: 7404183
    Abstract: An improved method and system for acquisition and release of locks within a software program is disclosed. In an exemplary embodiment, a lock within a loop is transformed by relocating acquisition and release instructions from within the loop to positions outside the loop. This may significantly decrease unnecessarily lock acquisition and release during execution of the software program. In order to avoid contention problems which may arise from acquiring and keeping a lock on an object over a relatively long period of time, a contention test may be inserted into the loop. Such a contention test may temporarily release the lock if another thread in the software program requires access to the locked object.
    Type: Grant
    Filed: May 13, 2004
    Date of Patent: July 22, 2008
    Assignee: International Business Machines Corporation
    Inventors: Nikola Grcevski, Kevin Alexander Stoodley, Mark Graham Stoodley, Vijay Sundaresan
  • Patent number: 7395531
    Abstract: A system and method is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.
    Type: Grant
    Filed: August 16, 2004
    Date of Patent: July 1, 2008
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
  • Patent number: 7395419
    Abstract: A macroscalar processor architecture is described herein. In one embodiment, an exemplary processor includes one or more execution units to execute instructions and one or more iteration units coupled to the execution units. The one or more iteration units receive one or more primary instructions of a program loop that comprise a machine executable program. For each of the primary instructions received, at least one of the iteration units generates multiple secondary instructions that correspond to multiple loop iterations of the task of the respective primary instruction when executed by the one or more execution units. Other methods and apparatuses are also described.
    Type: Grant
    Filed: April 23, 2004
    Date of Patent: July 1, 2008
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 7386842
    Abstract: An approach is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In the framework presented herein, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirement of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residue iteration counts, and multiple statements with arbitrary alignment combinations. Beyond generating a valid simdization, a preferred embodiment further improves the quality of the generated codes. Four stream-shift placement policies are disclosed, which minimize the number of data reorganization generated by the alignment handling.
    Type: Grant
    Filed: June 7, 2004
    Date of Patent: June 10, 2008
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Peng Wu
  • Patent number: 7373642
    Abstract: A method is provided for modifying a program written in a standard programming language so that when the program is compiled both an executable file is produced and an instruction is programmed into a programmable logic device of a processor system. The method includes identifying a critical code segment of a program, rewriting the critical code segment as a function, revising the program, and compiling the program. Revising the program includes designating the function as code to be compiled by an extension compiler and replacing the critical code segment of the program with a statement that calls the function. Compiling the program includes compiling the code with an extension compiler to produce a header file and the instruction for the programmable logic device. Compiling the program also includes using a standard compiler to compile the remainder of the program together with the header file to generate the executable file.
    Type: Grant
    Filed: July 29, 2003
    Date of Patent: May 13, 2008
    Assignee: Stretch, Inc.
    Inventors: Kenneth M Williams, Albert Wang
  • Patent number: 7367026
    Abstract: A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop contains multiple non-stride-one memory accesses that operate over a contiguous stream of memory is disclosed. A preferred embodiment identifies groups of isomorphic statements within a loop body where the isomorphic statements operate over a contiguous stream of memory over the iteration of the loop. Those identified statements are then converted into virtual-length vector operations. Next, the hardware's available vector length is used to determine a number of virtual-length vectors to aggregate into a single vector operation for each iteration of the loop. Finally, the aggregated, vectorized loop code is converted into SIMD operations.
    Type: Grant
    Filed: August 16, 2004
    Date of Patent: April 29, 2008
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
  • Patent number: 7367024
    Abstract: A highly predictable, low overhead and yet dynamic, memory allocation methodology for embedded systems with scratch-pad memory is presented. The dynamic memory allocation methodology for global and stack data (i) accounts for changing program requirements at runtime; (ii) has no software-caching tags; (iii) requires no run-time checks; (iv) has extremely low overheads; and (v) yields 100% predictable memory access times. The methodology provides that for data that is about to be accessed frequently is copied into the SRAM using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary.
    Type: Grant
    Filed: September 21, 2004
    Date of Patent: April 29, 2008
    Assignee: University of Maryland
    Inventors: Rajeev Kumar Barua, Sumesh Udayakumaran
  • Patent number: 7353505
    Abstract: The invention relates to tracing the execution path of a computer program comprising at least one module including a plurality of instructions. At least one of these instructions is a branch instruction. Each branch instruction is identified and evaluated to be one of true and false. An evaluation of true results in a unique identifier being pushed into a predefined area of storage. This unique identifier is associated with the instructions executed as a result of an evaluation of true.
    Type: Grant
    Filed: September 13, 2001
    Date of Patent: April 1, 2008
    Assignee: International Business Machines Corporation
    Inventor: Anthony John O'Dowd
  • Patent number: 7331045
    Abstract: An improved scheduling technique for software pipelining is disclosed which is designed to find schedules requiring fewer processor clock cycles and reduce register pressure hot spots when scheduling multiple groups of instructions (e.g. as represented by multiple sub-graphs of a DDG) which are independent, and substantially identical. The improvement in instruction scheduling and reduction of hot spots is achieved by evenly distributing such groups of instructions around the schedule for a given loop.
    Type: Grant
    Filed: April 29, 2004
    Date of Patent: February 12, 2008
    Assignee: International Business Machines Corporation
    Inventors: Allan Russell Martin, James Lawrence McInnes
  • Publication number: 20080010635
    Abstract: A compiler includes a mechanism for improving branch prediction in a processor that supports a branch hint instruction. The compiler receives a sequence of instructions, wherein the sequence of instructions comprises a loop. This loop sequence employs an hbr instruction to avoid the misprediction penalty of the taken branch to the start of the loop on each loop iteration. However, this penalty will be incurred regardless, on exiting the loop. The compiler inserts a compare and select instruction sequence which dynamically changes the input to the hbr instruction thereby avoiding this penalty when leaving the loop.
    Type: Application
    Filed: July 7, 2006
    Publication date: January 10, 2008
    Inventors: John Kevin O'Brien, Kathryn M. O'Brien
  • Patent number: 7318223
    Abstract: A generic language interface is provided to apply a number of loop optimization transformations. The language interface includes two new directives. The present invention detects the directives in a computer program, and generates code that has been applied at least one loop transformation based on the directives.
    Type: Grant
    Filed: August 26, 2004
    Date of Patent: January 8, 2008
    Assignee: International Business Machines Corporation
    Inventors: Robert James Blainey, Arie Tal
  • Patent number: 7316012
    Abstract: An efficient method for software-pipelining (SWP) of loops to translate programs, from higher level languages into equivalent object or machine language code for execution on a computer. In one example embodiment, this is accomplished by spilling and filling multiple computed values, in a register, that are live across multiple stages in a software-pipelined loop, using multiple rotating stack memory locations to reduce compiler-time of SWP, and complexity of the implemented SWP.
    Type: Grant
    Filed: September 29, 2003
    Date of Patent: January 1, 2008
    Assignee: Intel Corporation
    Inventor: Kalyan Muthukumar