Including Loop Patents (Class 717/160)
-
Publication number: 20090307675Abstract: Methods and apparatus to data dependence testing for loop fusion, e.g., with code replication, array contraction, and/or loop interchange, are described. In one embodiment, a compiler may optimize code for efficient execution during run-time by testing for dependencies associated with improving memory locality through code replication in loops that enable various loop transformations. Other embodiments are also described.Type: ApplicationFiled: June 4, 2008Publication date: December 10, 2009Inventors: John L. Ng, Rakesh Krishnaiyer, Alexander Y. Ostanevich
-
Publication number: 20090307673Abstract: A system and method for domain stretching for an advanced dual-representation polyhedral loop transformation framework are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.Type: ApplicationFiled: September 26, 2007Publication date: December 10, 2009Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
-
Publication number: 20090307674Abstract: Provided are a method, system, and article of manufacture improving data locality and parallelism by code replication and array contraction. Source code including an array of elements referenced using at least two indices is processed. The array is nested within multiple loops, wherein at least two of the loops perform iterations with respect to the indices of the array, wherein the index incremented in at least one innermost loop of the loops does not comprise a leftmost index in the array. The source code is transformed to object code by performing operations including fusing at least two innermost loops of the loops in object code generated by compiling the source code by replicating statements from at least one of the innermost loops into a fused innermost loop and performing loop interchange in the object code to have the fused innermost loop provide iterations with respect to the leftmost index in the array.Type: ApplicationFiled: June 4, 2008Publication date: December 10, 2009Inventors: John L. NG, Alexander Y. OSTANEVICH, Alexander L. SUSHENTSOV
-
Patent number: 7631305Abstract: Methods and products for processing a software kernel of instructions are disclosed. The software kernel has stages representing a loop nest. The software kernel is processed by partitioning iterations of an outermost loop into groups with each group representing iterations of the outermost loop, running the software kernel and rotating a register file for each stage of the software kernel preceding an innermost loop to generate code to prepare for filling and executing instructions in software pipelines for a current group, running the software kernel for each stage of the software kernel in the innermost loop to generate code to fill the software pipelines for the current group with the register file being rotated after at least one run of the software kernel for the innermost loop, and repeatedly running the software kernel to unroll inner loops to generate code to further fill the software pipelines for the current group.Type: GrantFiled: September 20, 2004Date of Patent: December 8, 2009Assignee: University of DelawareInventors: Hongbo Rong, Guang R. Gao, Alban Douillet, Ramaswamy Govindarajan
-
Publication number: 20090288075Abstract: A system and method for speculatively parallelizing non-countable loops in a multi-threaded application. A multi-core processor receives instructions for a multi-threaded application. The application may contain non-countable loops. Non-countable loops have an iteration count value that cannot be determined prior to the execution of the non-countable loop, a loop index value that cannot be non-speculatively determined prior to the execution of an iteration of the non-countable loop, and control that is not transferred out of the loop body by a code line in the loop body. The compiler replaces the non-countable loop with a parallelized loop pattern that uses outlined function calls defined in a parallelization library (PL) in order to speculatively execute iterations of the parallelized loop. The parallelized loop pattern is configured to squash and re-execute any speculative thread of the parallelized loop pattern that is signaled to have a transaction failure.Type: ApplicationFiled: May 19, 2008Publication date: November 19, 2009Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
-
Patent number: 7620945Abstract: One embodiment of the present invention provides a system that supports parallelized generic reduction operations in a parallel programming language, wherein a reduction operation is an associative operation that can be divided into a group of sub-operations that can execute in parallel. During operation, the system detects generic reduction operations in source code. In doing so, the system identifies a set of reduction variables upon which the generic reduction operation will operate, along with a set of initial values for the variables. The system additionally identifies a merge operation that merges partial results from the parallel generic reduction operations into a final result. The system then compiles the program's source code into a form which facilitates executing the generic reduction operations in parallel.Type: GrantFiled: August 16, 2005Date of Patent: November 17, 2009Assignee: Sun Microsystems, Inc.Inventors: Yonghong Song, Yuan Lin, Prashanth Narayanaswamy
-
Patent number: 7613599Abstract: An integrated design environment (IDE) is disclosed for forming virtual embedded systems. The IDE includes a design language for forming finite state machine models of hardware components that are coupled to simulators of processor cores, preferably instruction set accurate simulators. A software debugger interface permits a software application to be loaded and executed on the virtual embedded system. A virtual test bench may be coupled to the simulation to serve as a human-machine interface. In one embodiment, the IDE is provided as a web-based service for the evaluation, development and procurement phases of an embedded system project. IP components, such as processor cores, may be evaluated using a virtual embedded system. In one embodiment, a virtual embedded system is used as an executable specification for the procurement of a good or service related to an embedded system.Type: GrantFiled: June 1, 2001Date of Patent: November 3, 2009Assignee: Synopsys, Inc.Inventors: Stephen L Bade, Shay Ben-Chorin, Paul Caamano, Marcelo E Montoreano, Ani Taggu, Filip C Theon, Dean C Wills
-
Publication number: 20090259828Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.Type: ApplicationFiled: March 20, 2009Publication date: October 15, 2009Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy, Jayant B. Kolhe, John Bryan Pormann, Douglas Saylor
-
Patent number: 7594223Abstract: A compiler configured for optimizing non-loop memory access instructions of a computer program to form architected memory instructions conforming to a base register auto-incrementing addressing mode. The compiler includes code for obtaining an intermediate stream of code containing pseudo-memory instructions from the non-loop memory access instructions. The intermediate stream of code includes at least one place holder instruction preserving a value associated with a base of a first non-loop memory access instruction even after the first non-loop memory instruction is converted to one of the pseudo-memory instructions. The compiler includes code for converting the intermediate stream of code using the intermediate stream of code and the at least one place holder instruction to obtain the architected memory instructions.Type: GrantFiled: June 27, 2005Date of Patent: September 22, 2009Assignee: Hewlett-Packard Development Company, L.P.Inventors: Richard E. Hank, Le-Chun Wu
-
Patent number: 7581215Abstract: We present a technique to perform dependence analysis on more complex array subscripts than the linear form of the enclosing loop indices. For such complex array subscripts, we decouple the original iteration space and the dependence test iteration space and link them through index-association functions. The dependence analysis is performed in the dependence test iteration space to determine whether the dependence exists in the original iteration space. The dependence distance in the original iteration space is determined by the distance in the dependence test iteration space and the property of index-association functions. For certain non-linear expressions, we show how to transform it to a set of linear expressions equivalently. The latter can be used in dependence test with traditional techniques. We also show how our advanced dependence analysis technique can help parallelize some otherwise hard-to-parallelize loops.Type: GrantFiled: June 24, 2004Date of Patent: August 25, 2009Assignee: Sun Microsystems, Inc.Inventors: Yonghong Song, Xiangyun Kong
-
Patent number: 7571435Abstract: A method (and structure) for executing linear algebra subroutines, includes, for an execution code controlling operation of a floating point unit (FPU) performing the linear algebra subroutine execution, unrolling instructions to preload data into a floating point register (FReg) of the FPU. The unrolling generates an instruction to load data into the FReg and the instruction is inserted into a sequence of instructions that execute the linear algebra subroutine on the FPU.Type: GrantFiled: September 29, 2003Date of Patent: August 4, 2009Assignee: International Business Machines CorporationInventors: Fred Gehrung Gustavson, John A. Gunnels
-
Patent number: 7571432Abstract: A compiler 58, which is a compiler that realizes program development in a fewer man hours, translates a source program 72 written in a high-level language into a machine language program. This compiler 58 is comprised of: a directive obtainment unit that obtains a directive that a machine language program to be generated should be optimized; a parser unit 76 that parses the source program 72; an intermediate code conversion unit 78 that converts the source program 72 into intermediate codes based on a result of the parsing performed by the parser unit 76; an optimization unit 68 that optimizes the intermediate codes according to the directive; and a code generation unit 90 that converts the intermediate codes into the machine language program. The above directive is a directive to optimize the machine language program targeted at a processor that uses a cache memory.Type: GrantFiled: September 21, 2004Date of Patent: August 4, 2009Assignee: Panasonic CorporationInventors: Taketo Heishi, Hajime Ogawa, Yasuhiro Yamamoto, Kyoko Hattori, Shohei Michimoto, Kenji Hattori, Hirotetsu Tomita, Teruo Kawabata, Kiyoshi Nakashima
-
Patent number: 7549146Abstract: Techniques for execution-driven loop splitting and load-safe code hosting are provided. Compiled code includes statements associated with an original loop and statements associated with an alternative loop. The alternative loop reproduces the original loop except for conditional load-safe invariant expressions that appeared in the original loop and that are separated out of the alternative loop. During processing, once the conditional load-safe invariant expressions are computed and referenced for a first time within the original loop, processing dynamically switches to the alternative loop where the conditional load-safe invariant expressions are computed outside of the alternative loop and referenced from within the alternative loop.Type: GrantFiled: June 21, 2005Date of Patent: June 16, 2009Assignee: Intel CorporationInventors: Xinmin Tian, Milind B. Girkar
-
Patent number: 7546592Abstract: A method, computer program product, and a data processing system for scheduling instructions in a data processing system are provided. Dependencies among a plurality of nodes are analyzed to determine if any of the plurality of nodes uses a constrained resource. Each of the plurality of nodes represents an instruction in a set of instructions. A subset of the plurality of nodes is designated as resource-constrained nodes. An attempt is made to generate a schedule with the subset of the plurality of nodes scheduled with priority with respect to any of the plurality of nodes not included in the subset.Type: GrantFiled: July 21, 2005Date of Patent: June 9, 2009Assignee: International Business Machines CorporationInventor: Allan Russell Martin
-
Patent number: 7530063Abstract: A method and system of modifying instructions forming a loop is provided. A method of modifying instructions forming a loop includes modifying instructions forming a loop including: determining static and dynamic characteristics for the instructions; selecting a modification factor for the instructions based on a number of separate equivalent sections forming a cache in a processor which is processing the instructions; and modifying the instructions to interleave the instructions in the loop according to the modification factor and the static and dynamic characteristics when the instructions satisfy a modification criteria based on the static and dynamic characteristics.Type: GrantFiled: May 27, 2004Date of Patent: May 5, 2009Assignee: International Business Machines CorporationInventors: Roch Georges Archambault, Robert James Blainey, Yaoqing Gao, John David McCalpin, Francis Patrick O'Connell, Pascal Vezolle, Steven Wayne White
-
Patent number: 7516481Abstract: A program development supporting apparatus that groups a plurality of events each executed in an information processor to divide the events into a plurality of parallel execution units to be executed in parallel with each other has a directional graph acquisition section that acquires directional graph data expressing each of the plurality of events as a vertex and a restriction on the execution order between two of the plurality of events as a directional branch, an inverse chain partial set extraction section that traces the directional branch from each event in the forward direction to extract from the directional graph data an inverse partial set that is a combination of the events having such a relationship that any one of the events cannot be reached from the other events, and a parallel execution unit assignment section that assigns the plurality of events belonging to the inverse partial set to units different from each other in the parallel execution units.Type: GrantFiled: December 2, 2004Date of Patent: April 7, 2009Assignee: International Business Machines CorporationInventor: Toshiyuki Fujikura
-
Publication number: 20090083724Abstract: A system and method for advanced polyhedral loop transformations of source code in a compiler are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.Type: ApplicationFiled: September 26, 2007Publication date: March 26, 2009Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
-
Publication number: 20090077545Abstract: A mechanism for folding all the data dependencies in a loop into a single, conservative dependence. This mechanism leads to one pair of synchronization primitives per loop. This mechanism does not require complicated, multi-stage compile time analysis. This mechanism considers only the data dependence information in the loop. The low synchronization cost balances the loss in parallelism due to the reduced overlap between iterations. Additionally, a novel scheme is presented to implement required synchronization to enforce data dependences in a DOACROSS loop. The synchronization is based on an iteration vector, which identifies a spatial position in the iteration space of the loop. Multiple iterations executing in parallel have their own iteration vector for synchronization where they update their position in the iteration space. As no sequential updates to the synchronization variable exist, this method exploits a greater degree of parallelism.Type: ApplicationFiled: September 18, 2007Publication date: March 19, 2009Inventors: Raul Esteban Silvera, Priya Unnikrishnan
-
Publication number: 20090077544Abstract: A method, system and program product for optimizing emulation of a suspected malware. The method includes identifying, using an emulation optimizer tool, whether an instruction in a suspected malware being emulated by an emulation engine in a virtual environment signifies a long loop and, if so, generating a first hash for the loop. Further, the method includes ascertaining whether the first hash generated matches any long loop entries in a storage and, if so calculating a second hash for the long loop. Furthermore, the method includes inspecting any long loop entries ascertained to find an entry having a respective second hash matching the second hash calculated. If an entry matching the second hash calculated is found, the method further includes updating one or more states of the emulation engine, such that, execution of the long loop of the suspected malware is skipped, which optimizes emulation of the suspected malware.Type: ApplicationFiled: September 14, 2007Publication date: March 19, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Ji Yan Wu
-
Patent number: 7506331Abstract: A method, apparatus, and computer instructions for processing instructions. A data dependency graph is built. The data dependency graph is analyzed for recurrences, and unpipelined instructions that lie outside of the recurrences are expanded.Type: GrantFiled: August 30, 2004Date of Patent: March 17, 2009Assignee: International Business Machines CorporationInventors: Roch Georges Archambault, Robert Frederick Enenkel, Robert William Hay, Allan Russell Martin, James Lawrence McInnes, Ronald Ian McIntosh, Mark Peter Mendell
-
Publication number: 20090064120Abstract: In one embodiment, the present invention includes a method for constructing a data dependency graph (DDG) for a loop to be transformed, performing statement shifting to transform the loop into a first transformed loop according to at least one of first and second algorithms, performing unimodular and echelon transformations of a selected one of the first or second transformed loops, partitioning the selected transformed loop to obtain maximum outer level parallelism (MOLP), and partitioning the selected transformed loop into multiple sub-loops. Other embodiments are described and claimed.Type: ApplicationFiled: August 30, 2007Publication date: March 5, 2009Inventors: Li Liu, Buqi Cheng, Gansha Wu
-
Publication number: 20090064119Abstract: Systems, methods and computer products for compiler support for aggressive safe load speculation. Exemplary embodiments include a method for aggressive safe load speculation for a compiler in a computer system, the method including building a control flow graph, identifying both countable and non-countable loops, gathering a set of candidate loops for load speculation, for each candidate loop in the set of candidate loops gathered for load speculation performing computing an estimate of the iteration count, delay cycles, and code size, performing a profitability analysis and determine an unroll factor based on the delay cycles and the code size, transforming the loop by generating a prologue loop to achieve data alignment and an unrolled main loop with loop directives, indicating which loads can safely be executed speculatively and performing low-level instruction on the generated unrolled main loop.Type: ApplicationFiled: August 27, 2007Publication date: March 5, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Roch G. Archambault, Geoffrey O. Blandy, Roland Froese, Yaoqing Gao, Liangxiao Hu, James L. McInnes, Raul E. Silvera
-
Publication number: 20090055815Abstract: A method and computer program product for eliminating maximum and minimum expressions within loop bounds are provided. A loop in a code is identified. The loop is determined to meet conditions, which require an upper loop bound and a lower loop bound to contain maximum and minimum expressions, loop-invariant operands, a predetermined size for a code size, and a total number of instructions to be greater than a predetermined constant. A profitability of loop versioning is determined based on a performance gain of a fast version of the loop, a probability of executing the fast version of the loop at runtime, and an overhead for performing loop versioning. A pair of lower loop bound and upper loop bound values resulting in a constant number is identified. A loop iteration value is checked to be a non-zero constant. Branches are identified, and loop versioning is performed to generate a versioned loop.Type: ApplicationFiled: August 21, 2007Publication date: February 26, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Edwin Chan
-
Patent number: 7493604Abstract: Conditional compilation of intermediate language code based on current environment includes loading intermediate language code on a device. Portions of the intermediate language code are conditionally just-in-time compiled based on a current environment of the device. In accordance with certain aspects, intermediate language code is loaded on a device and a current environment of the device is identified. The intermediate language code is modified based on the current environment, and portions of the modified intermediate language code are just-in-time compiled as needed when running the intermediate language code.Type: GrantFiled: October 21, 2004Date of Patent: February 17, 2009Assignee: Microsoft CorporationInventor: Rico Mariani
-
Patent number: 7493609Abstract: A method and apparatus for automatic second-order predictive commoning is provided by the present invention. During an analysis phase, the intermediate representation of a program code is analyzed to identify opportunities for second-order predictive commoning optimization. The analyzed information is used by the present invention for apply transformations to the program code, such that the number of memory access and the number of computations are reduced for loop iterations and performance of program code is improved.Type: GrantFiled: August 30, 2004Date of Patent: February 17, 2009Assignee: International Business Machines CorporationInventors: Arie Tal, Dina Tal
-
Patent number: 7487497Abstract: A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.Type: GrantFiled: August 26, 2004Date of Patent: February 3, 2009Assignee: International Business Machines CorporationInventors: Zhixing Ren, Raul Esteban Silvera, Guansong Zhang
-
Patent number: 7478377Abstract: Generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths, is disclosed. Further, a preferred embodiment of the present invention includes a novel techique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.Type: GrantFiled: August 16, 2004Date of Patent: January 13, 2009Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
-
Patent number: 7475392Abstract: Generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths, is disclosed. Further, a preferred embodiment of the present invention includes a novel technique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. Length conversion operations, for packing and unpacking data values, are included in the alignment handling framework. These operations are formally defined in terms of standard SIMD instructions that are readily available on various SIMD platforms. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.Type: GrantFiled: August 16, 2004Date of Patent: January 6, 2009Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
-
Publication number: 20080313621Abstract: This document discusses, among other things, a system and method computing the shortest path expression in a loop having a plurality of expressions. Candidate expressions in the loop are identified and partitioned into sets. A cost matrix is computed as a function of the sets. Paths are found through the cost matrix and, if there are cycles in the paths, the cycles are broken. One or more shortest path expressions are generated as a function of the paths and one or more of the expressions in the loop are replaced with the shortest path expressions.Type: ApplicationFiled: June 15, 2007Publication date: December 18, 2008Applicant: Cray Inc.Inventor: James C. Beyer
-
Publication number: 20080271005Abstract: Based on operations within an uncounted loop of source code, one or more calculations are generated for determining, at runtime, an expected number of iterations through which the uncounted loop can iterate before encountering an exception corresponding to at least one target exception check. A copy of the uncounted loop omitting each target exception check is generated. The uncounted loop, the copy of the uncounted loop, and the one or more calculations are arranged in compiled code so that at runtime program flow enters the copy of the uncounted loop. If a maximum number of iterations of the copy of the uncounted loop is reached, program flow proceeds from the copy of the uncounted loop to the uncounted loop. The maximum number of iterations is no more than the smallest member of a set consisting of the expected number of iterations for each target exception check.Type: ApplicationFiled: April 27, 2007Publication date: October 30, 2008Inventor: Mark Graham Stoodley
-
Publication number: 20080263524Abstract: A state machine program is generated from a state machine. The state machine has states, transitions and events. A basic structure for the state machine program is generated. The basic structure has therein a structure that operates in non-final states. A statement is generated within the structure for detecting an event. A statement is generated within the structure for evaluating the detected event based on a current state to identify if the current state is valid for the detected event. A statement is generated within the structure for determining a next state if the current state is valid. A statement is generated within the structure for transitioning the current state to the next state.Type: ApplicationFiled: September 9, 2005Publication date: October 23, 2008Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Gregory D. Adams, Jonathan David Bennett, Perry Randolph Giffen, Axel Martens, William Gerald O'Farrell
-
Publication number: 20080250401Abstract: Described is a technology by which a series of loop nests corresponding to source code are detected by a compiler, with the series of loop nests tiled together, (thereby increasing the ratio of cache hits to misses in a multi-processor environment). The compiler transforms the series of loop nests into a plurality of tile loops within a controller loop, including using dependency analysis to determine which results from a tile loop need to be pre-computed before another tile loop. For dependency analysis, the compiler may use a directed acyclic graph as a high-level intermediate representation, and split the graph into sub-graphs each representing an array. The compiler uses descriptors processed from the graph to determine the controller loop and the tile loops within that controller loop.Type: ApplicationFiled: April 9, 2007Publication date: October 9, 2008Applicant: Microsoft CorporationInventors: Siddhartha Puri, Jaydeep P. Marathe
-
Publication number: 20080244549Abstract: According to one example embodiment, there is disclosed herein uses partial recurrence relaxation for parallelizing DOACROSS loops on multi-core computer architectures. By one example definition, a DOACROSS may be a loop that allows successive iterations executing by overlapping; that is, all iterations must impose a partial execution order. According to one embodiment, the inventive subject matter may be used to transform the dependence structure of a given loop with recurrences for maximal degree of thread-level parallelism (TLP), where the threads can be mapped on to either different logical processors (in a hyperthreaded processor) or can be mapped onto different physical cores (or processors) in a multi-core processor.Type: ApplicationFiled: March 31, 2007Publication date: October 2, 2008Inventors: Arun Kejariwal, Xinmin Tian, Wei Li, Milind B. Girkar
-
Patent number: 7428731Abstract: A method, machine readable medium, and system are disclosed. In one embodiment the method comprises collecting a loop trip count continuously during runtime of a region of code being executed that contains a loop, categorizing the trip count to identify one or more code modification techniques applicable to the loop, and dynamically applying the one or more applicable code modification techniques to alter the code that relates to the loop.Type: GrantFiled: March 31, 2004Date of Patent: September 23, 2008Assignee: Intel CorporationInventors: Youfeng Wu, Mauricio Breternitz, Jr.
-
Publication number: 20080229298Abstract: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.Type: ApplicationFiled: March 15, 2007Publication date: September 18, 2008Inventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien, Daniel Arthur Prener
-
Publication number: 20080222623Abstract: An approach is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.Type: ApplicationFiled: May 16, 2008Publication date: September 11, 2008Applicant: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
-
Patent number: 7421687Abstract: A Java virtual machine includes a just in time (JIT) Java compiler. The JIT compiler includes at least one optimizer. Each of the at least one optimizer includes logic for recognizing a pattern in a received Java byte code, logic for optimizing the recognized pattern to produce optimized native code and logic for outputting optimized native code. A method of producing optimized native code is also provided.Type: GrantFiled: September 9, 2004Date of Patent: September 2, 2008Assignee: Sun Microsystems, Inc.Inventors: Frank N. Yellin, Yin Zin Mark Lam
-
Patent number: 7415700Abstract: One embodiment disclosed relates to a method of compiling a program to be executed on a target microprocessor with multiple execution units of a same type. The method includes selecting one of the execution units for testing and scheduling the parallel execution of program code and diagnostics code. The diagnostic code is scheduled to be executed on the selected execution unit. The program code is scheduled to be executed on remaining execution units of the same type.Type: GrantFiled: October 14, 2003Date of Patent: August 19, 2008Assignee: Hewlett-Packard Development Company, L.P.Inventors: Ken Gary Pomaranski, Andrew Harvey Barr, Dale John Shidla
-
Patent number: 7404183Abstract: An improved method and system for acquisition and release of locks within a software program is disclosed. In an exemplary embodiment, a lock within a loop is transformed by relocating acquisition and release instructions from within the loop to positions outside the loop. This may significantly decrease unnecessarily lock acquisition and release during execution of the software program. In order to avoid contention problems which may arise from acquiring and keeping a lock on an object over a relatively long period of time, a contention test may be inserted into the loop. Such a contention test may temporarily release the lock if another thread in the software program requires access to the locked object.Type: GrantFiled: May 13, 2004Date of Patent: July 22, 2008Assignee: International Business Machines CorporationInventors: Nikola Grcevski, Kevin Alexander Stoodley, Mark Graham Stoodley, Vijay Sundaresan
-
Patent number: 7395531Abstract: A system and method is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.Type: GrantFiled: August 16, 2004Date of Patent: July 1, 2008Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
-
Patent number: 7395419Abstract: A macroscalar processor architecture is described herein. In one embodiment, an exemplary processor includes one or more execution units to execute instructions and one or more iteration units coupled to the execution units. The one or more iteration units receive one or more primary instructions of a program loop that comprise a machine executable program. For each of the primary instructions received, at least one of the iteration units generates multiple secondary instructions that correspond to multiple loop iterations of the task of the respective primary instruction when executed by the one or more execution units. Other methods and apparatuses are also described.Type: GrantFiled: April 23, 2004Date of Patent: July 1, 2008Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Patent number: 7386842Abstract: An approach is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In the framework presented herein, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirement of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residue iteration counts, and multiple statements with arbitrary alignment combinations. Beyond generating a valid simdization, a preferred embodiment further improves the quality of the generated codes. Four stream-shift placement policies are disclosed, which minimize the number of data reorganization generated by the alignment handling.Type: GrantFiled: June 7, 2004Date of Patent: June 10, 2008Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Peng Wu
-
Patent number: 7373642Abstract: A method is provided for modifying a program written in a standard programming language so that when the program is compiled both an executable file is produced and an instruction is programmed into a programmable logic device of a processor system. The method includes identifying a critical code segment of a program, rewriting the critical code segment as a function, revising the program, and compiling the program. Revising the program includes designating the function as code to be compiled by an extension compiler and replacing the critical code segment of the program with a statement that calls the function. Compiling the program includes compiling the code with an extension compiler to produce a header file and the instruction for the programmable logic device. Compiling the program also includes using a standard compiler to compile the remainder of the program together with the header file to generate the executable file.Type: GrantFiled: July 29, 2003Date of Patent: May 13, 2008Assignee: Stretch, Inc.Inventors: Kenneth M Williams, Albert Wang
-
Patent number: 7367026Abstract: A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop contains multiple non-stride-one memory accesses that operate over a contiguous stream of memory is disclosed. A preferred embodiment identifies groups of isomorphic statements within a loop body where the isomorphic statements operate over a contiguous stream of memory over the iteration of the loop. Those identified statements are then converted into virtual-length vector operations. Next, the hardware's available vector length is used to determine a number of virtual-length vectors to aggregate into a single vector operation for each iteration of the loop. Finally, the aggregated, vectorized loop code is converted into SIMD operations.Type: GrantFiled: August 16, 2004Date of Patent: April 29, 2008Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
-
Patent number: 7367024Abstract: A highly predictable, low overhead and yet dynamic, memory allocation methodology for embedded systems with scratch-pad memory is presented. The dynamic memory allocation methodology for global and stack data (i) accounts for changing program requirements at runtime; (ii) has no software-caching tags; (iii) requires no run-time checks; (iv) has extremely low overheads; and (v) yields 100% predictable memory access times. The methodology provides that for data that is about to be accessed frequently is copied into the SRAM using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary.Type: GrantFiled: September 21, 2004Date of Patent: April 29, 2008Assignee: University of MarylandInventors: Rajeev Kumar Barua, Sumesh Udayakumaran
-
Patent number: 7353505Abstract: The invention relates to tracing the execution path of a computer program comprising at least one module including a plurality of instructions. At least one of these instructions is a branch instruction. Each branch instruction is identified and evaluated to be one of true and false. An evaluation of true results in a unique identifier being pushed into a predefined area of storage. This unique identifier is associated with the instructions executed as a result of an evaluation of true.Type: GrantFiled: September 13, 2001Date of Patent: April 1, 2008Assignee: International Business Machines CorporationInventor: Anthony John O'Dowd
-
Patent number: 7331045Abstract: An improved scheduling technique for software pipelining is disclosed which is designed to find schedules requiring fewer processor clock cycles and reduce register pressure hot spots when scheduling multiple groups of instructions (e.g. as represented by multiple sub-graphs of a DDG) which are independent, and substantially identical. The improvement in instruction scheduling and reduction of hot spots is achieved by evenly distributing such groups of instructions around the schedule for a given loop.Type: GrantFiled: April 29, 2004Date of Patent: February 12, 2008Assignee: International Business Machines CorporationInventors: Allan Russell Martin, James Lawrence McInnes
-
Publication number: 20080010635Abstract: A compiler includes a mechanism for improving branch prediction in a processor that supports a branch hint instruction. The compiler receives a sequence of instructions, wherein the sequence of instructions comprises a loop. This loop sequence employs an hbr instruction to avoid the misprediction penalty of the taken branch to the start of the loop on each loop iteration. However, this penalty will be incurred regardless, on exiting the loop. The compiler inserts a compare and select instruction sequence which dynamically changes the input to the hbr instruction thereby avoiding this penalty when leaving the loop.Type: ApplicationFiled: July 7, 2006Publication date: January 10, 2008Inventors: John Kevin O'Brien, Kathryn M. O'Brien
-
Patent number: 7318223Abstract: A generic language interface is provided to apply a number of loop optimization transformations. The language interface includes two new directives. The present invention detects the directives in a computer program, and generates code that has been applied at least one loop transformation based on the directives.Type: GrantFiled: August 26, 2004Date of Patent: January 8, 2008Assignee: International Business Machines CorporationInventors: Robert James Blainey, Arie Tal
-
Patent number: 7316012Abstract: An efficient method for software-pipelining (SWP) of loops to translate programs, from higher level languages into equivalent object or machine language code for execution on a computer. In one example embodiment, this is accomplished by spilling and filling multiple computed values, in a register, that are live across multiple stages in a software-pipelined loop, using multiple rotating stack memory locations to reduce compiler-time of SWP, and complexity of the implemented SWP.Type: GrantFiled: September 29, 2003Date of Patent: January 1, 2008Assignee: Intel CorporationInventor: Kalyan Muthukumar