Including Loop Patents (Class 717/160)
  • Patent number: 8319774
    Abstract: Embodiments of the present disclosure are directed to graphics processing systems, comprising: a plurality of execution units, wherein one of the execution units is configurable to process a thread corresponding to a rendering context, wherein the rendering context comprises a plurality of constants with a priority level; a constant buffer configurable to store the constants of the rendering context into a plurality of slot in a physical storage space; and an execution unit control unit configurable to assign the thread to one of the execution units; a constant buffer control unit providing a translation table for the rendering context to map the corresponding constants into the slots of the physical storage space. Comparable methods are also disclosed.
    Type: Grant
    Filed: November 29, 2011
    Date of Patent: November 27, 2012
    Assignee: Via Technologies, Inc.
    Inventors: Yang (Jeff) Jiao, Yijung Su, John Brothers
  • Patent number: 8312442
    Abstract: A computing system has an amount of shared cache, and performs runtime automatic parallelization wherein when a parallelized loop is encountered, a main thread shares the workload with at least one other non-main thread. A method for providing interprocedural prefetching includes compiling source code to produce compiled code having a main thread including a parallelized loop. Prior to the parallelized loop in the main thread, the main thread includes prefetching instructions for the at least one other non-main thread that shares the workload of the parallelized loop. As a result, the main thread prefetches data into the shared cache for use by the at least one other non-main thread.
    Type: Grant
    Filed: December 10, 2008
    Date of Patent: November 13, 2012
    Assignee: Oracle America, Inc.
    Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
  • Patent number: 8296750
    Abstract: A method and apparatus for optimizing a target program including a pattern of instructions to be replaced. The method is performed by execution of program code by a processor of an information processing apparatus that includes an output device and a computer readable storage medium storing the program code. At least one transformation is performed on the target program to generate a transformed target subprogram in which dependencies among the instructions included in the target subprogram are matched with dependencies in the pattern to be replaced. The transformed target subprogram is replaced, with a post-replacement instruction stream determined to correspond to the pattern to be replaced, to generate a replaced target subprogram. An optimized target program that includes the replaced target subprogram is outputted to the output device. The at least one transformation includes a first transformation, a loop transformation, or both the first transformation and the loop transformation.
    Type: Grant
    Filed: September 12, 2007
    Date of Patent: October 23, 2012
    Assignee: International Business Machines Corporation
    Inventor: Motohiro Kawahito
  • Patent number: 8291396
    Abstract: Various high-level languages are used to specify hardware designs on programmable chips. The high-level language programs include pointer operations that may have same iteration and future iteration dependencies. Single loop iteration pointer dependencies are considered when memory accesses are assigned to clock cycles. Multiple loop iteration pointer dependencies are considered when determining how often new data can be entered into the generated hardware pipeline without causing memory corruption. A buffer can be used to forward data from a memory write to a future read.
    Type: Grant
    Filed: September 18, 2006
    Date of Patent: October 16, 2012
    Assignee: Altera Corporation
    Inventors: David James Lau, Jeffrey Orion Pritchard, Philippe Molson
  • Patent number: 8276134
    Abstract: An improved system and computer programming product for acquisition and release of locks within a software program is disclosed. In an exemplary embodiment, a lock within a loop is transformed by relocating acquisition and release instructions from within the loop to positions outside the loop. This may significantly decrease unnecessarily lock acquisition and release during execution of the software program. In order to avoid contention problems which may arise from acquiring and keeping a lock on an object over a relatively long period of time, a contention test may be inserted into the loop. Such a contention test may temporarily release the lock if another thread in the software program requires access to the locked object.
    Type: Grant
    Filed: June 9, 2008
    Date of Patent: September 25, 2012
    Assignee: International Business Machines Corporation
    Inventors: Nikola Grcevski, Kevin Alexander Stoodley, Mark Graham Stoodley, Vijay Sundaresan
  • Patent number: 8271954
    Abstract: A save data discrimination method saves calculation results including an element which is periodically saved when a computer executes a program repeating the same arithmetic process. The method includes analyzing a loop structure of the program from a source code of the program to detect a main loop of the arithmetic process repeated in the program and a sub-loop included in the main loop, determining a point of entrance to the main loop as a checkpoint that is a point for saving data of the calculation results, and analyzing the contents of the arithmetic process described in the main loop to identify reference-first elements which are elements only referred to and elements defined after being referred to as data to be saved at the checkpoint determined at the point of entrance.
    Type: Grant
    Filed: March 10, 2008
    Date of Patent: September 18, 2012
    Assignee: Fujitsu Limited
    Inventor: Akira Hosoi
  • Patent number: 8261257
    Abstract: A host system includes an operating system having a user space and a kernel space with a memory. A device driver performs download cycles to download a firmware file from the user space to the memory. The download cycles are performed based on blocks of data remaining in the user space and not downloaded from the user space. The device driver: transfers a first block of data to a first segment of the memory; transfers a second block of data from the user space to a second segment of the memory; copies the first block into the second segment; and appends the first block to the second block to form a combined block. The first block is transferred from the user space to the first segment during a first download cycle. The first block is transferred from a second segment to the first segment during a second download cycle.
    Type: Grant
    Filed: October 24, 2011
    Date of Patent: September 4, 2012
    Assignee: Marvell International Ltd.
    Inventors: Frank Huang, Xiaohua Luo, Robert Lee, James Jan, Zheng Cao
  • Patent number: 8261249
    Abstract: Embodiments of the invention provide a method for deploying and running an application on a massively parallel computer system, while minimizing the costs associated with latency, bandwidth, and limited memory resources. The executable code of a program may be divided into multiple code fragments and distributed to different compute nodes of a parallel computing system. During program execution, one compute node may fetch code fragments from other compute nodes as necessary.
    Type: Grant
    Filed: January 8, 2008
    Date of Patent: September 4, 2012
    Assignee: International Business Machines Corporation
    Inventors: Charles Jens Archer, Thomas Michael Gooding, Ruth Janine Poole, Albert Sidelnik
  • Patent number: 8250578
    Abstract: A method of pipelining hardware accelerators of a computing system includes associating hardware addresses to at least one processing unit (PU) or at least one logical partition (LPAR) of the computing system, receiving a work request for an associated hardware accelerator address, and queuing the work request for a hardware accelerator using the associated hardware accelerator address.
    Type: Grant
    Filed: February 22, 2008
    Date of Patent: August 21, 2012
    Assignee: International Business Machines Corporation
    Inventors: Rajaram B. Krishnamurthy, Thomas A. Gregg
  • Patent number: 8245208
    Abstract: Generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths, is disclosed. Further, a preferred embodiment of the present invention includes a novel technique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. Length conversion operations, for packing and unpacking data values, are included in the alignment handling framework. These operations are formally defined in terms of standard SIMD instructions that are readily available on various SIMD platforms. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.
    Type: Grant
    Filed: December 4, 2008
    Date of Patent: August 14, 2012
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
  • Patent number: 8239843
    Abstract: Parallelize a computer program by scoping program variables at compile time and inserting code into the program. Identify as value predictable variables, variables that are: defined only once in a loop of the program; not defined in any inner loop of the loop; and used in the loop. Optionally also: identify a code block in the program that contains a variable assignment, and then traverse a path backwards from the block through a control flow graph of the program. Name in a set all blocks along the path until a loop header block. For each block in the set, determine program blocks that logically succeed the block and are not in the first set. Identify all paths between the block and the determined blocks as failure paths, and insert code into the failure paths. When executed at run time of the program, the inserted code fails the corresponding path.
    Type: Grant
    Filed: March 11, 2008
    Date of Patent: August 7, 2012
    Assignee: Oracle America, Inc.
    Inventors: Yonghong Song, Xiangyun Kong, Spiros Kalogeropulos, Partha P. Tirumalai
  • Patent number: 8224636
    Abstract: A method and mechanism for implementing a general purpose scripting language that supports parallel execution is described. In one approach, parallel execution is provided in a seamless and high-level approach rather than requiring or expecting a user to have low-level programming expertise with parallel processing languages/functions. Also described is a system and method for performing circuit simulation. The present approach provides methods and systems that create reusable and independent measurements for use with circuit simulators. Also disclosed are parallelizable measurements having looping constructs that can be run without interference between parallel iterations. Reusability is enhanced by having parameterized measurements. Revisions and history of the operating parameters of circuit designs subject to simulation are tracked.
    Type: Grant
    Filed: December 17, 2003
    Date of Patent: July 17, 2012
    Assignee: Cadence Design Systems, Inc.
    Inventor: Kenneth S. Kundert
  • Patent number: 8214818
    Abstract: In one embodiment, the present invention includes a method for constructing a data dependency graph (DDG) for a loop to be transformed, performing statement shifting to transform the loop into a first transformed loop according to at least one of first and second algorithms, performing unimodular and echelon transformations of a selected one of the first or second transformed loops, partitioning the selected transformed loop to obtain maximum outer level parallelism (MOLP), and partitioning the selected transformed loop into multiple sub-loops. Other embodiments are described and claimed.
    Type: Grant
    Filed: August 30, 2007
    Date of Patent: July 3, 2012
    Assignee: Intel Corporation
    Inventors: Li Liu, Buqi Cheng, Gansha Wu
  • Publication number: 20120167069
    Abstract: Methods and apparatus to provide loop parallelization based on loop splitting and/or index array are described. In one embodiment, one or more split loops, corresponding to an original loop, are generated based on the mis-speculation information. In another embodiment, a plurality of subloops are generated from an original loop based on an index array. Other embodiments are also described.
    Type: Application
    Filed: December 24, 2010
    Publication date: June 28, 2012
    Inventors: Jin Lin, Nishkam Ravi, Xinmin Tian, John L. Ng, Renat V. Valiullin
  • Publication number: 20120167068
    Abstract: A system and method are configured to apply region level optimizations to a selected region of source code rather than loop level optimizations to a loop or loop nest. The region may include an outer loop, a plurality of inner loops and at least one control code. If the region includes an exceptional control flow statement and/or a procedure call, speculative region-level multi-versioning may be applied.
    Type: Application
    Filed: December 22, 2010
    Publication date: June 28, 2012
    Inventors: Jin Lin, John L. Ng, Robert J. Cox, Xinmin Tian
  • Publication number: 20120151463
    Abstract: A method for compiling application source code that includes selecting multiple loops for parallelization. The multiple loops include a first loop and a second loop. The method further includes partitioning the first loop into a first set of chunks, partitioning the second loop into a second set of chunks, and calculating data dependencies between the first set of chunks and the second set of chunks. A first chunk of the second set of chunks is dependent on a first chunk of the first set of chunks. The method further includes inserting, into the first loop and prior to completing compilation, a precedent synchronization instruction for execution when execution of the first chunk of the first set of chunks completes, and completing the compilation of the application source code to create an application compiled code.
    Type: Application
    Filed: December 9, 2010
    Publication date: June 14, 2012
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventors: Spiros Kalogeropulos, Partha P. Tirumalai
  • Patent number: 8196124
    Abstract: Loop code is generated to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths. Further, a preferred embodiment of the present invention includes a novel technique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.
    Type: Grant
    Filed: August 22, 2008
    Date of Patent: June 5, 2012
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
  • Publication number: 20120117552
    Abstract: Methods to improve optimization of compilation are presented. In one embodiment, a method includes identifying one or more optimization speculations with respect to a code region and speculatively performing transformation on an intermediate representation of the code region in accordance with an optimization speculation. The method includes generating an advice message corresponding to the optimization speculation and displaying the advice message if the optimization speculation results in an improved compilation result.
    Type: Application
    Filed: November 9, 2010
    Publication date: May 10, 2012
    Inventors: Rakesh Krishnaiyer, Hideki Saito Ido, Ernesto Su, John L. Ng, Jin Lin, Xinmin Tian, Robert Y. Geva
  • Patent number: 8176477
    Abstract: A method, system and program product for optimizing emulation of a suspected malware. The method includes identifying, using an emulation optimizer tool, whether an instruction in a suspected malware being emulated by an emulation engine in a virtual environment signifies a long loop and, if so, generating a first hash for the loop. Further, the method includes ascertaining whether the first hash generated matches any long loop entries in a storage and, if so calculating a second hash for the long loop. Furthermore, the method includes inspecting any long loop entries ascertained to find an entry having a respective second hash matching the second hash calculated. If an entry matching the second hash calculated is found, the method further includes updating one or more states of the emulation engine, such that, execution of the long loop of the suspected malware is skipped, which optimizes emulation of the suspected malware.
    Type: Grant
    Filed: September 14, 2007
    Date of Patent: May 8, 2012
    Assignee: International Business Machines Corporation
    Inventor: Ji Yan Wu
  • Patent number: 8171464
    Abstract: An approach is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.
    Type: Grant
    Filed: May 16, 2008
    Date of Patent: May 1, 2012
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
  • Patent number: 8151255
    Abstract: A method for detecting a dependence violation in an application that involves executing a plurality of sections of the application in parallel, and logging memory transactions that occur while executing the plurality of sections to obtain a plurality of logs and a plurality of temporary results, where the plurality of logs is compared while executing the plurality of sections to determine whether the dependence violation exists.
    Type: Grant
    Filed: June 26, 2006
    Date of Patent: April 3, 2012
    Assignee: Oracle America, Inc.
    Inventors: Phyllis E. Gustafson, Miguel Angel Lujan Moreno, Michael H. Paleczny, Christopher A. Vick, Olaf Manczak, Jay R. Freeman
  • Publication number: 20120079469
    Abstract: Systems and methods for the vectorization of software applications are described. In some embodiments, source code dependencies can be expressed in ways that can extend a compiler's ability to vectorize otherwise scalar functions. For example, when compiling a called function, a compiler may identify dependencies of the called function on variables other than parameters passed to the called function. The compiler may record these dependencies, e.g., in a dependency file. Later, when compiling a calling function that calls the called function, the same (or another) compiler may reference the previously-identified dependencies and use them to determine whether and how to vectorize the calling function. In particular, these techniques may facilitate the vectorization of non-leaf loops. Because non-leaf loops are relatively common, the techniques described herein can increase the amount of vectorization that can be applied to many applications.
    Type: Application
    Filed: September 23, 2010
    Publication date: March 29, 2012
    Inventor: Jeffry E. Gonion
  • Patent number: 8146067
    Abstract: Vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores is presented. In the framework presented herein, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirement of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residue iteration counts, and multiple statements with arbitrary alignment combinations. Beyond generating a valid simdization, a preferred embodiment further improves the quality of the generated codes. Four stream-shift placement policies are disclosed, which minimize the number of data reorganization generated by the alignment handling.
    Type: Grant
    Filed: April 23, 2008
    Date of Patent: March 27, 2012
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Peng Wu
  • Patent number: 8146071
    Abstract: A mechanism for folding all the data dependencies in a loop into a single, conservative dependence. This mechanism leads to one pair of synchronization primitives per loop. This mechanism does not require complicated, multi-stage compile time analysis. This mechanism considers only the data dependence information in the loop. The low synchronization cost balances the loss in parallelism due to the reduced overlap between iterations. Additionally, a novel scheme is presented to implement required synchronization to enforce data dependences in a DOACROSS loop. The synchronization is based on an iteration vector, which identifies a spatial position in the iteration space of the loop. Multiple iterations executing in parallel have their own iteration vector for synchronization where they update their position in the iteration space. As no sequential updates to the synchronization variable exist, this method exploits a greater degree of parallelism.
    Type: Grant
    Filed: September 18, 2007
    Date of Patent: March 27, 2012
    Assignee: International Business Machines Corporation
    Inventors: Raul Esteban Silvera, Priya Unnikrishnan
  • Patent number: 8136107
    Abstract: A method for managing multiple values assigned to a variable during various stages of a software pipelined process executed in a computing environment. The method comprises allocating two or more slots in a vector register to two or more values associated with said variable during two or more stages of a pipeline process; and rotating values in each slot responsive to an instruction.
    Type: Grant
    Filed: October 24, 2007
    Date of Patent: March 13, 2012
    Assignee: International Business Machines Corporation
    Inventor: Ayal Zaks
  • Patent number: 8132163
    Abstract: A method and apparatus for automatic second-order predictive commoning is provided by the present invention. During an analysis phase, the intermediate representation of a program code is analyzed to identify opportunities for second-order predictive commoning optimization. The analyzed information is used by the present invention for apply transformations to the program code, such that the number of memory access and the number of computations are reduced for loop iterations and performance of program code is improved.
    Type: Grant
    Filed: January 30, 2009
    Date of Patent: March 6, 2012
    Assignee: International Business Machines Corporation
    Inventors: Arie Tal, Dina Tal
  • Patent number: 8131979
    Abstract: The described embodiments provide a system that determines data dependencies between two vector memory operations or two memory operations that use vectors of memory addresses. During operation, the system receives a first input vector and a second input vector. The first input vector includes a number of elements containing memory addresses for a first memory operation, while the second input vector includes a number of elements containing memory addresses for a second memory operation, wherein the first memory operation occurs before the second memory operation in program order. The system then determines elements in the first and second input vectors where the memory addresses indicate that a dependency exists between the memory operations. The system next generates a result vector, wherein the result vector indicates the elements where dependencies exist between the memory operations.
    Type: Grant
    Filed: April 7, 2009
    Date of Patent: March 6, 2012
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff, Jr.
  • Patent number: 8120608
    Abstract: Embodiments of systems and methods for managing a constant buffer with rendering context specific data in multithreaded parallel computational GPU core are disclosed. Briefly described, one method embodiment, among others, comprises responsive to a first shader operation, receiving at a constant buffer a first group of constants corresponding to a first rendering context, and responsive to a second shader operation, receiving at the constant buffer a second group of constants corresponding to a second context without flushing the first group.
    Type: Grant
    Filed: April 4, 2008
    Date of Patent: February 21, 2012
    Assignee: Via Technologies, Inc.
    Inventors: Yang (Jeff) Jiao, Yijung Su, John Brothers
  • Patent number: 8117603
    Abstract: An operation synthesis system includes a pipeline structure creating section for automatically creating, based on a state number assigned to a skip statement described in a high-level language in a transition to a pipeline operation and the number of cycles required to supply a pipeline with one loop designated by a user or automatically set by the system, a state transition including a loop controller and a loop leaving controller which are capable of conducting pipeline operation. It is therefore possible to transform a loop description described in a high-level language into a description of a circuit in a practical size for pipeline operation.
    Type: Grant
    Filed: September 12, 2007
    Date of Patent: February 14, 2012
    Assignee: NEC Corporation
    Inventor: Toshihiko Nakamura
  • Patent number: 8104030
    Abstract: A computer implemented method, computer usable program code, and a system for parallelizing a loop. A parameter that will be used to limit parallelization of the loop is identified to limit parallelization of the loop. The parameter specifies a minimum number of loop iterations that a thread should execute. The parameter can be adjusted based on a parallel performance factor. A parallel performance factor is a factor that influences the performance of parallel code. A number of threads from a plurality of threads is selected for processing iterations of the loop based on the parameter. The number of threads is selected prior to execution of the first iteration of the loop.
    Type: Grant
    Filed: December 21, 2005
    Date of Patent: January 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Raul Esteban Silvera, Priya Unnikrishnan, Guansong Zhang
  • Patent number: 8099726
    Abstract: A software transactional memory system is described which utilizes decomposed software transactional memory instructions as well as runtime optimizations to achieve efficient performance. The decomposed instructions allow a compiler with knowledge of the instruction semantics to perform optimizations which would be unavailable on traditional software transactional memory systems. Additionally, high-level software transactional memory optimizations are performed such as code movement around procedure calls, addition of operations to provide strong atomicity, removal of unnecessary read-to-update upgrades, and removal of operations for newly-allocated objects. During execution, multi-use header words for objects are extended to provide for per-object housekeeping, as well as fast snapshots which illustrate changes to objects. Additionally, entries to software transactional memory logs are filtered using an associative table during execution, preventing needless writes to the logs.
    Type: Grant
    Filed: March 23, 2006
    Date of Patent: January 17, 2012
    Assignee: Microsoft Corporation
    Inventor: Timothy Lawrence Harris
  • Patent number: 8087011
    Abstract: Mechanisms for domain stretching for an advanced dual-representation polyhedral loop transformation framework are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.
    Type: Grant
    Filed: September 26, 2007
    Date of Patent: December 27, 2011
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
  • Patent number: 8087010
    Abstract: Mechanisms for selective code generation optimization for an advanced dual-representation polyhedral loop transformation framework are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.
    Type: Grant
    Filed: September 26, 2007
    Date of Patent: December 27, 2011
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
  • Patent number: 8087012
    Abstract: A technique is provided for eliminating maximum and minimum expressions within loop bounds are provided. A loop in a code is identified. The loop is determined to meet conditions, which require an upper loop bound and a lower loop bound to contain maximum and minimum expressions, loop-invariant operands, a predetermined size for a code size, and a total number of instructions to be greater than a predetermined constant. A profitability of loop versioning is determined based on a performance gain of a fast version of the loop, a probability of executing the fast version of the loop at runtime, and an overhead for performing loop versioning. A pair of lower loop bound and upper loop bound values resulting in a constant number is identified. A loop iteration value is checked to be a non-zero constant. Branches are identified, and loop versioning is performed to generate a versioned loop.
    Type: Grant
    Filed: August 21, 2007
    Date of Patent: December 27, 2011
    Assignee: International Business Machines Corporation
    Inventor: Edwin Chan
  • Publication number: 20110314461
    Abstract: The present invention extends to methods, systems, and computer program products for implementing parallel loops with serial semantics. Embodiments of the invention provide a semantic transforms and codegen patterns that provide more efficient parallel loop implementations with serial loop semantics. Embodiments of the invention support assignments within for-loop bodies, support break/return constructs within for-loop bodies, and run transformations to covert serial constructs to parallel constructs.
    Type: Application
    Filed: June 17, 2010
    Publication date: December 22, 2011
    Applicant: Microsoft Corporation
    Inventors: Jonathon Michael Stall, Curt Oliver Hagenlocher, John Benjamin Messerly, James J. Hugunin
  • Patent number: 8065670
    Abstract: A system that reduces overly optimistic program execution. During operation, the system encounters a bounded-execution block while executing a program, wherein the bounded execution block includes a primary path and a secondary path. Next, the system executes the bounded execution block. After executing the bounded execution block, the system determines whether executing instructions on the primary path is preferable to executing instructions on the secondary path based on information gathered while executing the bounded-execution block. If not, the system dynamically modifies the instructions of the bounded-execution block so that during subsequent passes through the bounded-execution block, the instructions on the secondary path are executed instead of the instructions on the primary path.
    Type: Grant
    Filed: October 3, 2006
    Date of Patent: November 22, 2011
    Assignee: Oracle America, Inc.
    Inventors: Tycho G. Nightingale, Wayne Mesard
  • Patent number: 8065502
    Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.
    Type: Grant
    Filed: November 6, 2009
    Date of Patent: November 22, 2011
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8060870
    Abstract: A system and method for advanced polyhedral loop transformations of source code in a compiler are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.
    Type: Grant
    Filed: September 26, 2007
    Date of Patent: November 15, 2011
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
  • Patent number: 8056069
    Abstract: A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop contains multiple non-stride-one memory accesses that operate over a contiguous stream of memory is disclosed. A preferred embodiment identifies groups of isomorphic statements within a loop body where the isomorphic statements operate over a contiguous stream of memory over the iteration of the loop. Those identified statements are then converted into virtual-length vector operations. Next, the hardware's available vector length is used to determine a number of virtual-length vectors to aggregate into a single vector operation for each iteration of the loop. Finally, the aggregated, vectorized loop code is converted into SIMD operations.
    Type: Grant
    Filed: September 17, 2007
    Date of Patent: November 8, 2011
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
  • Publication number: 20110271265
    Abstract: A system, method and computer program product for optimizing the process of compilation of computer program code. The compiler transforms the program code written in a variety of languages and creates additional code performing parallel processing of program tasks on target hardware architecture. The transformation of code is performed to achieve optimization of various critical parameters such as the execution speed on a multi-core or cluster target hardware architecture.
    Type: Application
    Filed: April 28, 2010
    Publication date: November 3, 2011
    Inventors: Alexander Y. Drozdov, Sergey V. Novikov
  • Patent number: 8028281
    Abstract: Parallelization of loops is performed for loops having indirect loop index variables and embedded conditional statements in the loop body. Loops having any finite number of array variables in the loop body, and any finite number of indirect loop index variables can be parallelized. There are two particular limitations of the described techniques: (i) that there are no cross-iteration dependencies in the loop other than through the indirect loop index variables; and (ii) that the loop index variables (either direct or indirect) are not redefined in the loop body.
    Type: Grant
    Filed: January 5, 2007
    Date of Patent: September 27, 2011
    Assignee: International Business Machines Corporation
    Inventor: Rajendra K. Bera
  • Publication number: 20110231830
    Abstract: A new computer-compiler architecture includes code analysis processes in which loops present in an intermediate instruction set are transformed into more efficient loops prior to fully executing the intermediate instruction set. The compiler architecture starts by generating the equivalent intermediate instructions for the original high level source code. For each loop in the intermediate instructions, a total cycle cost is calculated using a cycle cost table associated with the compiler. The compiler then generates intermediate code for replacement loops in which all conversion instructions are removed. The cycle costs for these new transformed loops are then compared against the total cycle cost for the original loops. If the total cycle costs exceed the new cycle costs, the compiler will replace the original loops in the intermediate instructions with the new transformed loops prior to generation of final code using the instruction set of the processor.
    Type: Application
    Filed: March 16, 2010
    Publication date: September 22, 2011
    Applicant: QUALCOMM INCORPORATED
    Inventors: Sumesh Udayakumaran, Chihong Zhang
  • Patent number: 8024718
    Abstract: One aspect of the invention includes a method of address expression optimization of source-level code. The source-level code describes the functionality of an application to be executed on a digital device. The method comprises first inputting first source-level code that describes the functionality of the application into optimization system. The optimization system then transforms the first source-level into a second source level that has fewer nonlinear operations than the first source-level code.
    Type: Grant
    Filed: November 21, 2005
    Date of Patent: September 20, 2011
    Assignee: IMEC
    Inventors: Miguel Miranda, Francky Catthoor, Martin Janssen, Hugo De Man
  • Patent number: 8024714
    Abstract: Various technologies and techniques are disclosed for transforming a sequential loop into a parallel loop for use with a transactional memory system. Open ended and/or closed ended sequential loops can be transformed to parallel loops. For example, a section of code containing an original sequential loop is analyzed to determine a fixed number of iterations for the original sequential loop. The original sequential loop is transformed into a parallel loop that can generate transactions in an amount up to the fixed number of iterations. As another example, an open ended sequential loop can be transformed into a parallel loop that generates a separate transaction containing a respective work item for each iteration of a speculation pipeline. The parallel loop is then executed using the transactional memory system, with at least some of the separate transactions being executed on different threads.
    Type: Grant
    Filed: June 4, 2007
    Date of Patent: September 20, 2011
    Assignee: Microsoft Corporation
    Inventors: John Joseph Duffy, Jan Gray, Yosseff Levanoni
  • Publication number: 20110225573
    Abstract: A compiler selects a nested loop within software code that includes an outer loop and an inner loop. The outer loop includes an outer induction variable and the inner loop includes an inner induction variable. The compiler identifies a computation included in the nested loop that generates an irregular array access, which includes an expression of both the outer induction variable and the inner induction variable. Next, the compiler identifies a redundant calculation for the computation based upon the outer induction variable and the inner induction variable, and generates a temporary variable to correspond with the redundant calculation. The compiler replaces the computation with the temporary variable in the nested loop and, in turn, compiles the nested loop with the included temporary variable.
    Type: Application
    Filed: March 11, 2010
    Publication date: September 15, 2011
    Inventor: Abderrazek Zaafrani
  • Publication number: 20110225213
    Abstract: Loop control flow diversion supports thread synchronization, garbage collection, and other situations involving suspension of long-running loops. Divertible loops have a loop body, a loop top, an indirection cell containing a loop top address, and a loop jump instruction sequence which references the indirection cell. In normal execution, control flows through the indirection cell to the loop top. After the indirection cell is altered, however, execution flow is diverted to a point away from the loop top. Operations such as garbage collection are performed while the loop (and hence the thread(s) using the loop) is thus diverted. The kernel or another thread then restores the loop top address into the indirection cell, and execution flow again continues through the restored indirection cell to the loop top.
    Type: Application
    Filed: March 10, 2010
    Publication date: September 15, 2011
    Applicant: Microsoft Corporation
    Inventors: Scott Mosier, Michael McKenzie Magruder, Frank V. Peschel-Gallee
  • Patent number: 8010957
    Abstract: A computer implemented method, apparatus, and computer usable program code for eliminating redundant read-modify-write code sequences in non-vectorizable code. Code is received comprising a sequence of operations. The sequence of operations includes a loop. Non-vectorizable operations are identified within the loop that modifies at least one sub-part of a storage location. The non-vectorizable operations are modified to include a single store operation for the number of sub-parts of the storage location.
    Type: Grant
    Filed: August 1, 2006
    Date of Patent: August 30, 2011
    Assignee: International Business Machines Corporation
    Inventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien
  • Patent number: 8006238
    Abstract: A process, compiler, computer program product and system for workload partitioning in a heterogeneous system. The process includes determining heterogeneous alignment constraints in the workload, partitioning a portion of tasks to a processing element sensitive to alignment constraints, and partitioning a remaining portion of tasks to a processing element not sensitive to alignment constraints.
    Type: Grant
    Filed: September 26, 2006
    Date of Patent: August 23, 2011
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Kathryn M. O'Brien, Tong Chen
  • Publication number: 20110185347
    Abstract: A method for executing a computer program involving obtaining a statement of the source code, where the statement comprises a method call, and where the source code is composed in a statically-typed programming language. The method also involves, upon entry into a loop included in the computer program: incrementing an entry counter by one; and, for each iteration of the loop, incrementing an iteration counter by one, incrementing a local counter by one to obtain an incremented value of the local counter, incrementing a summation variable by the incremented value of the local counter, and executing the iteration of the loop.
    Type: Application
    Filed: January 27, 2010
    Publication date: July 28, 2011
    Applicant: SUN MICROSYSTEMS, INC.
    Inventor: John Rose
  • Publication number: 20110161923
    Abstract: The system includes a command set defining a plurality of navigation commands for an audiovisual reproduction apparatus and a human-oriented scripting program for automatically authoring a navigation structure for use in a stand alone audiovisual product playable in the audiovisual reproduction apparatus. The scripting program includes an iterative loop with a variable adjusted according to the iterations of the loop. The scripting program is operable to automatically, for each iteration of the loop; select from the plurality of navigation commands a navigation command defined according to the variable as adjusted for each iteration of the loop; and add the navigation command to an intermediate representation of the navigation structure. An associated method is also provided.
    Type: Application
    Filed: April 19, 2005
    Publication date: June 30, 2011
    Applicant: ZOOtech Limited
    Inventor: Stuart Green