Including Loop Patents (Class 717/160)

Including scheduling instructions (Class 717/161)

Constant buffering for a computational core of a programmable graphics processing unit

Patent number: 8319774

Abstract: Embodiments of the present disclosure are directed to graphics processing systems, comprising: a plurality of execution units, wherein one of the execution units is configurable to process a thread corresponding to a rendering context, wherein the rendering context comprises a plurality of constants with a priority level; a constant buffer configurable to store the constants of the rendering context into a plurality of slot in a physical storage space; and an execution unit control unit configurable to assign the thread to one of the execution units; a constant buffer control unit providing a translation table for the rendering context to map the corresponding constants into the slots of the physical storage space. Comparable methods are also disclosed.

Type: Grant

Filed: November 29, 2011

Date of Patent: November 27, 2012

Assignee: Via Technologies, Inc.

Inventors: Yang (Jeff) Jiao, Yijung Su, John Brothers
Method and system for interprocedural prefetching

Patent number: 8312442

Abstract: A computing system has an amount of shared cache, and performs runtime automatic parallelization wherein when a parallelized loop is encountered, a main thread shares the workload with at least one other non-main thread. A method for providing interprocedural prefetching includes compiling source code to produce compiled code having a main thread including a parallelized loop. Prior to the parallelized loop in the main thread, the main thread includes prefetching instructions for the at least one other non-main thread that shares the workload of the parallelized loop. As a result, the main thread prefetches data into the shared cache for use by the at least one other non-main thread.

Type: Grant

Filed: December 10, 2008

Date of Patent: November 13, 2012

Assignee: Oracle America, Inc.

Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
Optimization of a target program

Patent number: 8296750

Abstract: A method and apparatus for optimizing a target program including a pattern of instructions to be replaced. The method is performed by execution of program code by a processor of an information processing apparatus that includes an output device and a computer readable storage medium storing the program code. At least one transformation is performed on the target program to generate a transformed target subprogram in which dependencies among the instructions included in the target subprogram are matched with dependencies in the pattern to be replaced. The transformed target subprogram is replaced, with a post-replacement instruction stream determined to correspond to the pattern to be replaced, to generate a replaced target subprogram. An optimized target program that includes the replaced target subprogram is outputted to the output device. The at least one transformation includes a first transformation, a loop transformation, or both the first transformation and the loop transformation.

Type: Grant

Filed: September 12, 2007

Date of Patent: October 23, 2012

Assignee: International Business Machines Corporation

Inventor: Motohiro Kawahito
Scheduling optimization of aliased pointers for implementation on programmable chips

Patent number: 8291396

Abstract: Various high-level languages are used to specify hardware designs on programmable chips. The high-level language programs include pointer operations that may have same iteration and future iteration dependencies. Single loop iteration pointer dependencies are considered when memory accesses are assigned to clock cycles. Multiple loop iteration pointer dependencies are considered when determining how often new data can be entered into the generated hardware pipeline without causing memory corruption. A buffer can be used to forward data from a memory write to a future read.

Type: Grant

Filed: September 18, 2006

Date of Patent: October 16, 2012

Assignee: Altera Corporation

Inventors: David James Lau, Jeffrey Orion Pritchard, Philippe Molson
Transforming locks in software loops

Patent number: 8276134

Abstract: An improved system and computer programming product for acquisition and release of locks within a software program is disclosed. In an exemplary embodiment, a lock within a loop is transformed by relocating acquisition and release instructions from within the loop to positions outside the loop. This may significantly decrease unnecessarily lock acquisition and release during execution of the software program. In order to avoid contention problems which may arise from acquiring and keeping a lock on an object over a relatively long period of time, a contention test may be inserted into the loop. Such a contention test may temporarily release the lock if another thread in the software program requires access to the locked object.

Type: Grant

Filed: June 9, 2008

Date of Patent: September 25, 2012

Assignee: International Business Machines Corporation

Inventors: Nikola Grcevski, Kevin Alexander Stoodley, Mark Graham Stoodley, Vijay Sundaresan
Method and apparatus for saving checkpoint data while detecting and analyzing a loop structure

Patent number: 8271954

Abstract: A save data discrimination method saves calculation results including an element which is periodically saved when a computer executes a program repeating the same arithmetic process. The method includes analyzing a loop structure of the program from a source code of the program to detect a main loop of the arithmetic process repeated in the program and a sub-loop included in the main loop, determining a point of entrance to the main loop as a checkpoint that is a point for saving data of the calculation results, and analyzing the contents of the arithmetic process described in the main loop to identify reference-first elements which are elements only referred to and elements defined after being referred to as data to be saved at the checkpoint determined at the point of entrance.

Type: Grant

Filed: March 10, 2008

Date of Patent: September 18, 2012

Assignee: Fujitsu Limited

Inventor: Akira Hosoi
Method and apparatus for transferring firmware between an operating system and a device in a host

Patent number: 8261257

Abstract: A host system includes an operating system having a user space and a kernel space with a memory. A device driver performs download cycles to download a firmware file from the user space to the memory. The download cycles are performed based on blocks of data remaining in the user space and not downloaded from the user space. The device driver: transfers a first block of data to a first segment of the memory; transfers a second block of data from the user space to a second segment of the memory; copies the first block into the second segment; and appends the first block to the second block to form a combined block. The first block is transferred from the user space to the first segment during a first download cycle. The first block is transferred from a second segment to the first segment during a second download cycle.

Type: Grant

Filed: October 24, 2011

Date of Patent: September 4, 2012

Assignee: Marvell International Ltd.

Inventors: Frank Huang, Xiaohua Luo, Robert Lee, James Jan, Zheng Cao
Distributed schemes for deploying an application in a large parallel system

Patent number: 8261249

Abstract: Embodiments of the invention provide a method for deploying and running an application on a massively parallel computer system, while minimizing the costs associated with latency, bandwidth, and limited memory resources. The executable code of a program may be divided into multiple code fragments and distributed to different compute nodes of a parallel computing system. During program execution, one compute node may fetch code fragments from other compute nodes as necessary.

Type: Grant

Filed: January 8, 2008

Date of Patent: September 4, 2012

Assignee: International Business Machines Corporation

Inventors: Charles Jens Archer, Thomas Michael Gooding, Ruth Janine Poole, Albert Sidelnik
Pipelining hardware accelerators to computer systems

Patent number: 8250578

Abstract: A method of pipelining hardware accelerators of a computing system includes associating hardware addresses to at least one processing unit (PU) or at least one logical partition (LPAR) of the computing system, receiving a work request for an associated hardware accelerator address, and queuing the work request for a hardware accelerator using the associated hardware accelerator address.

Type: Grant

Filed: February 22, 2008

Date of Patent: August 21, 2012

Assignee: International Business Machines Corporation

Inventors: Rajaram B. Krishnamurthy, Thomas A. Gregg
SIMD code generation for loops with mixed data lengths

Patent number: 8245208

Abstract: Generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths, is disclosed. Further, a preferred embodiment of the present invention includes a novel technique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. Length conversion operations, for packing and unpacking data values, are included in the alignment handling framework. These operations are formally defined in terms of standard SIMD instructions that are readily available on various SIMD platforms. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.

Type: Grant

Filed: December 4, 2008

Date of Patent: August 14, 2012

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
Value predictable variable scoping for speculative automatic parallelization with transactional memory

Patent number: 8239843

Abstract: Parallelize a computer program by scoping program variables at compile time and inserting code into the program. Identify as value predictable variables, variables that are: defined only once in a loop of the program; not defined in any inner loop of the loop; and used in the loop. Optionally also: identify a code block in the program that contains a variable assignment, and then traverse a path backwards from the block through a control flow graph of the program. Name in a set all blocks along the path until a loop header block. For each block in the set, determine program blocks that logically succeed the block and are not in the first set. Identify all paths between the block and the determined blocks as failure paths, and insert code into the failure paths. When executed at run time of the program, the inserted code fails the corresponding path.

Type: Grant

Filed: March 11, 2008

Date of Patent: August 7, 2012

Assignee: Oracle America, Inc.

Inventors: Yonghong Song, Xiangyun Kong, Spiros Kalogeropulos, Partha P. Tirumalai
Method and system for implementing parallel execution in a computing system and in a circuit simulator

Patent number: 8224636

Abstract: A method and mechanism for implementing a general purpose scripting language that supports parallel execution is described. In one approach, parallel execution is provided in a seamless and high-level approach rather than requiring or expecting a user to have low-level programming expertise with parallel processing languages/functions. Also described is a system and method for performing circuit simulation. The present approach provides methods and systems that create reusable and independent measurements for use with circuit simulators. Also disclosed are parallelizable measurements having looping constructs that can be run without interference between parallel iterations. Reusability is enhanced by having parameterized measurements. Revisions and history of the operating parameters of circuit designs subject to simulation are tracked.

Type: Grant

Filed: December 17, 2003

Date of Patent: July 17, 2012

Assignee: Cadence Design Systems, Inc.

Inventor: Kenneth S. Kundert
Method and apparatus to achieve maximum outer level parallelism of a loop

Patent number: 8214818

Abstract: In one embodiment, the present invention includes a method for constructing a data dependency graph (DDG) for a loop to be transformed, performing statement shifting to transform the loop into a first transformed loop according to at least one of first and second algorithms, performing unimodular and echelon transformations of a selected one of the first or second transformed loops, partitioning the selected transformed loop to obtain maximum outer level parallelism (MOLP), and partitioning the selected transformed loop into multiple sub-loops. Other embodiments are described and claimed.

Type: Grant

Filed: August 30, 2007

Date of Patent: July 3, 2012

Assignee: Intel Corporation

Inventors: Li Liu, Buqi Cheng, Gansha Wu
LOOP PARALLELIZATION BASED ON LOOP SPLITTING OR INDEX ARRAY

Publication number: 20120167069

Abstract: Methods and apparatus to provide loop parallelization based on loop splitting and/or index array are described. In one embodiment, one or more split loops, corresponding to an original loop, are generated based on the mis-speculation information. In another embodiment, a plurality of subloops are generated from an original loop based on an index array. Other embodiments are also described.

Type: Application

Filed: December 24, 2010

Publication date: June 28, 2012

Inventors: Jin Lin, Nishkam Ravi, Xinmin Tian, John L. Ng, Renat V. Valiullin
SPECULATIVE REGION-LEVEL LOOP OPTIMIZATIONS

Publication number: 20120167068

Abstract: A system and method are configured to apply region level optimizations to a selected region of source code rather than loop level optimizations to a loop or loop nest. The region may include an outer loop, a plurality of inner loops and at least one control code. If the region includes an exceptional control flow statement and/or a procedure call, speculative region-level multi-versioning may be applied.

Type: Application

Filed: December 22, 2010

Publication date: June 28, 2012

Inventors: Jin Lin, John L. Ng, Robert J. Cox, Xinmin Tian
METHOD AND SYSTEM FOR UTILIZING PARALLELISM ACROSS LOOPS

Publication number: 20120151463

Abstract: A method for compiling application source code that includes selecting multiple loops for parallelization. The multiple loops include a first loop and a second loop. The method further includes partitioning the first loop into a first set of chunks, partitioning the second loop into a second set of chunks, and calculating data dependencies between the first set of chunks and the second set of chunks. A first chunk of the second set of chunks is dependent on a first chunk of the first set of chunks. The method further includes inserting, into the first loop and prior to completing compilation, a precedent synchronization instruction for execution when execution of the first chunk of the first set of chunks completes, and completing the compilation of the application source code to create an application compiled code.

Type: Application

Filed: December 9, 2010

Publication date: June 14, 2012

Applicant: ORACLE INTERNATIONAL CORPORATION

Inventors: Spiros Kalogeropulos, Partha P. Tirumalai
SIMD code generation in the presence of optimized misaligned data reorganization

Patent number: 8196124

Abstract: Loop code is generated to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths. Further, a preferred embodiment of the present invention includes a novel technique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.

Type: Grant

Filed: August 22, 2008

Date of Patent: June 5, 2012

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
SPECULATIVE COMPILATION TO GENERATE ADVICE MESSAGES

Publication number: 20120117552

Abstract: Methods to improve optimization of compilation are presented. In one embodiment, a method includes identifying one or more optimization speculations with respect to a code region and speculatively performing transformation on an intermediate representation of the code region in accordance with an optimization speculation. The method includes generating an advice message corresponding to the optimization speculation and displaying the advice message if the optimization speculation results in an improved compilation result.

Type: Application

Filed: November 9, 2010

Publication date: May 10, 2012

Inventors: Rakesh Krishnaiyer, Hideki Saito Ido, Ernesto Su, John L. Ng, Jin Lin, Xinmin Tian, Robert Y. Geva
Method, system and program product for optimizing emulation of a suspected malware

Patent number: 8176477

Abstract: A method, system and program product for optimizing emulation of a suspected malware. The method includes identifying, using an emulation optimizer tool, whether an instruction in a suspected malware being emulated by an emulation engine in a virtual environment signifies a long loop and, if so, generating a first hash for the loop. Further, the method includes ascertaining whether the first hash generated matches any long loop entries in a storage and, if so calculating a second hash for the long loop. Furthermore, the method includes inspecting any long loop entries ascertained to find an entry having a respective second hash matching the second hash calculated. If an entry matching the second hash calculated is found, the method further includes updating one or more states of the emulation engine, such that, execution of the long loop of the suspected malware is skipped, which optimizes emulation of the suspected malware.

Type: Grant

Filed: September 14, 2007

Date of Patent: May 8, 2012

Assignee: International Business Machines Corporation

Inventor: Ji Yan Wu
Efficient code generation using loop peeling for SIMD loop code with multile misaligned statements

Patent number: 8171464

Abstract: An approach is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.

Type: Grant

Filed: May 16, 2008

Date of Patent: May 1, 2012

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
Using police threads to detect dependence violations to reduce speculative parallelization overhead

Patent number: 8151255

Abstract: A method for detecting a dependence violation in an application that involves executing a plurality of sections of the application in parallel, and logging memory transactions that occur while executing the plurality of sections to obtain a plurality of logs and a plurality of temporary results, where the plurality of logs is compared while executing the plurality of sections to determine whether the dependence violation exists.

Type: Grant

Filed: June 26, 2006

Date of Patent: April 3, 2012

Assignee: Oracle America, Inc.

Inventors: Phyllis E. Gustafson, Miguel Angel Lujan Moreno, Michael H. Paleczny, Christopher A. Vick, Olaf Manczak, Jay R. Freeman
Systems And Methods For Compiler-Based Vectorization Of Non-Leaf Code

Publication number: 20120079469

Abstract: Systems and methods for the vectorization of software applications are described. In some embodiments, source code dependencies can be expressed in ways that can extend a compiler's ability to vectorize otherwise scalar functions. For example, when compiling a called function, a compiler may identify dependencies of the called function on variables other than parameters passed to the called function. The compiler may record these dependencies, e.g., in a dependency file. Later, when compiling a calling function that calls the called function, the same (or another) compiler may reference the previously-identified dependencies and use them to determine whether and how to vectorize the calling function. In particular, these techniques may facilitate the vectorization of non-leaf loops. Because non-leaf loops are relatively common, the techniques described herein can increase the amount of vectorization that can be applied to many applications.

Type: Application

Filed: September 23, 2010

Publication date: March 29, 2012

Inventor: Jeffry E. Gonion
Efficient data reorganization to satisfy data alignment constraints

Patent number: 8146067

Abstract: Vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores is presented. In the framework presented herein, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirement of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residue iteration counts, and multiple statements with arbitrary alignment combinations. Beyond generating a valid simdization, a preferred embodiment further improves the quality of the generated codes. Four stream-shift placement policies are disclosed, which minimize the number of data reorganization generated by the alignment handling.

Type: Grant

Filed: April 23, 2008

Date of Patent: March 27, 2012

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Peng Wu
Pipelined parallelization of multi-dimensional loops with multiple data dependencies

Patent number: 8146071

Abstract: A mechanism for folding all the data dependencies in a loop into a single, conservative dependence. This mechanism leads to one pair of synchronization primitives per loop. This mechanism does not require complicated, multi-stage compile time analysis. This mechanism considers only the data dependence information in the loop. The low synchronization cost balances the loss in parallelism due to the reduced overlap between iterations. Additionally, a novel scheme is presented to implement required synchronization to enforce data dependences in a DOACROSS loop. The synchronization is based on an iteration vector, which identifies a spatial position in the iteration space of the loop. Multiple iterations executing in parallel have their own iteration vector for synchronization where they update their position in the iteration space. As no sequential updates to the synchronization variable exist, this method exploits a greater degree of parallelism.

Type: Grant

Filed: September 18, 2007

Date of Patent: March 27, 2012

Assignee: International Business Machines Corporation

Inventors: Raul Esteban Silvera, Priya Unnikrishnan
Software pipelining using one or more vector registers

Patent number: 8136107

Abstract: A method for managing multiple values assigned to a variable during various stages of a software pipelined process executed in a computing environment. The method comprises allocating two or more slots in a vector register to two or more values associated with said variable during two or more stages of a pipeline process; and rotating values in each slot responsive to an instruction.

Type: Grant

Filed: October 24, 2007

Date of Patent: March 13, 2012

Assignee: International Business Machines Corporation

Inventor: Ayal Zaks
Method and apparatus for automatic second-order predictive commoning

Patent number: 8132163

Abstract: A method and apparatus for automatic second-order predictive commoning is provided by the present invention. During an analysis phase, the intermediate representation of a program code is analyzed to identify opportunities for second-order predictive commoning optimization. The analyzed information is used by the present invention for apply transformations to the program code, such that the number of memory access and the number of computations are reduced for loop iterations and performance of program code is improved.

Type: Grant

Filed: January 30, 2009

Date of Patent: March 6, 2012

Assignee: International Business Machines Corporation

Inventors: Arie Tal, Dina Tal
Check-hazard instructions for processing vectors

Patent number: 8131979

Abstract: The described embodiments provide a system that determines data dependencies between two vector memory operations or two memory operations that use vectors of memory addresses. During operation, the system receives a first input vector and a second input vector. The first input vector includes a number of elements containing memory addresses for a first memory operation, while the second input vector includes a number of elements containing memory addresses for a second memory operation, wherein the first memory operation occurs before the second memory operation in program order. The system then determines elements in the first and second input vectors where the memory addresses indicate that a dependency exists between the memory operations. The system next generates a result vector, wherein the result vector indicates the elements where dependencies exist between the memory operations.

Type: Grant

Filed: April 7, 2009

Date of Patent: March 6, 2012

Assignee: Apple Inc.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff, Jr.
Constant buffering for a computational core of a programmable graphics processing unit

Patent number: 8120608

Abstract: Embodiments of systems and methods for managing a constant buffer with rendering context specific data in multithreaded parallel computational GPU core are disclosed. Briefly described, one method embodiment, among others, comprises responsive to a first shader operation, receiving at a constant buffer a first group of constants corresponding to a first rendering context, and responsive to a second shader operation, receiving at the constant buffer a second group of constants corresponding to a second context without flushing the first group.

Type: Grant

Filed: April 4, 2008

Date of Patent: February 21, 2012

Assignee: Via Technologies, Inc.

Inventors: Yang (Jeff) Jiao, Yijung Su, John Brothers
Operation synthesis system

Patent number: 8117603

Abstract: An operation synthesis system includes a pipeline structure creating section for automatically creating, based on a state number assigned to a skip statement described in a high-level language in a transition to a pipeline operation and the number of cycles required to supply a pipeline with one loop designated by a user or automatically set by the system, a state transition including a loop controller and a loop leaving controller which are capable of conducting pipeline operation. It is therefore possible to transform a loop description described in a high-level language into a description of a circuit in a practical size for pipeline operation.

Type: Grant

Filed: September 12, 2007

Date of Patent: February 14, 2012

Assignee: NEC Corporation

Inventor: Toshihiko Nakamura
Mechanism to restrict parallelization of loops

Patent number: 8104030

Abstract: A computer implemented method, computer usable program code, and a system for parallelizing a loop. A parameter that will be used to limit parallelization of the loop is identified to limit parallelization of the loop. The parameter specifies a minimum number of loop iterations that a thread should execute. The parameter can be adjusted based on a parallel performance factor. A parallel performance factor is a factor that influences the performance of parallel code. A number of threads from a plurality of threads is selected for processing iterations of the loop based on the parameter. The number of threads is selected prior to execution of the first iteration of the loop.

Type: Grant

Filed: December 21, 2005

Date of Patent: January 24, 2012

Assignee: International Business Machines Corporation

Inventors: Raul Esteban Silvera, Priya Unnikrishnan, Guansong Zhang
Implementing strong atomicity in software transactional memory

Patent number: 8099726

Abstract: A software transactional memory system is described which utilizes decomposed software transactional memory instructions as well as runtime optimizations to achieve efficient performance. The decomposed instructions allow a compiler with knowledge of the instruction semantics to perform optimizations which would be unavailable on traditional software transactional memory systems. Additionally, high-level software transactional memory optimizations are performed such as code movement around procedure calls, addition of operations to provide strong atomicity, removal of unnecessary read-to-update upgrades, and removal of operations for newly-allocated objects. During execution, multi-use header words for objects are extended to provide for per-object housekeeping, as well as fast snapshots which illustrate changes to objects. Additionally, entries to software transactional memory logs are filtered using an associative table during execution, preventing needless writes to the logs.

Type: Grant

Filed: March 23, 2006

Date of Patent: January 17, 2012

Assignee: Microsoft Corporation

Inventor: Timothy Lawrence Harris
Domain stretching for an advanced dual-representation polyhedral loop transformation framework

Patent number: 8087011

Abstract: Mechanisms for domain stretching for an advanced dual-representation polyhedral loop transformation framework are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.

Type: Grant

Filed: September 26, 2007

Date of Patent: December 27, 2011

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
Selective code generation optimization for an advanced dual-representation polyhedral loop transformation framework

Patent number: 8087010

Abstract: Mechanisms for selective code generation optimization for an advanced dual-representation polyhedral loop transformation framework are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.

Type: Grant

Filed: September 26, 2007

Date of Patent: December 27, 2011

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
Eliminating maximum/minimum operations in loop bounds

Patent number: 8087012

Abstract: A technique is provided for eliminating maximum and minimum expressions within loop bounds are provided. A loop in a code is identified. The loop is determined to meet conditions, which require an upper loop bound and a lower loop bound to contain maximum and minimum expressions, loop-invariant operands, a predetermined size for a code size, and a total number of instructions to be greater than a predetermined constant. A profitability of loop versioning is determined based on a performance gain of a fast version of the loop, a probability of executing the fast version of the loop at runtime, and an overhead for performing loop versioning. A pair of lower loop bound and upper loop bound values resulting in a constant number is identified. A loop iteration value is checked to be a non-zero constant. Branches are identified, and loop versioning is performed to generate a versioned loop.

Type: Grant

Filed: August 21, 2007

Date of Patent: December 27, 2011

Assignee: International Business Machines Corporation

Inventor: Edwin Chan
IMPLEMENTING PARALLEL LOOPS WITH SERIAL SEMANTICS

Publication number: 20110314461

Abstract: The present invention extends to methods, systems, and computer program products for implementing parallel loops with serial semantics. Embodiments of the invention provide a semantic transforms and codegen patterns that provide more efficient parallel loop implementations with serial loop semantics. Embodiments of the invention support assignments within for-loop bodies, support break/return constructs within for-loop bodies, and run transformations to covert serial constructs to parallel constructs.

Type: Application

Filed: June 17, 2010

Publication date: December 22, 2011

Applicant: Microsoft Corporation

Inventors: Jonathon Michael Stall, Curt Oliver Hagenlocher, John Benjamin Messerly, James J. Hugunin
Method and apparatus for enabling optimistic program execution

Patent number: 8065670

Abstract: A system that reduces overly optimistic program execution. During operation, the system encounters a bounded-execution block while executing a program, wherein the bounded execution block includes a primary path and a secondary path. Next, the system executes the bounded execution block. After executing the bounded execution block, the system determines whether executing instructions on the primary path is preferable to executing instructions on the secondary path based on information gathered while executing the bounded-execution block. If not, the system dynamically modifies the instructions of the bounded-execution block so that during subsequent passes through the bounded-execution block, the instructions on the secondary path are executed instead of the instructions on the primary path.

Type: Grant

Filed: October 3, 2006

Date of Patent: November 22, 2011

Assignee: Oracle America, Inc.

Inventors: Tycho G. Nightingale, Wayne Mesard
Macroscalar processor architecture

Patent number: 8065502

Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.

Type: Grant

Filed: November 6, 2009

Date of Patent: November 22, 2011

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
System and method for advanced polyhedral loop transformations of source code in a compiler

Patent number: 8060870

Abstract: A system and method for advanced polyhedral loop transformations of source code in a compiler are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.

Type: Grant

Filed: September 26, 2007

Date of Patent: November 15, 2011

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
Framework for integrated intra- and inter-loop aggregation of contiguous memory accesses for SIMD vectorization

Patent number: 8056069

Abstract: A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop contains multiple non-stride-one memory accesses that operate over a contiguous stream of memory is disclosed. A preferred embodiment identifies groups of isomorphic statements within a loop body where the isomorphic statements operate over a contiguous stream of memory over the iteration of the loop. Those identified statements are then converted into virtual-length vector operations. Next, the hardware's available vector length is used to determine a number of virtual-length vectors to aggregate into a single vector operation for each iteration of the loop. Finally, the aggregated, vectorized loop code is converted into SIMD operations.

Type: Grant

Filed: September 17, 2007

Date of Patent: November 8, 2011

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
METHOD OF AUTOMATIC GENERATION OF EXECUTABLE CODE FOR MULTI-CORE PARALLEL PROCESSING

Publication number: 20110271265

Abstract: A system, method and computer program product for optimizing the process of compilation of computer program code. The compiler transforms the program code written in a variety of languages and creates additional code performing parallel processing of program tasks on target hardware architecture. The transformation of code is performed to achieve optimization of various critical parameters such as the execution speed on a multi-core or cluster target hardware architecture.

Type: Application

Filed: April 28, 2010

Publication date: November 3, 2011

Inventors: Alexander Y. Drozdov, Sergey V. Novikov
Run-Time parallelization of loops in computer programs using bit vectors

Patent number: 8028281

Abstract: Parallelization of loops is performed for loops having indirect loop index variables and embedded conditional statements in the loop body. Loops having any finite number of array variables in the loop body, and any finite number of indirect loop index variables can be parallelized. There are two particular limitations of the described techniques: (i) that there are no cross-iteration dependencies in the loop other than through the indirect loop index variables; and (ii) that the loop index variables (either direct or indirect) are not redefined in the loop body.

Type: Grant

Filed: January 5, 2007

Date of Patent: September 27, 2011

Assignee: International Business Machines Corporation

Inventor: Rajendra K. Bera
Loop Transformation for Computer Compiler Optimization

Publication number: 20110231830

Abstract: A new computer-compiler architecture includes code analysis processes in which loops present in an intermediate instruction set are transformed into more efficient loops prior to fully executing the intermediate instruction set. The compiler architecture starts by generating the equivalent intermediate instructions for the original high level source code. For each loop in the intermediate instructions, a total cycle cost is calculated using a cycle cost table associated with the compiler. The compiler then generates intermediate code for replacement loops in which all conversion instructions are removed. The cycle costs for these new transformed loops are then compared against the total cycle cost for the original loops. If the total cycle costs exceed the new cycle costs, the compiler will replace the original loops in the intermediate instructions with the new transformed loops prior to generation of final code using the instruction set of the processor.

Type: Application

Filed: March 16, 2010

Publication date: September 22, 2011

Applicant: QUALCOMM INCORPORATED

Inventors: Sumesh Udayakumaran, Chihong Zhang
System and method for optimizing source code

Patent number: 8024718

Abstract: One aspect of the invention includes a method of address expression optimization of source-level code. The source-level code describes the functionality of an application to be executed on a digital device. The method comprises first inputting first source-level code that describes the functionality of the application into optimization system. The optimization system then transforms the first source-level into a second source level that has fewer nonlinear operations than the first source-level code.

Type: Grant

Filed: November 21, 2005

Date of Patent: September 20, 2011

Assignee: IMEC

Inventors: Miguel Miranda, Francky Catthoor, Martin Janssen, Hugo De Man
Parallelizing sequential frameworks using transactions

Patent number: 8024714

Abstract: Various technologies and techniques are disclosed for transforming a sequential loop into a parallel loop for use with a transactional memory system. Open ended and/or closed ended sequential loops can be transformed to parallel loops. For example, a section of code containing an original sequential loop is analyzed to determine a fixed number of iterations for the original sequential loop. The original sequential loop is transformed into a parallel loop that can generate transactions in an amount up to the fixed number of iterations. As another example, an open ended sequential loop can be transformed into a parallel loop that generates a separate transaction containing a respective work item for each iteration of a speculation pipeline. The parallel loop is then executed using the transactional memory system, with at least some of the separate transactions being executed on different threads.

Type: Grant

Filed: June 4, 2007

Date of Patent: September 20, 2011

Assignee: Microsoft Corporation

Inventors: John Joseph Duffy, Jan Gray, Yosseff Levanoni
Computation Reuse for Loops with Irregular Accesses

Publication number: 20110225573

Abstract: A compiler selects a nested loop within software code that includes an outer loop and an inner loop. The outer loop includes an outer induction variable and the inner loop includes an inner induction variable. The compiler identifies a computation included in the nested loop that generates an irregular array access, which includes an expression of both the outer induction variable and the inner induction variable. Next, the compiler identifies a redundant calculation for the computation based upon the outer induction variable and the inner induction variable, and generates a temporary variable to correspond with the redundant calculation. The compiler replaces the computation with the temporary variable in the nested loop and, in turn, compiles the nested loop with the included temporary variable.

Type: Application

Filed: March 11, 2010

Publication date: September 15, 2011

Inventor: Abderrazek Zaafrani
LOOP CONTROL FLOW DIVERSION

Publication number: 20110225213

Abstract: Loop control flow diversion supports thread synchronization, garbage collection, and other situations involving suspension of long-running loops. Divertible loops have a loop body, a loop top, an indirection cell containing a loop top address, and a loop jump instruction sequence which references the indirection cell. In normal execution, control flows through the indirection cell to the loop top. After the indirection cell is altered, however, execution flow is diverted to a point away from the loop top. Operations such as garbage collection are performed while the loop (and hence the thread(s) using the loop) is thus diverted. The kernel or another thread then restores the loop top address into the indirection cell, and execution flow again continues through the restored indirection cell to the loop top.

Type: Application

Filed: March 10, 2010

Publication date: September 15, 2011

Applicant: Microsoft Corporation

Inventors: Scott Mosier, Michael McKenzie Magruder, Frank V. Peschel-Gallee
Compiler for eliminating redundant read-modify-write code sequences in non-vectorizable code

Patent number: 8010957

Abstract: A computer implemented method, apparatus, and computer usable program code for eliminating redundant read-modify-write code sequences in non-vectorizable code. Code is received comprising a sequence of operations. The sequence of operations includes a loop. Non-vectorizable operations are identified within the loop that modifies at least one sub-part of a storage location. The non-vectorizable operations are modified to include a single store operation for the number of sub-parts of the storage location.

Type: Grant

Filed: August 1, 2006

Date of Patent: August 30, 2011

Assignee: International Business Machines Corporation

Inventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien
Workload partitioning in a parallel system with hetergeneous alignment constraints

Patent number: 8006238

Abstract: A process, compiler, computer program product and system for workload partitioning in a heterogeneous system. The process includes determining heterogeneous alignment constraints in the workload, partitioning a portion of tasks to a processing element sensitive to alignment constraints, and partitioning a remaining portion of tasks to a processing element not sensitive to alignment constraints.

Type: Grant

Filed: September 26, 2006

Date of Patent: August 23, 2011

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Kathryn M. O'Brien, Tong Chen
METHOD AND SYSTEM FOR EXECUTION PROFILING USING LOOP COUNT VARIANCE

Publication number: 20110185347

Abstract: A method for executing a computer program involving obtaining a statement of the source code, where the statement comprises a method call, and where the source code is composed in a statically-typed programming language. The method also involves, upon entry into a loop included in the computer program: incrementing an entry counter by one; and, for each iteration of the loop, incrementing an iteration counter by one, incrementing a local counter by one to obtain an incremented value of the local counter, incrementing a summation variable by the incremented value of the local counter, and executing the iteration of the loop.

Type: Application

Filed: January 27, 2010

Publication date: July 28, 2011

Applicant: SUN MICROSYSTEMS, INC.

Inventor: John Rose
PREPARING NAVIGATION STRUCTURE FOR AN AUDIOVISUAL PRODUCT

Publication number: 20110161923

Abstract: The system includes a command set defining a plurality of navigation commands for an audiovisual reproduction apparatus and a human-oriented scripting program for automatically authoring a navigation structure for use in a stand alone audiovisual product playable in the audiovisual reproduction apparatus. The scripting program includes an iterative loop with a variable adjusted according to the iterations of the loop. The scripting program is operable to automatically, for each iteration of the loop; select from the plurality of navigation commands a navigation command defined according to the variable as adjusted for each iteration of the loop; and add the navigation command to an intermediate representation of the navigation structure. An associated method is also provided.

Type: Application

Filed: April 19, 2005

Publication date: June 30, 2011

Applicant: ZOOtech Limited

Inventor: Stuart Green

prev 1 2 3 4 5 6 7 8 9 next