Patents by Inventor Alexandre E. Eichenberger

Alexandre E. Eichenberger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Framework for generating mixed-mode operations in loop-level simdization

Patent number: 8549501

Abstract: Generating mixed-mode operations in the compilation of program code for processors having vector or SIMD processing units is disclosed. In a preferred embodiment of the present invention, program instructions making up the body of a loop are abstracted into virtual vector instructions. These virtual vector instructions are treated, for initial code optimization purposes, as vector instructions (i.e., instructions written for the vector unit). The virtual vector instructions are eventually expanded into native code for the target processor, at which time a determination is made for each virtual vector instruction as to whether to expand the virtual vector instruction into native vector instructions, into native scalar instructions, into calls to pre-defined library functions, or into a combination of these. A cost model is used to determine the optimal choice of expansion based on hardware/software constraints, performance costs/benefits, and other criteria.

Type: Grant

Filed: August 16, 2004

Date of Patent: October 1, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
SIMD Compare Instruction Using Permute Logic for Distributed Register Files

Publication number: 20130246737

Abstract: Mechanisms, in a data processing system comprising a single instruction multiple data (SIMD) processor, for performing a data dependency check operation on vector element values of at least two input vector registers are provided. Two calls to a simd-check instruction are performed, one with input vector registers having a first order and one with the input vector registers having a different order. The simd-check instruction performs comparisons to determine if any data dependencies are present. Results of the two calls to the simd-check instruction are obtained and used to determine if any data dependencies are present in the at least two input vector registers. Based on the results, the SIMD processor may perform various operations.

Type: Application

Filed: March 15, 2012

Publication date: September 19, 2013

Applicant: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Bruce M. Fleischer
Data transfer optimized software cache for regular memory references

Patent number: 8527974

Abstract: Mechanisms are provided for optimizing regular memory references in computer code. These mechanisms may parse the computer code to identify memory references in the computer code. These mechanisms may further classify the memory references in the computer code as either regular memory references or irregular memory references. Moreover, the mechanisms may transform the computer code, by a compiler, to generate transformed computer code in which regular memory references access a storage of a software cache of a data processing system through a high locality cache mechanism of the software cache.

Type: Grant

Filed: March 28, 2008

Date of Patent: September 3, 2013

Assignees: International Business Machines Corporation, Barcelona Supercomputing Center—Centro Nacional de Supercomputacion

Inventors: Eduard Ayguade, Tong Chen, Alexandre E. Eichenberger, Marc Gonzalez Tallada, Xavier Martorell, John K. O'Brien, Kathryn M. O'Brien, Zehra N. Sura, Tao Zhang
Checkpointing in speculative versioning caches

Patent number: 8521961

Abstract: Mechanisms for generating checkpoints in a speculative versioning cache of a data processing system are provided. The mechanisms execute code within the data processing system, wherein the code accesses cache lines in the speculative versioning cache. The mechanisms further determine whether a first condition occurs indicating a need to generate a checkpoint in the speculative versioning cache. The checkpoint is a speculative cache line which is made non-speculative in response to a second condition occurring that requires a roll-back of changes to a cache line corresponding to the speculative cache line. The mechanisms also generate the checkpoint in the speculative versioning cache in response to a determination that the first condition has occurred.

Type: Grant

Filed: August 20, 2009

Date of Patent: August 27, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Alan Gara, Michael K. Gschwind, Martin Ohmacht
Write-through cache optimized for dependence-free parallel regions

Patent number: 8516197

Abstract: An apparatus, method and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.

Type: Grant

Filed: February 11, 2011

Date of Patent: August 20, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Alan G. Gara, Martin Ohmacht, Vijayalakshmi Srinivasan
Shared prefetching to reduce execution skew in multi-threaded systems

Patent number: 8490071

Abstract: Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated based on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.

Type: Grant

Filed: May 4, 2010

Date of Patent: July 16, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John A. Gunnels
Tracking and detecting thread dependencies using speculative versioning cache

Patent number: 8468539

Abstract: Mechanisms are provided for tracking dependencies of threads in a multi-threaded computer program execution. The mechanisms detect a dependency of a first thread's execution on results of a second thread's execution in an execution flow of the multi-threaded computer program. The mechanisms further store, in a hardware thread dependency vector storage associated with the first thread's execution, an identifier of the dependency by setting at least one bit in the hardware thread dependency vector storage corresponding to the second thread. Moreover, the mechanisms schedule tasks performed by the multi-threaded computer program based on the hardware thread dependency vector storage to minimize squashing of threads.

Type: Grant

Filed: September 3, 2009

Date of Patent: June 18, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Lakshminarayanan Renganarayana, Xiaotong Zhuang
Parallelization of irregular reductions via parallel building and exploitation of conflict-free units of work at runtime

Patent number: 8468508

Abstract: An optimizing compiler device, a method, a computer program product which are capable of performing parallelization of irregular reductions. The method for performing parallelization of irregular reductions includes receiving, at a compiler, a program and selecting, at compile time, at least one unit of work (UW) from the program, each UW configured to operate on at least one reduction operation, where at least one reduction operation in the UW operates on a reduction variable whose address is determinable when running the program at a run-time. At run time, for each successive current UW, a list of reduction operations accessed by that unit of work is recorded. Further, it is determined at run time whether reduction operations accessed by a current UW conflict with any reduction operations recorded as having been accessed by prior selected units of work, and assigning the unit of work as a conflict free unit of work (CFUW) when no conflicts are found.

Type: Grant

Filed: October 9, 2009

Date of Patent: June 18, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Yangchun Luo, John K. O'Brien, Xiaotong Zhuang
Efficient Enqueuing of Values in SIMD Engines with Permute Unit

Publication number: 20130151822

Abstract: Mechanisms, in a data processing system having a processor, for generating enqueued data for performing computations of a conditional branch of code are provided. Mask generation logic of the processor operates to generate a mask representing a subset of iterations of a loop of the code that results in a condition of the conditional branch being satisfied. The mask is used to select data elements from an input data element vector register corresponding to the subset of iterations of the loop of the code that result in the condition of the conditional branch being satisfied. Furthermore, the selected data elements are used to perform computations of the conditional branch of code. Iterations of the loop of the code that do not result in the condition of the conditional branch being satisfied are not used as a basis for performing computations of the conditional branch of code.

Type: Application

Filed: December 9, 2011

Publication date: June 13, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alexandre E. Eichenberger, John K.P. O'Brien, Yuan Zhao
Runtime dependence-aware scheduling using assist thread

Patent number: 8464271

Abstract: A runtime dependence-aware scheduling of dependent iterations mechanism is provided. Computation is performed for one or more iterations of computer executable code by a main thread. Dependence information is determined for a plurality of memory accesses within the computer executable code using modified executable code using a set of dependence threads. Using the dependence information, a determination is made as to whether a subset of a set of uncompleted iterations in the plurality of iterations is capable of being executed ahead-of-time by the one or more available threads in the data processing system. If the subset of the set of uncompleted iterations in the plurality of iterations is capable of being executed ahead-of-time, the main thread is signaled to skip the subset of the set of uncompleted iterations and the set of assist threads is signaled to execute the subset of the set of uncompleted iterations.

Type: Grant

Filed: April 10, 2012

Date of Patent: June 11, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kathryn M. O'Brien, Xiaotong Zhuang
Method and structure of using SIMD vector architectures to implement matrix multiplication

Patent number: 8458442

Abstract: A structure (and method) including a plurality of coprocessing units and a controller that selectively loads data for processing on the plurality of coprocessing units, using a compound loading instruction. The compound loading instruction includes a plurality of low-level software instructions that preliminarily processes input data in a manner predetermined to simulate an effect of a single hardware loading instruction that would provide optimal loading of complex matrix data by loading input data in accordance with the effect of multiplying i·i=?1.

Type: Grant

Filed: August 26, 2009

Date of Patent: June 4, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Michael Karl Gschwind, John A. Gunnels, Fred Gehrung Gustavson, Brett Olsson
Insertion of operation-and-indicate instructions for optimized SIMD code

Patent number: 8458684

Abstract: Mechanisms are provided for inserting indicated instructions for tracking and indicating exceptions in the execution of vectorized code. A portion of first code is received for compilation. The portion of first code is analyzed to identify non-speculative instructions performing designated non-speculative operations in the first code that are candidates for replacement by replacement operation-and-indicate instructions that perform the designated non-speculative operations and further perform an indication operation for indicating any exception conditions corresponding to special exception values present in vector register inputs to the replacement operation-and-indicate instructions. The replacement is performed and second code is generated based on the replacement of the at least one non-speculative instruction.

Type: Grant

Filed: August 19, 2009

Date of Patent: June 4, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Alan Gara, Michael K. Gschwind
Code generation for complex arithmetic reduction for architectures lacking cross data-path support

Patent number: 8423979

Abstract: A computer implemented method, apparatus, and computer usable program code for compiling source code for performing a complex operation followed by a complex reduction operation. A method is determined for generating executable code for performing the complex operation and the complex reduction operation. Executable code is generated for computing sub-products, reducing the sub-products to intermediate results, and summing the intermediate results to generate a final result in response to a determination that a reduced single instruction multiple data method is appropriate.

Type: Grant

Filed: October 12, 2006

Date of Patent: April 16, 2013

Assignee: International Business Machines Corporation

Inventors: Roch Georges Archambault, Alexandre E. Eichenberger, Amy Kai-Ting Wang, Peng Wu, Peng P. Zhao
Version pressure feedback mechanisms for speculative versioning caches

Patent number: 8397052

Abstract: Mechanisms are provided for controlling version pressure on a speculative versioning cache. Raw version pressure data is collected based on one or more threads accessing cache lines of the speculative versioning cache. One or more statistical measures of version pressure are generated based on the collected raw version pressure data. A determination is made as to whether one or more modifications to an operation of a data processing system are to be performed based on the one or more statistical measures of version pressure, the one or more modifications affecting version pressure exerted on the speculative versioning cache. An operation of the data processing system is modified based on the one or more determined modifications, in response to a determination that one or more modifications to the operation of the data processing system are to be performed, to affect the version pressure exerted on the speculative versioning cache.

Type: Grant

Filed: August 19, 2009

Date of Patent: March 12, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Alan Gara, Kathryn M. O'Brien, Martin Ohmacht, Xiaotong Zhuang
Optimized software cache lookup for SIMD architectures

Patent number: 8370575

Abstract: Process, cache memory, computer product and system for loading data associated with a requested address in a software cache. The process includes loading address tags associated with a set in a cache directory using a Single Instruction Multiple Data (SIMD) operation, determining a position of the requested address in the set using a SIMD comparison, and determining an actual data value associated with the position of the requested address in the set.

Type: Grant

Filed: September 7, 2006

Date of Patent: February 5, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Tao Zhang
Optimizing scalar code executed on a SIMD engine by alignment of SIMD slots

Patent number: 8370817

Abstract: A mechanism is provided for optimizing scalar code executed on a single instruction multiple data (SIMD) engine by aligning the slots of SIMD registers. With the mechanism, a compiler is provided that parses source code and, for each statement in the program, generates an expression tree. The compiler inspects all storage inputs to scalar operations in the expression tree to determine their alignment in the SIMD registers. This alignment is propagated up the expression tree from the leaves. When the alignments of two operands in the expression tree are the same, the resulting alignment is the shared value. When the alignments of two operands in the expression tree are different, one operand is shifted. For shifted operands, a shift operation is inserted in the expression tree. The executable code is then generated for the expression tree and shifts are inserted where indicated.

Type: Grant

Filed: May 27, 2008

Date of Patent: February 5, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien
WRITE-THROUGH CACHE OPTIMIZED FOR DEPENDENCE-FREE PARALLEL REGIONS

Publication number: 20120331232

Abstract: An apparatus and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.

Type: Application

Filed: September 5, 2012

Publication date: December 27, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alexandre E. Eichenberger, Alan Gara, Martin Ohmacht, Vijayalakshmi Srinivasan
Single instruction multiple data (SIMD) code generation for parallel loops using versioning and scheduling

Patent number: 8341615

Abstract: Embodiments of the present invention address deficiencies of the art in respect to loop parallelization for a target architecture implementing a shared memory model and provide a novel and non-obvious method, system and computer program product for SIMD code generation for parallel loops using versioning and scheduling. In an embodiment of the invention, within a code compilation data processing system a parallel SIMD loop code generation method can include identifying a loop in a representation of source code as a parallel loop candidate, either through a user directive or through auto-parallelization.

Type: Grant

Filed: July 11, 2008

Date of Patent: December 25, 2012

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Raul E. Silvera, Amy K. Wang, Guansong Zhang
Optimized Scalar Promotion with Load and Splat SIMD Instructions

Publication number: 20120290816

Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.

Type: Application

Filed: July 23, 2012

Publication date: November 15, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
Constant Time Worker Thread Allocation Via Configuration Caching

Publication number: 20120246654

Abstract: Mechanisms are provided for allocating threads for execution of a parallel region of code. A request for allocation of worker threads to execute the parallel region of code is received from a master thread. Cached thread allocation information identifying prior thread allocations that have been performed for the master thread are accessed. Worker threads are allocated to the master thread based on the cached thread allocation information. The parallel region of code is executed using the allocated worker threads.

Type: Application

Filed: March 24, 2011

Publication date: September 27, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alexandre E. Eichenberger, John K.P. O'Brien

prev 1 2 3 4 5 6 7 next