Patents by Inventor Alexandre E. Eichenberger

Alexandre E. Eichenberger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8549501
    Abstract: Generating mixed-mode operations in the compilation of program code for processors having vector or SIMD processing units is disclosed. In a preferred embodiment of the present invention, program instructions making up the body of a loop are abstracted into virtual vector instructions. These virtual vector instructions are treated, for initial code optimization purposes, as vector instructions (i.e., instructions written for the vector unit). The virtual vector instructions are eventually expanded into native code for the target processor, at which time a determination is made for each virtual vector instruction as to whether to expand the virtual vector instruction into native vector instructions, into native scalar instructions, into calls to pre-defined library functions, or into a combination of these. A cost model is used to determine the optimal choice of expansion based on hardware/software constraints, performance costs/benefits, and other criteria.
    Type: Grant
    Filed: August 16, 2004
    Date of Patent: October 1, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
  • Publication number: 20130246737
    Abstract: Mechanisms, in a data processing system comprising a single instruction multiple data (SIMD) processor, for performing a data dependency check operation on vector element values of at least two input vector registers are provided. Two calls to a simd-check instruction are performed, one with input vector registers having a first order and one with the input vector registers having a different order. The simd-check instruction performs comparisons to determine if any data dependencies are present. Results of the two calls to the simd-check instruction are obtained and used to determine if any data dependencies are present in the at least two input vector registers. Based on the results, the SIMD processor may perform various operations.
    Type: Application
    Filed: March 15, 2012
    Publication date: September 19, 2013
    Applicant: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Bruce M. Fleischer
  • Patent number: 8527974
    Abstract: Mechanisms are provided for optimizing regular memory references in computer code. These mechanisms may parse the computer code to identify memory references in the computer code. These mechanisms may further classify the memory references in the computer code as either regular memory references or irregular memory references. Moreover, the mechanisms may transform the computer code, by a compiler, to generate transformed computer code in which regular memory references access a storage of a software cache of a data processing system through a high locality cache mechanism of the software cache.
    Type: Grant
    Filed: March 28, 2008
    Date of Patent: September 3, 2013
    Assignees: International Business Machines Corporation, Barcelona Supercomputing Center—Centro Nacional de Supercomputacion
    Inventors: Eduard Ayguade, Tong Chen, Alexandre E. Eichenberger, Marc Gonzalez Tallada, Xavier Martorell, John K. O'Brien, Kathryn M. O'Brien, Zehra N. Sura, Tao Zhang
  • Patent number: 8521961
    Abstract: Mechanisms for generating checkpoints in a speculative versioning cache of a data processing system are provided. The mechanisms execute code within the data processing system, wherein the code accesses cache lines in the speculative versioning cache. The mechanisms further determine whether a first condition occurs indicating a need to generate a checkpoint in the speculative versioning cache. The checkpoint is a speculative cache line which is made non-speculative in response to a second condition occurring that requires a roll-back of changes to a cache line corresponding to the speculative cache line. The mechanisms also generate the checkpoint in the speculative versioning cache in response to a determination that the first condition has occurred.
    Type: Grant
    Filed: August 20, 2009
    Date of Patent: August 27, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Alan Gara, Michael K. Gschwind, Martin Ohmacht
  • Patent number: 8516197
    Abstract: An apparatus, method and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.
    Type: Grant
    Filed: February 11, 2011
    Date of Patent: August 20, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Alan G. Gara, Martin Ohmacht, Vijayalakshmi Srinivasan
  • Patent number: 8490071
    Abstract: Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated based on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.
    Type: Grant
    Filed: May 4, 2010
    Date of Patent: July 16, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John A. Gunnels
  • Patent number: 8468539
    Abstract: Mechanisms are provided for tracking dependencies of threads in a multi-threaded computer program execution. The mechanisms detect a dependency of a first thread's execution on results of a second thread's execution in an execution flow of the multi-threaded computer program. The mechanisms further store, in a hardware thread dependency vector storage associated with the first thread's execution, an identifier of the dependency by setting at least one bit in the hardware thread dependency vector storage corresponding to the second thread. Moreover, the mechanisms schedule tasks performed by the multi-threaded computer program based on the hardware thread dependency vector storage to minimize squashing of threads.
    Type: Grant
    Filed: September 3, 2009
    Date of Patent: June 18, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Lakshminarayanan Renganarayana, Xiaotong Zhuang
  • Patent number: 8468508
    Abstract: An optimizing compiler device, a method, a computer program product which are capable of performing parallelization of irregular reductions. The method for performing parallelization of irregular reductions includes receiving, at a compiler, a program and selecting, at compile time, at least one unit of work (UW) from the program, each UW configured to operate on at least one reduction operation, where at least one reduction operation in the UW operates on a reduction variable whose address is determinable when running the program at a run-time. At run time, for each successive current UW, a list of reduction operations accessed by that unit of work is recorded. Further, it is determined at run time whether reduction operations accessed by a current UW conflict with any reduction operations recorded as having been accessed by prior selected units of work, and assigning the unit of work as a conflict free unit of work (CFUW) when no conflicts are found.
    Type: Grant
    Filed: October 9, 2009
    Date of Patent: June 18, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Yangchun Luo, John K. O'Brien, Xiaotong Zhuang
  • Publication number: 20130151822
    Abstract: Mechanisms, in a data processing system having a processor, for generating enqueued data for performing computations of a conditional branch of code are provided. Mask generation logic of the processor operates to generate a mask representing a subset of iterations of a loop of the code that results in a condition of the conditional branch being satisfied. The mask is used to select data elements from an input data element vector register corresponding to the subset of iterations of the loop of the code that result in the condition of the conditional branch being satisfied. Furthermore, the selected data elements are used to perform computations of the conditional branch of code. Iterations of the loop of the code that do not result in the condition of the conditional branch being satisfied are not used as a basis for performing computations of the conditional branch of code.
    Type: Application
    Filed: December 9, 2011
    Publication date: June 13, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, John K.P. O'Brien, Yuan Zhao
  • Patent number: 8464271
    Abstract: A runtime dependence-aware scheduling of dependent iterations mechanism is provided. Computation is performed for one or more iterations of computer executable code by a main thread. Dependence information is determined for a plurality of memory accesses within the computer executable code using modified executable code using a set of dependence threads. Using the dependence information, a determination is made as to whether a subset of a set of uncompleted iterations in the plurality of iterations is capable of being executed ahead-of-time by the one or more available threads in the data processing system. If the subset of the set of uncompleted iterations in the plurality of iterations is capable of being executed ahead-of-time, the main thread is signaled to skip the subset of the set of uncompleted iterations and the set of assist threads is signaled to execute the subset of the set of uncompleted iterations.
    Type: Grant
    Filed: April 10, 2012
    Date of Patent: June 11, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kathryn M. O'Brien, Xiaotong Zhuang
  • Patent number: 8458442
    Abstract: A structure (and method) including a plurality of coprocessing units and a controller that selectively loads data for processing on the plurality of coprocessing units, using a compound loading instruction. The compound loading instruction includes a plurality of low-level software instructions that preliminarily processes input data in a manner predetermined to simulate an effect of a single hardware loading instruction that would provide optimal loading of complex matrix data by loading input data in accordance with the effect of multiplying i·i=?1.
    Type: Grant
    Filed: August 26, 2009
    Date of Patent: June 4, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Michael Karl Gschwind, John A. Gunnels, Fred Gehrung Gustavson, Brett Olsson
  • Patent number: 8458684
    Abstract: Mechanisms are provided for inserting indicated instructions for tracking and indicating exceptions in the execution of vectorized code. A portion of first code is received for compilation. The portion of first code is analyzed to identify non-speculative instructions performing designated non-speculative operations in the first code that are candidates for replacement by replacement operation-and-indicate instructions that perform the designated non-speculative operations and further perform an indication operation for indicating any exception conditions corresponding to special exception values present in vector register inputs to the replacement operation-and-indicate instructions. The replacement is performed and second code is generated based on the replacement of the at least one non-speculative instruction.
    Type: Grant
    Filed: August 19, 2009
    Date of Patent: June 4, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Alan Gara, Michael K. Gschwind
  • Patent number: 8423979
    Abstract: A computer implemented method, apparatus, and computer usable program code for compiling source code for performing a complex operation followed by a complex reduction operation. A method is determined for generating executable code for performing the complex operation and the complex reduction operation. Executable code is generated for computing sub-products, reducing the sub-products to intermediate results, and summing the intermediate results to generate a final result in response to a determination that a reduced single instruction multiple data method is appropriate.
    Type: Grant
    Filed: October 12, 2006
    Date of Patent: April 16, 2013
    Assignee: International Business Machines Corporation
    Inventors: Roch Georges Archambault, Alexandre E. Eichenberger, Amy Kai-Ting Wang, Peng Wu, Peng P. Zhao
  • Patent number: 8397052
    Abstract: Mechanisms are provided for controlling version pressure on a speculative versioning cache. Raw version pressure data is collected based on one or more threads accessing cache lines of the speculative versioning cache. One or more statistical measures of version pressure are generated based on the collected raw version pressure data. A determination is made as to whether one or more modifications to an operation of a data processing system are to be performed based on the one or more statistical measures of version pressure, the one or more modifications affecting version pressure exerted on the speculative versioning cache. An operation of the data processing system is modified based on the one or more determined modifications, in response to a determination that one or more modifications to the operation of the data processing system are to be performed, to affect the version pressure exerted on the speculative versioning cache.
    Type: Grant
    Filed: August 19, 2009
    Date of Patent: March 12, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Alan Gara, Kathryn M. O'Brien, Martin Ohmacht, Xiaotong Zhuang
  • Patent number: 8370575
    Abstract: Process, cache memory, computer product and system for loading data associated with a requested address in a software cache. The process includes loading address tags associated with a set in a cache directory using a Single Instruction Multiple Data (SIMD) operation, determining a position of the requested address in the set using a SIMD comparison, and determining an actual data value associated with the position of the requested address in the set.
    Type: Grant
    Filed: September 7, 2006
    Date of Patent: February 5, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Tao Zhang
  • Patent number: 8370817
    Abstract: A mechanism is provided for optimizing scalar code executed on a single instruction multiple data (SIMD) engine by aligning the slots of SIMD registers. With the mechanism, a compiler is provided that parses source code and, for each statement in the program, generates an expression tree. The compiler inspects all storage inputs to scalar operations in the expression tree to determine their alignment in the SIMD registers. This alignment is propagated up the expression tree from the leaves. When the alignments of two operands in the expression tree are the same, the resulting alignment is the shared value. When the alignments of two operands in the expression tree are different, one operand is shifted. For shifted operands, a shift operation is inserted in the expression tree. The executable code is then generated for the expression tree and shifts are inserted where indicated.
    Type: Grant
    Filed: May 27, 2008
    Date of Patent: February 5, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien
  • Publication number: 20120331232
    Abstract: An apparatus and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.
    Type: Application
    Filed: September 5, 2012
    Publication date: December 27, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, Alan Gara, Martin Ohmacht, Vijayalakshmi Srinivasan
  • Patent number: 8341615
    Abstract: Embodiments of the present invention address deficiencies of the art in respect to loop parallelization for a target architecture implementing a shared memory model and provide a novel and non-obvious method, system and computer program product for SIMD code generation for parallel loops using versioning and scheduling. In an embodiment of the invention, within a code compilation data processing system a parallel SIMD loop code generation method can include identifying a loop in a representation of source code as a parallel loop candidate, either through a user directive or through auto-parallelization.
    Type: Grant
    Filed: July 11, 2008
    Date of Patent: December 25, 2012
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Raul E. Silvera, Amy K. Wang, Guansong Zhang
  • Publication number: 20120290816
    Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.
    Type: Application
    Filed: July 23, 2012
    Publication date: November 15, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
  • Publication number: 20120246654
    Abstract: Mechanisms are provided for allocating threads for execution of a parallel region of code. A request for allocation of worker threads to execute the parallel region of code is received from a master thread. Cached thread allocation information identifying prior thread allocations that have been performed for the master thread are accessed. Worker threads are allocated to the master thread based on the cached thread allocation information. The parallel region of code is executed using the allocated worker threads.
    Type: Application
    Filed: March 24, 2011
    Publication date: September 27, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, John K.P. O'Brien