Patents by Inventor Alexandre E. Eichenberger
Alexandre E. Eichenberger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 8266587Abstract: Disclosure for using SLP in processing a plurality of statements, wherein the statements are associated with an array having a number of array positions, and each statement includes one or more expressions. Expressions are gathered for each of the statements into a structure comprising a single merge stream furnished with a location for each expression. The location for a given expression is associated with one of the array positions. A plurality of expressions are selectively identified and SLP packing operations are applied to the identified expressions to merge into one or more isomorphic sub-streams. Expressions of the isomorphic sub-streams and other expressions of the single merge stream are combined into a number of input vectors that are substantially equal in length to one another. A location vector is generated that contains the respective locations for all of the expressions in the single merge stream.Type: GrantFiled: December 26, 2007Date of Patent: September 11, 2012Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
-
Patent number: 8255884Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.Type: GrantFiled: June 6, 2008Date of Patent: August 28, 2012Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
-
Publication number: 20120210073Abstract: An apparatus, method and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.Type: ApplicationFiled: February 11, 2011Publication date: August 16, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Alan Gara, Martin Ohmacht, Vijayalakshmi Srinivasan
-
Patent number: 8245208Abstract: Generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths, is disclosed. Further, a preferred embodiment of the present invention includes a novel technique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. Length conversion operations, for packing and unpacking data values, are included in the alignment handling framework. These operations are formally defined in terms of standard SIMD instructions that are readily available on various SIMD platforms. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.Type: GrantFiled: December 4, 2008Date of Patent: August 14, 2012Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
-
Publication number: 20120204189Abstract: A runtime dependence-aware scheduling of dependent iterations mechanism is provided. Computation is performed for one or more iterations of computer executable code by a main thread. Dependence information is determined for a plurality of memory accesses within the computer executable code using modified executable code using a set of dependence threads. Using the dependence information, a determination is made as to whether a subset of a set of uncompleted iterations in the plurality of iterations is capable of being executed ahead-of-time by the one or more available threads in the data processing system. If the subset of the set of uncompleted iterations in the plurality of iterations is capable of being executed ahead-of-time, the main thread is signaled to skip the subset of the set of uncompleted iterations and the set of assist threads is signaled to execute the subset of the set of uncompleted iterations.Type: ApplicationFiled: April 10, 2012Publication date: August 9, 2012Applicant: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kathryn M. O'Brien, Xiaotong Zhuang
-
Publication number: 20120198425Abstract: A compiler of a single instruction multiple data (SIMD) information handling system (IHS) identifies “if-then-else” statements that offer opportunity for conditional branch conversion. The compiler converts those “if-then-else” statements into “conditional branch and prepare” statements as well as “branch return” statements. The compiler compiles source code file information containing “if-then-else” statement opportunities into compiled code, namely an executable program. The SIMD IHS employs a processor or processors to execute the executable program. During execution, the processor generates and updates SIMD lane mask information to track and manage the conditional branch loops of the executing program. The processor saves branch addresses and employs SIMD lane masks to identify conditional branch loops with different branch conditions than previous conditional branch loops. The processor may reduce SIMD IHS processing time during processing of compiled code of the original “if-then-else” statements.Type: ApplicationFiled: January 28, 2011Publication date: August 2, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Brian Flachs, Dorit Nuzman, Ira Rosen, Ulrich Weigand, Ayal Zaks
-
Publication number: 20120191953Abstract: Mechanisms for extracting data dependencies during runtime are provided. With these mechanisms, a portion of code having a loop is executed. A first parallel execution group is generated for the loop, the group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The first parallel execution group is executed by executing each iteration in parallel. Store data for iterations are stored in corresponding store caches of the processor, Dependency checking logic of the processor determines, for each iteration, whether the iteration has a data dependence. Only the store data for stores where there was no data dependence determined are committed to memory.Type: ApplicationFiled: March 30, 2012Publication date: July 26, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Brian K. Flachs, Charles R. Johns, Mark R. Nutter
-
Publication number: 20120192167Abstract: Mechanisms for extracting data dependencies during runtime are provided. The mechanisms execute a portion of code having a loop and generate, for the loop, a first parallel execution group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The mechanisms further execute the first parallel execution group and determining, for each iteration in the subset of iterations, whether the iteration has a data dependence. Moreover, the mechanisms commit store data to system memory only for stores performed by iterations in the subset of iterations for which no data dependence is determined. Store data of stores performed by iterations in the subset of iterations for which a data dependence is determined is not committed to the system memory.Type: ApplicationFiled: March 30, 2012Publication date: July 26, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Brian K. Flachs, Charles R. Johns, Mark R. Nutter
-
Publication number: 20120180031Abstract: Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.Type: ApplicationFiled: March 26, 2012Publication date: July 12, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Brian K. Flachs, Charles R. Johns, Mark R. Nutter
-
Patent number: 8214831Abstract: A runtime dependence-aware scheduling of dependent iterations mechanism is provided. Computation is performed for one or more iterations of computer executable code by a main thread. Dependence information is determined for a plurality of memory accesses within the computer executable code using modified executable code using a set of dependence threads. Using the dependence information, a determination is made as to whether a subset of a set of uncompleted iterations in the plurality of iterations is capable of being executed ahead-of-time by the one or more available threads in the data processing system. If the subset of the set of uncompleted iterations in the plurality of iterations is capable of being executed ahead-of-time, the main thread is signaled to skip the subset of the set of uncompleted iterations and the set of assist threads is signaled to execute the subset of the set of uncompleted iterations.Type: GrantFiled: May 5, 2009Date of Patent: July 3, 2012Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kathryn M. O'Brien, Xiaotong Zhuang
-
Patent number: 8196124Abstract: Loop code is generated to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths. Further, a preferred embodiment of the present invention includes a novel technique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.Type: GrantFiled: August 22, 2008Date of Patent: June 5, 2012Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
-
Patent number: 8171464Abstract: An approach is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.Type: GrantFiled: May 16, 2008Date of Patent: May 1, 2012Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
-
Patent number: 8156310Abstract: One embodiment of the present method and apparatus for data stream alignment support includes retrieving a first input from a first register file, retrieving a second input from a second register file, the second register file being dedicated to a stream shift unit and performing the stream shift instruction in accordance with the first input, the second input and a third input.Type: GrantFiled: September 11, 2006Date of Patent: April 10, 2012Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Michael Karl Gschwind, John-David Wellman, Peng Wu
-
Patent number: 8146067Abstract: Vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores is presented. In the framework presented herein, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirement of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residue iteration counts, and multiple statements with arbitrary alignment combinations. Beyond generating a valid simdization, a preferred embodiment further improves the quality of the generated codes. Four stream-shift placement policies are disclosed, which minimize the number of data reorganization generated by the alignment handling.Type: GrantFiled: April 23, 2008Date of Patent: March 27, 2012Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Peng Wu
-
Patent number: 8136105Abstract: A computer program product is provided for extracting SIMD parallelism. The computer program product includes instructions for providing a stream of input code comprising basic blocks; identifying pairs of statements that are semi-isomorphic with respect to each other within a basic block; iteratively combining into packs, pairs of statements that are semi-isomorphic with respect to each other, and combining packs into combined packs; collecting packs whose statements can be scheduled together for processing; and generating SIMD instructions for each pack to provide for extracting the SIMD parallelism..Type: GrantFiled: September 29, 2006Date of Patent: March 13, 2012Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu, Peng Zhao
-
Publication number: 20120060015Abstract: Mechanisms for performing a scattered load operation are provided. With these mechanisms, an extended address is received in a cache memory of a processor. The extended address has a plurality of data element address portions that specify a plurality of data elements to be accessed using the single extended address. Each of the plurality of data element address portions is provided to corresponding data element selector logic units of the cache memory. Each data element selector logic unit in the cache memory selects a corresponding data element from a cache line buffer based on a corresponding data element address portion provided to the data element selector logic unit. Each data element selector logic unit outputs the corresponding data element for use by the processor.Type: ApplicationFiled: September 7, 2010Publication date: March 8, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Michael K. Gschwind, Valentina Salapura
-
Publication number: 20120060016Abstract: Mechanisms for performing a scattered load operation are provided. With these mechanisms, a gather instruction is receive in a logic unit of a processor, the gather instruction specifying a plurality of addresses in a memory from which data is to be loaded into a target vector register of the processor. A plurality of separate load instructions for loading the data from the plurality of addresses in the memory are automatically generated within the logic unit. The plurality of separate load instructions are sent, from the logic unit, to one or more load/store units of the processor. The data corresponding to the plurality of addresses is gathered in a buffer of the processor. The logic unit then writes data stored in the buffer to the target vector register.Type: ApplicationFiled: September 7, 2010Publication date: March 8, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Michael K. Gschwind, Valentina Salapura
-
Publication number: 20120011348Abstract: Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pair-wise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.Type: ApplicationFiled: July 12, 2010Publication date: January 12, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels, Valentina Salapura
-
Patent number: 8087011Abstract: Mechanisms for domain stretching for an advanced dual-representation polyhedral loop transformation framework are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.Type: GrantFiled: September 26, 2007Date of Patent: December 27, 2011Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
-
Patent number: 8087010Abstract: Mechanisms for selective code generation optimization for an advanced dual-representation polyhedral loop transformation framework are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.Type: GrantFiled: September 26, 2007Date of Patent: December 27, 2011Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache