Patents by Inventor John A. Gunnels

John A. Gunnels has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Reducing Bandwidth Requirements for Matrix Multiplication

Publication number: 20090300091

Abstract: A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. The mechanism increases block size and divides each block into sub-blocks. By reversing the visitation order, the mechanism eliminates a sub-block load at the corner turns. The mechanism per forms sub-block matrix multiplication for each sub-block in a given block, and then repeats operation for a next block until all blocks are computed. The mechanism may determine block size and sub-block size to optimize load balancing and memory bandwidth. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.

Type: Application

Filed: May 30, 2008

Publication date: December 3, 2009

Applicant: International Business Machines Corporation

Inventors: Daniel A. Brokenshire, John A. Gunnels, Michael D. Kistler
Optimized Corner Turns for Local Storage and Bandwidth Reduction

Publication number: 20090292758

Abstract: A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. By reversing the visitation order, the mechanism eliminates a block load at the corner turns. In accordance with the illustrative embodiment, a corner return is referred to as a “bounce” corner turn and results in a serpentine patterned processing order of the matrix blocks. The mechanism allows the data processing system to perform a block matrix multiplication operation with a maximum of three block transfers per time step. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.

Type: Application

Filed: May 23, 2008

Publication date: November 26, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Daniel A. Brokenshire, John A. Gunnels, Michael D. Kistler
METHOD AND STRUCTURE FOR HIGH-PERFORMANCE MATRIX MULTIPLICATION IN THE PRESENCE OF SEVERAL ARCHITECTURAL OBSTACLES

Publication number: 20090198976

Abstract: A method (and apparatus) for processing data on a computer having a memory to store the data and a processing unit to execute the processing, the processing unit having a plurality of registers available for an internal working space for a data processing occurring in the processing unit, includes configuring the plurality of registers to include at least two sets of registers. A first set of the at least two sets interfaces with the processing unit for the data processing in a current processing cycle. A second set of the at least two sets is used for removing data from the processing unit of a previous processing cycle to be stored in the memory and preloading data into the processing unit from the memory, to be used for a next processing cycle.

Type: Application

Filed: February 6, 2008

Publication date: August 6, 2009

Inventors: Vernon R. Austel, John A. Gunnels
Method and structure for producing high performance linear algebra routines using preloading of floating point registers

Patent number: 7571435

Abstract: A method (and structure) for executing linear algebra subroutines, includes, for an execution code controlling operation of a floating point unit (FPU) performing the linear algebra subroutine execution, unrolling instructions to preload data into a floating point register (FReg) of the FPU. The unrolling generates an instruction to load data into the FReg and the instruction is inserted into a sequence of instructions that execute the linear algebra subroutine on the FPU.

Type: Grant

Filed: September 29, 2003

Date of Patent: August 4, 2009

Assignee: International Business Machines Corporation

Inventors: Fred Gehrung Gustavson, John A. Gunnels
Method and structure for an improved data reformatting procedure

Patent number: 7555604

Abstract: A method (and structure) of managing memory in which a low-level mechanism is executed to signal, in a sequence of instructions generated at a higher level, that at least a portion of a contiguous area of memory is permitted to be overwritten.

Type: Grant

Filed: January 9, 2006

Date of Patent: June 30, 2009

Assignee: International Business Machines Corporation

Inventors: Siddhartha Chatterjee, John A. Gunnels, Fred Gehrung Gustavson
METHOD AND STRUCTURE FOR PRODUCING HIGH PERFORMANCE LINEAR ALGEBRA ROUTINES USING STREAMING

Publication number: 20090150615

Abstract: A method (and structure) for executing a linear algebra subroutine on a computer having a cache, includes streaming data for matrices involved in processing the linear algebra subroutine such that data is processed using data for a first matrix stored in the cache as a matrix format and data from a second matrix and a third matrix is stored in a memory device at a higher level than the cache, the streaming providing data from the higher level as the streaming data is required for the processing.

Type: Application

Filed: January 5, 2009

Publication date: June 11, 2009

Inventors: Fred Gehrung Gustavson, John A. Gunnels
Performance Evaluation of Algorithmic Tasks and Dynamic Parameterization on Multi-Core Processing Systems

Publication number: 20090144745

Abstract: Apparatus for evaluating the performance of DMA-based algorithmic tasks on a target multi-core processing system includes a memory and at least one processor coupled to the memory. The processor is operative: to input a template for a specified task, the template including DMA-related parameters specifying DMA operations and computational operations to be performed; to evaluate performance for the specified task by running a benchmark on the target multi-core processing system, the benchmark being operative to generate data access patterns using DMA operations and invoking prescribed computation routines as specified by the input template; and to provide results of the benchmark indicative of a measure of performance of the specified task corresponding to the target multi-core processing system.

Type: Application

Filed: November 29, 2007

Publication date: June 4, 2009

Inventors: John A. Gunnels, Shakti Kapoor, Ravi Kothari, Yogish Sabharwal, James C. Sexton
Performance Evaluation of Algorithmic Tasks and Dynamic Parameterization on Multi-Core Processing Systems

Publication number: 20090144736

Abstract: A method for evaluating performance of DMA-based algorithmic tasks on a target multi-core processing system includes the steps of: inputting a template for a specified task, the template including DMA-related parameters specifying DMA operations and computational operations to be performed; evaluating performance for the specified task by running a benchmark on the target multi-core processing system, the benchmark being operative to generate data access patterns using DMA operations and invoking prescribed computation routines as specified by the input template; and providing results of the benchmark indicative of a measure of performance of the specified task corresponding to the target multi-core processing system.

Type: Application

Filed: May 29, 2008

Publication date: June 4, 2009

Applicant: International Business Machines Corporation

Inventors: John A. Gunnels, Shakti Kapoor, Ravi Kothari, Yogish Sabharwal, James C. Sexton
Performance Evaluation of Algorithmic Tasks and Dynamic Parameterization on Multi-Core Processing Systems

Publication number: 20090144738

Abstract: Apparatus for evaluating the performance of DMA-based algorithmic tasks on a target multi-core processing system includes a memory and at least one processor coupled to the memory. The processor is operative: to input a template for a specified task, the template including DMA-related parameters specifying DMA operations and computational operations to be performed; to evaluate performance for the specified task by running a benchmark on the target multi-core processing system, the benchmark being operative to generate data access patterns using DMA operations and invoking prescribed computation routines as specified by the input template; and to provide results of the benchmark indicative of a measure of performance of the specified task corresponding to the target multi-core processing system.

Type: Application

Filed: May 30, 2008

Publication date: June 4, 2009

Applicant: Interantional Business Machines Corporation

Inventors: John A. Gunnels, Shakti Kapoor, Ravi Kothari, Yogish Sabharwal, James C. Sexton
Performance Evaluation of Algorithmic Tasks and Dynamic Parameterization on Multi-Core Processing Systems

Publication number: 20090144744

Abstract: A method for evaluating performance of DMA-based algorithmic tasks on a target multi-core processing system includes the steps of: inputting a template for a specified task, the template including DMA-related parameters specifying DMA operations and computational operations to be performed; evaluating performance for the specified task by running a benchmark on the target multi-core processing system, the benchmark being operative to generate data access patterns using DMA operations and invoking prescribed computation routines as specified by the input template; and providing results of the benchmark indicative of a measure of performance of the specified task corresponding to the target multi-core processing system.

Type: Application

Filed: November 29, 2007

Publication date: June 4, 2009

Inventors: John A. Gunnels, Shakti Kapoor, Ravi Kothari, Yogish Sabharwal, James C. Sexton
METHOD AND STRUCTURE FOR PRODUCING HIGH PERFORMANCE LINEAR ALGEBRA ROUTINES USING COMPOSITE BLOCKING BASED ON L1 CACHE SIZE

Publication number: 20090106343

Abstract: A method (and structure) for performing a matrix subroutine, includes storing data for a matrix subroutine call in a computer memory in an increment block size that is based on a cache size.

Type: Application

Filed: December 22, 2008

Publication date: April 23, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Fred Gehrung Gustavson, John A. Gunnels
METHOD AND APPARATUS FOR ALLOCATING ARCHITECTURAL REGISTER RESOURCES AMONG THREADS IN A MULTI-THREADED MICROPROCESSOR CORE

Publication number: 20090100249

Abstract: One embodiment of a microprocessor core capable of executing a plurality of threads substantially simultaneously includes a plurality of register resources available for use by the threads, where the register resources are fewer in number than the number threads multiplied by a number of architectural register resources required per thread, and a supervisor for allocating the register resources among the plurality of threads.

Type: Application

Filed: October 10, 2007

Publication date: April 16, 2009

Inventors: ALEXANDRE E. EICHENBERGER, Michael Karl Gschwind, John A. Gunnels
System and method for detecting a faulty object in a system

Patent number: 7506196

Abstract: A method (and system) for detecting at least one faulty object in a system including a plurality of objects in communication with each other in an n-dimensional architecture, includes probing a first plane of objects in the n-dimensional architecture and probing at least one other plane of objects in the n-dimensional architecture which would result in identifying a faulty object in the system.

Type: Grant

Filed: February 7, 2005

Date of Patent: March 17, 2009

Assignee: International Business Machines Corporation

Inventors: John A. Gunnels, Fred Gehrung Gustavson, Robert Daniel Engle
SYSTEMS, METHODS AND COMPUTER PRODUCTS FOR CROSS-THREAD SCHEDULING

Publication number: 20090064152

Abstract: Systems, methods and computer products for cross-thread scheduling. Exemplary embodiments include a cross thread scheduling method for compiling code, the method including scheduling a scheduling unit with a scheduler sub-operation in response to the scheduling unit being in a non-multithreaded part of the code and scheduling the scheduling unit with a cross-thread scheduler sub-operation in response to the scheduling unit being in a multithreaded part of the code.

Type: Application

Filed: August 30, 2007

Publication date: March 5, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels, James L. McInnes, Mark P. Mendell
METHOD AND STRUCTURE FOR FAST IN-PLACE TRANSFORMATION OF STANDARD FULL AND PACKED MATRIX DATA FORMATS

Publication number: 20090063607

Abstract: A method and structure for an in-place transformation of matrix data. For a matrix A stored in one of a standard full format or a packed format and a transformation T having a compact representation, blocking parameters MB and NB are chosen, based on a cache size. A sub-matrix A1 of A, A1 having size M1=m*MB by N1=n*NB, is worked on, and any of a residual remainder of A is saved in a buffer B. Sub-matrix A1 is worked on by contiguously moving and contiguously transforming A1 in-place into a New Data Structure (NDS), applying the transformation T in units of MB*NB contiguous double words to the NDS format of A1, thereby replacing A1 with the contents of T(A1), and moving and transforming NDS T(A1) to standard data format T(A1) with holes for the remainder of A in buffer B. The contents of buffer B is contiguously copied into the holes of A2, thereby providing in-place transformed matrix T(A).

Type: Application

Filed: September 1, 2007

Publication date: March 5, 2009

Inventors: Fred Gehrung Gustavson, John A. Gunnels, James C. Sexton
METHOD AND STRUCTURE FOR FAST IN-PLACE TRANSFORMATION OF STANDARD FULL AND PACKED MATRIX DATA FORMATS

Publication number: 20090063529

Abstract: A computerized method provides for an in-place transformation of matrix A data including a New Data Structure (NDS) format and a transformation T having a compact representation. The NDS represents data of the matrix A in a format other than a row major format or a column major format, such that the data for the matrix A is stored as contiguous sub matrices of size MB by NB in an order predetermined to provide the data for a matrix processing. The transformation T is applied to the MB by NB blocks, using an in-place transformation processing, thereby replacing data of the block A1 with the contents of T(A1).

Type: Application

Filed: February 19, 2008

Publication date: March 5, 2009

Inventors: Fred Gehrung Gustavson, John A. Gunnels, James C. Sexton
SYSTEM AND METHOD FOR DETECTING A FAULTY OBJECT IN A SYSTEM

Publication number: 20090052334

Abstract: A method (and system) for detecting at least one faulty object in a system including a plurality of objects in communication with each other in an n-dimensional architecture, includes probing a first plane of objects in the n-dimensional architecture and probing at least one other plane of objects in the n-dimensional architecture which would result in identifying a faulty object in the system.

Type: Application

Filed: October 22, 2008

Publication date: February 26, 2009

Applicant: International Business Machines Corporation

Inventors: John A. Gunnels, Fred Gehrung Gustavson, Robert Daniel Engle
Method and structure for producing high performance linear algebra routines using a selectable one of six possible level 3 L1 kernel routines

Patent number: 7490120

Abstract: A method (and structure) for improving at least one of speed and efficiency when executing level 3 dense linear algebra subroutines on a computer. An optimal matrix subroutine is selected from among a plurality of matrix subroutines stored in a memory that could alternatively perform a level 3 matrix multiplication or factorization processing.

Type: Grant

Filed: September 29, 2003

Date of Patent: February 10, 2009

Assignee: International Business Machines Corporation

Inventors: Fred Gehrung Gustavson, John A. Gunnels
Method and structure for producing high performance linear algebra routines using composite blocking based on L1 cache size

Patent number: 7487195

Abstract: A method (and structure) for performing a matrix subroutine, includes storing data for a matrix subroutine call in a computer memory in an increment block size that is based on a cache size.

Type: Grant

Filed: September 29, 2003

Date of Patent: February 3, 2009

Assignee: International Business Machines Corporation

Inventors: Fred Gehrung Gustavson, John A. Gunnels
Method and structure for producing high performance linear algebra routines using streaming

Patent number: 7475101

Abstract: A method (and structure) of improving at least one of speed and efficiency when executing a linear algebra subroutine on a computer having a memory hierarchical structure including at least one cache, the computer having M levels of caches and a main memory. Based on sizes, it is determined, for a level 3 matrix multiplication processing, which matrix will have data for a submatrix block residing in a lower level cache of the computer and which two matrices will have data for submatrix blocks residing in at least one higher level cache or a memory. From a plurality of six kernels, two kernels are selected as optimal to use for executing the level 3 matrix multiplication processing as data streams from different levels of the M levels of cache, such that the processor will switch back and forth between the two selected kernels as streaming data traverses the different levels of cache.

Type: Grant

Filed: September 29, 2003

Date of Patent: January 6, 2009

Assignee: International Business Machines Corporation

Inventors: Fred Gehrung Gustavson, John A. Gunnels

prev … 2 3 4 5 6 7 8 next