Patents by Inventor John A. Gunnels

John A. Gunnels has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20090300091
    Abstract: A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. The mechanism increases block size and divides each block into sub-blocks. By reversing the visitation order, the mechanism eliminates a sub-block load at the corner turns. The mechanism per forms sub-block matrix multiplication for each sub-block in a given block, and then repeats operation for a next block until all blocks are computed. The mechanism may determine block size and sub-block size to optimize load balancing and memory bandwidth. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.
    Type: Application
    Filed: May 30, 2008
    Publication date: December 3, 2009
    Applicant: International Business Machines Corporation
    Inventors: Daniel A. Brokenshire, John A. Gunnels, Michael D. Kistler
  • Publication number: 20090292758
    Abstract: A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. By reversing the visitation order, the mechanism eliminates a block load at the corner turns. In accordance with the illustrative embodiment, a corner return is referred to as a “bounce” corner turn and results in a serpentine patterned processing order of the matrix blocks. The mechanism allows the data processing system to perform a block matrix multiplication operation with a maximum of three block transfers per time step. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.
    Type: Application
    Filed: May 23, 2008
    Publication date: November 26, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Daniel A. Brokenshire, John A. Gunnels, Michael D. Kistler
  • Publication number: 20090198976
    Abstract: A method (and apparatus) for processing data on a computer having a memory to store the data and a processing unit to execute the processing, the processing unit having a plurality of registers available for an internal working space for a data processing occurring in the processing unit, includes configuring the plurality of registers to include at least two sets of registers. A first set of the at least two sets interfaces with the processing unit for the data processing in a current processing cycle. A second set of the at least two sets is used for removing data from the processing unit of a previous processing cycle to be stored in the memory and preloading data into the processing unit from the memory, to be used for a next processing cycle.
    Type: Application
    Filed: February 6, 2008
    Publication date: August 6, 2009
    Inventors: Vernon R. Austel, John A. Gunnels
  • Patent number: 7571435
    Abstract: A method (and structure) for executing linear algebra subroutines, includes, for an execution code controlling operation of a floating point unit (FPU) performing the linear algebra subroutine execution, unrolling instructions to preload data into a floating point register (FReg) of the FPU. The unrolling generates an instruction to load data into the FReg and the instruction is inserted into a sequence of instructions that execute the linear algebra subroutine on the FPU.
    Type: Grant
    Filed: September 29, 2003
    Date of Patent: August 4, 2009
    Assignee: International Business Machines Corporation
    Inventors: Fred Gehrung Gustavson, John A. Gunnels
  • Patent number: 7555604
    Abstract: A method (and structure) of managing memory in which a low-level mechanism is executed to signal, in a sequence of instructions generated at a higher level, that at least a portion of a contiguous area of memory is permitted to be overwritten.
    Type: Grant
    Filed: January 9, 2006
    Date of Patent: June 30, 2009
    Assignee: International Business Machines Corporation
    Inventors: Siddhartha Chatterjee, John A. Gunnels, Fred Gehrung Gustavson
  • Publication number: 20090150615
    Abstract: A method (and structure) for executing a linear algebra subroutine on a computer having a cache, includes streaming data for matrices involved in processing the linear algebra subroutine such that data is processed using data for a first matrix stored in the cache as a matrix format and data from a second matrix and a third matrix is stored in a memory device at a higher level than the cache, the streaming providing data from the higher level as the streaming data is required for the processing.
    Type: Application
    Filed: January 5, 2009
    Publication date: June 11, 2009
    Inventors: Fred Gehrung Gustavson, John A. Gunnels
  • Publication number: 20090144745
    Abstract: Apparatus for evaluating the performance of DMA-based algorithmic tasks on a target multi-core processing system includes a memory and at least one processor coupled to the memory. The processor is operative: to input a template for a specified task, the template including DMA-related parameters specifying DMA operations and computational operations to be performed; to evaluate performance for the specified task by running a benchmark on the target multi-core processing system, the benchmark being operative to generate data access patterns using DMA operations and invoking prescribed computation routines as specified by the input template; and to provide results of the benchmark indicative of a measure of performance of the specified task corresponding to the target multi-core processing system.
    Type: Application
    Filed: November 29, 2007
    Publication date: June 4, 2009
    Inventors: John A. Gunnels, Shakti Kapoor, Ravi Kothari, Yogish Sabharwal, James C. Sexton
  • Publication number: 20090144736
    Abstract: A method for evaluating performance of DMA-based algorithmic tasks on a target multi-core processing system includes the steps of: inputting a template for a specified task, the template including DMA-related parameters specifying DMA operations and computational operations to be performed; evaluating performance for the specified task by running a benchmark on the target multi-core processing system, the benchmark being operative to generate data access patterns using DMA operations and invoking prescribed computation routines as specified by the input template; and providing results of the benchmark indicative of a measure of performance of the specified task corresponding to the target multi-core processing system.
    Type: Application
    Filed: May 29, 2008
    Publication date: June 4, 2009
    Applicant: International Business Machines Corporation
    Inventors: John A. Gunnels, Shakti Kapoor, Ravi Kothari, Yogish Sabharwal, James C. Sexton
  • Publication number: 20090144738
    Abstract: Apparatus for evaluating the performance of DMA-based algorithmic tasks on a target multi-core processing system includes a memory and at least one processor coupled to the memory. The processor is operative: to input a template for a specified task, the template including DMA-related parameters specifying DMA operations and computational operations to be performed; to evaluate performance for the specified task by running a benchmark on the target multi-core processing system, the benchmark being operative to generate data access patterns using DMA operations and invoking prescribed computation routines as specified by the input template; and to provide results of the benchmark indicative of a measure of performance of the specified task corresponding to the target multi-core processing system.
    Type: Application
    Filed: May 30, 2008
    Publication date: June 4, 2009
    Applicant: Interantional Business Machines Corporation
    Inventors: John A. Gunnels, Shakti Kapoor, Ravi Kothari, Yogish Sabharwal, James C. Sexton
  • Publication number: 20090144744
    Abstract: A method for evaluating performance of DMA-based algorithmic tasks on a target multi-core processing system includes the steps of: inputting a template for a specified task, the template including DMA-related parameters specifying DMA operations and computational operations to be performed; evaluating performance for the specified task by running a benchmark on the target multi-core processing system, the benchmark being operative to generate data access patterns using DMA operations and invoking prescribed computation routines as specified by the input template; and providing results of the benchmark indicative of a measure of performance of the specified task corresponding to the target multi-core processing system.
    Type: Application
    Filed: November 29, 2007
    Publication date: June 4, 2009
    Inventors: John A. Gunnels, Shakti Kapoor, Ravi Kothari, Yogish Sabharwal, James C. Sexton
  • Publication number: 20090106343
    Abstract: A method (and structure) for performing a matrix subroutine, includes storing data for a matrix subroutine call in a computer memory in an increment block size that is based on a cache size.
    Type: Application
    Filed: December 22, 2008
    Publication date: April 23, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Fred Gehrung Gustavson, John A. Gunnels
  • Publication number: 20090100249
    Abstract: One embodiment of a microprocessor core capable of executing a plurality of threads substantially simultaneously includes a plurality of register resources available for use by the threads, where the register resources are fewer in number than the number threads multiplied by a number of architectural register resources required per thread, and a supervisor for allocating the register resources among the plurality of threads.
    Type: Application
    Filed: October 10, 2007
    Publication date: April 16, 2009
    Inventors: ALEXANDRE E. EICHENBERGER, Michael Karl Gschwind, John A. Gunnels
  • Patent number: 7506196
    Abstract: A method (and system) for detecting at least one faulty object in a system including a plurality of objects in communication with each other in an n-dimensional architecture, includes probing a first plane of objects in the n-dimensional architecture and probing at least one other plane of objects in the n-dimensional architecture which would result in identifying a faulty object in the system.
    Type: Grant
    Filed: February 7, 2005
    Date of Patent: March 17, 2009
    Assignee: International Business Machines Corporation
    Inventors: John A. Gunnels, Fred Gehrung Gustavson, Robert Daniel Engle
  • Publication number: 20090064152
    Abstract: Systems, methods and computer products for cross-thread scheduling. Exemplary embodiments include a cross thread scheduling method for compiling code, the method including scheduling a scheduling unit with a scheduler sub-operation in response to the scheduling unit being in a non-multithreaded part of the code and scheduling the scheduling unit with a cross-thread scheduler sub-operation in response to the scheduling unit being in a multithreaded part of the code.
    Type: Application
    Filed: August 30, 2007
    Publication date: March 5, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels, James L. McInnes, Mark P. Mendell
  • Publication number: 20090063607
    Abstract: A method and structure for an in-place transformation of matrix data. For a matrix A stored in one of a standard full format or a packed format and a transformation T having a compact representation, blocking parameters MB and NB are chosen, based on a cache size. A sub-matrix A1 of A, A1 having size M1=m*MB by N1=n*NB, is worked on, and any of a residual remainder of A is saved in a buffer B. Sub-matrix A1 is worked on by contiguously moving and contiguously transforming A1 in-place into a New Data Structure (NDS), applying the transformation T in units of MB*NB contiguous double words to the NDS format of A1, thereby replacing A1 with the contents of T(A1), and moving and transforming NDS T(A1) to standard data format T(A1) with holes for the remainder of A in buffer B. The contents of buffer B is contiguously copied into the holes of A2, thereby providing in-place transformed matrix T(A).
    Type: Application
    Filed: September 1, 2007
    Publication date: March 5, 2009
    Inventors: Fred Gehrung Gustavson, John A. Gunnels, James C. Sexton
  • Publication number: 20090063529
    Abstract: A computerized method provides for an in-place transformation of matrix A data including a New Data Structure (NDS) format and a transformation T having a compact representation. The NDS represents data of the matrix A in a format other than a row major format or a column major format, such that the data for the matrix A is stored as contiguous sub matrices of size MB by NB in an order predetermined to provide the data for a matrix processing. The transformation T is applied to the MB by NB blocks, using an in-place transformation processing, thereby replacing data of the block A1 with the contents of T(A1).
    Type: Application
    Filed: February 19, 2008
    Publication date: March 5, 2009
    Inventors: Fred Gehrung Gustavson, John A. Gunnels, James C. Sexton
  • Publication number: 20090052334
    Abstract: A method (and system) for detecting at least one faulty object in a system including a plurality of objects in communication with each other in an n-dimensional architecture, includes probing a first plane of objects in the n-dimensional architecture and probing at least one other plane of objects in the n-dimensional architecture which would result in identifying a faulty object in the system.
    Type: Application
    Filed: October 22, 2008
    Publication date: February 26, 2009
    Applicant: International Business Machines Corporation
    Inventors: John A. Gunnels, Fred Gehrung Gustavson, Robert Daniel Engle
  • Patent number: 7490120
    Abstract: A method (and structure) for improving at least one of speed and efficiency when executing level 3 dense linear algebra subroutines on a computer. An optimal matrix subroutine is selected from among a plurality of matrix subroutines stored in a memory that could alternatively perform a level 3 matrix multiplication or factorization processing.
    Type: Grant
    Filed: September 29, 2003
    Date of Patent: February 10, 2009
    Assignee: International Business Machines Corporation
    Inventors: Fred Gehrung Gustavson, John A. Gunnels
  • Patent number: 7487195
    Abstract: A method (and structure) for performing a matrix subroutine, includes storing data for a matrix subroutine call in a computer memory in an increment block size that is based on a cache size.
    Type: Grant
    Filed: September 29, 2003
    Date of Patent: February 3, 2009
    Assignee: International Business Machines Corporation
    Inventors: Fred Gehrung Gustavson, John A. Gunnels
  • Patent number: 7475101
    Abstract: A method (and structure) of improving at least one of speed and efficiency when executing a linear algebra subroutine on a computer having a memory hierarchical structure including at least one cache, the computer having M levels of caches and a main memory. Based on sizes, it is determined, for a level 3 matrix multiplication processing, which matrix will have data for a submatrix block residing in a lower level cache of the computer and which two matrices will have data for submatrix blocks residing in at least one higher level cache or a memory. From a plurality of six kernels, two kernels are selected as optimal to use for executing the level 3 matrix multiplication processing as data streams from different levels of the M levels of cache, such that the processor will switch back and forth between the two selected kernels as streaming data traverses the different levels of cache.
    Type: Grant
    Filed: September 29, 2003
    Date of Patent: January 6, 2009
    Assignee: International Business Machines Corporation
    Inventors: Fred Gehrung Gustavson, John A. Gunnels