Patents by Inventor Alan G. Gara

Alan G. Gara has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10558575
    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform a second operation when an incoming operand set arrives at the plurality of processing elements.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: February 11, 2020
    Assignee: Intel Corporation
    Inventors: Kermin E. Fleming, Jr., Kent D. Glossop, Simon C. Steely, Jr., Jinjie Tang, Alan G. Gara
  • Publication number: 20180189231
    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform a second operation when an incoming operand set arrives at the plurality of processing elements.
    Type: Application
    Filed: December 30, 2016
    Publication date: July 5, 2018
    Inventors: KERMIN E. FLEMING, JR., KENT D. GLOSSOP, SIMON C. STEELY, JR., JINJIE TANG, ALAN G. GARA
  • Patent number: 9864423
    Abstract: Apparatus and methods may provide for characterizing a plurality of similar components of a distributed computing system based on a maximum safe operation level associated with each component and storing characterization data in a database and allocating non-uniform power to each similar component based at least in part on the characterization data in the database to substantially equalize performance of the components.
    Type: Grant
    Filed: December 24, 2015
    Date of Patent: January 9, 2018
    Assignee: Intel Corporation
    Inventors: Alan G. Gara, Steve S. Sylvester, Jonathan M. Eastep, Ramkumar Nagappan, Christopher M. Cantalupo
  • Patent number: 9857858
    Abstract: A method and system for managing power consumption and performance of computing systems are described herein. The method includes monitoring an overall power consumption of the computing systems to determine whether the overall power consumption is above or below an overall power consumption limit, and monitoring a performance of each computing system to determine whether the performance is within a performance tolerance. The method further includes adjusting the power consumption limits for the computing systems or the performances of the computing systems such that the overall power consumption is below the overall power consumption limit and the performance of each computing system is within the performance tolerance.
    Type: Grant
    Filed: May 17, 2012
    Date of Patent: January 2, 2018
    Assignee: Intel Corporation
    Inventors: Devadatta V. Bodas, John H. Crawford, Alan G. Gara
  • Publication number: 20170185129
    Abstract: Apparatus and methods may provide for characterizing a plurality of similar components of a distributed computing system based on a maximum safe operation level associated with each component and storing characterization data in a database and allocating non-uniform power to each similar component based at least in part on the characterization data in the database to substantially equalize performance of the components.
    Type: Application
    Filed: December 24, 2015
    Publication date: June 29, 2017
    Applicant: Intel Corporation
    Inventors: Alan G. Gara, Steve S. Sylvester, Jonathan M. Eastep, Ramkumar Nagappan, Christopher M. Cantalupo
  • Publication number: 20150169026
    Abstract: A method and system for managing power consumption and performance of computing systems are described herein. The method includes monitoring an overall power consumption of the computing systems to determine whether the overall power consumption is above or below an overall power consumption limit, and monitoring a performance of each computing system to determine whether the performance is within a performance tolerance. The method further includes adjusting the power consumption limits for the computing systems or the performances of the computing systems such that the overall power consumption is below the overall power consumption limit and the performance of each computing system is within the performance tolerance.
    Type: Application
    Filed: May 17, 2012
    Publication date: June 18, 2015
    Inventors: Devadatta V. Bodas, John H. Crawford, Alan G Gara
  • Patent number: 8904392
    Abstract: A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU is further programmed to monitor event signals issued from non-processor devices.
    Type: Grant
    Filed: May 31, 2012
    Date of Patent: December 2, 2014
    Assignee: International Business Machines Corporation
    Inventors: George Chiu, Alan G. Gara, Valentina Salapura
  • Patent number: 8756350
    Abstract: An apparatus and method for tracking coherence event signals transmitted in a multiprocessor system. The apparatus comprises a coherence logic unit, each unit having a plurality of queue structures with each queue structure associated with a respective sender of event signals transmitted in the system. A timing circuit associated with a queue structure controls enqueuing and dequeuing of received coherence event signals, and, a counter tracks a number of coherence event signals remaining enqueued in the queue structure and dequeued since receipt of a timestamp signal. A counter mechanism generates an output signal indicating that all of the coherence event signals present in the queue structure at the time of receipt of the timestamp signal have been dequeued. In one embodiment, the timestamp signal is asserted at the start of a memory synchronization operation and, the output signal indicates that all coherence events present when the timestamp signal was asserted have completed.
    Type: Grant
    Filed: June 26, 2007
    Date of Patent: June 17, 2014
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Martin Ohmacht, Valentina Salapura, Pavlos Vranas
  • Patent number: 8677073
    Abstract: A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
    Type: Grant
    Filed: August 16, 2012
    Date of Patent: March 18, 2014
    Assignee: Intel Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk I. Hoenicke, Martin Ohmacht, Valentina Salapura, Pavlos M. Vranas
  • Patent number: 8667049
    Abstract: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency.
    Type: Grant
    Filed: August 3, 2012
    Date of Patent: March 4, 2014
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, George L. Chiu, Thomas M. Cipolla, Paul W. Coteus, Alan G. Gara, Mark E. Giampap, Philip Heidlberger, Gerard V. Kopcsay, Lawrence S. Mok, Todd E. Takken
  • Patent number: 8627010
    Abstract: An apparatus and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.
    Type: Grant
    Filed: September 5, 2012
    Date of Patent: January 7, 2014
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Alan G. Gara, Martin Ohmacht, Vijayalakshmi Srinivasan
  • Patent number: 8516197
    Abstract: An apparatus, method and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.
    Type: Grant
    Filed: February 11, 2011
    Date of Patent: August 20, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Alan G. Gara, Martin Ohmacht, Vijayalakshmi Srinivasan
  • Patent number: 8468416
    Abstract: A method and system are disclosed for providing combined error code protection and subgroup parity protection for a given group of n bits. The method comprises the steps of identifying a number, m, of redundant bits for said error protection; and constructing a matrix P, wherein multiplying said given group of n bits with P produces m redundant error correction code (ECC) protection bits, and two columns of P provide parity protection for subgroups of said given group of n bits. In the preferred embodiment of the invention, the matrix P is constructed by generating permutations of m bit wide vectors with three or more, but an odd number of, elements with value one and the other elements with value zero; and assigning said vectors to rows of the matrix P.
    Type: Grant
    Filed: June 26, 2007
    Date of Patent: June 18, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alan G. Gara, Dong Chen, Philip Heidelberger, Martin Ohmacht
  • Publication number: 20120311299
    Abstract: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency.
    Type: Application
    Filed: August 3, 2012
    Publication date: December 6, 2012
    Applicant: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, George L. Chiu, Thomas M. Cipolla, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidlberger, Gerard V. Kopcsay, Lawrence S. Mok, Todd E. Takken
  • Publication number: 20120311272
    Abstract: A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
    Type: Application
    Filed: August 16, 2012
    Publication date: December 6, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Matthias A. Blumrich, Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk I. Hoenicke, Martin Ohmacht, Valentina Salapura, Pavlos M. Vranas
  • Publication number: 20120304020
    Abstract: A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU is further programmed to monitor event signals issued from non-processor devices.
    Type: Application
    Filed: May 31, 2012
    Publication date: November 29, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: George Chiu, Alan G. Gara, Valentina Salapura
  • Patent number: 8255638
    Abstract: A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
    Type: Grant
    Filed: May 1, 2008
    Date of Patent: August 28, 2012
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk I. Hoenicke, Martin Ohmacht, Valentina Salapura, Pavlos M. Vranas
  • Patent number: 8250133
    Abstract: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System- On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency.
    Type: Grant
    Filed: June 26, 2009
    Date of Patent: August 21, 2012
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, George L. Chiu, Thomas M. Cipolla, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Gerard V. Kopcsay, Lawrence S. Mok, Todd E. Takken
  • Patent number: 8230433
    Abstract: A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system.
    Type: Grant
    Filed: June 26, 2007
    Date of Patent: July 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: George Chiu, Alan G. Gara, Valentina Salapura
  • Patent number: 8161248
    Abstract: A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
    Type: Grant
    Filed: November 24, 2010
    Date of Patent: April 17, 2012
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Phillip Heidelberger, Dirk Hoenicke, Martin Ohmacht