Patents by Inventor Alan Gara

Alan Gara has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230412524
    Abstract: A device and method that implements a multi-stage electrical interconnection network is provided. The electronic device includes a plurality of computing devices and a plurality of switches grouped into a plurality of groups. Switches, of the plurality of switches, in a same group are configured to be fully connected to computing devices in the same group, each of switches of the plurality of switches included in a first group among the plurality of groups is configured to have a ono-to-one connection with any one of switches included in a second group, and a connection between the computing devices in the same group and the switches in the same group and a connection between switches in in the plurality of groups are electrical connections.
    Type: Application
    Filed: September 5, 2023
    Publication date: December 21, 2023
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: WONSEOK LEE, Alan Gara, YOUNG JUN HONG, WONYONG LEE, WOOSEOK CHANG
  • Publication number: 20230369171
    Abstract: A computing device includes: a processor; a memory stack in which memories connected to the processor are stacked; and a substrate disposed under the processor, wherein a network bandwidth between the processor and the substrate is five or less times a memory bandwidth between the processor and the memory stack.
    Type: Application
    Filed: July 25, 2023
    Publication date: November 16, 2023
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Young Jun HONG, Wonyong LEE, Alan GARA, Se Hyun YANG, Wooseok CHANG
  • Publication number: 20230253294
    Abstract: A computing device includes: a processor; a memory stack in which memories connected to the processor are stacked; and a substrate disposed under the processor, wherein a network bandwidth between the processor and the substrate is five or less times a memory bandwidth between the processor and the memory stack.
    Type: Application
    Filed: January 25, 2023
    Publication date: August 10, 2023
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Wonyong LEE, Alan GARA, Se Hyun YANG, Young Jun HONG, Wooseok CHANG
  • Publication number: 20230254269
    Abstract: A device and method that implements a multi-stage electrical interconnection network is provided. The electronic device includes a plurality of computing devices and a plurality of switches grouped into a plurality of groups. Switches, of the plurality of switches, in a same group are configured to be fully connected to computing devices in the same group, each of switches of the plurality of switches included in a first group among the plurality of groups is configured to have a ono-to-one connection with any one of switches included in a second group, and a connection between the computing devices in the same group and the switches in the same group and a connection between switches in in the plurality of groups are electrical connections.
    Type: Application
    Filed: September 9, 2022
    Publication date: August 10, 2023
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: WONSEOK LEE, Alan Gara, YOUNG JUN HONG, WONYONG LEE, WOOSEOK CHANG
  • Publication number: 20230254253
    Abstract: Message splitting and aggregation in a multi-stage electrical interconnection network are disclosed. A method of operating an electronic device comprised of computing devices, includes splitting, into segments, a message to be transmitted from a first of the computing devices, transmitting the segments to a second of the computing devices through a multi-channel that is based on an electrical connection between the first computing device and a plurality of switches, wherein the multi-channel includes channels respectively including electrical connections, the electrical connections connecting the first computing device with the second computing device, and reconstructing the message by aggregating the segments in the second computing device, wherein a bandwidth of the multi-channel transmitting the segments is greater than a maximum bandwidth of a single electrical connection of the electrical connections.
    Type: Application
    Filed: January 26, 2023
    Publication date: August 10, 2023
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Young Jun HONG, Alan GARA, Wonseok LEE, Wonyong LEE, Wooseok CHANG
  • Patent number: 10740097
    Abstract: Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.
    Type: Grant
    Filed: May 20, 2016
    Date of Patent: August 11, 2020
    Assignee: International Business Machines Corporation
    Inventors: Dong Chen, Paul W. Coteus, Noel A. Eisley, Alan Gara, Philip Heidelberger, Robert M. Senger, Valentina Salapura, Burkhard Steinmacher-Burow, Yutaka Sugawara, Todd E. Takken
  • Patent number: 10713043
    Abstract: Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups.
    Type: Grant
    Filed: March 12, 2018
    Date of Patent: July 14, 2020
    Assignee: International Business Machines Corporation
    Inventors: Alan Gara, David L. Satterfield, Robert E. Walkup
  • Patent number: 10140179
    Abstract: A method and system are disclosed for providing combined error code protection and subgroup parity protection for a given group of n bits. The method comprises the steps of identifying a number, m, of redundant bits for said error protection; and constructing a matrix P, wherein multiplying said given group of n bits with P produces m redundant error correction code (ECC) protection bits, and two columns of P provide parity protection for subgroups of said given group of n bits. In the preferred embodiment of the invention, the matrix P is constructed by generating permutations of m bit wide vectors with three or more, but an odd number of, elements with value one and the other elements with value zero; and assigning said vectors to rows of the matrix P.
    Type: Grant
    Filed: December 17, 2015
    Date of Patent: November 27, 2018
    Assignee: International Business Machines Corporation
    Inventors: Alan Gara, Dong Chen, Philip Heidelberger, Martin Ohmacht
  • Patent number: 10069599
    Abstract: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
    Type: Grant
    Filed: December 17, 2015
    Date of Patent: September 4, 2018
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Paul W. Coteus, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Todd E. Takken, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
  • Publication number: 20180203693
    Abstract: Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups.
    Type: Application
    Filed: March 12, 2018
    Publication date: July 19, 2018
    Inventors: Alan Gara, David L. Satterfield, Robert E. Walkup
  • Patent number: 9971713
    Abstract: A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.
    Type: Grant
    Filed: April 30, 2015
    Date of Patent: May 15, 2018
    Assignee: GLOBALFOUNDRIES INC.
    Inventors: Sameh Asaad, Ralph E. Bellofatto, Michael A. Blocksome, Matthias A. Blumrich, Peter Boyle, Jose R. Brunheroto, Dong Chen, Chen-Yong Cher, George L. Chiu, Norman Christ, Paul W. Coteus, Kristan D. Davis, Gabor J. Dozsa, Alexandre E. Eichenberger, Noel A. Eisley, Matthew R. Ellavsky, Kahn C. Evans, Bruce M. Fleischer, Thomas W. Fox, Alan Gara, Mark E. Giampapa, Thomas M. Gooding, Michael K. Gschwind, John A. Gunnels, Shawn A. Hall, Rudolf A. Haring, Philip Heidelberger, Todd A. Inglett, Brant L. Knudson, Gerard V. Kopcsay, Sameer Kumar, Amith R. Mamidala, James A. Marcella, Mark G. Megerian, Douglas R. Miller, Samuel J. Miller, Adam J. Muff, Michael B. Mundy, John K. O'Brien, Kathryn M. O'Brien, Martin Ohmacht, Jeffrey J. Parker, Ruth J. Poole, Joseph D. Ratterman, Valentina Salapura, David L. Satterfield, Robert M. Senger, Burkhard Steinmacher-Burow, William M. Stockdell, Craig B. Stunkel, Krishnan Sugavanam, Yutaka Sugawara, Todd E. Takken, Barry M. Trager, James L. Van Oosten, Charles D. Wait, Robert E. Walkup, Alfred T. Watson, Robert W. Wisniewski, Peng Wu
  • Patent number: 9921831
    Abstract: Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups.
    Type: Grant
    Filed: October 12, 2016
    Date of Patent: March 20, 2018
    Assignee: International Business Machines Corporation
    Inventors: Alan Gara, David L. Satterfield, Robert E. Walkup
  • Publication number: 20170068536
    Abstract: Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups.
    Type: Application
    Filed: October 12, 2016
    Publication date: March 9, 2017
    Inventors: Alan Gara, David L. Satterfield, Robert E. Walkup
  • Patent number: 9507647
    Abstract: In a multiprocessor system, a conflict checking mechanism is implemented in the L2 cache memory. Different versions of speculative writes are maintained in different ways of the cache. A record of speculative writes is maintained in the cache directory. Conflict checking occurs as part of directory lookup. Speculative versions that do not conflict are aggregated into an aggregated version in a different way of the cache. Speculative memory access requests do not go to main memory.
    Type: Grant
    Filed: January 18, 2011
    Date of Patent: November 29, 2016
    Assignee: GLOBALFOUNDRIES INC.
    Inventors: Matthias A. Blumrich, Luis H. Ceze, Dong Chen, Alan Gara, Phlip Heidelberger, Martin Ohmacht, Burkhard Steinmacher-Burow, Xiaotong Zhuang
  • Patent number: 9501333
    Abstract: A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.
    Type: Grant
    Filed: December 30, 2013
    Date of Patent: November 22, 2016
    Assignee: International Business Machines Corporation
    Inventors: Daniel Ahn, Luis H. Ceze, Dong Chen Chen, Alan Gara, Philip Heidelberger, Martin Ohmacht
  • Publication number: 20160316001
    Abstract: Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.
    Type: Application
    Filed: May 20, 2016
    Publication date: October 27, 2016
    Inventors: Dong Chen, Paul W. Coteus, Noel A. Eisley, Alan Gara, Philip Heidelberger, Robert M. Senger, Valentina Salapura, Burkhard Steinmacher-Burow, Yutaka Sugawara, Todd E. Takken
  • Patent number: 9473569
    Abstract: Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups.
    Type: Grant
    Filed: July 15, 2015
    Date of Patent: October 18, 2016
    Assignee: International Business Machines Corporation
    Inventors: Alan Gara, David L. Satterfield, Robert E. Walkup
  • Patent number: 9374414
    Abstract: Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.
    Type: Grant
    Filed: August 26, 2013
    Date of Patent: June 21, 2016
    Assignee: International Business Machines Corporation
    Inventors: Dong Chen, Paul W. Coteus, Noel A. Eisley, Alan Gara, Philip Heidelberger, Robert M. Senger, Valentina Salapura, Burkhard Steinmacher-Burow, Yutaka Sugawara, Todd E. Takken
  • Patent number: 9373415
    Abstract: A method of testing a circuit includes halting a flow of normal data through the circuit, running test data through the circuit while subjecting the circuit to a stress condition, and determining whether a hard error exists in the circuit based on the running of the test data.
    Type: Grant
    Filed: August 16, 2013
    Date of Patent: June 21, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Pradip Bose, Alan Gara, Hans M. Jacobson
  • Publication number: 20160110256
    Abstract: A method and system are disclosed for providing combined error code protection and subgroup parity protection for a given group of n bits. The method comprises the steps of identifying a number, m, of redundant bits for said error protection; and constructing a matrix P, wherein multiplying said given group of n bits with P produces m redundant error correction code (ECC) protection bits, and two columns of P provide parity protection for subgroups of said given group of n bits. In the preferred embodiment of the invention, the matrix P is constructed by generating permutations of m bit wide vectors with three or more, but an odd number of, elements with value one and the other elements with value zero; and assigning said vectors to rows of the matrix P.
    Type: Application
    Filed: December 17, 2015
    Publication date: April 21, 2016
    Inventors: Alan Gara, Dong Chen, Philip Heidelberger, Martin Ohmacht