Patents by Inventor Alan Gara
Alan Gara has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230412524Abstract: A device and method that implements a multi-stage electrical interconnection network is provided. The electronic device includes a plurality of computing devices and a plurality of switches grouped into a plurality of groups. Switches, of the plurality of switches, in a same group are configured to be fully connected to computing devices in the same group, each of switches of the plurality of switches included in a first group among the plurality of groups is configured to have a ono-to-one connection with any one of switches included in a second group, and a connection between the computing devices in the same group and the switches in the same group and a connection between switches in in the plurality of groups are electrical connections.Type: ApplicationFiled: September 5, 2023Publication date: December 21, 2023Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: WONSEOK LEE, Alan Gara, YOUNG JUN HONG, WONYONG LEE, WOOSEOK CHANG
-
Publication number: 20230369171Abstract: A computing device includes: a processor; a memory stack in which memories connected to the processor are stacked; and a substrate disposed under the processor, wherein a network bandwidth between the processor and the substrate is five or less times a memory bandwidth between the processor and the memory stack.Type: ApplicationFiled: July 25, 2023Publication date: November 16, 2023Applicant: Samsung Electronics Co., Ltd.Inventors: Young Jun HONG, Wonyong LEE, Alan GARA, Se Hyun YANG, Wooseok CHANG
-
Publication number: 20230253294Abstract: A computing device includes: a processor; a memory stack in which memories connected to the processor are stacked; and a substrate disposed under the processor, wherein a network bandwidth between the processor and the substrate is five or less times a memory bandwidth between the processor and the memory stack.Type: ApplicationFiled: January 25, 2023Publication date: August 10, 2023Applicant: Samsung Electronics Co., Ltd.Inventors: Wonyong LEE, Alan GARA, Se Hyun YANG, Young Jun HONG, Wooseok CHANG
-
Publication number: 20230254269Abstract: A device and method that implements a multi-stage electrical interconnection network is provided. The electronic device includes a plurality of computing devices and a plurality of switches grouped into a plurality of groups. Switches, of the plurality of switches, in a same group are configured to be fully connected to computing devices in the same group, each of switches of the plurality of switches included in a first group among the plurality of groups is configured to have a ono-to-one connection with any one of switches included in a second group, and a connection between the computing devices in the same group and the switches in the same group and a connection between switches in in the plurality of groups are electrical connections.Type: ApplicationFiled: September 9, 2022Publication date: August 10, 2023Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: WONSEOK LEE, Alan Gara, YOUNG JUN HONG, WONYONG LEE, WOOSEOK CHANG
-
Publication number: 20230254253Abstract: Message splitting and aggregation in a multi-stage electrical interconnection network are disclosed. A method of operating an electronic device comprised of computing devices, includes splitting, into segments, a message to be transmitted from a first of the computing devices, transmitting the segments to a second of the computing devices through a multi-channel that is based on an electrical connection between the first computing device and a plurality of switches, wherein the multi-channel includes channels respectively including electrical connections, the electrical connections connecting the first computing device with the second computing device, and reconstructing the message by aggregating the segments in the second computing device, wherein a bandwidth of the multi-channel transmitting the segments is greater than a maximum bandwidth of a single electrical connection of the electrical connections.Type: ApplicationFiled: January 26, 2023Publication date: August 10, 2023Applicant: Samsung Electronics Co., Ltd.Inventors: Young Jun HONG, Alan GARA, Wonseok LEE, Wonyong LEE, Wooseok CHANG
-
Patent number: 10740097Abstract: Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.Type: GrantFiled: May 20, 2016Date of Patent: August 11, 2020Assignee: International Business Machines CorporationInventors: Dong Chen, Paul W. Coteus, Noel A. Eisley, Alan Gara, Philip Heidelberger, Robert M. Senger, Valentina Salapura, Burkhard Steinmacher-Burow, Yutaka Sugawara, Todd E. Takken
-
Patent number: 10713043Abstract: Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups.Type: GrantFiled: March 12, 2018Date of Patent: July 14, 2020Assignee: International Business Machines CorporationInventors: Alan Gara, David L. Satterfield, Robert E. Walkup
-
Patent number: 10140179Abstract: A method and system are disclosed for providing combined error code protection and subgroup parity protection for a given group of n bits. The method comprises the steps of identifying a number, m, of redundant bits for said error protection; and constructing a matrix P, wherein multiplying said given group of n bits with P produces m redundant error correction code (ECC) protection bits, and two columns of P provide parity protection for subgroups of said given group of n bits. In the preferred embodiment of the invention, the matrix P is constructed by generating permutations of m bit wide vectors with three or more, but an odd number of, elements with value one and the other elements with value zero; and assigning said vectors to rows of the matrix P.Type: GrantFiled: December 17, 2015Date of Patent: November 27, 2018Assignee: International Business Machines CorporationInventors: Alan Gara, Dong Chen, Philip Heidelberger, Martin Ohmacht
-
Patent number: 10069599Abstract: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.Type: GrantFiled: December 17, 2015Date of Patent: September 4, 2018Assignee: International Business Machines CorporationInventors: Matthias A. Blumrich, Paul W. Coteus, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Todd E. Takken, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
-
Publication number: 20180203693Abstract: Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups.Type: ApplicationFiled: March 12, 2018Publication date: July 19, 2018Inventors: Alan Gara, David L. Satterfield, Robert E. Walkup
-
Patent number: 9971713Abstract: A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.Type: GrantFiled: April 30, 2015Date of Patent: May 15, 2018Assignee: GLOBALFOUNDRIES INC.Inventors: Sameh Asaad, Ralph E. Bellofatto, Michael A. Blocksome, Matthias A. Blumrich, Peter Boyle, Jose R. Brunheroto, Dong Chen, Chen-Yong Cher, George L. Chiu, Norman Christ, Paul W. Coteus, Kristan D. Davis, Gabor J. Dozsa, Alexandre E. Eichenberger, Noel A. Eisley, Matthew R. Ellavsky, Kahn C. Evans, Bruce M. Fleischer, Thomas W. Fox, Alan Gara, Mark E. Giampapa, Thomas M. Gooding, Michael K. Gschwind, John A. Gunnels, Shawn A. Hall, Rudolf A. Haring, Philip Heidelberger, Todd A. Inglett, Brant L. Knudson, Gerard V. Kopcsay, Sameer Kumar, Amith R. Mamidala, James A. Marcella, Mark G. Megerian, Douglas R. Miller, Samuel J. Miller, Adam J. Muff, Michael B. Mundy, John K. O'Brien, Kathryn M. O'Brien, Martin Ohmacht, Jeffrey J. Parker, Ruth J. Poole, Joseph D. Ratterman, Valentina Salapura, David L. Satterfield, Robert M. Senger, Burkhard Steinmacher-Burow, William M. Stockdell, Craig B. Stunkel, Krishnan Sugavanam, Yutaka Sugawara, Todd E. Takken, Barry M. Trager, James L. Van Oosten, Charles D. Wait, Robert E. Walkup, Alfred T. Watson, Robert W. Wisniewski, Peng Wu
-
Patent number: 9921831Abstract: Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups.Type: GrantFiled: October 12, 2016Date of Patent: March 20, 2018Assignee: International Business Machines CorporationInventors: Alan Gara, David L. Satterfield, Robert E. Walkup
-
Publication number: 20170068536Abstract: Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups.Type: ApplicationFiled: October 12, 2016Publication date: March 9, 2017Inventors: Alan Gara, David L. Satterfield, Robert E. Walkup
-
Patent number: 9507647Abstract: In a multiprocessor system, a conflict checking mechanism is implemented in the L2 cache memory. Different versions of speculative writes are maintained in different ways of the cache. A record of speculative writes is maintained in the cache directory. Conflict checking occurs as part of directory lookup. Speculative versions that do not conflict are aggregated into an aggregated version in a different way of the cache. Speculative memory access requests do not go to main memory.Type: GrantFiled: January 18, 2011Date of Patent: November 29, 2016Assignee: GLOBALFOUNDRIES INC.Inventors: Matthias A. Blumrich, Luis H. Ceze, Dong Chen, Alan Gara, Phlip Heidelberger, Martin Ohmacht, Burkhard Steinmacher-Burow, Xiaotong Zhuang
-
Patent number: 9501333Abstract: A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.Type: GrantFiled: December 30, 2013Date of Patent: November 22, 2016Assignee: International Business Machines CorporationInventors: Daniel Ahn, Luis H. Ceze, Dong Chen Chen, Alan Gara, Philip Heidelberger, Martin Ohmacht
-
Publication number: 20160316001Abstract: Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.Type: ApplicationFiled: May 20, 2016Publication date: October 27, 2016Inventors: Dong Chen, Paul W. Coteus, Noel A. Eisley, Alan Gara, Philip Heidelberger, Robert M. Senger, Valentina Salapura, Burkhard Steinmacher-Burow, Yutaka Sugawara, Todd E. Takken
-
Patent number: 9473569Abstract: Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups.Type: GrantFiled: July 15, 2015Date of Patent: October 18, 2016Assignee: International Business Machines CorporationInventors: Alan Gara, David L. Satterfield, Robert E. Walkup
-
Patent number: 9374414Abstract: Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.Type: GrantFiled: August 26, 2013Date of Patent: June 21, 2016Assignee: International Business Machines CorporationInventors: Dong Chen, Paul W. Coteus, Noel A. Eisley, Alan Gara, Philip Heidelberger, Robert M. Senger, Valentina Salapura, Burkhard Steinmacher-Burow, Yutaka Sugawara, Todd E. Takken
-
Patent number: 9373415Abstract: A method of testing a circuit includes halting a flow of normal data through the circuit, running test data through the circuit while subjecting the circuit to a stress condition, and determining whether a hard error exists in the circuit based on the running of the test data.Type: GrantFiled: August 16, 2013Date of Patent: June 21, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Pradip Bose, Alan Gara, Hans M. Jacobson
-
Publication number: 20160110256Abstract: A method and system are disclosed for providing combined error code protection and subgroup parity protection for a given group of n bits. The method comprises the steps of identifying a number, m, of redundant bits for said error protection; and constructing a matrix P, wherein multiplying said given group of n bits with P produces m redundant error correction code (ECC) protection bits, and two columns of P provide parity protection for subgroups of said given group of n bits. In the preferred embodiment of the invention, the matrix P is constructed by generating permutations of m bit wide vectors with three or more, but an odd number of, elements with value one and the other elements with value zero; and assigning said vectors to rows of the matrix P.Type: ApplicationFiled: December 17, 2015Publication date: April 21, 2016Inventors: Alan Gara, Dong Chen, Philip Heidelberger, Martin Ohmacht