Patents by Inventor John Shalf
John Shalf has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240411834Abstract: A system and method of performing sparse accumulation in column-wise sparse general matrix-matrix multiplication (SpGEMM) algorithms. The method includes receiving a request to perform SpGEMM based on a first matrix and a second matrix. The method includes accumulating, in a hardware buffer, a hash key and an intermediate multiplication result of the first matrix and the second matrix. The method includes performing a probe search of a hardware cache to identify a match between the hash key and a partial sum associated with the first matrix and the second matrix. The method includes generating, by a hardware adder, a multiplication result based on the partial sum and the intermediate multiplication result from the accumulation waiting buffer.Type: ApplicationFiled: June 7, 2024Publication date: December 12, 2024Inventors: Chao Zhang, Xiaochen Guo, Maximilian Bremer, Cy Chan, John Shalf
-
Patent number: 12075201Abstract: Disclosed herein are methods, systems, and devices for bandwidth steering. Systems may include a plurality of compute nodes configured to execute one or more applications, a plurality of first level resources communicatively coupled to the plurality of compute nodes, a plurality of second level resources communicatively coupled to the plurality of first level resources, and a plurality of third level resources communicatively coupled to the plurality of second level resources. Systems may also include a plurality of optical switch circuits communicatively coupled to the plurality of first level resources and the plurality of second level resources, wherein each of the plurality of optical switch circuits is coupled to more than one of the plurality of the first level resources and is also coupled to more than one of the plurality of the second level resources.Type: GrantFiled: November 13, 2020Date of Patent: August 27, 2024Assignee: The Regents of the University of CaliforniaInventors: Georgios Michelogiannakis, Yiwen Shen, Min Yee Teh, John Shalf, Madeleine Glick, Keren Bergman
-
Publication number: 20240204783Abstract: A primitive race-logic temporal operator is described, comprising superconducting logic single flux quantum (SFQ) cells.Type: ApplicationFiled: March 3, 2021Publication date: June 20, 2024Inventors: Georgios Tzimpragos, Dilip Vasudevan, Nestan Tsiskaridze, Georgios Michelogiannakis, Advait Madhavan, Jennifer Volk, John Shalf, Timothy Sherwood
-
Patent number: 11599470Abstract: A last-level collective hardware prefetcher (LLCHP) is described. The LLCHP is to detect a first off-chip memory access request by a first processor core of a plurality of processor cores. The LLCHP is further to determine, based on the first off-chip memory access request, that first data associated with the first off-chip memory access request is associated with second data of a second processor core of the plurality of processor cores. The LLCHP is further to prefetch the first data and the second data based on the determination.Type: GrantFiled: November 6, 2019Date of Patent: March 7, 2023Assignee: The Regents of the University of CaliforniaInventors: Georgios Michelogiannakis, John Shalf
-
Publication number: 20220394362Abstract: Disclosed herein are methods, systems, and devices for bandwidth steering. Systems may include a plurality of compute nodes configured to execute one or more applications, a plurality of first level resources communicatively coupled to the plurality of compute nodes, a plurality of second level resources communicatively coupled to the plurality of first level resources, and a plurality of third level resources communicatively coupled to the plurality of second level resources. Systems may also include a plurality of optical switch circuits communicatively coupled to the plurality of first level resources and the plurality of second level resources, wherein each of the plurality of optical switch circuits is coupled to more than one of the plurality of the first level resources and is also coupled to more than one of the plurality of the second level resources.Type: ApplicationFiled: November 13, 2020Publication date: December 8, 2022Inventors: Georgios Michelogiannakis, Yiwen Shen, Min Yee Teh, John Shalf, Madeleine Glick, Keren Bergman
-
Publication number: 20220012178Abstract: A last-level collective hardware prefetcher (LLCHP) is described. The LLCHP is to detect a first off-chip memory access request by a first processor core of a plurality of processor cores. The LLCHP is further to determine, based on the first off-chip memory access request, that first data associated with the first off-chip memory access request is associated with second data of a second processor core of the plurality of processor cores. The LLCHP is further to prefetch the first data and the second data based on the determination.Type: ApplicationFiled: November 6, 2019Publication date: January 13, 2022Inventors: Georgios Michelogiannakis, John Shalf
-
Patent number: 10318444Abstract: This disclosure provides systems, methods, and apparatus for collective memory transfers. A control unit may be configured to coordinate a transfer of data between a memory and processor cores. For a read data transfer operation, the control unit may receive a trigger packet identifying a read data transfer operation and identifying a first plurality of data lines based on data values included in the trigger packet. The control unit may read the first plurality of data lines from the memory sequentially and send a second plurality of data lines to the processor cores. For a write data transfer operation, the control unit may send a request for at least one data line to a plurality of processor cores, may receive and reorder the requested data lines, and may write the data lines to a memory. The control unit may determine a mapping between processor cores and the memory.Type: GrantFiled: April 10, 2014Date of Patent: June 11, 2019Assignee: The Regents of the University of CaliforniaInventors: Georgios Michelogiannakis, John Shalf
-
Patent number: 10102179Abstract: A multi-core computer processor including a plurality of processor cores interconnected in a Network-on-Chip (NoC) architecture, a plurality of caches, each of the plurality of caches being associated with one and only one of the plurality of processor cores, and a plurality of memories, each of the plurality of memories being associated with a different set of at least one of the plurality of processor cores and each of the plurality of memories being configured to be visible in a global memory address space such that the plurality of memories are visible to two or more of the plurality of processor cores.Type: GrantFiled: August 22, 2016Date of Patent: October 16, 2018Assignee: THE REGENTS OF THE UNIVERSITY OF CALIFORNIAInventors: John Shalf, David Donofrio, Leonid Oliker
-
Patent number: 10078593Abstract: A multi-core computer processor including a plurality of processor cores interconnected in a Network-on-Chip (NoC) architecture, a plurality of caches, each of the plurality of caches being associated with one and only one of the plurality of processor cores, and a plurality of memories, each of the plurality of memories being associated with a different set of at least one of the plurality of processor cores and each of the plurality of memories being configured to be visible in a global memory address space such that the plurality of memories are visible to two or more of the plurality of processor cores, wherein at least one of a number of the processor cores, a size of each of the plurality of caches, or a size of each of the plurality of memories is configured for performing a reverse-time-migration (RTM) computation.Type: GrantFiled: October 26, 2012Date of Patent: September 18, 2018Assignee: THE REGENTS OF THE UNIVERSITY OF CALIFORNIAInventors: John Shalf, David Donofrio, Leonid Oliker, Jens Kruger, Samuel Williams
-
Publication number: 20160371226Abstract: A multi-core computer processor including a plurality of processor cores interconnected in a Network-on-Chip (NoC) architecture, a plurality of caches, each of the plurality of caches being associated with one and only one of the plurality of processor cores, and a plurality of memories, each of the plurality of memories being associated with a different set of at least one of the plurality of processor cores and each of the plurality of memories being configured to be visible in a global memory address space such that the plurality of memories are visible to two or more of the plurality of processor cores.Type: ApplicationFiled: August 22, 2016Publication date: December 22, 2016Inventors: John Shalf, David Donofrio, Leonid Oliker
-
Patent number: 9448940Abstract: A multi-core computer processor including a plurality of processor cores interconnected in a Network-on-Chip (NoC) architecture, a plurality of caches, each of the plurality of caches being associated with one and only one of the plurality of processor cores, and a plurality of memories, each of the plurality of memories being associated with a different set of at least one of the plurality of processor cores and each of the plurality of memories being configured to be visible in a global memory address space such that the plurality of memories are visible to two or more of the plurality of processor cores.Type: GrantFiled: October 26, 2012Date of Patent: September 20, 2016Assignee: The Regents of the University of CaliforniaInventors: John Shalf, David Donofrio, Leonid Oliker
-
Publication number: 20140310495Abstract: This disclosure provides systems, methods, and apparatus for collective memory transfers. A control unit may be configured to coordinate a transfer of data between a memory and processor cores. For a read data transfer operation, the control unit may receive a trigger packet identifying a read data transfer operation and identifying a first plurality of data lines based on data values included in the trigger packet. The control unit may read the first plurality of data lines from the memory sequentially and send a second plurality of data lines to the processor cores. For a write data transfer operation, the control unit may send a request for at least one data line to a plurality of processor cores, may receive and reorder the requested data lines, and may write the data lines to a memory. The control unit may determine a mapping between processor cores and the memory.Type: ApplicationFiled: April 10, 2014Publication date: October 16, 2014Applicant: The Regents of the University of CaliforniaInventors: Georgios Michelogiannakis, John Shalf
-
Publication number: 20140310467Abstract: A multi-core computer processor including a plurality of processor cores interconnected in a Network-on-Chip (NoC) architecture, a plurality of caches, each of the plurality of caches being associated with one and only one of the plurality of processor cores, and a plurality of memories, each of the plurality of memories being associated with a different set of at least one of the plurality of processor cores and each of the plurality of memories being configured to be visible in a global memory address space such that the plurality of memories are visible to two or more of the plurality of processor cores, wherein at least one of a number of the processor cores, a size of each of the plurality of caches, or a size of each of the plurality of memories is configured for performing a reverse-time-migration (RTM) computation.Type: ApplicationFiled: October 26, 2012Publication date: October 16, 2014Inventors: John Shalf, David Donofrio, Leonid Oliker, Jens Kruger, Samuel Williams
-
Publication number: 20140281243Abstract: A multi-core computer processor including a plurality of processor cores interconnected in a Network-on-Chip (NoC) architecture, a plurality of caches, each of the plurality of caches being associated with one and only one of the plurality of processor cores, and a plurality of memories, each of the plurality of memories being associated with a different set of at least one of the plurality of processor cores and each of the plurality of memories being configured to be visible in a global memory address space such that the plurality of memories are visible to two or more of the plurality of processor cores.Type: ApplicationFiled: October 26, 2012Publication date: September 18, 2014Inventors: John Shalf, David Donofrio, Leonid Oliker