Patents by Inventor Norman Paul Jouppi

Norman Paul Jouppi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

HARDWARE-OPTIMIZED NEURAL ARCHITECTURE SEARCH

Publication number: 20250077833

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining an architecture for a task neural network that is configured to perform a particular machine learning task on a target set of hardware resources. When deployed on a target set of hardware, such as a collection of datacenter accelerators, the task neural network may be capable of performing the particular machine learning task with enhanced accuracy and speed.

Type: Application

Filed: August 30, 2024

Publication date: March 6, 2025

Inventors: Sheng Li, Norman Paul Jouppi, Quoc V. Le, Mingxing Tan, Ruoming Pang, Liqun Cheng, Andrew Li
Methods and heat distribution devices for thermal management of chip assemblies

Patent number: 12243802

Abstract: A method of manufacturing a chip assembly comprises joining an in-process unit to a printed circuit board; reflowing a bonding material disposed between and electrically connecting the in-process unit with the printed circuit board, the bonding material having a first reflow temperature; and then joining a heat distribution device to the plurality of semiconductor chips using a thermal interface material (“TIM”) having a second reflow temperature that is lower than the first reflow temperature. The in-process unit further comprises a substrate having an active surface, a passive surface, and contacts exposed at the active surface; an interposer electrically connected to the substrate; a plurality of semiconductor chips overlying the substrate and electrically connected to the substrate through the interposer, and a stiffener overlying the substrate and having an aperture extending therethrough, the plurality of semiconductor chips being positioned within the aperture.

Type: Grant

Filed: April 12, 2024

Date of Patent: March 4, 2025

Assignee: Google LLC

Inventors: Madhusudan K. Iyengar, Christopher Malone, Woon-Seong Kwon, Emad Samadiani, Melanie Beauchemin, Padam Jain, Teckgyu Kang, Yuan Li, Connor Burgess, Norman Paul Jouppi, Nicholas Stevens-Yu, Yingying Wang
Error checking for systolic array computation

Patent number: 12189472

Abstract: Aspects of the disclosure are directed to a computation unit implementing a systolic array and configured for detecting errors while processing data on the systolic array. Checksum circuit in communication with a systolic array is configured to compute checksums and perform error detection while the systolic array processes input data. Instead of pre-generating checksums in input matrices, input matrices can be directly fed into the systolic array through the checksum circuit. On the output side, the checksum circuit can generate and compare checksums with checksums in an output matrix generated by the systolic array. Error checking the operations to generate the output matrix can be performed without delaying the operations of the systolic array, and without preprocessing the input matrices.

Type: Grant

Filed: November 3, 2023

Date of Patent: January 7, 2025

Assignee: Google LLC

Inventors: Doe Hyun Yoon, Norman Paul Jouppi
PERFORMING MATRIX MULTIPLICATION IN HARDWARE

Publication number: 20240370526

Abstract: Methods, systems, and apparatus for performing a matrix multiplication using a hardware circuit are described. An example method begins by obtaining an input activation value and a weight input value in a first floating point format. The input activation value and the weight input value are multiplied to generate a product value in a second floating point format that has higher precision than the first floating point format. A partial sum value is obtained in a third floating point format that has a higher precision than the first floating point format. The partial sum value and the product value are combined to generate an updated partial sum value that has the third floating point format.

Type: Application

Filed: April 17, 2024

Publication date: November 7, 2024

Inventors: Andrew Everett Phelps, Norman Paul Jouppi
LOW LATENCY MATRIX MULTIPLY UNIT

Publication number: 20240362298

Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. The matrix multiply unit may include cells arranged in columns of the systolic array. Two chains of weight shift registers per column of the systolic array are in the matrix multiply unit. Each weight shift register is connected to only one chain and each cell is connected to only one weight shift register. A weight matrix register per cell is configured to store a weight input received from a weight shift register. A multiply unit is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.

Type: Application

Filed: April 17, 2024

Publication date: October 31, 2024

Inventors: Andrew Everett Phelps, Norman Paul Jouppi
Hardware-optimized neural architecture search

Patent number: 12131244

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining an architecture for a task neural network that is configured to perform a particular machine learning task on a target set of hardware resources. When deployed on a target set of hardware, such as a collection of datacenter accelerators, the task neural network may be capable of performing the particular machine learning task with enhanced accuracy and speed.

Type: Grant

Filed: September 30, 2020

Date of Patent: October 29, 2024

Assignee: Google LLC

Inventors: Sheng Li, Norman Paul Jouppi, Quoc V. Le, Mingxing Tan, Ruoming Pang, Liqun Cheng, Andrew Li
Methods And Heat Distribution Devices For Thermal Management Of Chip Assemblies

Publication number: 20240347414

Abstract: A method of manufacturing a chip assembly comprises joining an in-process unit to a printed circuit board; reflowing a bonding material disposed between and electrically connecting the in-process unit with the printed circuit board, the bonding material having a first reflow temperature; and then joining a heat distribution device to the plurality of semiconductor chips using a thermal interface material (“TIM”) having a second reflow temperature that is lower than the first reflow temperature. The in-process unit further comprises a substrate having an active surface, a passive surface, and contacts exposed at the active surface; an interposer electrically connected to the substrate; a plurality of semiconductor chips overlying the substrate and electrically connected to the substrate through the interposer, and a stiffener overlying the substrate and having an aperture extending therethrough, the plurality of semiconductor chips being positioned within the aperture.

Type: Application

Filed: April 12, 2024

Publication date: October 17, 2024

Inventors: Madhusudan K. Iyengar, Christopher Malone, Woon-Seong Kwon, Emad Samadiani, Melanie Beauchemin, Padam Jain, Teckgyu Kang, Yuan Li, Connor Burgess, Norman Paul Jouppi, Nicholas Stevens-Yu, Yingying Wang
LOW LATENCY MATRIX MULTIPLY UNIT

Publication number: 20240303297

Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. Each cell of the matrix multiply includes: a weight matrix register configured to receive a weight input from either a transposed or a non-transposed weight shift register; a transposed weight shift register configured to receive a weight input from a horizontal direction to be stored in the weight matrix register; a non-transposed weight shift register configured to receive a weight input from a vertical direction to be stored in the weight matrix register; and a multiply unit that is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.

Type: Application

Filed: February 16, 2024

Publication date: September 12, 2024

Inventors: Andrew Everett Phelps, Norman Paul Jouppi
Generating recommendations based on performance metrics for scheduling jobs on distributed computing devices

Patent number: 12056534

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for scheduling operations represented as a computational graph on a distributed computing network. A method includes: receiving data representing operations to be executed in order to perform a job on a plurality of hardware accelerators of a plurality of different accelerator types; generating, for the job and from at least the data representing the operations, features that represent a predicted performance for the job on hardware accelerators of the plurality of different accelerator types; generating, from the features, a respective predicted performance metric for the job for each of the plurality of different accelerator types according to a performance objective function; and providing, to a scheduling system, one or more recommendations for scheduling the job on one or more recommended types of hardware accelerators.

Type: Grant

Filed: December 30, 2022

Date of Patent: August 6, 2024

Assignee: Google LLC

Inventors: Sheng Li, Brian Zhang, Liqun Cheng, Norman Paul Jouppi, Yun Ni
Heterogeneous ML Accelerator Cluster with Flexible System Resource Balance

Publication number: 20240231667

Abstract: Aspects of the disclosure are directed to a heterogeneous machine learning accelerator system with compute and memory nodes connected by high speed chip-to-chip interconnects. While existing remote/disaggregated memory may require memory expansion via remote processing units, aspects of the disclosure add memory nodes into machine learning accelerator clusters via the chip-to-chip interconnects without needing assistance from remote processing units to achieve higher performance, simpler software stack, and/or lower cost. The memory nodes may support prefetch and intelligent compression to enable the use of low cost memory without performance degradation.

Type: Application

Filed: January 10, 2023

Publication date: July 11, 2024

Inventors: Sheng Li, Sridhar Lakshmanamurthy, Norman Paul Jouppi, Martin Guy Dixon, Daniel Stodolsky, Quoc V. Le, Liqun Cheng, Erik Karl Norden, Parthasarathy Ranganathan
Multi-Modal Systolic Array For Matrix Multiplication

Publication number: 20240220202

Abstract: A system and method for matrix multiplication using a systolic array configurable between multiple modes of operation. A systolic processor may receive a data type indicator for the matrix multiplication. For a first data type, the systolic processor may load the right-hand side data from the right-hand matrix register into the data processing cells of the systolic array between row 0 and row M?1, and pass the respective row of the left-hand side data through a corresponding row of the systolic array between rows 0 and M?1. For a second data type, the systolic processor may split each element of the left-hand side data and the right-hand side data into respective first and second element halves, and move each element half through a corresponding row of the systolic array between rows 0 and 2M?1.

Type: Application

Filed: February 14, 2023

Publication date: July 4, 2024

Inventors: Matthew Leever Hedlund, Christopher Aaron Clark, Andrew Everett Phelps, Thomas James Norrie, Norman Paul Jouppi, Sushma Honnavara-Prasad, Vinayak Anand Gokhale, Pareesa Ameneh Golnari
Methods and heat distribution devices for thermal management of chip assemblies

Patent number: 11990386

Abstract: A method of manufacturing a chip assembly comprises joining an in-process unit to a printed circuit board; reflowing a bonding material disposed between and electrically connecting the in-process unit with the printed circuit board, the bonding material having a first reflow temperature; and then joining a heat distribution device to the plurality of semiconductor chips using a thermal interface material (“TIM”) having a second reflow temperature that is lower than the first reflow temperature. The in-process unit further comprises a substrate having an active surface, a passive surface, and contacts exposed at the active surface; an interposer electrically connected to the substrate; a plurality of semiconductor chips overlying the substrate and electrically connected to the substrate through the interposer, and a stiffener overlying the substrate and having an aperture extending therethrough, the plurality of semiconductor chips being positioned within the aperture.

Type: Grant

Filed: May 28, 2021

Date of Patent: May 21, 2024

Assignee: Google LLC

Inventors: Madhusudan K. Iyengar, Christopher Malone, Woon-Seong Kwon, Emad Samadiani, Melanie Beauchemin, Padam Jain, Teckgyu Kang, Yuan Li, Connor Burgess, Norman Paul Jouppi, Nicholas Stevens-Yu, Yingying Wang
Performing matrix multiplication in hardware

Patent number: 11989258

Abstract: Methods, systems, and apparatus for performing a matrix multiplication using a hardware circuit are described. An example method begins by obtaining an input activation value and a weight input value in a first floating point format. The input activation value and the weight input value are multiplied to generate a product value in a second floating point format that has higher precision than the first floating point format. A partial sum value is obtained in a third floating point format that has a higher precision than the first floating point format. The partial sum value and the product value are combined to generate an updated partial sum value that has the third floating point format.

Type: Grant

Filed: November 9, 2020

Date of Patent: May 21, 2024

Assignee: Google LLC

Inventors: Andrew Everett Phelps, Norman Paul Jouppi
Low latency matrix multiply unit

Patent number: 11989259

Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. The matrix multiply unit may include cells arranged in columns of the systolic array. Two chains of weight shift registers per column of the systolic array are in the matrix multiply unit. Each weight shift register is connected to only one chain and each cell is connected to only one weight shift register. A weight matrix register per cell is configured to store a weight input received from a weight shift register. A multiply unit is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.

Type: Grant

Filed: November 10, 2022

Date of Patent: May 21, 2024

Assignee: Google LLC

Inventors: Andrew Everett Phelps, Norman Paul Jouppi
SHARED SCRATCHPAD MEMORY WITH PARALLEL LOAD-STORE

Publication number: 20240160909

Abstract: Methods, systems, and apparatus, including computer-readable media, are described for a hardware circuit configured to implement a neural network. The circuit includes a first memory, respective first and second processor cores, and a shared memory. The first memory provides data for performing computations to generate an output for a neural network layer. Each of the first and second cores include a vector memory for storing vector values derived from the data provided by the first memory. The shared memory is disposed generally intermediate the first memory and at least one core and includes: i) a direct memory access (DMA) data path configured to route data between the shared memory and the respective vector memories of the first and second cores and ii) a load-store data path configured to route data between the shared memory and respective vector registers of the first and second cores.

Type: Application

Filed: January 25, 2024

Publication date: May 16, 2024

Inventors: Thomas Norrie, Andrew Everett Phelps, Norman Paul Jouppi, Matthew Leever Hedlund
Vector reductions using shared scratchpad memory

Patent number: 11934826

Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the respective vectors of values from respective resources of the processor cores using a direct memory access (DMA) data path of the shared memory. The shared memory performs an accumulation operation on the respective vectors of values using an operator unit coupled to the shared memory. The operator unit is configured to accumulate values based on arithmetic operations encoded at the operator unit. A result vector is generated based on performing the accumulation operation using the respective vectors of values.

Type: Grant

Filed: November 19, 2021

Date of Patent: March 19, 2024

Assignee: Google LLC

Inventors: Thomas Norrie, Gurushankar Rajamani, Andrew Everett Phelps, Matthew Leever Hedlund, Norman Paul Jouppi
Shared scratchpad memory with parallel load-store

Patent number: 11922292

Abstract: Methods, systems, and apparatus, including computer-readable media, are described for a hardware circuit configured to implement a neural network. The circuit includes a first memory, respective first and second processor cores, and a shared memory. The first memory provides data for performing computations to generate an output for a neural network layer. Each of the first and second cores include a vector memory for storing vector values derived from the data provided by the first memory. The shared memory is disposed generally intermediate the first memory and at least one core and includes: i) a direct memory access (DMA) data path configured to route data between the shared memory and the respective vector memories of the first and second cores and ii) a load-store data path configured to route data between the shared memory and respective vector registers of the first and second cores.

Type: Grant

Filed: May 14, 2020

Date of Patent: March 5, 2024

Assignee: Google LLC

Inventors: Thomas Norrie, Andrew Everett Phelps, Norman Paul Jouppi, Matthew Leever Hedlund
Modifying machine learning models to improve locality

Patent number: 11915139

Abstract: Methods, systems, and apparatus for updating machine learning models to improve locality are described. In one aspect, a method includes receiving data of a machine learning model. The data represents operations of the machine learning model and data dependencies between the operations. Data specifying characteristics of a memory hierarchy for a machine learning processor on which the machine learning model is going to be deployed is received. The memory hierarchy includes multiple memories at multiple memory levels for storing machine learning data used by the machine learning processor when performing machine learning computations using the machine learning model. An updated machine learning model is generated by modifying the operations and control dependencies of the machine learning model to account for the characteristics of the memory hierarchy. Machine learning computations are performed using the updated machine learning model.

Type: Grant

Filed: February 15, 2022

Date of Patent: February 27, 2024

Assignee: Google LLC

Inventors: Doe Hyun Yoon, Nishant Patil, Norman Paul Jouppi
Error Checking For Systolic Array Computation

Publication number: 20240061742

Abstract: Aspects of the disclosure are directed to a computation unit implementing a systolic array and configured for detecting errors while processing data on the systolic array. Checksum circuit in communication with a systolic array is configured to compute checksums and perform error detection while the systolic array processes input data. Instead of pre-generating checksums in input matrices, input matrices can be directly fed into the systolic array through the checksum circuit. On the output side, the checksum circuit can generate and compare checksums with checksums in an output matrix generated by the systolic array. Error checking the operations to generate the output matrix can be performed without delaying the operations of the systolic array, and without preprocessing the input matrices.

Type: Application

Filed: November 3, 2023

Publication date: February 22, 2024

Inventors: Doe Hyun Yoon, Norman Paul Jouppi
Low latency matrix multiply unit

Patent number: 11907330

Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. Each cell of the matrix multiply includes: a weight matrix register configured to receive a weight input from either a transposed or a non-transposed weight shift register; a transposed weight shift register configured to receive a weight input from a horizontal direction to be stored in the weight matrix register; a non-transposed weight shift register configured to receive a weight input from a vertical direction to be stored in the weight matrix register; and a multiply unit that is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.

Type: Grant

Filed: February 17, 2023

Date of Patent: February 20, 2024

Assignee: Google LLC

Inventors: Andrew Everett Phelps, Norman Paul Jouppi

1 2 3 4 5 … next