Patents by Inventor Norman Paul Jouppi
Norman Paul Jouppi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12189472Abstract: Aspects of the disclosure are directed to a computation unit implementing a systolic array and configured for detecting errors while processing data on the systolic array. Checksum circuit in communication with a systolic array is configured to compute checksums and perform error detection while the systolic array processes input data. Instead of pre-generating checksums in input matrices, input matrices can be directly fed into the systolic array through the checksum circuit. On the output side, the checksum circuit can generate and compare checksums with checksums in an output matrix generated by the systolic array. Error checking the operations to generate the output matrix can be performed without delaying the operations of the systolic array, and without preprocessing the input matrices.Type: GrantFiled: November 3, 2023Date of Patent: January 7, 2025Assignee: Google LLCInventors: Doe Hyun Yoon, Norman Paul Jouppi
-
Publication number: 20240370526Abstract: Methods, systems, and apparatus for performing a matrix multiplication using a hardware circuit are described. An example method begins by obtaining an input activation value and a weight input value in a first floating point format. The input activation value and the weight input value are multiplied to generate a product value in a second floating point format that has higher precision than the first floating point format. A partial sum value is obtained in a third floating point format that has a higher precision than the first floating point format. The partial sum value and the product value are combined to generate an updated partial sum value that has the third floating point format.Type: ApplicationFiled: April 17, 2024Publication date: November 7, 2024Inventors: Andrew Everett Phelps, Norman Paul Jouppi
-
Publication number: 20240362298Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. The matrix multiply unit may include cells arranged in columns of the systolic array. Two chains of weight shift registers per column of the systolic array are in the matrix multiply unit. Each weight shift register is connected to only one chain and each cell is connected to only one weight shift register. A weight matrix register per cell is configured to store a weight input received from a weight shift register. A multiply unit is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.Type: ApplicationFiled: April 17, 2024Publication date: October 31, 2024Inventors: Andrew Everett Phelps, Norman Paul Jouppi
-
Patent number: 12131244Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining an architecture for a task neural network that is configured to perform a particular machine learning task on a target set of hardware resources. When deployed on a target set of hardware, such as a collection of datacenter accelerators, the task neural network may be capable of performing the particular machine learning task with enhanced accuracy and speed.Type: GrantFiled: September 30, 2020Date of Patent: October 29, 2024Assignee: Google LLCInventors: Sheng Li, Norman Paul Jouppi, Quoc V. Le, Mingxing Tan, Ruoming Pang, Liqun Cheng, Andrew Li
-
Publication number: 20240347414Abstract: A method of manufacturing a chip assembly comprises joining an in-process unit to a printed circuit board; reflowing a bonding material disposed between and electrically connecting the in-process unit with the printed circuit board, the bonding material having a first reflow temperature; and then joining a heat distribution device to the plurality of semiconductor chips using a thermal interface material (“TIM”) having a second reflow temperature that is lower than the first reflow temperature. The in-process unit further comprises a substrate having an active surface, a passive surface, and contacts exposed at the active surface; an interposer electrically connected to the substrate; a plurality of semiconductor chips overlying the substrate and electrically connected to the substrate through the interposer, and a stiffener overlying the substrate and having an aperture extending therethrough, the plurality of semiconductor chips being positioned within the aperture.Type: ApplicationFiled: April 12, 2024Publication date: October 17, 2024Inventors: Madhusudan K. Iyengar, Christopher Malone, Woon-Seong Kwon, Emad Samadiani, Melanie Beauchemin, Padam Jain, Teckgyu Kang, Yuan Li, Connor Burgess, Norman Paul Jouppi, Nicholas Stevens-Yu, Yingying Wang
-
Publication number: 20240303297Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. Each cell of the matrix multiply includes: a weight matrix register configured to receive a weight input from either a transposed or a non-transposed weight shift register; a transposed weight shift register configured to receive a weight input from a horizontal direction to be stored in the weight matrix register; a non-transposed weight shift register configured to receive a weight input from a vertical direction to be stored in the weight matrix register; and a multiply unit that is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.Type: ApplicationFiled: February 16, 2024Publication date: September 12, 2024Inventors: Andrew Everett Phelps, Norman Paul Jouppi
-
Patent number: 12056534Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for scheduling operations represented as a computational graph on a distributed computing network. A method includes: receiving data representing operations to be executed in order to perform a job on a plurality of hardware accelerators of a plurality of different accelerator types; generating, for the job and from at least the data representing the operations, features that represent a predicted performance for the job on hardware accelerators of the plurality of different accelerator types; generating, from the features, a respective predicted performance metric for the job for each of the plurality of different accelerator types according to a performance objective function; and providing, to a scheduling system, one or more recommendations for scheduling the job on one or more recommended types of hardware accelerators.Type: GrantFiled: December 30, 2022Date of Patent: August 6, 2024Assignee: Google LLCInventors: Sheng Li, Brian Zhang, Liqun Cheng, Norman Paul Jouppi, Yun Ni
-
Publication number: 20240231667Abstract: Aspects of the disclosure are directed to a heterogeneous machine learning accelerator system with compute and memory nodes connected by high speed chip-to-chip interconnects. While existing remote/disaggregated memory may require memory expansion via remote processing units, aspects of the disclosure add memory nodes into machine learning accelerator clusters via the chip-to-chip interconnects without needing assistance from remote processing units to achieve higher performance, simpler software stack, and/or lower cost. The memory nodes may support prefetch and intelligent compression to enable the use of low cost memory without performance degradation.Type: ApplicationFiled: January 10, 2023Publication date: July 11, 2024Inventors: Sheng Li, Sridhar Lakshmanamurthy, Norman Paul Jouppi, Martin Guy Dixon, Daniel Stodolsky, Quoc V. Le, Liqun Cheng, Erik Karl Norden, Parthasarathy Ranganathan
-
Publication number: 20240220202Abstract: A system and method for matrix multiplication using a systolic array configurable between multiple modes of operation. A systolic processor may receive a data type indicator for the matrix multiplication. For a first data type, the systolic processor may load the right-hand side data from the right-hand matrix register into the data processing cells of the systolic array between row 0 and row M?1, and pass the respective row of the left-hand side data through a corresponding row of the systolic array between rows 0 and M?1. For a second data type, the systolic processor may split each element of the left-hand side data and the right-hand side data into respective first and second element halves, and move each element half through a corresponding row of the systolic array between rows 0 and 2M?1.Type: ApplicationFiled: February 14, 2023Publication date: July 4, 2024Inventors: Matthew Leever Hedlund, Christopher Aaron Clark, Andrew Everett Phelps, Thomas James Norrie, Norman Paul Jouppi, Sushma Honnavara-Prasad, Vinayak Anand Gokhale, Pareesa Ameneh Golnari
-
Patent number: 11989258Abstract: Methods, systems, and apparatus for performing a matrix multiplication using a hardware circuit are described. An example method begins by obtaining an input activation value and a weight input value in a first floating point format. The input activation value and the weight input value are multiplied to generate a product value in a second floating point format that has higher precision than the first floating point format. A partial sum value is obtained in a third floating point format that has a higher precision than the first floating point format. The partial sum value and the product value are combined to generate an updated partial sum value that has the third floating point format.Type: GrantFiled: November 9, 2020Date of Patent: May 21, 2024Assignee: Google LLCInventors: Andrew Everett Phelps, Norman Paul Jouppi
-
Patent number: 11989259Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. The matrix multiply unit may include cells arranged in columns of the systolic array. Two chains of weight shift registers per column of the systolic array are in the matrix multiply unit. Each weight shift register is connected to only one chain and each cell is connected to only one weight shift register. A weight matrix register per cell is configured to store a weight input received from a weight shift register. A multiply unit is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.Type: GrantFiled: November 10, 2022Date of Patent: May 21, 2024Assignee: Google LLCInventors: Andrew Everett Phelps, Norman Paul Jouppi
-
Patent number: 11990386Abstract: A method of manufacturing a chip assembly comprises joining an in-process unit to a printed circuit board; reflowing a bonding material disposed between and electrically connecting the in-process unit with the printed circuit board, the bonding material having a first reflow temperature; and then joining a heat distribution device to the plurality of semiconductor chips using a thermal interface material (“TIM”) having a second reflow temperature that is lower than the first reflow temperature. The in-process unit further comprises a substrate having an active surface, a passive surface, and contacts exposed at the active surface; an interposer electrically connected to the substrate; a plurality of semiconductor chips overlying the substrate and electrically connected to the substrate through the interposer, and a stiffener overlying the substrate and having an aperture extending therethrough, the plurality of semiconductor chips being positioned within the aperture.Type: GrantFiled: May 28, 2021Date of Patent: May 21, 2024Assignee: Google LLCInventors: Madhusudan K. Iyengar, Christopher Malone, Woon-Seong Kwon, Emad Samadiani, Melanie Beauchemin, Padam Jain, Teckgyu Kang, Yuan Li, Connor Burgess, Norman Paul Jouppi, Nicholas Stevens-Yu, Yingying Wang
-
Publication number: 20240160909Abstract: Methods, systems, and apparatus, including computer-readable media, are described for a hardware circuit configured to implement a neural network. The circuit includes a first memory, respective first and second processor cores, and a shared memory. The first memory provides data for performing computations to generate an output for a neural network layer. Each of the first and second cores include a vector memory for storing vector values derived from the data provided by the first memory. The shared memory is disposed generally intermediate the first memory and at least one core and includes: i) a direct memory access (DMA) data path configured to route data between the shared memory and the respective vector memories of the first and second cores and ii) a load-store data path configured to route data between the shared memory and respective vector registers of the first and second cores.Type: ApplicationFiled: January 25, 2024Publication date: May 16, 2024Inventors: Thomas Norrie, Andrew Everett Phelps, Norman Paul Jouppi, Matthew Leever Hedlund
-
Patent number: 11934826Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the respective vectors of values from respective resources of the processor cores using a direct memory access (DMA) data path of the shared memory. The shared memory performs an accumulation operation on the respective vectors of values using an operator unit coupled to the shared memory. The operator unit is configured to accumulate values based on arithmetic operations encoded at the operator unit. A result vector is generated based on performing the accumulation operation using the respective vectors of values.Type: GrantFiled: November 19, 2021Date of Patent: March 19, 2024Assignee: Google LLCInventors: Thomas Norrie, Gurushankar Rajamani, Andrew Everett Phelps, Matthew Leever Hedlund, Norman Paul Jouppi
-
Patent number: 11922292Abstract: Methods, systems, and apparatus, including computer-readable media, are described for a hardware circuit configured to implement a neural network. The circuit includes a first memory, respective first and second processor cores, and a shared memory. The first memory provides data for performing computations to generate an output for a neural network layer. Each of the first and second cores include a vector memory for storing vector values derived from the data provided by the first memory. The shared memory is disposed generally intermediate the first memory and at least one core and includes: i) a direct memory access (DMA) data path configured to route data between the shared memory and the respective vector memories of the first and second cores and ii) a load-store data path configured to route data between the shared memory and respective vector registers of the first and second cores.Type: GrantFiled: May 14, 2020Date of Patent: March 5, 2024Assignee: Google LLCInventors: Thomas Norrie, Andrew Everett Phelps, Norman Paul Jouppi, Matthew Leever Hedlund
-
Patent number: 11915139Abstract: Methods, systems, and apparatus for updating machine learning models to improve locality are described. In one aspect, a method includes receiving data of a machine learning model. The data represents operations of the machine learning model and data dependencies between the operations. Data specifying characteristics of a memory hierarchy for a machine learning processor on which the machine learning model is going to be deployed is received. The memory hierarchy includes multiple memories at multiple memory levels for storing machine learning data used by the machine learning processor when performing machine learning computations using the machine learning model. An updated machine learning model is generated by modifying the operations and control dependencies of the machine learning model to account for the characteristics of the memory hierarchy. Machine learning computations are performed using the updated machine learning model.Type: GrantFiled: February 15, 2022Date of Patent: February 27, 2024Assignee: Google LLCInventors: Doe Hyun Yoon, Nishant Patil, Norman Paul Jouppi
-
Publication number: 20240061742Abstract: Aspects of the disclosure are directed to a computation unit implementing a systolic array and configured for detecting errors while processing data on the systolic array. Checksum circuit in communication with a systolic array is configured to compute checksums and perform error detection while the systolic array processes input data. Instead of pre-generating checksums in input matrices, input matrices can be directly fed into the systolic array through the checksum circuit. On the output side, the checksum circuit can generate and compare checksums with checksums in an output matrix generated by the systolic array. Error checking the operations to generate the output matrix can be performed without delaying the operations of the systolic array, and without preprocessing the input matrices.Type: ApplicationFiled: November 3, 2023Publication date: February 22, 2024Inventors: Doe Hyun Yoon, Norman Paul Jouppi
-
Patent number: 11907330Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. Each cell of the matrix multiply includes: a weight matrix register configured to receive a weight input from either a transposed or a non-transposed weight shift register; a transposed weight shift register configured to receive a weight input from a horizontal direction to be stored in the weight matrix register; a non-transposed weight shift register configured to receive a weight input from a vertical direction to be stored in the weight matrix register; and a multiply unit that is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.Type: GrantFiled: February 17, 2023Date of Patent: February 20, 2024Assignee: Google LLCInventors: Andrew Everett Phelps, Norman Paul Jouppi
-
Publication number: 20240037373Abstract: Aspects of the disclosure are directed to jointly searching machine learning model architectures and hardware architectures in a combined space of models, hardware, and mapping strategies. A search strategy is utilized where all models, hardware, and mappings are evaluated together at once via weight sharing and a supernetwork. A multi-objective reward function is utilized with objectives for quality, performance, power, and area.Type: ApplicationFiled: July 28, 2022Publication date: February 1, 2024Inventors: Sheng Li, Norman Paul Jouppi, Garrett Axel Andersen, Quoc V. Le, Liqun Cheng, Parthasarathy Ranganathan
-
Patent number: 11853156Abstract: Aspects of the disclosure are directed to a computation unit implementing a systolic array and configured for detecting errors while processing data on the systolic array. Checksum circuit in communication with a systolic array is configured to compute checksums and perform error detection while the systolic array processes input data. Instead of pre-generating checksums in input matrices, input matrices can be directly fed into the systolic array through the checksum circuit. On the output side, the checksum circuit can generate and compare checksums with checksums in an output matrix generated by the systolic array. Error checking the operations to generate the output matrix can be performed without delaying the operations of the systolic array, and without preprocessing the input matrices.Type: GrantFiled: October 7, 2022Date of Patent: December 26, 2023Assignee: Google LLCInventors: Doe Hyun Yoon, Norman Paul Jouppi