Patents by Inventor Christopher Aaron Clark
Christopher Aaron Clark has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230297372Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.Type: ApplicationFiled: December 5, 2022Publication date: September 21, 2023Inventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps
-
Patent number: 11748443Abstract: A circuit comprises an input register configured to receive an input vector of elements, a control register configured to receive a control vector of elements, wherein each element of the control vector corresponds to a respective element of the input vector, and wherein each element specifies a permutation of a corresponding element of the input vector, and a permute execution circuit configured to generate an output vector of elements corresponding to a permutation of the input vector. Generating each element of the output vector comprises accessing, at the input register, a particular element of the input vector, accessing, at the control register, a particular element of the control vector corresponding to the particular element of the input vector, and outputting the particular element of the input vector as an element at a particular position of the output vector that is selected based on the particular element of the control vector.Type: GrantFiled: March 22, 2021Date of Patent: September 5, 2023Assignee: Google LLCInventors: Dong Hyuk Woo, Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam, Jonathan Ross, Christopher Aaron Clark
-
Publication number: 20230206070Abstract: A circuit for performing neural network computations for a neural network comprising a plurality of layers, the circuit comprising: activation circuitry configured to receive a vector of accumulated values and configured to apply a function to each accumulated value to generate a vector of activation values; and normalization circuitry coupled to the activation circuitry and configured to generate a respective normalized value from each activation value.Type: ApplicationFiled: March 1, 2023Publication date: June 29, 2023Inventors: Gregory Michael Thorson, Christopher Aaron Clark, Dan Luu
-
Patent number: 11620508Abstract: A circuit for performing neural network computations for a neural network comprising a plurality of layers, the circuit comprising: activation circuitry configured to receive a vector of accumulated values and configured to apply a function to each accumulated value to generate a vector of activation values; and normalization circuitry coupled to the activation circuitry and configured to generate a respective normalized value from each activation value.Type: GrantFiled: January 11, 2019Date of Patent: April 4, 2023Assignee: Google LLCInventors: Gregory Michael Thorson, Christopher Aaron Clark, Dan Luu
-
Patent number: 11520581Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.Type: GrantFiled: May 24, 2021Date of Patent: December 6, 2022Assignee: Google LLCInventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps
-
Patent number: 11301212Abstract: Embodiments of the present disclosure pertain to multimodal digital multiplier circuits and methods. In one embodiment, partial product outputs of digital multiplication circuits are selectively inverted based on a mode control signal. The mode control signal may be set based on a format of the operands input to the multiplier. Example embodiments of the disclosure may multiply combinations of signed and unsigned input operands using different modes.Type: GrantFiled: October 1, 2020Date of Patent: April 12, 2022Assignee: Groq, Inc.Inventor: Christopher Aaron Clark
-
Publication number: 20210357212Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.Type: ApplicationFiled: May 24, 2021Publication date: November 18, 2021Inventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps
-
Publication number: 20210312011Abstract: A circuit comprises an input register configured to receive an input vector of elements, a control register configured to receive a control vector of elements, wherein each element of the control vector corresponds to a respective element of the input vector, and wherein each element specifies a permutation of a corresponding element of the input vector, and a permute execution circuit configured to generate an output vector of elements corresponding to a permutation of the input vector. Generating each element of the output vector comprises accessing, at the input register, a particular element of the input vector, accessing, at the control register, a particular element of the control vector corresponding to the particular element of the input vector, and outputting the particular element of the input vector as an element at a particular position of the output vector that is selected based on the particular element of the control vector.Type: ApplicationFiled: March 22, 2021Publication date: October 7, 2021Inventors: Dong Hyuk Woo, Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam, Jonathan Ross, Christopher Aaron Clark
-
Patent number: 11042360Abstract: In one embodiment, in a first mode, first and second input operands having a first data type are multiplied using one or more of a plurality of multipliers, and in second mode, a plurality of input operands having a second data type are multiplied using the plurality of multipliers. Accordingly, multiplier circuitry may process different input data types and share circuitry across the different modes. In some embodiments, in the first mode, products may be converted to a third data type, and in the second mode, multiple products may be concatenated. Values in the third data type, in the first mode, and concatenated values having the second data type, in the second mode, may be added across different multimodal multipliers to form a multiply-accumulator. In some embodiments, the plurality of multiply-accumulators may be configured in series.Type: GrantFiled: August 5, 2020Date of Patent: June 22, 2021Assignee: Groq, Inc.Inventors: Christopher Aaron Clark, Jonathan Ross
-
Publication number: 20210165635Abstract: A circuit for transposing a matrix comprising reversal circuitry configured, for each of one or more diagonals of the matrix, to receive elements of the matrix in a first vector and generate a second vector that includes the elements of the matrix in an order that is a reverse of an order of the elements of the matrix in the first vector, and rotation circuitry configured, for each of the one or more diagonals of the matrix, to determine a number of positions by which to rotate the elements of the matrix in the second vector, receive the second vector of elements of the matrix, and generate a third vector that includes the elements of the matrix in the second vector in an order that is a rotation of the elements of the matrix in the second vector by the determined number of positions.Type: ApplicationFiled: February 12, 2021Publication date: June 3, 2021Inventors: Jonathan Ross, Robert David Nuckolls, Christopher Aaron Clark, Chester Li, Gregory Michael Thorson
-
Patent number: 11016764Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.Type: GrantFiled: April 8, 2020Date of Patent: May 25, 2021Assignee: Google LLCInventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps
-
Patent number: 10956537Abstract: A circuit comprises an input register configured to receive an input vector of elements, a control register configured to receive a control vector of elements, wherein each element of the control vector corresponds to a respective element of the input vector, and wherein each element specifies a permutation of a corresponding element of the input vector, and a permute execution circuit configured to generate an output vector of elements corresponding to a permutation of the input vector. Generating each element of the output vector comprises accessing, at the input register, a particular element of the input vector, accessing, at the control register, a particular element of the control vector corresponding to the particular element of the input vector, and outputting the particular element of the input vector as an element at a particular position of the output vector that is selected based on the particular element of the control vector.Type: GrantFiled: April 6, 2020Date of Patent: March 23, 2021Assignee: Google LLCInventors: Dong Hyuk Woo, Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam, Jonathan Ross, Christopher Aaron Clark
-
Patent number: 10922057Abstract: A circuit for transposing a matrix comprising reversal circuitry configured, for each of one or more diagonals of the matrix, to receive elements of the matrix in a first vector and generate a second vector that includes the elements of the matrix in an order that is a reverse of an order of the elements of the matrix in the first vector, and rotation circuitry configured, for each of the one or more diagonals of the matrix, to determine a number of positions by which to rotate the elements of the matrix in the second vector, receive the second vector of elements of the matrix, and generate a third vector that includes the elements of the matrix in the second vector in an order that is a rotation of the elements of the matrix in the second vector by the determined number of positions.Type: GrantFiled: September 23, 2019Date of Patent: February 16, 2021Assignee: Google LLCInventors: Jonathan Ross, Robert David Nuckolls, Christopher Aaron Clark, Chester Li, Gregory Michael Thorson
-
Patent number: 10915318Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.Type: GrantFiled: March 4, 2019Date of Patent: February 9, 2021Assignee: Google LLCInventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps
-
Patent number: 10853037Abstract: Embodiments of the present disclosure pertain to digital circuits with compressed carries. In one embodiment, an adder circuit generates a sum and carry. The carry is compressed to reduce the number of bits required to represent the carry. In one embodiment, a multiplier circuit generates output product values. The output product values may be summed to produce a sum and carry. The carry may be compressed. In other embodiments, a multiplier circuit receives an input sum and compressed carry. The compressed input carry is decompressed and added to output product values and the input sum, and a resulting carry is compressed. The output of such a multiplier is another sum and compressed carry.Type: GrantFiled: July 14, 2020Date of Patent: December 1, 2020Assignee: Groq, Inc.Inventors: Christopher Aaron Clark, Jonathan Ross
-
Patent number: 10831445Abstract: Embodiments of the present disclosure pertain to multimodal digital multiplier circuits and methods. In one embodiment, partial product outputs of digital multiplication circuits are selectively inverted based on a mode control signal. The mode control signal may be set based on a format of the operands input to the multiplier. Example embodiments of the disclosure may multiply combinations of signed and unsigned input operands using different modes.Type: GrantFiled: September 20, 2018Date of Patent: November 10, 2020Assignee: Groq, Inc.Inventor: Christopher Aaron Clark
-
Publication number: 20200301996Abstract: A circuit comprises an input register configured to receive an input vector of elements, a control register configured to receive a control vector of elements, wherein each element of the control vector corresponds to a respective element of the input vector, and wherein each element specifies a permutation of a corresponding element of the input vector, and a permute execution circuit configured to generate an output vector of elements corresponding to a permutation of the input vector. Generating each element of the output vector comprises accessing, at the input register, a particular element of the input vector, accessing, at the control register, a particular element of the control vector corresponding to the particular element of the input vector, and outputting the particular element of the input vector as an element at a particular position of the output vector that is selected based on the particular element of the control vector.Type: ApplicationFiled: April 6, 2020Publication date: September 24, 2020Inventors: Dong Hyuk Woo, Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam, Jonathan Ross, Christopher Aaron Clark
-
Patent number: 10776078Abstract: In one embodiment, in a first mode, first and second input operands having a first data type are multiplied using one or more of a plurality of multipliers, and in second mode, a plurality of input operands having a second data type are multiplied using the plurality of multipliers. Accordingly, multiplier circuitry may process different input data types and share circuitry across the different modes. In some embodiments, in the first mode, products may be converted to a third data type, and in the second mode, multiple products may be concatenated. Values in the third data type, in the first mode, and concatenated values having the second data type, in the second mode, may be added across different multimodal multipliers to form a multiply-accumulator. In some embodiments, the plurality of multiply-accumulators may be configured in series.Type: GrantFiled: September 23, 2018Date of Patent: September 15, 2020Assignee: Groq, Inc.Inventors: Christopher Aaron Clark, Jonathan Ross
-
Patent number: 10725741Abstract: Embodiments of the present disclosure pertain to digital circuits with compressed carries. In one embodiment, an adder circuit generates a sum and carry. The carry is compressed to reduce the number of bits required to represent the carry. In one embodiment, a multiplier circuit generates output product values. The output product values may be summed to produce a sum and carry. The carry may be compressed. In other embodiments, a multiplier circuit receives an input sum and compressed carry. The compressed input carry is decompressed and added to output product values and the input sum, and a resulting carry is compressed. The output of such a multiplier is another sum and compressed carry.Type: GrantFiled: September 21, 2018Date of Patent: July 28, 2020Assignee: Groq, Inc.Inventors: Christopher Aaron Clark, Jonathan Ross
-
Publication number: 20200233663Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.Type: ApplicationFiled: April 8, 2020Publication date: July 23, 2020Inventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps