Patents Assigned to GROQ, INC.
  • Patent number: 11856226
    Abstract: Introduced here is a technique to create small compressed image files while preserving data quality upon decompression. Upon receiving an uncompressed data, such as an image, a video, an audio, and/or a structured data, a machine learning model identifies an object in the uncompressed data such as a house, a dog, a text, a distinct audio signal, a unique data pattern, etc. The identified object is compressed using a compression treatment optimized for the identified object. The identified object, either before or after the compression, is removed from the uncompressed data. The uncompressed data with the identified object removed is compressed using a standard compression treatment.
    Type: Grant
    Filed: March 8, 2021
    Date of Patent: December 26, 2023
    Assignee: Groq, Inc.
    Inventor: Jonathan Alexander Ross
  • Patent number: 11822510
    Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.
    Type: Grant
    Filed: March 1, 2022
    Date of Patent: November 21, 2023
    Assignee: Groq, Inc.
    Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
  • Patent number: 11809514
    Abstract: A method comprises receiving a kernel used to convolve with an input tensor. For a first dimension of the kernel, a square block of values for each single dimensional vector of the kernel that includes all rotations of that single dimensional vector is generated. For each additional dimension of the kernel, group blocks of an immediately preceding dimension into sets of blocks, each set of blocks including blocks of the immediately preceding dimension that are aligned along a vector that is parallel to the axis of the dimension; and generate, for the additional dimension, one or more blocks of values, each block including all rotations of blocks within each of the sets of blocks of the immediately preceding dimension. The block of values corresponding to the last dimension in the additional dimensions of the kernel is output as the expanded kernel.
    Type: Grant
    Filed: November 4, 2021
    Date of Patent: November 7, 2023
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Thomas Hawkins, Gregory Michael Thorson, Matt Boyd
  • Patent number: 11645226
    Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.
    Type: Grant
    Filed: March 17, 2022
    Date of Patent: May 9, 2023
    Assignee: Groq, Inc.
    Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
  • Patent number: 11625618
    Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.
    Type: Grant
    Filed: November 17, 2021
    Date of Patent: April 11, 2023
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Gregory M. Thorson
  • Patent number: 11625619
    Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.
    Type: Grant
    Filed: November 22, 2021
    Date of Patent: April 11, 2023
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Gregory Michael Thorson
  • Patent number: 11568275
    Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.
    Type: Grant
    Filed: July 30, 2019
    Date of Patent: January 31, 2023
    Assignee: GROQ, INC.
    Inventors: Jonathan Alexander Ross, Gregory M. Thorson
  • Patent number: 11537687
    Abstract: A method comprises accessing a flattened input stream that includes a set of parallel vectors representing a set of input values of a kernel-sized tile of an input tensor that is to be convolved with a kernel. An expanded kernel is received that is generated by permuting values from the kernel. A control pattern is received that includes a set of vectors each corresponding to the output value position for the kernel-sized tile of the output and indicating a vector of the flattened input stream to access input values. The method further comprises generating, for each output position of each kernel-sized tile of the output, a dot product between a first vector that includes values of the flattened input stream as selected by the control pattern, and a second vector corresponding to a vector in the expanded kernel corresponding to the output position.
    Type: Grant
    Filed: November 18, 2019
    Date of Patent: December 27, 2022
    Assignee: GROQ, INC.
    Inventors: Jonathan Alexander Ross, Tom Hawkins, Gregory Michael Thorson, Matt Boyd
  • Patent number: 11474557
    Abstract: In one embodiment, the present disclosure includes multichip timing synchronization circuits and methods. In one embodiment, hardware counters in different systems are synchronized. Programs on the systems may include synchronization instructions. A second system executes synchronization instruction, and in response thereto, synchronizes a local software counter to a local hardware counter. The software counter on the second system may be delayed a fixed period of time corresponding to a program delay on the first system. The software counter on the second system may further be delayed by an offset to bring software counters on the two systems into sync.
    Type: Grant
    Filed: September 15, 2020
    Date of Patent: October 18, 2022
    Assignee: GROQ, INC.
    Inventors: Gregory Michael Thorson, Srivathsa Dhruvanarayan
  • Patent number: 11461433
    Abstract: Introduced here is a technique to detect and/or correct errors in computation. The ability to correct errors in computation can increase the speed of the processor, reduce the power consumption of the processor, and reduce the distance between the transistors within the processor because the errors thus generated can be detected and corrected. In one embodiment, an error correcting module, running either in software or in hardware, can detect an error in matrix multiplication, by calculating an expected sum of all elements in the resulting matrix, and an actual sum of all elements in the resulting matrix. When there is a difference between the expected sum and the resulting sum, the error correcting module detects an error. In another embodiment, in addition to detecting the error, the error correcting module can determine the location and the magnitude of the error, thus correcting the erroneous computation.
    Type: Grant
    Filed: January 10, 2018
    Date of Patent: October 4, 2022
    Assignee: GROQ, INC.
    Inventor: Jonathan Alexander Ross
  • Patent number: 11455370
    Abstract: A method comprises receiving an input tensor for convolution with a kernel, dividing the input tensor into one or more tiles with each tile having a size equal to the kernel, flattening the values in the one or more tiles into vectors to generate the flattened input stream.
    Type: Grant
    Filed: November 18, 2019
    Date of Patent: September 27, 2022
    Assignee: GROQ, INC.
    Inventors: Jonathan Alexander Ross, Thomas Hawkins, Gregory Michael Thorson, Matt Boyd
  • Patent number: 11392535
    Abstract: A computational array is implemented in which all operands and results are loaded or output from a single side of the array. The computational array comprises a plurality of cells arranged in n rows and m columns, each configured to produce a processed value based upon a weight value and an activation value. The cells receive weight and activation values via colinear weight and activation transmission channels that each extend across a first side edge of the computational array to provide weight values and activation values to the cells of the array. In addition, result values produced at a top cell of each of the m columns of the array are routed through the array to be output from the same first side edge of the array at a same relative timing at which the result values were produced.
    Type: Grant
    Filed: November 25, 2020
    Date of Patent: July 19, 2022
    Assignee: GROQ, INC.
    Inventors: Jonathan Alexander Ross, Tom Hawkins, Dennis Charles Abts
  • Patent number: 11360934
    Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.
    Type: Grant
    Filed: November 27, 2020
    Date of Patent: June 14, 2022
    Assignee: GROQ, INC.
    Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
  • Patent number: 11307827
    Abstract: Embodiments of the present disclosure pertain to switch matrix circuit including a data permutation circuit. In one embodiment, the switch matrix comprises a plurality of adjacent switching blocks configured along a first axis, wherein the plurality of adjacent switching blocks each receive data and switch control settings along a second axis. The switch matrix includes a permutation circuit comprising, in each switching block, a plurality of switching stages spanning a plurality of adjacent switching blocks and at least one switching stage that does not span to adjacent switching blocks. The permutation circuit receives data in a first pattern and outputs the data in a second pattern. The data permutation performed by the switching stages is based on the particular switch control settings received in the adjacent switching blocks along the second axis.
    Type: Grant
    Filed: July 14, 2020
    Date of Patent: April 19, 2022
    Assignee: Groq, Inc.
    Inventor: Gregory Michael Thorson
  • Patent number: 11301546
    Abstract: A method comprises receiving one or more sizes for each of the dimensions of a kernel that is convolved with an input tensor to generate an output activation, generating a control pattern used to compute output values for the convolution of the input tensor, with the control pattern being a square matrix with each dimension being a size equal to the product of the width and the height of the kernel. The control pattern is generated by generating a value for each position of the control pattern that is based on a location of the position in the control pattern and the one or more sizes of each of the dimensions of the kernel, the value indicating a location from which to access values from a flattened input tensor for the convolution with the kernel.
    Type: Grant
    Filed: November 18, 2019
    Date of Patent: April 12, 2022
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Thomas Hawkins, Gregory Michael Thorson, Matt Boyd
  • Patent number: 11301212
    Abstract: Embodiments of the present disclosure pertain to multimodal digital multiplier circuits and methods. In one embodiment, partial product outputs of digital multiplication circuits are selectively inverted based on a mode control signal. The mode control signal may be set based on a format of the operands input to the multiplier. Example embodiments of the disclosure may multiply combinations of signed and unsigned input operands using different modes.
    Type: Grant
    Filed: October 1, 2020
    Date of Patent: April 12, 2022
    Assignee: Groq, Inc.
    Inventor: Christopher Aaron Clark
  • Patent number: 11288595
    Abstract: The system presented here can create a new machine learning model by improving and combining existing machine learning models in a modular way. By combining existing machine learning models, the system can avoid the step of training a new machine model. Further, by combining existing machine models in a modular way, the system can selectively train only a module, i.e. a part, of the new machine learning model. Using the disclosed system, the expensive steps of gathering 8 TB of data and using the data to train the new machine learning model over 16,000 processors for three days can be entirely avoided, or can be reduced by a half, a third, etc. depending on the size of the module requiring training.
    Type: Grant
    Filed: February 13, 2018
    Date of Patent: March 29, 2022
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Douglas Wightman
  • Patent number: 11263129
    Abstract: A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.
    Type: Grant
    Filed: July 30, 2019
    Date of Patent: March 1, 2022
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Dennis Charles Abts, John Thompson, Gregory M. Thorson
  • Patent number: 11243880
    Abstract: A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.
    Type: Grant
    Filed: September 14, 2018
    Date of Patent: February 8, 2022
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Dennis Charles Abts, John Thompson, Gregory M. Thorson
  • Patent number: 11216734
    Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.
    Type: Grant
    Filed: July 30, 2019
    Date of Patent: January 4, 2022
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Gregory M. Thorson