Patents Assigned to GROQ, INC.
-
Patent number: 11856226Abstract: Introduced here is a technique to create small compressed image files while preserving data quality upon decompression. Upon receiving an uncompressed data, such as an image, a video, an audio, and/or a structured data, a machine learning model identifies an object in the uncompressed data such as a house, a dog, a text, a distinct audio signal, a unique data pattern, etc. The identified object is compressed using a compression treatment optimized for the identified object. The identified object, either before or after the compression, is removed from the uncompressed data. The uncompressed data with the identified object removed is compressed using a standard compression treatment.Type: GrantFiled: March 8, 2021Date of Patent: December 26, 2023Assignee: Groq, Inc.Inventor: Jonathan Alexander Ross
-
Patent number: 11822510Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.Type: GrantFiled: March 1, 2022Date of Patent: November 21, 2023Assignee: Groq, Inc.Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
-
Patent number: 11809514Abstract: A method comprises receiving a kernel used to convolve with an input tensor. For a first dimension of the kernel, a square block of values for each single dimensional vector of the kernel that includes all rotations of that single dimensional vector is generated. For each additional dimension of the kernel, group blocks of an immediately preceding dimension into sets of blocks, each set of blocks including blocks of the immediately preceding dimension that are aligned along a vector that is parallel to the axis of the dimension; and generate, for the additional dimension, one or more blocks of values, each block including all rotations of blocks within each of the sets of blocks of the immediately preceding dimension. The block of values corresponding to the last dimension in the additional dimensions of the kernel is output as the expanded kernel.Type: GrantFiled: November 4, 2021Date of Patent: November 7, 2023Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Thomas Hawkins, Gregory Michael Thorson, Matt Boyd
-
Patent number: 11645226Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.Type: GrantFiled: March 17, 2022Date of Patent: May 9, 2023Assignee: Groq, Inc.Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
-
Patent number: 11625618Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.Type: GrantFiled: November 17, 2021Date of Patent: April 11, 2023Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Gregory M. Thorson
-
Patent number: 11625619Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.Type: GrantFiled: November 22, 2021Date of Patent: April 11, 2023Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Gregory Michael Thorson
-
Patent number: 11568275Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.Type: GrantFiled: July 30, 2019Date of Patent: January 31, 2023Assignee: GROQ, INC.Inventors: Jonathan Alexander Ross, Gregory M. Thorson
-
Patent number: 11537687Abstract: A method comprises accessing a flattened input stream that includes a set of parallel vectors representing a set of input values of a kernel-sized tile of an input tensor that is to be convolved with a kernel. An expanded kernel is received that is generated by permuting values from the kernel. A control pattern is received that includes a set of vectors each corresponding to the output value position for the kernel-sized tile of the output and indicating a vector of the flattened input stream to access input values. The method further comprises generating, for each output position of each kernel-sized tile of the output, a dot product between a first vector that includes values of the flattened input stream as selected by the control pattern, and a second vector corresponding to a vector in the expanded kernel corresponding to the output position.Type: GrantFiled: November 18, 2019Date of Patent: December 27, 2022Assignee: GROQ, INC.Inventors: Jonathan Alexander Ross, Tom Hawkins, Gregory Michael Thorson, Matt Boyd
-
Patent number: 11474557Abstract: In one embodiment, the present disclosure includes multichip timing synchronization circuits and methods. In one embodiment, hardware counters in different systems are synchronized. Programs on the systems may include synchronization instructions. A second system executes synchronization instruction, and in response thereto, synchronizes a local software counter to a local hardware counter. The software counter on the second system may be delayed a fixed period of time corresponding to a program delay on the first system. The software counter on the second system may further be delayed by an offset to bring software counters on the two systems into sync.Type: GrantFiled: September 15, 2020Date of Patent: October 18, 2022Assignee: GROQ, INC.Inventors: Gregory Michael Thorson, Srivathsa Dhruvanarayan
-
Patent number: 11461433Abstract: Introduced here is a technique to detect and/or correct errors in computation. The ability to correct errors in computation can increase the speed of the processor, reduce the power consumption of the processor, and reduce the distance between the transistors within the processor because the errors thus generated can be detected and corrected. In one embodiment, an error correcting module, running either in software or in hardware, can detect an error in matrix multiplication, by calculating an expected sum of all elements in the resulting matrix, and an actual sum of all elements in the resulting matrix. When there is a difference between the expected sum and the resulting sum, the error correcting module detects an error. In another embodiment, in addition to detecting the error, the error correcting module can determine the location and the magnitude of the error, thus correcting the erroneous computation.Type: GrantFiled: January 10, 2018Date of Patent: October 4, 2022Assignee: GROQ, INC.Inventor: Jonathan Alexander Ross
-
Patent number: 11455370Abstract: A method comprises receiving an input tensor for convolution with a kernel, dividing the input tensor into one or more tiles with each tile having a size equal to the kernel, flattening the values in the one or more tiles into vectors to generate the flattened input stream.Type: GrantFiled: November 18, 2019Date of Patent: September 27, 2022Assignee: GROQ, INC.Inventors: Jonathan Alexander Ross, Thomas Hawkins, Gregory Michael Thorson, Matt Boyd
-
Patent number: 11392535Abstract: A computational array is implemented in which all operands and results are loaded or output from a single side of the array. The computational array comprises a plurality of cells arranged in n rows and m columns, each configured to produce a processed value based upon a weight value and an activation value. The cells receive weight and activation values via colinear weight and activation transmission channels that each extend across a first side edge of the computational array to provide weight values and activation values to the cells of the array. In addition, result values produced at a top cell of each of the m columns of the array are routed through the array to be output from the same first side edge of the array at a same relative timing at which the result values were produced.Type: GrantFiled: November 25, 2020Date of Patent: July 19, 2022Assignee: GROQ, INC.Inventors: Jonathan Alexander Ross, Tom Hawkins, Dennis Charles Abts
-
Patent number: 11360934Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.Type: GrantFiled: November 27, 2020Date of Patent: June 14, 2022Assignee: GROQ, INC.Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
-
Patent number: 11307827Abstract: Embodiments of the present disclosure pertain to switch matrix circuit including a data permutation circuit. In one embodiment, the switch matrix comprises a plurality of adjacent switching blocks configured along a first axis, wherein the plurality of adjacent switching blocks each receive data and switch control settings along a second axis. The switch matrix includes a permutation circuit comprising, in each switching block, a plurality of switching stages spanning a plurality of adjacent switching blocks and at least one switching stage that does not span to adjacent switching blocks. The permutation circuit receives data in a first pattern and outputs the data in a second pattern. The data permutation performed by the switching stages is based on the particular switch control settings received in the adjacent switching blocks along the second axis.Type: GrantFiled: July 14, 2020Date of Patent: April 19, 2022Assignee: Groq, Inc.Inventor: Gregory Michael Thorson
-
Patent number: 11301546Abstract: A method comprises receiving one or more sizes for each of the dimensions of a kernel that is convolved with an input tensor to generate an output activation, generating a control pattern used to compute output values for the convolution of the input tensor, with the control pattern being a square matrix with each dimension being a size equal to the product of the width and the height of the kernel. The control pattern is generated by generating a value for each position of the control pattern that is based on a location of the position in the control pattern and the one or more sizes of each of the dimensions of the kernel, the value indicating a location from which to access values from a flattened input tensor for the convolution with the kernel.Type: GrantFiled: November 18, 2019Date of Patent: April 12, 2022Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Thomas Hawkins, Gregory Michael Thorson, Matt Boyd
-
Patent number: 11301212Abstract: Embodiments of the present disclosure pertain to multimodal digital multiplier circuits and methods. In one embodiment, partial product outputs of digital multiplication circuits are selectively inverted based on a mode control signal. The mode control signal may be set based on a format of the operands input to the multiplier. Example embodiments of the disclosure may multiply combinations of signed and unsigned input operands using different modes.Type: GrantFiled: October 1, 2020Date of Patent: April 12, 2022Assignee: Groq, Inc.Inventor: Christopher Aaron Clark
-
Patent number: 11288595Abstract: The system presented here can create a new machine learning model by improving and combining existing machine learning models in a modular way. By combining existing machine learning models, the system can avoid the step of training a new machine model. Further, by combining existing machine models in a modular way, the system can selectively train only a module, i.e. a part, of the new machine learning model. Using the disclosed system, the expensive steps of gathering 8 TB of data and using the data to train the new machine learning model over 16,000 processors for three days can be entirely avoided, or can be reduced by a half, a third, etc. depending on the size of the module requiring training.Type: GrantFiled: February 13, 2018Date of Patent: March 29, 2022Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Douglas Wightman
-
Patent number: 11263129Abstract: A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.Type: GrantFiled: July 30, 2019Date of Patent: March 1, 2022Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Dennis Charles Abts, John Thompson, Gregory M. Thorson
-
Patent number: 11243880Abstract: A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.Type: GrantFiled: September 14, 2018Date of Patent: February 8, 2022Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Dennis Charles Abts, John Thompson, Gregory M. Thorson
-
Patent number: 11216734Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.Type: GrantFiled: July 30, 2019Date of Patent: January 4, 2022Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Gregory M. Thorson