Patents Assigned to GROQ, INC.
-
Patent number: 12660683Abstract: Embodiments are directed to an integrated circuit with one or more tile structures. The integrated circuit can include a first die and a second die connected to the first die forming a tile structure. The first die is shifted relative to the second die by a first shift amount along a first dimension and by a second shift amount along a second dimension orthogonal to the first dimension. The integrated circuit can further include an array of tile structures. Each tile structure in the array includes a first die, and a second die connected to the first die in a face-to-face configuration. The first die is shifted relative to the second die by a first shift amount along a first dimension and by a second shift amount along a second dimension orthogonal to the first dimension forming an offset alignment between the first die and the second die.Type: GrantFiled: February 23, 2022Date of Patent: June 16, 2026Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Dinesh Maheshwari
-
Patent number: 12561279Abstract: Embodiments are directed to a deterministic streaming system with one or more deterministic streaming processors each having an array of processing elements and a first deterministic memory coupled to the processing elements. The deterministic streaming system further includes a second deterministic memory with multiple data banks having a global memory address space, and a controller. The controller initiates retrieval of first data from the data banks of the second deterministic memory as a first plurality of streams, each stream of the first plurality of streams streaming toward a respective group of processing elements of the array of processing elements. The controller further initiates writing of second data to the data banks of the second deterministic memory as a second plurality of streams, each stream of the second plurality of streams streaming from the respective group of processing elements toward a respective data bank of the second deterministic memory.Type: GrantFiled: June 3, 2024Date of Patent: February 24, 2026Assignee: Groq, Inc.Inventor: Dennis Charles Abts
-
Patent number: 12547459Abstract: Embodiments are directed to data transformation algorithms performed at a deterministic streaming processor. Blocks of input data is streamed from a memory of the processor via a superlane of a processor to a first functional slice of the processor. The first functional slice permutes each block of input data, and each permuted block is streamed back to the memory. Permuted blocks of input data are then streamed from the memory via the superlane to a second functional slice of the processor. The second functional slice aligns portions of each permuted block to lanes within the superlane. Aligned portions of permuted blocks are then streamed to a third functional slice of the processor. The third functional slice merges the aligned portions of the permuted blocks to generate result data in a transformation domain suitable for at least one convolutional layer of the ResNet-50 model.Type: GrantFiled: January 27, 2022Date of Patent: February 10, 2026Assignee: Groq, Inc.Inventors: Matthew Boyd, Sahil Parmar, Dennis Charles Abts
-
Patent number: 12475363Abstract: A visualizer receives a compiled program to be run on a tensor streaming processor, which indicates a predetermined timing at which each functional unit of the processor receives instructions for processing data, and generates a visualization model used to display a schedule comprising elements corresponding to instructions received by each functional unit of a data path of the processor, arranged based upon a time at which each instruction is executed by its respective functional unit in accordance with the generated model. Due to the deterministic nature of the tensor streaming processor, the visualizer infers the flow of data across communication lanes of the processor, and to predicts the location of data within the processor for a given cycle during execution of the compiled program, without the need to actually execute the compiled program or to implement breakpoints within the program at specific cycles.Type: GrantFiled: November 7, 2022Date of Patent: November 18, 2025Assignee: Groq, Inc.Inventor: Mark Wong-VanHaren
-
Patent number: 12443216Abstract: Clock period synthesis for fine-grain power management is provided. Methods are described for enabling clock waveform synthesis for, in some embodiments, tensor or graphical processors that enable shorter runtime latency, higher computational job throughput, more efficient power management, and a lower implementation cost than alternative clock waveform methods. This Abstract and the independent Claims are concise signifiers of embodiments of the claimed inventions. The Abstract does not limit the scope of the claimed inventions.Type: GrantFiled: May 24, 2023Date of Patent: October 14, 2025Assignee: Groq, Inc.Inventor: James David Sproch
-
Patent number: 12411762Abstract: A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.Type: GrantFiled: December 22, 2023Date of Patent: September 9, 2025Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Dennis Charles Abts, John Thompson, Gregory M. Thorson
-
Patent number: 12373107Abstract: When reading and writing DRAM (dynamic random-access memory), the latency and bandwidth is often unpredictable with large variations. One reason is because all the DRAM memory banks require periodic refreshes and maintenance cycles that interrupt these accesses. DRAM refresh and maintenance cycles are synchronized with the read/write accesses in a mutually exclusive manner, hence, preventing the accesses from being interfered with by a refresh or maintenance cycle resulting in predictable latency and bandwidth performance during read/write operations.Type: GrantFiled: December 13, 2023Date of Patent: July 29, 2025Assignee: Groq, Inc.Inventors: Albert Cheng, Michael Bye, Rahul Shah
-
Patent number: 12373018Abstract: In one embodiment, the present disclosure includes a method of reducing power in an artificial intelligence processor. For each cycle, over a plurality of cycles, an AI model is translated into operations executable on an artificial intelligence processor. The translating is based on power parameters that correspond to power consumption and performance of the artificial intelligence processor. The AI processor is configured with the executable operations, and input activation data sets are processed. Accordingly, result sets, power consumption data, and performance data are generated and stored over the plurality of cycles. The method further includes training an AI algorithm using the stored parameters, the power consumption data, and the performance data. A trained AI algorithm outputs a plurality of optimized parameters to reduce power consumption of the AI processor. The AI model is then translated into optimized executable operations based on the plurality of optimized parameters.Type: GrantFiled: January 18, 2024Date of Patent: July 29, 2025Assignee: Groq, Inc.Inventor: Sushma Honnavara-Prasad
-
Patent number: 12340300Abstract: Improved placement of memory and functional modules, ‘tiles’, within a tiled processor architecture are disclosed for linear algebra calculations involving vectors and matrices comprising large amounts of data. The improved placement places the data in close proximity to the functional modules performing calculations using the data. These modules enable these calculations to be performed more quickly while using less energy. These modules, in particular, improve the efficiency of the training and application of deep learning and artificial neural network systems. This Abstract and the independent Claims are concise signifiers of embodiments of the claimed inventions. The Abstract does not limit the scope of the claimed inventions.Type: GrantFiled: March 16, 2021Date of Patent: June 24, 2025Assignee: Groq, Inc.Inventors: Dennis Charles Abts, Jonathan Alexander Ross
-
Patent number: 12287695Abstract: One or more embodiments of a regulator circuit for providing power to a load device having a first power demand profile over time. The regulator circuit comprises a regulator and an energy storage device coupled to the regulator and the load device. The regulator circuit is configured to scavenge provided energy that is available beyond the first power demand profile. Further, the regulator circuit is configured to store that energy in the energy storage device, and the energy storage device is configured to augment deliverable peak power to the load device when the load device requires more power than is provided by the regulator circuit.Type: GrantFiled: April 15, 2024Date of Patent: April 29, 2025Assignee: Groq, Inc.Inventors: James David Sproch, Dinesh Maheshwari
-
Patent number: 12277444Abstract: A system contains a network of processors arranged in a plurality of nodes. Each node comprises a respective plurality of processors connected via local links, and different nodes are connected via global links. The processors of the network communicate with each other to establish a global counter for the network, enabling deterministic communication between the processors of the network. A compiler is configured to explicitly schedule communication traffic across the global and local links of the network of processors based upon the deterministic links between the processors, which enable software-scheduled networking with explicit send or receive instructions executed by functional units of the processors at specific times, to establish a specific ordering of operations performed by the network of processors. In some embodiments, the processors of the network of processors are tensor streaming processors (TSPs).Type: GrantFiled: November 23, 2022Date of Patent: April 15, 2025Assignee: Groq, Inc.Inventors: Dennis Charles Abts, Jonathan Ross, Garrin Kimmell, Michael Bye, Matthew Boyd, Andrew Ling
-
Patent number: 12271339Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.Type: GrantFiled: October 9, 2023Date of Patent: April 8, 2025Assignee: Groq, Inc.Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
-
Patent number: 12260118Abstract: A deterministic apparatus comprising a deterministic near-compute memory communicatively coupled with and proximate to a deterministic processor. The deterministic near-compute memory comprises a plurality of data banks having a global memory address space, a control bus, a data input bus and a data output bus for each data bank. The deterministic processor is configured to initiate, via the control bus, retrieval of a set of data from the plurality of data banks. The retrieved set of data comprises at least one row of a selected one of the data banks passed via the data output bus onto a plurality of stream registers of the deterministic processor.Type: GrantFiled: December 12, 2022Date of Patent: March 25, 2025Assignee: Groq, Inc.Inventor: Dinesh Maheshwari
-
Patent number: 12248357Abstract: Embodiments pertain to reducing power consumption in a computing system comprising one or more deterministic processors. A controller generates a plurality of control signals for a voltage regulator to regulate a supply voltage of a respective one of the one or more deterministic processors. A power management module determines an initial profile for power consumption and performance of an algorithm executed on the respective deterministic processor having an initial value for the supply voltage and an initial value for a clock frequency. The power management module further determines a target profile for power consumption and performance of the algorithm executed on the respective deterministic processor. The controller modifies the plurality of control signals based on the initial profile and the target profile. The respective deterministic processor executes the algorithm while the supply voltage is dynamically modified by the voltage regulator based on the modified plurality of control signals.Type: GrantFiled: September 20, 2021Date of Patent: March 11, 2025Assignee: GROQ, Inc.Inventors: Omar Ahmad, Geert Rosseel
-
Patent number: 12222894Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.Type: GrantFiled: July 13, 2023Date of Patent: February 11, 2025Assignee: GROQ, INC.Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
-
Patent number: 12223436Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.Type: GrantFiled: January 5, 2024Date of Patent: February 11, 2025Assignee: GROQ, INC.Inventors: Jonathan Alexander Ross, Gregory M. Thorson
-
Patent number: 12210642Abstract: Embodiments are directed to a computing system with permission control via data redundancy. The computing system includes a memory and a permission control circuit coupled to the memory. The permission control circuit encodes a first data vector by using a bit position register with a first permission control code for a first user, writes the encoded first data vector into the memory, and updates content of the bit position register from the first permission control code to a second permission control code for a second user. The encoded first data vector written into the memory is inaccessible for the second user based on the updated content of the bit position register.Type: GrantFiled: September 27, 2022Date of Patent: January 28, 2025Assignee: GROQ, INC.Inventors: Zefu Dai, John Thompson
-
Patent number: 12175287Abstract: A processor comprises a computational array of computational elements and an instruction dispatch circuit. The computational elements receive data operands via data lanes extending along a first dimension, and processes the operands based upon instructions received from the instruction dispatch circuit via instruction lanes extending along a second dimension. The instruction dispatch circuit receives raw instructions, and comprises an instruction dispatch unit (IDU) processor that processes a set of raw instructions to generate processed instructions for dispatch to the computational elements, where the number of processed instructions is not equal to the number of instructions of the set of raw instructions.Type: GrantFiled: December 20, 2023Date of Patent: December 24, 2024Assignee: GROQ, INC.Inventors: Brian Lee Kurtz, Dinesh Maheshwari, James David Sproch
-
Patent number: 12001383Abstract: Embodiments are directed to a deterministic streaming system with one or more deterministic streaming processors each having an array of processing elements and a first deterministic memory coupled to the processing elements. The deterministic streaming system further includes a second deterministic memory with multiple data banks having a global memory address space, and a controller. The controller initiates retrieval of first data from the data banks of the second deterministic memory as a first plurality of streams, each stream of the first plurality of streams streaming toward a respective group of processing elements of the array of processing elements. The controller further initiates writing of second data to the data banks of the second deterministic memory as a second plurality of streams, each stream of the second plurality of streams streaming from the respective group of processing elements toward a respective data bank of the second deterministic memory.Type: GrantFiled: July 6, 2022Date of Patent: June 4, 2024Assignee: Groq, Inc.Inventor: Dennis Charles Abts
-
Patent number: 11960346Abstract: One or more embodiments of a regulator circuit for providing power to a load device having a first power demand profile over time. The regulator circuit comprises a regulator and an energy storage device coupled to the regulator and the load device. The regulator circuit is configured to scavenge provided energy that is available beyond the first power demand profile. Further, the regulator circuit is configured to store that energy in the energy storage device, and the energy storage device is configured to augment deliverable peak power to the load device when the load device requires more power than is provided by the regulator circuit.Type: GrantFiled: October 8, 2020Date of Patent: April 16, 2024Assignee: GROQ, INC.Inventors: James David Sproch, Dinesh Maheshwari