Patents by Inventor Jonathan Alexander Ross

Jonathan Alexander Ross has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

EXPANDED KERNEL GENERATION

Publication number: 20250094530

Abstract: A method comprises receiving a kernel used to convolve with an input tensor. For a first dimension of the kernel, a square block of values for each single dimensional vector of the kernel that includes all rotations of that single dimensional vector is generated. For each additional dimension of the kernel, group blocks of an immediately preceding dimension into sets of blocks, each set of blocks including blocks of the immediately preceding dimension that are aligned along a vector that is parallel to the axis of the dimension; and generate, for the additional dimension, one or more blocks of values, each block including all rotations of blocks within each of the sets of blocks of the immediately preceding dimension. The block of values corresponding to the last dimension in the additional dimensions of the kernel is output as the expanded kernel.

Type: Application

Filed: November 8, 2024

Publication date: March 20, 2025

Inventors: Jonathan Alexander Ross, Thomas Hawkins, Gregory Michael Thorson, Matt Boyd
Processor compiler for scheduling instructions to reduce execution delay due to dependencies

Patent number: 12223436

Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.

Type: Grant

Filed: January 5, 2024

Date of Patent: February 11, 2025

Assignee: GROQ, INC.

Inventors: Jonathan Alexander Ross, Gregory M. Thorson
Compiler operations for tensor streaming processor

Patent number: 12222894

Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.

Type: Grant

Filed: July 13, 2023

Date of Patent: February 11, 2025

Assignee: GROQ, INC.

Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
SCALE COMPUTING IN DETERMINISTIC CLOUD ENVIRONMENTS

Publication number: 20240370302

Abstract: Embodiments are directed to a deterministic streaming system with a scheduler, a compiler, and a plurality of deterministic streaming processors. The scheduler evaluates a latency for each task of a plurality of tasks to be run at the deterministic streaming system, and adjusts at least one of an accuracy metric and a quality metric for an output of each task based on the evaluated latency until the plurality of tasks can be completed before expiration of contractual deadlines. At least a subset of the plurality of deterministic streaming processors ruins the plurality of tasks each having the output with the adjusted accuracy metric and/or the adjusted quality metric. The compiler performs partial compilation of at least one model into an intermediate representation before requiring more information from the scheduler on how to finish the compilation. The scheduler generates the information for the compiler during a static capacity planning process.

Type: Application

Filed: August 29, 2022

Publication date: November 7, 2024

Inventors: Evan Daniel Patrick, Jonathan Alexander Ross
PROCESSOR ARCHITECTURE

Publication number: 20240176737

Abstract: A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.

Type: Application

Filed: December 22, 2023

Publication date: May 30, 2024

Inventors: Jonathan Alexander Ross, Dennis Charles Abts, John Thompson, Gregory M. Thorson
PROCESSOR COMPILER FOR SCHEDULING INSTRUCTIONS TO REDUCE EXECUTION DELAY DUE TO DEPENDENCIES

Publication number: 20240144044

Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.

Type: Application

Filed: January 5, 2024

Publication date: May 2, 2024

Inventors: Jonathan Alexander Ross, Gregory M. Thorson
EXPANDED KERNEL GENERATION

Publication number: 20240126832

Abstract: A method comprises receiving a kernel used to convolve with an input tensor. For a first dimension of the kernel, a square block of values for each single dimensional vector of the kernel that includes all rotations of that single dimensional vector is generated. For each additional dimension of the kernel, group blocks of an immediately preceding dimension into sets of blocks, each set of blocks including blocks of the immediately preceding dimension that are aligned along a vector that is parallel to the axis of the dimension; and generate, for the additional dimension, one or more blocks of values, each block including all rotations of blocks within each of the sets of blocks of the immediately preceding dimension. The block of values corresponding to the last dimension in the additional dimensions of the kernel is output as the expanded kernel.

Type: Application

Filed: October 6, 2023

Publication date: April 18, 2024

Inventors: Jonathan Alexander Ross, Thomas Hawkins, Gregory Michael Thorson, Matt Boyd
MULTI-PASS COMPRESSION OF UNCOMPRESSED DATA

Publication number: 20240107066

Abstract: Introduced here is a technique to create small compressed image files while preserving data quality upon decompression. Upon receiving an uncompressed data, such as an image, a video, an audio, and/or a structured data, a machine learning model identifies an object in the uncompressed data such as a house, a dog, a text, a distinct audio signal, a unique data pattern, etc. The identified object is compressed using a compression treatment optimized for the identified object. The identified object, either before or after the compression, is removed from the uncompressed data. The uncompressed data with the identified object removed is compressed using a standard compression treatment.

Type: Application

Filed: November 27, 2023

Publication date: March 28, 2024

Inventor: Jonathan Alexander Ross
INSTRUCTION FORMAT AND INSTRUCTION SET ARCHITECTURE FOR TENSOR STREAMING PROCESSOR

Publication number: 20240037064

Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.

Type: Application

Filed: October 9, 2023

Publication date: February 1, 2024

Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
PROCESSOR ARCHITECTURE AND MODEL EXPLORATION SYSTEM FOR DEEP LEARNING

Publication number: 20240020536

Abstract: A processor architecture and model exploration system for deep learning is provided. A method of improving performance of a processor system and associated software includes selecting a set of performance parameter targets for a processor architecture having a set of functional units and an AI model. The method also includes evaluating performance of the processor architecture and the AI model and adjusting at least one of the functional units of the processor architecture to form a new processor architecture prior to iteratively evaluating the combination of the new processor architecture and the AI model. Further, the method includes repeating the evaluating step and the adjustment step until the performance evaluation of the processor architecture and AI model meets the set of performance parameter targets.

Type: Application

Filed: July 14, 2023

Publication date: January 18, 2024

Inventors: Andrew Chaang Ling, Jonathan Alexander Ross, Andrew Esper Bitar, Aidan Robert Byron Wood, Baorui Zhou
METHODOLOGY TO GENERATE EFFICIENT MODELS AND ARCHITECTURES FOR DEEP LEARNING

Publication number: 20240020537

Abstract: A system and method of generating an efficient neural network model architecture and an efficient processor for deep learning in an artificial intelligence (AI) processor are provided. The system and method to create the processor architecture as a companion to the neural network model by composing a plurality of processor architectures to enable architectural exploration. The compilation can be implemented for any arbitrary spatial processor architecture using either ASIC or FPGA devices. The processor architecture can be uniquely defined for a selected ML or AI model without having to update the software compiler.

Type: Application

Filed: July 14, 2023

Publication date: January 18, 2024

Inventors: Andrew Chaang Ling, Aidan Robert Byron Wood, Baorui Zhou, Andrew Esper Bitar, Jonathan Alexander Ross
Data structures with multiple read ports

Patent number: 11875874

Abstract: A memory structure having 2m read ports allowing for concurrent access to n data entries can be constructed using three memory structures each having 2m-1 read ports. The three memory structures include two structures providing access to half of the n data entries, and a difference structure providing access to difference data between the halves of the n data entries. Each pair of the 2m ports is connected to a respective port of each of the 2m-1-port data structures, such that each port of the part can access data entries of a first half of the n data entries either by accessing the structure storing that half directly, or by accessing both the difference structure and the structure containing the second half to reconstruct the data entries of the first half, thus allowing for a pair of ports to concurrently access any of the stored data entries in parallel.

Type: Grant

Filed: August 9, 2021

Date of Patent: January 16, 2024

Assignee: Groq, Inc.

Inventors: Jonathan Alexander Ross, Gregory M. Thorson
Memory design for a processor

Patent number: 11868250

Abstract: A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.

Type: Grant

Filed: January 24, 2022

Date of Patent: January 9, 2024

Assignee: Groq, Inc.

Inventors: Jonathan Alexander Ross, Dennis Charles Abts, John Thompson, Gregory M. Thorson
Processor compiler for scheduling instructions to reduce execution delay due to dependencies

Patent number: 11868908

Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.

Type: Grant

Filed: December 16, 2022

Date of Patent: January 9, 2024

Assignee: Groq, Inc.

Inventors: Jonathan Alexander Ross, Gregory M. Thorson
Multi-pass compression of uncompressed data

Patent number: 11856226

Abstract: Introduced here is a technique to create small compressed image files while preserving data quality upon decompression. Upon receiving an uncompressed data, such as an image, a video, an audio, and/or a structured data, a machine learning model identifies an object in the uncompressed data such as a house, a dog, a text, a distinct audio signal, a unique data pattern, etc. The identified object is compressed using a compression treatment optimized for the identified object. The identified object, either before or after the compression, is removed from the uncompressed data. The uncompressed data with the identified object removed is compressed using a standard compression treatment.

Type: Grant

Filed: March 8, 2021

Date of Patent: December 26, 2023

Assignee: Groq, Inc.

Inventor: Jonathan Alexander Ross
Instruction format and instruction set architecture for tensor streaming processor

Patent number: 11822510

Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.

Type: Grant

Filed: March 1, 2022

Date of Patent: November 21, 2023

Assignee: Groq, Inc.

Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
COMPILER OPERATIONS FOR TENSOR STREAMING PROCESSOR

Publication number: 20230359584

Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.

Type: Application

Filed: July 13, 2023

Publication date: November 9, 2023

Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
Expanded kernel generation

Patent number: 11809514

Abstract: A method comprises receiving a kernel used to convolve with an input tensor. For a first dimension of the kernel, a square block of values for each single dimensional vector of the kernel that includes all rotations of that single dimensional vector is generated. For each additional dimension of the kernel, group blocks of an immediately preceding dimension into sets of blocks, each set of blocks including blocks of the immediately preceding dimension that are aligned along a vector that is parallel to the axis of the dimension; and generate, for the additional dimension, one or more blocks of values, each block including all rotations of blocks within each of the sets of blocks of the immediately preceding dimension. The block of values corresponding to the last dimension in the additional dimensions of the kernel is output as the expanded kernel.

Type: Grant

Filed: November 4, 2021

Date of Patent: November 7, 2023

Assignee: Groq, Inc.

Inventors: Jonathan Alexander Ross, Thomas Hawkins, Gregory Michael Thorson, Matt Boyd
FRIENDLY CUCKOO HASHING SCHEME FOR ACCELERATOR CLUSTER LOAD BALANCING

Publication number: 20230333900

Abstract: Improved placement of workload requests in a hosted compute resource uses a ‘friendly’ cuckoo hash algorithm to assign each workload request to an appropriately configured compute resource. When a first workload request is received, the workload is assigned to the compute resource module that has been pre-configured to execute that workload. Subsequent requests for a similar workload are either assigned to a second pre-configured compute resource or queued behind the first workload request.

Type: Application

Filed: April 13, 2023

Publication date: October 19, 2023

Inventor: Jonathan Alexander Ross
Compiler operations for tensor streaming processor

Patent number: 11645226

Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.

Type: Grant

Filed: March 17, 2022

Date of Patent: May 9, 2023

Assignee: Groq, Inc.

Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson

1 2 3 next