Patents by Inventor Jonathan Alexander Ross
Jonathan Alexander Ross has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240370302Abstract: Embodiments are directed to a deterministic streaming system with a scheduler, a compiler, and a plurality of deterministic streaming processors. The scheduler evaluates a latency for each task of a plurality of tasks to be run at the deterministic streaming system, and adjusts at least one of an accuracy metric and a quality metric for an output of each task based on the evaluated latency until the plurality of tasks can be completed before expiration of contractual deadlines. At least a subset of the plurality of deterministic streaming processors ruins the plurality of tasks each having the output with the adjusted accuracy metric and/or the adjusted quality metric. The compiler performs partial compilation of at least one model into an intermediate representation before requiring more information from the scheduler on how to finish the compilation. The scheduler generates the information for the compiler during a static capacity planning process.Type: ApplicationFiled: August 29, 2022Publication date: November 7, 2024Inventors: Evan Daniel Patrick, Jonathan Alexander Ross
-
Publication number: 20240176737Abstract: A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.Type: ApplicationFiled: December 22, 2023Publication date: May 30, 2024Inventors: Jonathan Alexander Ross, Dennis Charles Abts, John Thompson, Gregory M. Thorson
-
Publication number: 20240144044Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.Type: ApplicationFiled: January 5, 2024Publication date: May 2, 2024Inventors: Jonathan Alexander Ross, Gregory M. Thorson
-
Publication number: 20240126832Abstract: A method comprises receiving a kernel used to convolve with an input tensor. For a first dimension of the kernel, a square block of values for each single dimensional vector of the kernel that includes all rotations of that single dimensional vector is generated. For each additional dimension of the kernel, group blocks of an immediately preceding dimension into sets of blocks, each set of blocks including blocks of the immediately preceding dimension that are aligned along a vector that is parallel to the axis of the dimension; and generate, for the additional dimension, one or more blocks of values, each block including all rotations of blocks within each of the sets of blocks of the immediately preceding dimension. The block of values corresponding to the last dimension in the additional dimensions of the kernel is output as the expanded kernel.Type: ApplicationFiled: October 6, 2023Publication date: April 18, 2024Inventors: Jonathan Alexander Ross, Thomas Hawkins, Gregory Michael Thorson, Matt Boyd
-
Publication number: 20240107066Abstract: Introduced here is a technique to create small compressed image files while preserving data quality upon decompression. Upon receiving an uncompressed data, such as an image, a video, an audio, and/or a structured data, a machine learning model identifies an object in the uncompressed data such as a house, a dog, a text, a distinct audio signal, a unique data pattern, etc. The identified object is compressed using a compression treatment optimized for the identified object. The identified object, either before or after the compression, is removed from the uncompressed data. The uncompressed data with the identified object removed is compressed using a standard compression treatment.Type: ApplicationFiled: November 27, 2023Publication date: March 28, 2024Inventor: Jonathan Alexander Ross
-
Publication number: 20240037064Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.Type: ApplicationFiled: October 9, 2023Publication date: February 1, 2024Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
-
Publication number: 20240020536Abstract: A processor architecture and model exploration system for deep learning is provided. A method of improving performance of a processor system and associated software includes selecting a set of performance parameter targets for a processor architecture having a set of functional units and an AI model. The method also includes evaluating performance of the processor architecture and the AI model and adjusting at least one of the functional units of the processor architecture to form a new processor architecture prior to iteratively evaluating the combination of the new processor architecture and the AI model. Further, the method includes repeating the evaluating step and the adjustment step until the performance evaluation of the processor architecture and AI model meets the set of performance parameter targets.Type: ApplicationFiled: July 14, 2023Publication date: January 18, 2024Inventors: Andrew Chaang Ling, Jonathan Alexander Ross, Andrew Esper Bitar, Aidan Robert Byron Wood, Baorui Zhou
-
Publication number: 20240020537Abstract: A system and method of generating an efficient neural network model architecture and an efficient processor for deep learning in an artificial intelligence (AI) processor are provided. The system and method to create the processor architecture as a companion to the neural network model by composing a plurality of processor architectures to enable architectural exploration. The compilation can be implemented for any arbitrary spatial processor architecture using either ASIC or FPGA devices. The processor architecture can be uniquely defined for a selected ML or AI model without having to update the software compiler.Type: ApplicationFiled: July 14, 2023Publication date: January 18, 2024Inventors: Andrew Chaang Ling, Aidan Robert Byron Wood, Baorui Zhou, Andrew Esper Bitar, Jonathan Alexander Ross
-
Patent number: 11875874Abstract: A memory structure having 2m read ports allowing for concurrent access to n data entries can be constructed using three memory structures each having 2m-1 read ports. The three memory structures include two structures providing access to half of the n data entries, and a difference structure providing access to difference data between the halves of the n data entries. Each pair of the 2m ports is connected to a respective port of each of the 2m-1-port data structures, such that each port of the part can access data entries of a first half of the n data entries either by accessing the structure storing that half directly, or by accessing both the difference structure and the structure containing the second half to reconstruct the data entries of the first half, thus allowing for a pair of ports to concurrently access any of the stored data entries in parallel.Type: GrantFiled: August 9, 2021Date of Patent: January 16, 2024Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Gregory M. Thorson
-
Patent number: 11868908Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.Type: GrantFiled: December 16, 2022Date of Patent: January 9, 2024Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Gregory M. Thorson
-
Patent number: 11868250Abstract: A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.Type: GrantFiled: January 24, 2022Date of Patent: January 9, 2024Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Dennis Charles Abts, John Thompson, Gregory M. Thorson
-
Patent number: 11856226Abstract: Introduced here is a technique to create small compressed image files while preserving data quality upon decompression. Upon receiving an uncompressed data, such as an image, a video, an audio, and/or a structured data, a machine learning model identifies an object in the uncompressed data such as a house, a dog, a text, a distinct audio signal, a unique data pattern, etc. The identified object is compressed using a compression treatment optimized for the identified object. The identified object, either before or after the compression, is removed from the uncompressed data. The uncompressed data with the identified object removed is compressed using a standard compression treatment.Type: GrantFiled: March 8, 2021Date of Patent: December 26, 2023Assignee: Groq, Inc.Inventor: Jonathan Alexander Ross
-
Patent number: 11822510Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.Type: GrantFiled: March 1, 2022Date of Patent: November 21, 2023Assignee: Groq, Inc.Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
-
Publication number: 20230359584Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.Type: ApplicationFiled: July 13, 2023Publication date: November 9, 2023Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
-
Patent number: 11809514Abstract: A method comprises receiving a kernel used to convolve with an input tensor. For a first dimension of the kernel, a square block of values for each single dimensional vector of the kernel that includes all rotations of that single dimensional vector is generated. For each additional dimension of the kernel, group blocks of an immediately preceding dimension into sets of blocks, each set of blocks including blocks of the immediately preceding dimension that are aligned along a vector that is parallel to the axis of the dimension; and generate, for the additional dimension, one or more blocks of values, each block including all rotations of blocks within each of the sets of blocks of the immediately preceding dimension. The block of values corresponding to the last dimension in the additional dimensions of the kernel is output as the expanded kernel.Type: GrantFiled: November 4, 2021Date of Patent: November 7, 2023Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Thomas Hawkins, Gregory Michael Thorson, Matt Boyd
-
Publication number: 20230333900Abstract: Improved placement of workload requests in a hosted compute resource uses a ‘friendly’ cuckoo hash algorithm to assign each workload request to an appropriately configured compute resource. When a first workload request is received, the workload is assigned to the compute resource module that has been pre-configured to execute that workload. Subsequent requests for a similar workload are either assigned to a second pre-configured compute resource or queued behind the first workload request.Type: ApplicationFiled: April 13, 2023Publication date: October 19, 2023Inventor: Jonathan Alexander Ross
-
Patent number: 11645226Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.Type: GrantFiled: March 17, 2022Date of Patent: May 9, 2023Assignee: Groq, Inc.Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
-
Publication number: 20230121986Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.Type: ApplicationFiled: December 16, 2022Publication date: April 20, 2023Inventors: Jonathan Alexander Ross, Gregory M. Thorson
-
Patent number: 11625619Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.Type: GrantFiled: November 22, 2021Date of Patent: April 11, 2023Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Gregory Michael Thorson
-
Patent number: 11625618Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.Type: GrantFiled: November 17, 2021Date of Patent: April 11, 2023Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Gregory M. Thorson