Patents by Inventor Gregory M. Thorson

Gregory M. Thorson has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

PROCESSOR COMPILER FOR SCHEDULING INSTRUCTIONS TO REDUCE EXECUTION DELAY DUE TO DEPENDENCIES

Publication number: 20240144044

Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.

Type: Application

Filed: January 5, 2024

Publication date: May 2, 2024

Inventors: Jonathan Alexander Ross, Gregory M. Thorson
Data structures with multiple read ports

Patent number: 11875874

Abstract: A memory structure having 2m read ports allowing for concurrent access to n data entries can be constructed using three memory structures each having 2m-1 read ports. The three memory structures include two structures providing access to half of the n data entries, and a difference structure providing access to difference data between the halves of the n data entries. Each pair of the 2m ports is connected to a respective port of each of the 2m-1-port data structures, such that each port of the part can access data entries of a first half of the n data entries either by accessing the structure storing that half directly, or by accessing both the difference structure and the structure containing the second half to reconstruct the data entries of the first half, thus allowing for a pair of ports to concurrently access any of the stored data entries in parallel.

Type: Grant

Filed: August 9, 2021

Date of Patent: January 16, 2024

Assignee: Groq, Inc.

Inventors: Jonathan Alexander Ross, Gregory M. Thorson
Memory design for a processor

Patent number: 11868250

Abstract: A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.

Type: Grant

Filed: January 24, 2022

Date of Patent: January 9, 2024

Assignee: Groq, Inc.

Inventors: Jonathan Alexander Ross, Dennis Charles Abts, John Thompson, Gregory M. Thorson
Processor compiler for scheduling instructions to reduce execution delay due to dependencies

Patent number: 11868908

Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.

Type: Grant

Filed: December 16, 2022

Date of Patent: January 9, 2024

Assignee: Groq, Inc.

Inventors: Jonathan Alexander Ross, Gregory M. Thorson
PROCESSOR COMPILER FOR SCHEDULING INSTRUCTIONS TO REDUCE EXECUTION DELAY DUE TO DEPENDENCIES

Publication number: 20230121986

Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.

Type: Application

Filed: December 16, 2022

Publication date: April 20, 2023

Inventors: Jonathan Alexander Ross, Gregory M. Thorson
Processor compiler

Patent number: 11625618

Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.

Type: Grant

Filed: November 17, 2021

Date of Patent: April 11, 2023

Assignee: Groq, Inc.

Inventors: Jonathan Alexander Ross, Gregory M. Thorson
Processor compiler

Patent number: 11568275

Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.

Type: Grant

Filed: July 30, 2019

Date of Patent: January 31, 2023

Assignee: GROQ, INC.

Inventors: Jonathan Alexander Ross, Gregory M. Thorson
Scalable in-network computation for massively-parallel shared-memory processors

Patent number: 11463272

Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.

Type: Grant

Filed: October 6, 2021

Date of Patent: October 4, 2022

Assignee: NVIDIA Corporation

Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
Scalable in-network computation for massively-parallel shared-memory processors

Patent number: 11336476

Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.

Type: Grant

Filed: July 24, 2020

Date of Patent: May 17, 2022

Assignee: NVIDIA Corporation

Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
DATA STRUCTURES WITH MULTIPLE READ PORTS

Publication number: 20220101896

Abstract: A memory structure having 2m read ports allowing for concurrent access to n data entries can be constructed using three memory structures each having 2? read ports. The three memory structures include two structures providing access to half of the n data entries, and a difference structure providing access to difference data between the halves of the n data entries. Each pair of the 2m ports is connected to a respective port of each of the 2m-1-port data structures, such that each port of the part can access data entries of a first half of the n data entries either by accessing the structure storing that half directly, or by accessing both the difference structure and the structure containing the second half to reconstruct the data entries of the first half, thus allowing for a pair of ports to concurrently access any of the stored data entries in parallel.

Type: Application

Filed: August 9, 2021

Publication date: March 31, 2022

Inventors: Jonathan Alexander Ross, Gregory M. Thorson
Processor architecture

Patent number: 11263129

Abstract: A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.

Type: Grant

Filed: July 30, 2019

Date of Patent: March 1, 2022

Assignee: Groq, Inc.

Inventors: Jonathan Alexander Ross, Dennis Charles Abts, John Thompson, Gregory M. Thorson
Processor architecture

Patent number: 11243880

Abstract: A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.

Type: Grant

Filed: September 14, 2018

Date of Patent: February 8, 2022

Assignee: Groq, Inc.

Inventors: Jonathan Alexander Ross, Dennis Charles Abts, John Thompson, Gregory M. Thorson
SCALABLE IN-NETWORK COMPUTATION FOR MASSIVELY-PARALLEL SHARED-MEMORY PROCESSORS

Publication number: 20220029845

Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.

Type: Application

Filed: October 6, 2021

Publication date: January 27, 2022

Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
Processor compiler

Patent number: 11216734

Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.

Type: Grant

Filed: July 30, 2019

Date of Patent: January 4, 2022

Assignee: Groq, Inc.

Inventors: Jonathan Alexander Ross, Gregory M. Thorson
Processor compiler

Patent number: 11210594

Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.

Type: Grant

Filed: July 30, 2019

Date of Patent: December 28, 2021

Assignee: Groq, Inc.

Inventors: Jonathan Alexander Ross, Gregory M. Thorson
Predictive model compiler for generating a statically scheduled binary with known resource constraints

Patent number: 11170307

Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.

Type: Grant

Filed: September 14, 2018

Date of Patent: November 9, 2021

Assignee: Groq, Inc.

Inventors: Jonathan Alexander Ross, Gregory M. Thorson
Scalable in-network computation for massively-parallel shared-memory processors

Patent number: 11171798

Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.

Type: Grant

Filed: July 24, 2020

Date of Patent: November 9, 2021

Assignee: NVIDIA Corporation

Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
Data structures with multiple read ports

Patent number: 11114138

Abstract: A memory structure having 2m read ports allowing for concurrent access to n data entries can be constructed using three memory structures each having 2m?1 read ports. The three memory structures include two structures providing access to half of the n data entries, and a difference structure providing access to difference data between the halves of the n data entries. Each pair of the 2m ports is connected to a respective port of each of the 2m?1-port data structures, such that each port of the part can access data entries of a first half of the n data entries either by accessing the structure storing that half directly, or by accessing both the difference structure and the structure containing the second half to reconstruct the data entries of the first half, thus allowing for a pair of ports to concurrently access any of the stored data entries in parallel.

Type: Grant

Filed: September 14, 2018

Date of Patent: September 7, 2021

Assignee: Groq, Inc.

Inventors: Jonathan Alexander Ross, Gregory M. Thorson
SCALABLE IN-NETWORK COMPUTATION FOR MASSIVELY-PARALLEL SHARED-MEMORY PROCESSORS

Publication number: 20210036877

Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.

Type: Application

Filed: July 24, 2020

Publication date: February 4, 2021

Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
SCALABLE IN-NETWORK COMPUTATION FOR MASSIVELY-PARALLEL SHARED-MEMORY PROCESSORS

Publication number: 20210037107

Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.

Type: Application

Filed: July 24, 2020

Publication date: February 4, 2021

Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson

1 2 3 next