Patents by Inventor Gregory M. Thorson

Gregory M. Thorson has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240144044
    Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.
    Type: Application
    Filed: January 5, 2024
    Publication date: May 2, 2024
    Inventors: Jonathan Alexander Ross, Gregory M. Thorson
  • Patent number: 11875874
    Abstract: A memory structure having 2m read ports allowing for concurrent access to n data entries can be constructed using three memory structures each having 2m-1 read ports. The three memory structures include two structures providing access to half of the n data entries, and a difference structure providing access to difference data between the halves of the n data entries. Each pair of the 2m ports is connected to a respective port of each of the 2m-1-port data structures, such that each port of the part can access data entries of a first half of the n data entries either by accessing the structure storing that half directly, or by accessing both the difference structure and the structure containing the second half to reconstruct the data entries of the first half, thus allowing for a pair of ports to concurrently access any of the stored data entries in parallel.
    Type: Grant
    Filed: August 9, 2021
    Date of Patent: January 16, 2024
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Gregory M. Thorson
  • Patent number: 11868250
    Abstract: A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.
    Type: Grant
    Filed: January 24, 2022
    Date of Patent: January 9, 2024
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Dennis Charles Abts, John Thompson, Gregory M. Thorson
  • Patent number: 11868908
    Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.
    Type: Grant
    Filed: December 16, 2022
    Date of Patent: January 9, 2024
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Gregory M. Thorson
  • Publication number: 20230121986
    Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.
    Type: Application
    Filed: December 16, 2022
    Publication date: April 20, 2023
    Inventors: Jonathan Alexander Ross, Gregory M. Thorson
  • Patent number: 11625618
    Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.
    Type: Grant
    Filed: November 17, 2021
    Date of Patent: April 11, 2023
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Gregory M. Thorson
  • Patent number: 11568275
    Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.
    Type: Grant
    Filed: July 30, 2019
    Date of Patent: January 31, 2023
    Assignee: GROQ, INC.
    Inventors: Jonathan Alexander Ross, Gregory M. Thorson
  • Patent number: 11463272
    Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.
    Type: Grant
    Filed: October 6, 2021
    Date of Patent: October 4, 2022
    Assignee: NVIDIA Corporation
    Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
  • Patent number: 11336476
    Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.
    Type: Grant
    Filed: July 24, 2020
    Date of Patent: May 17, 2022
    Assignee: NVIDIA Corporation
    Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
  • Publication number: 20220101896
    Abstract: A memory structure having 2m read ports allowing for concurrent access to n data entries can be constructed using three memory structures each having 2? read ports. The three memory structures include two structures providing access to half of the n data entries, and a difference structure providing access to difference data between the halves of the n data entries. Each pair of the 2m ports is connected to a respective port of each of the 2m-1-port data structures, such that each port of the part can access data entries of a first half of the n data entries either by accessing the structure storing that half directly, or by accessing both the difference structure and the structure containing the second half to reconstruct the data entries of the first half, thus allowing for a pair of ports to concurrently access any of the stored data entries in parallel.
    Type: Application
    Filed: August 9, 2021
    Publication date: March 31, 2022
    Inventors: Jonathan Alexander Ross, Gregory M. Thorson
  • Patent number: 11263129
    Abstract: A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.
    Type: Grant
    Filed: July 30, 2019
    Date of Patent: March 1, 2022
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Dennis Charles Abts, John Thompson, Gregory M. Thorson
  • Patent number: 11243880
    Abstract: A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.
    Type: Grant
    Filed: September 14, 2018
    Date of Patent: February 8, 2022
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Dennis Charles Abts, John Thompson, Gregory M. Thorson
  • Publication number: 20220029845
    Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.
    Type: Application
    Filed: October 6, 2021
    Publication date: January 27, 2022
    Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
  • Patent number: 11216734
    Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.
    Type: Grant
    Filed: July 30, 2019
    Date of Patent: January 4, 2022
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Gregory M. Thorson
  • Patent number: 11210594
    Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.
    Type: Grant
    Filed: July 30, 2019
    Date of Patent: December 28, 2021
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Gregory M. Thorson
  • Patent number: 11170307
    Abstract: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.
    Type: Grant
    Filed: September 14, 2018
    Date of Patent: November 9, 2021
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Gregory M. Thorson
  • Patent number: 11171798
    Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.
    Type: Grant
    Filed: July 24, 2020
    Date of Patent: November 9, 2021
    Assignee: NVIDIA Corporation
    Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
  • Patent number: 11114138
    Abstract: A memory structure having 2m read ports allowing for concurrent access to n data entries can be constructed using three memory structures each having 2m?1 read ports. The three memory structures include two structures providing access to half of the n data entries, and a difference structure providing access to difference data between the halves of the n data entries. Each pair of the 2m ports is connected to a respective port of each of the 2m?1-port data structures, such that each port of the part can access data entries of a first half of the n data entries either by accessing the structure storing that half directly, or by accessing both the difference structure and the structure containing the second half to reconstruct the data entries of the first half, thus allowing for a pair of ports to concurrently access any of the stored data entries in parallel.
    Type: Grant
    Filed: September 14, 2018
    Date of Patent: September 7, 2021
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Gregory M. Thorson
  • Publication number: 20210036877
    Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.
    Type: Application
    Filed: July 24, 2020
    Publication date: February 4, 2021
    Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
  • Publication number: 20210037107
    Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.
    Type: Application
    Filed: July 24, 2020
    Publication date: February 4, 2021
    Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson