Patents by Inventor Martin-Thomas Grymel

Martin-Thomas Grymel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods, apparatus, and articles of manufacture to increase data reuse for multiply and accumulate (MAC) operations

Patent number: 12169643

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.

Type: Grant

Filed: September 12, 2023

Date of Patent: December 17, 2024

Assignee: Intel Corporation

Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
METHODS AND APPARATUS FOR SPARSE TENSOR STORAGE FOR NEURAL NETWORK ACCELERATORS

Publication number: 20240134786

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.

Type: Application

Filed: December 14, 2023

Publication date: April 25, 2024

Applicant: Intel Corporation

Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
SYSTEMS, APPARATUS, AND METHODS TO DEBUG ACCELERATOR HARDWARE

Publication number: 20240118992

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to debug a hardware accelerator such as a neural network accelerator for executing Artificial Intelligence computational workloads. An example apparatus includes a core with a core input and a core output to execute executable code based on a machine-learning model to generate a data output based on a data input, and debug circuitry coupled to the core. The debug circuitry is configured to detect a breakpoint associated with the machine-learning model, compile executable code based on at least one of the machine-learning model or the breakpoint. In response to the triggering of the breakpoint, the debug circuitry is to stop the execution of the executable code and output data such as the data input, data output and the breakpoint for debugging the hardware accelerator.

Type: Application

Filed: October 16, 2023

Publication date: April 11, 2024

Applicant: Intel Corporation

Inventors: Martin-Thomas Grymel, David Bernard, Martin Power, Niall Hanrahan, Kevin Brady
Methods and apparatus for sparse tensor storage for neural network accelerators

Patent number: 11940907

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.

Type: Grant

Filed: June 25, 2021

Date of Patent: March 26, 2024

Assignee: INTEL CORPORATION

Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
METHODS, APPARATUS, AND ARTICLES OF MANUFACTURE TO INCREASE DATA REUSE FOR MULTIPLY AND ACCUMULATE (MAC) OPERATIONS

Publication number: 20240036763

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.

Type: Application

Filed: September 12, 2023

Publication date: February 1, 2024

Applicant: Intel Corporation

Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
Systems, apparatus, and methods to debug accelerator hardware

Patent number: 11829279

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to debug a hardware accelerator such as a neural network accelerator for executing Artificial Intelligence computational workloads. An example apparatus includes a core with a core input and a core output to execute executable code based on a machine-learning model to generate a data output based on a data input, and debug circuitry coupled to the core. The debug circuitry is configured to detect a breakpoint associated with the machine-learning model, compile executable code based on at least one of the machine-learning model or the breakpoint. In response to the triggering of the breakpoint, the debug circuitry is to stop the execution of the executable code and output data such as the data input, data output and the breakpoint for debugging the hardware accelerator.

Type: Grant

Filed: September 23, 2021

Date of Patent: November 28, 2023

Assignee: Intel Corporation

Inventors: Martin-Thomas Grymel, David Bernard, Martin Power, Niall Hanrahan, Kevin Brady
Methods, apparatus, and articles of manufacture to increase data reuse for multiply and accumulate (MAC) operations

Patent number: 11789646

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.

Type: Grant

Filed: September 24, 2021

Date of Patent: October 17, 2023

Assignee: INTEL CORPORATION

Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
HALO TRANSFER FOR CONVOLUTION WORKLOAD PARTITION

Publication number: 20230116629

Abstract: A DNN accelerator includes multiple compute tiles for sharing a workload of running a convolution. A halo pipeline in a compute tile can facilitate replications of halo data from the compute tile where the halo data is generated into another compute tile. The halo pipeline may receive a memory transaction for writing a data block. The halo pipeline may determine that the data block falls into a halo region in an input tensor of the convolution. The halo pipeline may generate a remote address for storing the data block in a memory of the other compute tile, e.g., based on a local address of the data block in a memory of the compute tile. The halo pipeline may adjust the remote address, e.g., based on a difference in dimensions of a tensor to be used by the compute tile and a tensor to be used by the other compute tile.

Type: Application

Filed: October 13, 2022

Publication date: April 13, 2023

Applicant: Intel Corporation

Inventors: Martin-Thomas Grymel, David Thomas Bernard, Niall Hanrahan
DEEP NEURAL NETWORK (DNN) ACCELERATOR FACILITATING ACTIVATION COMPRESSION

Publication number: 20230072082

Abstract: A system includes a first memory, a compiler, and a DNN accelerator. The DNN accelerator includes a DMA engine, an acceleration module, and a compute block. The compute block includes a second memory. The compiler may generate a task for transferring activations from the second memory to the first memory. The DMA engine may receive the task and read the activations from the second memory. The acceleration module may compress the activations to generate compressed activation data and write the compressed activation data into the external memory. The acceleration module may also store a size of the compressed activation data in the local memory, which may be used by the DMA engine to read the activation from the first memory to the second memory later. The compressed activation data may include non-zero activations and sparsity bitmaps. The compressed activation data may also include a header or zeropoint marker.

Type: Application

Filed: October 28, 2022

Publication date: March 9, 2023

Inventors: Sudheendra Kadri, Andrea Deidda, Hassan Kamal, Martin-Thomas Grymel, Alfonso Tarazona Martinez, David Thomas Bernard
SPARSITY PROCESSING ON UNPACKED DATA

Publication number: 20230018857

Abstract: Sparsity processing within a compute block can be done on unpacked data. The compute block includes a sparsity decoder that generates a combined sparsity vector from an activation sparsity vector and a weight sparsity vector. The activation sparsity vector indicates positions of non-zero valued activations in an activation context. The weight sparsity vector indicates positions of non-zero valued weights in a weight context. The combined sparsity vector comprises one or more zero valued bits and one or more non-zero valued bits. The sparsity decoder may determine the position of a non-zero valued bit in the combined sparsity vector and determine an address for the non-zero valued activation and the non-zero valued weight based on the position of the non-zero valued bit. The non-zero valued activation and the non-zero valued weight may be provided to a PE for performing MAC operations.

Type: Application

Filed: September 19, 2022

Publication date: January 19, 2023

Inventors: Martin Power, Conor Byrne, Niall Hanrahan, Deepak Abraham Mathaikutty, Arnab Raha, Raymond Jit-Hung Sung, David Thomas Bernard, Kevin Brady, Martin-Thomas Grymel
WRITE COMBINE BUFFER (WCB) FOR DEEP NEURAL NETWORK (DNN) ACCELERATOR

Publication number: 20230020929

Abstract: A compute tile includes a WCB that receives a workload of writing an output tensor of a convolution into a local memory of the compute tile. The local memory may be a SRAM. The WCB receives write transactions. A write transaction includes a data block, which is a part of the output tensor, and metadata describing one or more attributes of the data block. The WCB may store write transactions in its internal buffers. The WCB may determine whether to combine two write transactions, e.g., based on an operation mode or metadata in the write transactions. In embodiments where the WCB determines to combine the two write transactions, the WCB may combine the two write transactions into a new write transaction and write the new write transaction into the local memory or an internal memory of the WCB. The total number of write transactions for the workload can be reduced.

Type: Application

Filed: September 16, 2022

Publication date: January 19, 2023

Inventors: Martin-Thomas Grymel, David Thomas Bernard, Martin Power, Niall Hanrahan, Kevin Brady
METHODS AND APPARATUS FOR PERFORMING A MACHINE LEARNING OPERATION USING STORAGE ELEMENT POINTERS

Publication number: 20220108135

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for performing a machine learning operation using storage element pointers. An example computer readable medium comprises instructions that when executed, cause at least one processor to select, in response to a determination that a machine learning operation is to be performed, create first and second storage element pointers based on a type of machine learning operation to be performed, remap input tensor data of the input tensor based on the first storage element pointer without movement of the input tensor data in memory, cause execution of the machine learning operation with the remapped input tensor data to create intermediate tensor data, remap the intermediate tensor data based on the second storage element pointer without movement of the intermediate tensor data in memory, and provide the remapped intermediate tensor data as an output tensor.

Type: Application

Filed: December 17, 2021

Publication date: April 7, 2022

Inventors: Kevin Brady, Martin Power, Martin-Thomas Grymel, Alessandro Palla, David Bernard, Niall Hanrahan
SYSTEMS, APPARATUS, AND METHODS TO DEBUG ACCELERATOR HARDWARE

Publication number: 20220012164

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to debug a hardware accelerator such as a neural network accelerator for executing Artificial Intelligence computational workloads. An example apparatus includes a core with a core input and a core output to execute executable code based on a machine-learning model to generate a data output based on a data input, and debug circuitry coupled to the core. The debug circuitry is configured to detect a breakpoint associated with the machine-learning model, compile executable code based on at least one of the machine-learning model or the breakpoint. In response to the triggering of the breakpoint, the debug circuitry is to stop the execution of the executable code and output data such as the data input, data output and the breakpoint for debugging the hardware accelerator.

Type: Application

Filed: September 23, 2021

Publication date: January 13, 2022

Inventors: Martin-Thomas Grymel, David Bernard, Martin Power, Niall Hanrahan, Kevin Brady
METHODS, APPARATUS, AND ARTICLES OF MANUFACTURE TO INCREASE DATA REUSE FOR MULTIPLY AND ACCUMULATE (MAC) OPERATIONS

Publication number: 20220012058

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.

Type: Application

Filed: September 24, 2021

Publication date: January 13, 2022

Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
METHODS, APPARATUS, AND ARTICLES OF MANUFACTURE TO INCREASE UTILIZATION OF NEURAL NETWORK (NN) ACCELERATOR CIRCUITRY FOR SHALLOW LAYERS OF AN NN BY REFORMATTING ONE OR MORE TENSORS

Publication number: 20220012578

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase utilization of neural network (NN) accelerator circuitry for shallow layers of an NN by reformatting one or more tensors. An example apparatus includes parameter determining circuitry to determine a width of a weight kernel and to determine a depth of a first tensor. The example apparatus also includes storage control circuitry to, starting at a first XY location of the first tensor, copy one or more Z values, up to the depth of the first tensor, of consecutive XY locations that overlap the width of the weight kernel and to load the one or more Z values consecutively in a first XY location of a second tensor.

Type: Application

Filed: September 24, 2021

Publication date: January 13, 2022

Inventors: Kevin Brady, Martin Power, Niall Hanrahan, Alessandro Palla, Martin-Thomas Grymel, David Bernard
METHODS AND APPARATUS FOR SPARSE TENSOR STORAGE FOR NEURAL NETWORK ACCELERATORS

Publication number: 20210406164

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.

Type: Application

Filed: June 25, 2021

Publication date: December 30, 2021

Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
METHODS AND APPARATUS TO PERFORM MACHINE-LEARNING MODEL OPERATIONS ON SPARSE ACCELERATORS

Publication number: 20210319317

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to perform machine-learning model operations on sparse accelerators. An example apparatus includes first circuitry, second circuitry to generate sparsity data based on an acceleration operation, and third circuitry to instruct one or more data buffers to provide at least one of activation data or weight data based on the sparsity data to the first circuitry, the first circuitry to execute the acceleration operation based on the at least one of the activation data or the weight data.

Type: Application

Filed: June 24, 2021

Publication date: October 14, 2021

Inventors: Martin Power, Kevin Brady, Niall Hanrahan, Martin-Thomas Grymel, David Bernard, Gary Baugh