Patents by Inventor Niall Hanrahan

Niall Hanrahan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240134786
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.
    Type: Application
    Filed: December 14, 2023
    Publication date: April 25, 2024
    Applicant: Intel Corporation
    Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
  • Publication number: 20240118992
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to debug a hardware accelerator such as a neural network accelerator for executing Artificial Intelligence computational workloads. An example apparatus includes a core with a core input and a core output to execute executable code based on a machine-learning model to generate a data output based on a data input, and debug circuitry coupled to the core. The debug circuitry is configured to detect a breakpoint associated with the machine-learning model, compile executable code based on at least one of the machine-learning model or the breakpoint. In response to the triggering of the breakpoint, the debug circuitry is to stop the execution of the executable code and output data such as the data input, data output and the breakpoint for debugging the hardware accelerator.
    Type: Application
    Filed: October 16, 2023
    Publication date: April 11, 2024
    Applicant: Intel Corporation
    Inventors: Martin-Thomas Grymel, David Bernard, Martin Power, Niall Hanrahan, Kevin Brady
  • Patent number: 11940907
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: March 26, 2024
    Assignee: INTEL CORPORATION
    Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
  • Publication number: 20240036763
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.
    Type: Application
    Filed: September 12, 2023
    Publication date: February 1, 2024
    Applicant: Intel Corporation
    Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
  • Publication number: 20240028895
    Abstract: A load module in a deep neural network (DNN) accelerator may receive a configuration parameter indicating a selection between an activation sparsity mode and a weight sparsity mode. The load module may read a sparse activation tensor, an activation sparsity bitmap, a sparse weight tensor, and a weight sparsity bitmap from a memory. The load module may densify one of the compressed tensors based on the sparsity mode and leave the other compressed tensor as is. The load module may load the dense tensor and the sparse tensor to a sparse cell. The sparse cell includes a sparsity module that may select one or more elements of the dense tensor based on the sparsity bitmap of the sparse tensor. The sparse cell also includes multiply-accumulate (MAC) units that perform MAC operation on the selected elements and the sparse tensor. MAC operations on unselected elements of the dense tensor are skipped.
    Type: Application
    Filed: September 28, 2023
    Publication date: January 25, 2024
    Inventors: Arnab Raha, Deepak Abraham Mathaikutty, Dinakar Kondru, Umer Iftikhar Cheema, Martin Power, Niall Hanrahan
  • Patent number: 11829279
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to debug a hardware accelerator such as a neural network accelerator for executing Artificial Intelligence computational workloads. An example apparatus includes a core with a core input and a core output to execute executable code based on a machine-learning model to generate a data output based on a data input, and debug circuitry coupled to the core. The debug circuitry is configured to detect a breakpoint associated with the machine-learning model, compile executable code based on at least one of the machine-learning model or the breakpoint. In response to the triggering of the breakpoint, the debug circuitry is to stop the execution of the executable code and output data such as the data input, data output and the breakpoint for debugging the hardware accelerator.
    Type: Grant
    Filed: September 23, 2021
    Date of Patent: November 28, 2023
    Assignee: Intel Corporation
    Inventors: Martin-Thomas Grymel, David Bernard, Martin Power, Niall Hanrahan, Kevin Brady
  • Patent number: 11789646
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.
    Type: Grant
    Filed: September 24, 2021
    Date of Patent: October 17, 2023
    Assignee: INTEL CORPORATION
    Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
  • Publication number: 20230229910
    Abstract: A compute block includes a DMA engine that reads data from an external memory and write the data into a local memory of the compute block. An MAC array in the compute block may use the data to perform convolutions. The external memory may store weights of one or more filters in a memory layout that comprises a sequence of sections for each filter. Each section may correspond to a channel of the filter and may store all the weights in the channel. The DMA engine may convert the memory layout to a different memory layout, which includes a sequence of new sections for each filter. Each new section may include a weight vector that includes a sequence of weights, each of which is from a different channel. The DMA engine may also compress the weights, e.g., by removing zero valued weights, before the conversion of the memory layout.
    Type: Application
    Filed: October 3, 2022
    Publication date: July 20, 2023
    Applicant: Intel Corporation
    Inventors: Kevin Brady, Sudheendra Kadri, Niall Hanrahan
  • Patent number: 11675629
    Abstract: Methods, apparatus, systems and articles of manufacture to store and access multi-dimensional data are disclosed. An example apparatus includes a memory; a memory allocator to allocate part of the memory for storage of a multi-dimensional data object; and a storage element organizer to: separate the multi-dimensional data into storage elements; store the storage elements in the memory, the stored storage elements being selectively executable; store starting memory address locations for the storage elements in an array in the memory, the array to facilitate selectable access of data of the stored elements; store a pointer for the array into the memory.
    Type: Grant
    Filed: July 12, 2021
    Date of Patent: June 13, 2023
    Assignee: Movidius Limited
    Inventors: Fergal Connor, David Bernard, Niall Hanrahan, Derek Harnett
  • Patent number: 11656845
    Abstract: Methods, apparatus, systems and articles of manufacture to perform dot product calculations using sparse vectors are disclosed. An example apparatus includes means for generating a mask vector based on a first logic operation on a difference vector and an inverse of a control vector, the control vector based on a first bitmap of a first sparse vector and a second bitmap of a second sparse vector; means for generating a first product of a third value from the first sparse vector and a fourth value from the second sparse vector, the third value based on (i) the mask vector and (ii) a second sparsity map based on the first sparse vector, the fourth value corresponding to (i) the mask vector and (ii) a second sparsity map corresponding to the second sparse vector; and means for adding the first product to a second product of a previous iteration.
    Type: Grant
    Filed: April 28, 2021
    Date of Patent: May 23, 2023
    Assignee: Movidius Limited
    Inventors: Fergal Connor, David Bernard, Niall Hanrahan
  • Publication number: 20230116629
    Abstract: A DNN accelerator includes multiple compute tiles for sharing a workload of running a convolution. A halo pipeline in a compute tile can facilitate replications of halo data from the compute tile where the halo data is generated into another compute tile. The halo pipeline may receive a memory transaction for writing a data block. The halo pipeline may determine that the data block falls into a halo region in an input tensor of the convolution. The halo pipeline may generate a remote address for storing the data block in a memory of the other compute tile, e.g., based on a local address of the data block in a memory of the compute tile. The halo pipeline may adjust the remote address, e.g., based on a difference in dimensions of a tensor to be used by the compute tile and a tensor to be used by the other compute tile.
    Type: Application
    Filed: October 13, 2022
    Publication date: April 13, 2023
    Applicant: Intel Corporation
    Inventors: Martin-Thomas Grymel, David Thomas Bernard, Niall Hanrahan
  • Publication number: 20230093989
    Abstract: A method for obtaining chemical and/or material specific information of a sample based on scattered light. The method comprises receiving detection data comprising at least two images. Each image is indicative of the intensity of scattered light i) for incident light of a different wavelength, or ii) for incident light of a different polarization state, or iii) of a different polarization state. The scattered light comprises an elastic scattering component that is due to Rayleigh scattering of the incident light in at least a portion of the sample. Alternatively, each image is indicative of the intensity of scattered light i) of a different wavelength, or ii) for incident light of a different polarization state, or iii) of a different polarization state, wherein the scattered light comprises an inelastic scattering component that is due to Raman scattering of the incident light in at least a portion of the sample.
    Type: Application
    Filed: February 16, 2021
    Publication date: March 30, 2023
    Applicant: University of Southampton
    Inventors: Sumeet MAHAJAN, Niall HANRAHAN, Konstaninos BOURDAKOS, Simon Lane
  • Publication number: 20230020929
    Abstract: A compute tile includes a WCB that receives a workload of writing an output tensor of a convolution into a local memory of the compute tile. The local memory may be a SRAM. The WCB receives write transactions. A write transaction includes a data block, which is a part of the output tensor, and metadata describing one or more attributes of the data block. The WCB may store write transactions in its internal buffers. The WCB may determine whether to combine two write transactions, e.g., based on an operation mode or metadata in the write transactions. In embodiments where the WCB determines to combine the two write transactions, the WCB may combine the two write transactions into a new write transaction and write the new write transaction into the local memory or an internal memory of the WCB. The total number of write transactions for the workload can be reduced.
    Type: Application
    Filed: September 16, 2022
    Publication date: January 19, 2023
    Inventors: Martin-Thomas Grymel, David Thomas Bernard, Martin Power, Niall Hanrahan, Kevin Brady
  • Publication number: 20230018857
    Abstract: Sparsity processing within a compute block can be done on unpacked data. The compute block includes a sparsity decoder that generates a combined sparsity vector from an activation sparsity vector and a weight sparsity vector. The activation sparsity vector indicates positions of non-zero valued activations in an activation context. The weight sparsity vector indicates positions of non-zero valued weights in a weight context. The combined sparsity vector comprises one or more zero valued bits and one or more non-zero valued bits. The sparsity decoder may determine the position of a non-zero valued bit in the combined sparsity vector and determine an address for the non-zero valued activation and the non-zero valued weight based on the position of the non-zero valued bit. The non-zero valued activation and the non-zero valued weight may be provided to a PE for performing MAC operations.
    Type: Application
    Filed: September 19, 2022
    Publication date: January 19, 2023
    Inventors: Martin Power, Conor Byrne, Niall Hanrahan, Deepak Abraham Mathaikutty, Arnab Raha, Raymond Jit-Hung Sung, David Thomas Bernard, Kevin Brady, Martin-Thomas Grymel
  • Publication number: 20230016455
    Abstract: A deconvolution can be decomposed into multiple convolutions. Results of the convolutions constitute an output of the deconvolution. Zeros may be added to an input tensor of the deconvolution to generate an upsampled input tensor. Subtensors having the same size as the kernel of the deconvolution may be identified from the upsampled input tensor. A subtensor may include one or more input activations and one or more zeros. Subtensors having same distribution patterns of input activations may be used to generate a reduced kernel. The reduced kernel includes a subset of the kernel. The position of a weight in the reduced kernel may be the same as the positions of an input activation in the subtensor. Multiple reduced kernels may be generated based on multiple subtensors having different distribution patterns of activations. Each of the convolutions may use the input tensor and a different one of the reduced kernels.
    Type: Application
    Filed: September 26, 2022
    Publication date: January 19, 2023
    Inventors: Alessandro Palla, David Thomas Bernard, Niall Hanrahan
  • Publication number: 20230008622
    Abstract: An DNN accelerator may perform 1×N kernel decomposition to decompose a convolutional kernel into kernel vectors, each of which includes multiple weights. Through the kernel decomposition, a weight operand may be generated from a filter. The DNN accelerator converts an input tensor into input operands. An input operand includes activations and has the same size as the weight operand. The DNN accelerator may read a first activation in the input operand from memory to an internal memory of a first PE and read a second activation in the input operand from the memory to an internal memory of a second PE. The first PE may receive the second activation from the second PE through activation broadcasting between the two PEs and perform MAC operations on the input operand and weight operand. The second PE may perform MAC operations on another input operand in the input tensor and the weight operand.
    Type: Application
    Filed: September 22, 2022
    Publication date: January 12, 2023
    Inventors: Richard Boyd, David Thomas Bernard, Deepak Abraham Mathaikutty, Martin Power, Niall Hanrahan
  • Publication number: 20220138016
    Abstract: Methods, apparatus, systems and articles of manufacture to store and access multi-dimensional data are disclosed. An example apparatus includes a memory; a memory allocator to allocate part of the memory for storage of a multi-dimensional data object; and a storage element organizer to: separate the multi-dimensional data into storage elements; store the storage elements in the memory, the stored storage elements being selectively executable; store starting memory address locations for the storage elements in an array in the memory, the array to facilitate selectable access of data of the stored elements; store a pointer for the array into the memory.
    Type: Application
    Filed: July 12, 2021
    Publication date: May 5, 2022
    Inventors: Fergal Connor, David Bernard, Niall Hanrahan, Derek Harnett
  • Publication number: 20220108135
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for performing a machine learning operation using storage element pointers. An example computer readable medium comprises instructions that when executed, cause at least one processor to select, in response to a determination that a machine learning operation is to be performed, create first and second storage element pointers based on a type of machine learning operation to be performed, remap input tensor data of the input tensor based on the first storage element pointer without movement of the input tensor data in memory, cause execution of the machine learning operation with the remapped input tensor data to create intermediate tensor data, remap the intermediate tensor data based on the second storage element pointer without movement of the intermediate tensor data in memory, and provide the remapped intermediate tensor data as an output tensor.
    Type: Application
    Filed: December 17, 2021
    Publication date: April 7, 2022
    Inventors: Kevin Brady, Martin Power, Martin-Thomas Grymel, Alessandro Palla, David Bernard, Niall Hanrahan
  • Publication number: 20220012058
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.
    Type: Application
    Filed: September 24, 2021
    Publication date: January 13, 2022
    Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
  • Publication number: 20220012578
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase utilization of neural network (NN) accelerator circuitry for shallow layers of an NN by reformatting one or more tensors. An example apparatus includes parameter determining circuitry to determine a width of a weight kernel and to determine a depth of a first tensor. The example apparatus also includes storage control circuitry to, starting at a first XY location of the first tensor, copy one or more Z values, up to the depth of the first tensor, of consecutive XY locations that overlap the width of the weight kernel and to load the one or more Z values consecutively in a first XY location of a second tensor.
    Type: Application
    Filed: September 24, 2021
    Publication date: January 13, 2022
    Inventors: Kevin Brady, Martin Power, Niall Hanrahan, Alessandro Palla, Martin-Thomas Grymel, David Bernard