Patents by Inventor Cormac Brick

Cormac Brick has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240134786
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.
    Type: Application
    Filed: December 14, 2023
    Publication date: April 25, 2024
    Applicant: Intel Corporation
    Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
  • Patent number: 11940907
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: March 26, 2024
    Assignee: INTEL CORPORATION
    Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
  • Patent number: 11922178
    Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: March 5, 2024
    Assignee: Intel Corporation
    Inventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
  • Patent number: 11907827
    Abstract: Methods and systems include a neural network system that includes a neural network accelerator. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.
    Type: Grant
    Filed: June 28, 2019
    Date of Patent: February 20, 2024
    Assignee: Intel Corporation
    Inventors: Gautham Chinya, Huichu Liu, Arnab Raha, Debabrata Mohapatra, Cormac Brick, Lance Hacking
  • Publication number: 20240036763
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.
    Type: Application
    Filed: September 12, 2023
    Publication date: February 1, 2024
    Applicant: Intel Corporation
    Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
  • Publication number: 20240022259
    Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.
    Type: Application
    Filed: September 12, 2023
    Publication date: January 18, 2024
    Applicant: Intel Corporation
    Inventors: Gautham Chinya, Debabrata Mohapatra, Arnab Raha, Huichu Liu, Cormac Brick
  • Publication number: 20230359464
    Abstract: The present application relates generally to a parallel processing device. The parallel processing device can include a plurality of processing elements, a memory subsystem, and an interconnect system. The memory subsystem can include a plurality of memory slices, at least one of which is associated with one of the plurality of processing elements and comprises a plurality of random access memory (RAM) tiles, each tile having individual read and write ports. The interconnect system is configured to couple the plurality of processing elements and the memory subsystem. The interconnect system includes a local interconnect and a global interconnect.
    Type: Application
    Filed: January 30, 2023
    Publication date: November 9, 2023
    Inventors: David Moloney, Cormac Brick, Ovidiu Andrei Vesa, Brendan Barry
  • Patent number: 11804851
    Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: October 31, 2023
    Assignee: INTEL CORPORATION
    Inventors: Gautham Chinya, Debabrata Mohapatra, Arnab Raha, Huichu Liu, Cormac Brick
  • Patent number: 11789646
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.
    Type: Grant
    Filed: September 24, 2021
    Date of Patent: October 17, 2023
    Assignee: INTEL CORPORATION
    Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
  • Patent number: 11605212
    Abstract: The present application provides a method of corner detection and an image processing system for detecting corners in an image. The preferred implementation is in software using enabling and reusable hardware features in the underlying vector processor architecture. The advantage of this combined software and programmable processor datapath hardware is that the same hardware used for the FAST algorithm can also be readily applied to a variety of other computational tasks, not limited to image processing.
    Type: Grant
    Filed: July 12, 2021
    Date of Patent: March 14, 2023
    Assignee: Movidius Limited
    Inventors: Cormac Brick, Brendan Barry, Fergal Connor, David Moloney
  • Patent number: 11567780
    Abstract: The present application relates generally to a parallel processing device. The parallel processing device can include a plurality of processing elements, a memory subsystem, and an interconnect system. The memory subsystem can include a plurality of memory slices, at least one of which is associated with one of the plurality of processing elements and comprises a plurality of random access memory (RAM) tiles, each tile having individual read and write ports. The interconnect system is configured to couple the plurality of processing elements and the memory subsystem. The interconnect system includes a local interconnect and a global interconnect.
    Type: Grant
    Filed: June 21, 2021
    Date of Patent: January 31, 2023
    Assignee: Movidius Limited
    Inventors: David Moloney, Cormac Brick, Ovidiu Andrei Vesa, Brendan Barry
  • Publication number: 20220391710
    Abstract: Systems, apparatuses and methods may provide for technology that determines a complexity of a task associated with a neural network workload and generates a hardware efficiency estimate for the task, wherein the hardware efficiency estimate is generated via a neural network based cost model if the complexity exceeds a threshold, and wherein the hardware efficiency estimate is generated via a cost function if the complexity does not exceed the threshold. In one example, the technology trains the neural network based cost model based on one or more of hardware profile data or register-transfer level (RTL) data.
    Type: Application
    Filed: August 18, 2022
    Publication date: December 8, 2022
    Applicant: Intel Corporation
    Inventors: Alessandro Palla, Ian Frederick Hunter, Richard Richmond, Cormac Brick, Sebastian Eusebiu Nagy
  • Publication number: 20220180618
    Abstract: The present application provides a method of corner detection and an image processing system for detecting corners in an image. The preferred implementation is in software using enabling and reusable hardware features in the underlying vector processor architecture. The advantage of this combined software and programmable processor datapath hardware is that the same hardware used for the FAST algorithm can also be readily applied to a variety of other computational tasks, not limited to image processing.
    Type: Application
    Filed: July 12, 2021
    Publication date: June 9, 2022
    Inventors: Cormac Brick, Brendan Barry, Fergal Connor, David Moloney
  • Patent number: 11347828
    Abstract: A disclosed apparatus to multiply matrices includes a compute engine. The compute engine includes multipliers in a two dimensional array that has a plurality of array locations defined by columns and rows. The apparatus also includes a plurality of adders in columns. A broadcast interconnect between a cache and the multipliers broadcasts a first set of operand data elements to multipliers in the rows of the array. A unicast interconnect unicasts a second set of operands between a data buffer and the multipliers. The multipliers multiply the operands to generate a plurality of outputs, and the adders add the outputs generated by the multipliers.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: May 31, 2022
    Assignee: Intel Corporation
    Inventors: Biji George, Om Ji Omer, Dipan Kumar Mandal, Cormac Brick, Lance Hacking, Sreenivas Subramoney, Belliappa Kuttanna
  • Publication number: 20220147363
    Abstract: The present application relates generally to a parallel processing device. The parallel processing device can include a plurality of processing elements, a memory subsystem, and an interconnect system. The memory subsystem can include a plurality of memory slices, at least one of which is associated with one of the plurality of processing elements and comprises a plurality of random access memory (RAM) tiles, each tile having individual read and write ports. The interconnect system is configured to couple the plurality of processing elements and the memory subsystem. The interconnect system includes a local interconnect and a global interconnect.
    Type: Application
    Filed: June 21, 2021
    Publication date: May 12, 2022
    Inventors: David Moloney, Cormac Brick, Ovidiu Andrei Vesa, Brendan Barry
  • Publication number: 20220129320
    Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. The present disclosure provides a schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators, wherein the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator. Other embodiments may be described and/or claimed.
    Type: Application
    Filed: November 5, 2021
    Publication date: April 28, 2022
    Inventors: Debabrata Mohapatra, Arnab Raha, Deepak Mathaikutty, Raymond Sung, Cormac Brick
  • Publication number: 20220067524
    Abstract: Systems, apparatuses and methods may provide for technology that prefetches compressed data and a sparsity bitmap from a memory to store the compressed data in a decode buffer, where the compressed data is associated with a plurality of tensors, wherein the compressed data is in a compressed format. The technology aligns the compressed data with the sparsity bitmap to generate decoded data, and provides the decoded data to a plurality of processing elements.
    Type: Application
    Filed: November 11, 2021
    Publication date: March 3, 2022
    Applicant: Intel Corporation
    Inventors: Deepak Mathaikutty, Arnab Raha, Raymond Sung, Debabrata Mohapatra, Cormac Brick
  • Publication number: 20220012058
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.
    Type: Application
    Filed: September 24, 2021
    Publication date: January 13, 2022
    Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
  • Publication number: 20210406164
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.
    Type: Application
    Filed: June 25, 2021
    Publication date: December 30, 2021
    Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
  • Publication number: 20210397414
    Abstract: Systems, apparatuses and methods may provide for multi-precision multiply-accumulate (MAC) technology that includes a plurality of arithmetic blocks, wherein the plurality of arithmetic blocks each contain multiple multipliers, and wherein the logic is to combine multipliers one or more of within each arithmetic block or across multiple arithmetic blocks. In one example, one or more intermediate multipliers are of a size that is less than precisions supported by arithmetic blocks containing the one or more intermediate multipliers.
    Type: Application
    Filed: June 25, 2021
    Publication date: December 23, 2021
    Inventors: Arnab Raha, Mark A. Anders, Martin Power, Martin Langhammer, Himanshu Kaul, Debabrata Mohapatra, Gautham Chinya, Cormac Brick, Ram Krishnamurthy