Patents by Inventor Cormac Brick

Cormac Brick has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12242861
    Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.
    Type: Grant
    Filed: January 18, 2024
    Date of Patent: March 4, 2025
    Assignee: Intel Corporation
    Inventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
  • Patent number: 12229673
    Abstract: Systems, apparatuses and methods may provide for technology that prefetches compressed data and a sparsity bitmap from a memory to store the compressed data in a decode buffer, where the compressed data is associated with a plurality of tensors, wherein the compressed data is in a compressed format. The technology aligns the compressed data with the sparsity bitmap to generate decoded data, and provides the decoded data to a plurality of processing elements.
    Type: Grant
    Filed: November 11, 2021
    Date of Patent: February 18, 2025
    Assignee: Intel Corporation
    Inventors: Deepak Mathaikutty, Arnab Raha, Raymond Sung, Debabrata Mohapatra, Cormac Brick
  • Publication number: 20250036928
    Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.
    Type: Application
    Filed: October 7, 2024
    Publication date: January 30, 2025
    Applicant: Intel Corporation
    Inventors: Arnab Raha, Debabrata Mohapatra, Gautham Chinya, Guruguhanathan Venkataramanan, Sang Kyun Kim, Deepak Mathaikutty, Raymond Sung, Cormac Brick
  • Publication number: 20250028565
    Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. The present disclosure provides a schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators, wherein the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator. Other embodiments may be described and/or claimed.
    Type: Application
    Filed: October 4, 2024
    Publication date: January 23, 2025
    Applicant: Intel Corporation
    Inventors: Debabrata Mohapatra, Arnab Raha, Deepak Mathaikutty, Raymond Sung, Cormac Brick
  • Patent number: 12169643
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.
    Type: Grant
    Filed: September 12, 2023
    Date of Patent: December 17, 2024
    Assignee: Intel Corporation
    Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
  • Patent number: 12147836
    Abstract: Techniques and configurations enhancing the performance of hardware (HW) accelerators are provided. A schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators is provided, where the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator.
    Type: Grant
    Filed: November 5, 2021
    Date of Patent: November 19, 2024
    Assignee: Intel Corporation
    Inventors: Debabrata Mohapatra, Arnab Raha, Deepak Mathaikutty, Raymond Sung, Cormac Brick
  • Patent number: 12141683
    Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.
    Type: Grant
    Filed: April 30, 2021
    Date of Patent: November 12, 2024
    Assignee: Intel Corporation
    Inventors: Arnab Raha, Debabrata Mohapatra, Gautham Chinya, Guruguhanathan Venkataramanan, Sang Kyun Kim, Deepak Mathaikutty, Raymond Sung, Cormac Brick
  • Patent number: 12124941
    Abstract: Examples to determine a dynamic batch size of a layer are disclosed herein. An example apparatus to determine a dynamic batch size of a layer includes a layer operations controller to determine a layer ratio between a number of operations of a layer and weights of the layer, a comparator to compare the layer ratio to a number of operations per unit of memory size performed by a computation engine, and a batch size determination controller to, when the layer ratio is less than the number of operations per unit of memory size, determine the dynamic batch size of the layer.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: October 22, 2024
    Assignee: Intel Corporation
    Inventors: Eric Luk, Mohamed Elmalaki, Sara Almalih, Cormac Brick
  • Publication number: 20240231839
    Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.
    Type: Application
    Filed: January 18, 2024
    Publication date: July 11, 2024
    Applicant: Intel Corporation
    Inventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
  • Publication number: 20240220785
    Abstract: Methods and systems include a neural network system that includes a neural network accelerator comprising. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.
    Type: Application
    Filed: January 10, 2024
    Publication date: July 4, 2024
    Applicant: Intel Corporation
    Inventors: Gautham Chinya, Huichu Liu, Arnab Raha, Debabrata Mohapatra, Cormac Brick, Lance Hacking
  • Publication number: 20240134786
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.
    Type: Application
    Filed: December 14, 2023
    Publication date: April 25, 2024
    Applicant: Intel Corporation
    Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
  • Patent number: 11940907
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: March 26, 2024
    Assignee: INTEL CORPORATION
    Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
  • Patent number: 11922178
    Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: March 5, 2024
    Assignee: Intel Corporation
    Inventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
  • Patent number: 11907827
    Abstract: Methods and systems include a neural network system that includes a neural network accelerator. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.
    Type: Grant
    Filed: June 28, 2019
    Date of Patent: February 20, 2024
    Assignee: Intel Corporation
    Inventors: Gautham Chinya, Huichu Liu, Arnab Raha, Debabrata Mohapatra, Cormac Brick, Lance Hacking
  • Publication number: 20240036763
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.
    Type: Application
    Filed: September 12, 2023
    Publication date: February 1, 2024
    Applicant: Intel Corporation
    Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
  • Publication number: 20240022259
    Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.
    Type: Application
    Filed: September 12, 2023
    Publication date: January 18, 2024
    Applicant: Intel Corporation
    Inventors: Gautham Chinya, Debabrata Mohapatra, Arnab Raha, Huichu Liu, Cormac Brick
  • Publication number: 20230359464
    Abstract: The present application relates generally to a parallel processing device. The parallel processing device can include a plurality of processing elements, a memory subsystem, and an interconnect system. The memory subsystem can include a plurality of memory slices, at least one of which is associated with one of the plurality of processing elements and comprises a plurality of random access memory (RAM) tiles, each tile having individual read and write ports. The interconnect system is configured to couple the plurality of processing elements and the memory subsystem. The interconnect system includes a local interconnect and a global interconnect.
    Type: Application
    Filed: January 30, 2023
    Publication date: November 9, 2023
    Inventors: David Moloney, Cormac Brick, Ovidiu Andrei Vesa, Brendan Barry
  • Patent number: 11804851
    Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: October 31, 2023
    Assignee: INTEL CORPORATION
    Inventors: Gautham Chinya, Debabrata Mohapatra, Arnab Raha, Huichu Liu, Cormac Brick
  • Patent number: 11789646
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.
    Type: Grant
    Filed: September 24, 2021
    Date of Patent: October 17, 2023
    Assignee: INTEL CORPORATION
    Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
  • Patent number: 11605212
    Abstract: The present application provides a method of corner detection and an image processing system for detecting corners in an image. The preferred implementation is in software using enabling and reusable hardware features in the underlying vector processor architecture. The advantage of this combined software and programmable processor datapath hardware is that the same hardware used for the FAST algorithm can also be readily applied to a variety of other computational tasks, not limited to image processing.
    Type: Grant
    Filed: July 12, 2021
    Date of Patent: March 14, 2023
    Assignee: Movidius Limited
    Inventors: Cormac Brick, Brendan Barry, Fergal Connor, David Moloney