Patents by Inventor Cormac Brick

Cormac Brick has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20260154525
    Abstract: Example apparatus disclosed herein include an array of processor elements, the array including rows each having a first number of processor elements and columns each having a second number of processor elements. Disclosed example apparatus also include configuration registers to store descriptors to configure the array to implement a layer of a convolutional neural network based on a dataflow schedule corresponding to one of multiple tensor processing templates, ones of the processor elements to be configured based on the descriptors to implement the one of the tensor processing templates to operate on input activation data and filter data associated with the layer of the convolutional neural network to produce output activation data associated with the layer of the convolutional neural network. Disclosed example apparatus further include memory to store the input activation data, the filter data and the output activation data associated with the layer of the convolutional neural network.
    Type: Application
    Filed: December 23, 2025
    Publication date: June 4, 2026
    Applicant: Intel Corporation
    Inventors: Debabrata Mohapatra, Arnab Raha, Gautham Chinya, Huichu Liu, Cormac Brick, Lance Hacking
  • Patent number: 12554962
    Abstract: Example apparatus disclosed herein include an array of processor elements, the array including rows each having a first number of processor elements and columns each having a second number of processor elements. Disclosed example apparatus also include configuration registers to store descriptors to configure the array to implement a layer of a convolutional neural network based on a dataflow schedule corresponding to one of multiple tensor processing templates, ones of the processor elements to be configured based on the descriptors to implement the one of the tensor processing templates to operate on input activation data and filter data associated with the layer of the convolutional neural network to produce output activation data associated with the layer of the convolutional neural network. Disclosed example apparatus further include memory to store the input activation data, the filter data and the output activation data associated with the layer of the convolutional neural network.
    Type: Grant
    Filed: December 24, 2019
    Date of Patent: February 17, 2026
    Assignee: Intel Corporation
    Inventors: Debabrata Mohapatra, Arnab Raha, Gautham Chinya, Huichu Liu, Cormac Brick, Lance Hacking
  • Publication number: 20250371349
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for hardware-aware machine learning model training. An example apparatus includes a configuration determiner to determine a hardware configuration of a target hardware platform on which the machine learning model is to be executed, a layer generator to assign sparsity configurations to layers of the machine learning model based on the hardware configuration, and a deployment controller to deploy the machine learning model to the target hardware platform in response to outputs of the machine learning model satisfying respective thresholds, the outputs including a quantity of clock cycles to execute the machine learning model with the layers having the assigned sparsity configurations.
    Type: Application
    Filed: August 21, 2025
    Publication date: December 4, 2025
    Applicant: Intel Corporation
    Inventors: Xiaofan Xu, Cormac Brick, Zsolt Biro
  • Publication number: 20250365009
    Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.
    Type: Application
    Filed: August 12, 2025
    Publication date: November 27, 2025
    Applicant: Intel Corporation
    Inventors: Gautham Chinya, Debabrata Mohapatra, Arnab Raha, Huichu Liu, Cormac Brick
  • Patent number: 12438553
    Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.
    Type: Grant
    Filed: September 12, 2023
    Date of Patent: October 7, 2025
    Assignee: Intel Corporation
    Inventors: Gautham Chinya, Debabrata Mohapatra, Arnab Raha, Huichu Liu, Cormac Brick
  • Patent number: 12430239
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.
    Type: Grant
    Filed: December 14, 2023
    Date of Patent: September 30, 2025
    Assignee: Intel Corporation
    Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
  • Patent number: 12288153
    Abstract: Methods and systems include a neural network system that includes a neural network accelerator. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.
    Type: Grant
    Filed: January 10, 2024
    Date of Patent: April 29, 2025
    Assignee: Intel Corporation
    Inventors: Gautham Chinya, Huichu Liu, Arnab Raha, Debabrata Mohapatra, Cormac Brick, Lance Hacking
  • Publication number: 20250131256
    Abstract: Examples to determine a dynamic batch size of a layer are disclosed herein. An example apparatus to determine a dynamic batch size of a layer includes a layer operations controller to determine a layer ratio between a number of operations of a layer and weights of the layer, a comparator to compare the layer ratio to a number of operations per unit of memory size performed by a computation engine, and a batch size determination controller to, when the layer ratio is less than the number of operations per unit of memory size, determine the dynamic batch size of the layer.
    Type: Application
    Filed: September 18, 2024
    Publication date: April 24, 2025
    Applicant: Intel Corporation
    Inventors: Eric Luk, Mohamed Elmalaki, Sara Almalih, Cormac Brick
  • Patent number: 12242861
    Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.
    Type: Grant
    Filed: January 18, 2024
    Date of Patent: March 4, 2025
    Assignee: Intel Corporation
    Inventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
  • Patent number: 12229673
    Abstract: Systems, apparatuses and methods may provide for technology that prefetches compressed data and a sparsity bitmap from a memory to store the compressed data in a decode buffer, where the compressed data is associated with a plurality of tensors, wherein the compressed data is in a compressed format. The technology aligns the compressed data with the sparsity bitmap to generate decoded data, and provides the decoded data to a plurality of processing elements.
    Type: Grant
    Filed: November 11, 2021
    Date of Patent: February 18, 2025
    Assignee: Intel Corporation
    Inventors: Deepak Mathaikutty, Arnab Raha, Raymond Sung, Debabrata Mohapatra, Cormac Brick
  • Publication number: 20250036928
    Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.
    Type: Application
    Filed: October 7, 2024
    Publication date: January 30, 2025
    Applicant: Intel Corporation
    Inventors: Arnab Raha, Debabrata Mohapatra, Gautham Chinya, Guruguhanathan Venkataramanan, Sang Kyun Kim, Deepak Mathaikutty, Raymond Sung, Cormac Brick
  • Publication number: 20250028565
    Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. The present disclosure provides a schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators, wherein the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator. Other embodiments may be described and/or claimed.
    Type: Application
    Filed: October 4, 2024
    Publication date: January 23, 2025
    Applicant: Intel Corporation
    Inventors: Debabrata Mohapatra, Arnab Raha, Deepak Mathaikutty, Raymond Sung, Cormac Brick
  • Patent number: 12169643
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.
    Type: Grant
    Filed: September 12, 2023
    Date of Patent: December 17, 2024
    Assignee: Intel Corporation
    Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
  • Patent number: 12147836
    Abstract: Techniques and configurations enhancing the performance of hardware (HW) accelerators are provided. A schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators is provided, where the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator.
    Type: Grant
    Filed: November 5, 2021
    Date of Patent: November 19, 2024
    Assignee: Intel Corporation
    Inventors: Debabrata Mohapatra, Arnab Raha, Deepak Mathaikutty, Raymond Sung, Cormac Brick
  • Patent number: 12141683
    Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.
    Type: Grant
    Filed: April 30, 2021
    Date of Patent: November 12, 2024
    Assignee: Intel Corporation
    Inventors: Arnab Raha, Debabrata Mohapatra, Gautham Chinya, Guruguhanathan Venkataramanan, Sang Kyun Kim, Deepak Mathaikutty, Raymond Sung, Cormac Brick
  • Patent number: 12124941
    Abstract: Examples to determine a dynamic batch size of a layer are disclosed herein. An example apparatus to determine a dynamic batch size of a layer includes a layer operations controller to determine a layer ratio between a number of operations of a layer and weights of the layer, a comparator to compare the layer ratio to a number of operations per unit of memory size performed by a computation engine, and a batch size determination controller to, when the layer ratio is less than the number of operations per unit of memory size, determine the dynamic batch size of the layer.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: October 22, 2024
    Assignee: Intel Corporation
    Inventors: Eric Luk, Mohamed Elmalaki, Sara Almalih, Cormac Brick
  • Publication number: 20240231839
    Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.
    Type: Application
    Filed: January 18, 2024
    Publication date: July 11, 2024
    Applicant: Intel Corporation
    Inventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
  • Publication number: 20240220785
    Abstract: Methods and systems include a neural network system that includes a neural network accelerator comprising. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.
    Type: Application
    Filed: January 10, 2024
    Publication date: July 4, 2024
    Applicant: Intel Corporation
    Inventors: Gautham Chinya, Huichu Liu, Arnab Raha, Debabrata Mohapatra, Cormac Brick, Lance Hacking
  • Publication number: 20240134786
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.
    Type: Application
    Filed: December 14, 2023
    Publication date: April 25, 2024
    Applicant: Intel Corporation
    Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
  • Patent number: 11940907
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: March 26, 2024
    Assignee: INTEL CORPORATION
    Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick