Patents by Inventor Cormac Brick
Cormac Brick has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20260154525Abstract: Example apparatus disclosed herein include an array of processor elements, the array including rows each having a first number of processor elements and columns each having a second number of processor elements. Disclosed example apparatus also include configuration registers to store descriptors to configure the array to implement a layer of a convolutional neural network based on a dataflow schedule corresponding to one of multiple tensor processing templates, ones of the processor elements to be configured based on the descriptors to implement the one of the tensor processing templates to operate on input activation data and filter data associated with the layer of the convolutional neural network to produce output activation data associated with the layer of the convolutional neural network. Disclosed example apparatus further include memory to store the input activation data, the filter data and the output activation data associated with the layer of the convolutional neural network.Type: ApplicationFiled: December 23, 2025Publication date: June 4, 2026Applicant: Intel CorporationInventors: Debabrata Mohapatra, Arnab Raha, Gautham Chinya, Huichu Liu, Cormac Brick, Lance Hacking
-
Patent number: 12554962Abstract: Example apparatus disclosed herein include an array of processor elements, the array including rows each having a first number of processor elements and columns each having a second number of processor elements. Disclosed example apparatus also include configuration registers to store descriptors to configure the array to implement a layer of a convolutional neural network based on a dataflow schedule corresponding to one of multiple tensor processing templates, ones of the processor elements to be configured based on the descriptors to implement the one of the tensor processing templates to operate on input activation data and filter data associated with the layer of the convolutional neural network to produce output activation data associated with the layer of the convolutional neural network. Disclosed example apparatus further include memory to store the input activation data, the filter data and the output activation data associated with the layer of the convolutional neural network.Type: GrantFiled: December 24, 2019Date of Patent: February 17, 2026Assignee: Intel CorporationInventors: Debabrata Mohapatra, Arnab Raha, Gautham Chinya, Huichu Liu, Cormac Brick, Lance Hacking
-
Publication number: 20250371349Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for hardware-aware machine learning model training. An example apparatus includes a configuration determiner to determine a hardware configuration of a target hardware platform on which the machine learning model is to be executed, a layer generator to assign sparsity configurations to layers of the machine learning model based on the hardware configuration, and a deployment controller to deploy the machine learning model to the target hardware platform in response to outputs of the machine learning model satisfying respective thresholds, the outputs including a quantity of clock cycles to execute the machine learning model with the layers having the assigned sparsity configurations.Type: ApplicationFiled: August 21, 2025Publication date: December 4, 2025Applicant: Intel CorporationInventors: Xiaofan Xu, Cormac Brick, Zsolt Biro
-
Publication number: 20250365009Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.Type: ApplicationFiled: August 12, 2025Publication date: November 27, 2025Applicant: Intel CorporationInventors: Gautham Chinya, Debabrata Mohapatra, Arnab Raha, Huichu Liu, Cormac Brick
-
Patent number: 12438553Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.Type: GrantFiled: September 12, 2023Date of Patent: October 7, 2025Assignee: Intel CorporationInventors: Gautham Chinya, Debabrata Mohapatra, Arnab Raha, Huichu Liu, Cormac Brick
-
Patent number: 12430239Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.Type: GrantFiled: December 14, 2023Date of Patent: September 30, 2025Assignee: Intel CorporationInventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
-
Patent number: 12288153Abstract: Methods and systems include a neural network system that includes a neural network accelerator. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.Type: GrantFiled: January 10, 2024Date of Patent: April 29, 2025Assignee: Intel CorporationInventors: Gautham Chinya, Huichu Liu, Arnab Raha, Debabrata Mohapatra, Cormac Brick, Lance Hacking
-
Publication number: 20250131256Abstract: Examples to determine a dynamic batch size of a layer are disclosed herein. An example apparatus to determine a dynamic batch size of a layer includes a layer operations controller to determine a layer ratio between a number of operations of a layer and weights of the layer, a comparator to compare the layer ratio to a number of operations per unit of memory size performed by a computation engine, and a batch size determination controller to, when the layer ratio is less than the number of operations per unit of memory size, determine the dynamic batch size of the layer.Type: ApplicationFiled: September 18, 2024Publication date: April 24, 2025Applicant: Intel CorporationInventors: Eric Luk, Mohamed Elmalaki, Sara Almalih, Cormac Brick
-
Patent number: 12242861Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.Type: GrantFiled: January 18, 2024Date of Patent: March 4, 2025Assignee: Intel CorporationInventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
-
Patent number: 12229673Abstract: Systems, apparatuses and methods may provide for technology that prefetches compressed data and a sparsity bitmap from a memory to store the compressed data in a decode buffer, where the compressed data is associated with a plurality of tensors, wherein the compressed data is in a compressed format. The technology aligns the compressed data with the sparsity bitmap to generate decoded data, and provides the decoded data to a plurality of processing elements.Type: GrantFiled: November 11, 2021Date of Patent: February 18, 2025Assignee: Intel CorporationInventors: Deepak Mathaikutty, Arnab Raha, Raymond Sung, Debabrata Mohapatra, Cormac Brick
-
Publication number: 20250036928Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.Type: ApplicationFiled: October 7, 2024Publication date: January 30, 2025Applicant: Intel CorporationInventors: Arnab Raha, Debabrata Mohapatra, Gautham Chinya, Guruguhanathan Venkataramanan, Sang Kyun Kim, Deepak Mathaikutty, Raymond Sung, Cormac Brick
-
Publication number: 20250028565Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. The present disclosure provides a schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators, wherein the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator. Other embodiments may be described and/or claimed.Type: ApplicationFiled: October 4, 2024Publication date: January 23, 2025Applicant: Intel CorporationInventors: Debabrata Mohapatra, Arnab Raha, Deepak Mathaikutty, Raymond Sung, Cormac Brick
-
Patent number: 12169643Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.Type: GrantFiled: September 12, 2023Date of Patent: December 17, 2024Assignee: Intel CorporationInventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
-
Patent number: 12147836Abstract: Techniques and configurations enhancing the performance of hardware (HW) accelerators are provided. A schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators is provided, where the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator.Type: GrantFiled: November 5, 2021Date of Patent: November 19, 2024Assignee: Intel CorporationInventors: Debabrata Mohapatra, Arnab Raha, Deepak Mathaikutty, Raymond Sung, Cormac Brick
-
Patent number: 12141683Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.Type: GrantFiled: April 30, 2021Date of Patent: November 12, 2024Assignee: Intel CorporationInventors: Arnab Raha, Debabrata Mohapatra, Gautham Chinya, Guruguhanathan Venkataramanan, Sang Kyun Kim, Deepak Mathaikutty, Raymond Sung, Cormac Brick
-
Patent number: 12124941Abstract: Examples to determine a dynamic batch size of a layer are disclosed herein. An example apparatus to determine a dynamic batch size of a layer includes a layer operations controller to determine a layer ratio between a number of operations of a layer and weights of the layer, a comparator to compare the layer ratio to a number of operations per unit of memory size performed by a computation engine, and a batch size determination controller to, when the layer ratio is less than the number of operations per unit of memory size, determine the dynamic batch size of the layer.Type: GrantFiled: March 27, 2020Date of Patent: October 22, 2024Assignee: Intel CorporationInventors: Eric Luk, Mohamed Elmalaki, Sara Almalih, Cormac Brick
-
Publication number: 20240231839Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.Type: ApplicationFiled: January 18, 2024Publication date: July 11, 2024Applicant: Intel CorporationInventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
-
Publication number: 20240220785Abstract: Methods and systems include a neural network system that includes a neural network accelerator comprising. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.Type: ApplicationFiled: January 10, 2024Publication date: July 4, 2024Applicant: Intel CorporationInventors: Gautham Chinya, Huichu Liu, Arnab Raha, Debabrata Mohapatra, Cormac Brick, Lance Hacking
-
Publication number: 20240134786Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.Type: ApplicationFiled: December 14, 2023Publication date: April 25, 2024Applicant: Intel CorporationInventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
-
Patent number: 11940907Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.Type: GrantFiled: June 25, 2021Date of Patent: March 26, 2024Assignee: INTEL CORPORATIONInventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick