Patents by Inventor Cormac Brick

Cormac Brick has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHODS AND APPARATUS FOR SPARSE TENSOR STORAGE FOR NEURAL NETWORK ACCELERATORS

Publication number: 20240134786

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.

Type: Application

Filed: December 14, 2023

Publication date: April 25, 2024

Applicant: Intel Corporation

Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
Methods and apparatus for sparse tensor storage for neural network accelerators

Patent number: 11940907

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.

Type: Grant

Filed: June 25, 2021

Date of Patent: March 26, 2024

Assignee: INTEL CORPORATION

Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
Methods and apparatus to load data within a machine learning accelerator

Patent number: 11922178

Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.

Type: Grant

Filed: June 25, 2021

Date of Patent: March 5, 2024

Assignee: Intel Corporation

Inventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
Schedule-aware tensor distribution module

Patent number: 11907827

Abstract: Methods and systems include a neural network system that includes a neural network accelerator. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.

Type: Grant

Filed: June 28, 2019

Date of Patent: February 20, 2024

Assignee: Intel Corporation

Inventors: Gautham Chinya, Huichu Liu, Arnab Raha, Debabrata Mohapatra, Cormac Brick, Lance Hacking
METHODS, APPARATUS, AND ARTICLES OF MANUFACTURE TO INCREASE DATA REUSE FOR MULTIPLY AND ACCUMULATE (MAC) OPERATIONS

Publication number: 20240036763

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.

Type: Application

Filed: September 12, 2023

Publication date: February 1, 2024

Applicant: Intel Corporation

Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
METHODS, SYSTEMS, ARTICLES OF MANUFACTURE, AND APPARATUS TO DECODE ZERO-VALUE-COMPRESSION DATA VECTORS

Publication number: 20240022259

Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.

Type: Application

Filed: September 12, 2023

Publication date: January 18, 2024

Applicant: Intel Corporation

Inventors: Gautham Chinya, Debabrata Mohapatra, Arnab Raha, Huichu Liu, Cormac Brick
APPARATUS, SYSTEMS, AND METHODS FOR PROVIDING COMPUTATIONAL IMAGING PIPELINE

Publication number: 20230359464

Abstract: The present application relates generally to a parallel processing device. The parallel processing device can include a plurality of processing elements, a memory subsystem, and an interconnect system. The memory subsystem can include a plurality of memory slices, at least one of which is associated with one of the plurality of processing elements and comprises a plurality of random access memory (RAM) tiles, each tile having individual read and write ports. The interconnect system is configured to couple the plurality of processing elements and the memory subsystem. The interconnect system includes a local interconnect and a global interconnect.

Type: Application

Filed: January 30, 2023

Publication date: November 9, 2023

Inventors: David Moloney, Cormac Brick, Ovidiu Andrei Vesa, Brendan Barry
Methods, systems, articles of manufacture, and apparatus to decode zero-value-compression data vectors

Patent number: 11804851

Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.

Type: Grant

Filed: March 27, 2020

Date of Patent: October 31, 2023

Assignee: INTEL CORPORATION

Inventors: Gautham Chinya, Debabrata Mohapatra, Arnab Raha, Huichu Liu, Cormac Brick
Methods, apparatus, and articles of manufacture to increase data reuse for multiply and accumulate (MAC) operations

Patent number: 11789646

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.

Type: Grant

Filed: September 24, 2021

Date of Patent: October 17, 2023

Assignee: INTEL CORPORATION

Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
Corner detection

Patent number: 11605212

Abstract: The present application provides a method of corner detection and an image processing system for detecting corners in an image. The preferred implementation is in software using enabling and reusable hardware features in the underlying vector processor architecture. The advantage of this combined software and programmable processor datapath hardware is that the same hardware used for the FAST algorithm can also be readily applied to a variety of other computational tasks, not limited to image processing.

Type: Grant

Filed: July 12, 2021

Date of Patent: March 14, 2023

Assignee: Movidius Limited

Inventors: Cormac Brick, Brendan Barry, Fergal Connor, David Moloney
Apparatus, systems, and methods for providing computational imaging pipeline

Patent number: 11567780

Abstract: The present application relates generally to a parallel processing device. The parallel processing device can include a plurality of processing elements, a memory subsystem, and an interconnect system. The memory subsystem can include a plurality of memory slices, at least one of which is associated with one of the plurality of processing elements and comprises a plurality of random access memory (RAM) tiles, each tile having individual read and write ports. The interconnect system is configured to couple the plurality of processing elements and the memory subsystem. The interconnect system includes a local interconnect and a global interconnect.

Type: Grant

Filed: June 21, 2021

Date of Patent: January 31, 2023

Assignee: Movidius Limited

Inventors: David Moloney, Cormac Brick, Ovidiu Andrei Vesa, Brendan Barry
NEURAL NETWORK BASED POWER AND PERFORMANCE MODEL FOR VERSATILE PROCESSING UNITS

Publication number: 20220391710

Abstract: Systems, apparatuses and methods may provide for technology that determines a complexity of a task associated with a neural network workload and generates a hardware efficiency estimate for the task, wherein the hardware efficiency estimate is generated via a neural network based cost model if the complexity exceeds a threshold, and wherein the hardware efficiency estimate is generated via a cost function if the complexity does not exceed the threshold. In one example, the technology trains the neural network based cost model based on one or more of hardware profile data or register-transfer level (RTL) data.

Type: Application

Filed: August 18, 2022

Publication date: December 8, 2022

Applicant: Intel Corporation

Inventors: Alessandro Palla, Ian Frederick Hunter, Richard Richmond, Cormac Brick, Sebastian Eusebiu Nagy
CORNER DETECTION

Publication number: 20220180618

Abstract: The present application provides a method of corner detection and an image processing system for detecting corners in an image. The preferred implementation is in software using enabling and reusable hardware features in the underlying vector processor architecture. The advantage of this combined software and programmable processor datapath hardware is that the same hardware used for the FAST algorithm can also be readily applied to a variety of other computational tasks, not limited to image processing.

Type: Application

Filed: July 12, 2021

Publication date: June 9, 2022

Inventors: Cormac Brick, Brendan Barry, Fergal Connor, David Moloney
Methods, apparatus, articles of manufacture to perform accelerated matrix multiplication

Patent number: 11347828

Abstract: A disclosed apparatus to multiply matrices includes a compute engine. The compute engine includes multipliers in a two dimensional array that has a plurality of array locations defined by columns and rows. The apparatus also includes a plurality of adders in columns. A broadcast interconnect between a cache and the multipliers broadcasts a first set of operand data elements to multipliers in the rows of the array. A unicast interconnect unicasts a second set of operands between a data buffer and the multipliers. The multipliers multiply the operands to generate a plurality of outputs, and the adders add the outputs generated by the multipliers.

Type: Grant

Filed: March 27, 2020

Date of Patent: May 31, 2022

Assignee: Intel Corporation

Inventors: Biji George, Om Ji Omer, Dipan Kumar Mandal, Cormac Brick, Lance Hacking, Sreenivas Subramoney, Belliappa Kuttanna
APPARATUS, SYSTEMS, AND METHODS FOR PROVIDING COMPUTATIONAL IMAGING PIPELINE

Publication number: 20220147363

Abstract: The present application relates generally to a parallel processing device. The parallel processing device can include a plurality of processing elements, a memory subsystem, and an interconnect system. The memory subsystem can include a plurality of memory slices, at least one of which is associated with one of the plurality of processing elements and comprises a plurality of random access memory (RAM) tiles, each tile having individual read and write ports. The interconnect system is configured to couple the plurality of processing elements and the memory subsystem. The interconnect system includes a local interconnect and a global interconnect.

Type: Application

Filed: June 21, 2021

Publication date: May 12, 2022

Inventors: David Moloney, Cormac Brick, Ovidiu Andrei Vesa, Brendan Barry
SCHEDULE-AWARE DYNAMICALLY RECONFIGURABLE ADDER TREE ARCHITECTURE FOR PARTIAL SUM ACCUMULATION IN MACHINE LEARNING ACCELERATORS

Publication number: 20220129320

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. The present disclosure provides a schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators, wherein the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator. Other embodiments may be described and/or claimed.

Type: Application

Filed: November 5, 2021

Publication date: April 28, 2022

Inventors: Debabrata Mohapatra, Arnab Raha, Deepak Mathaikutty, Raymond Sung, Cormac Brick
SPARSITY-AWARE DATASTORE FOR INFERENCE PROCESSING IN DEEP NEURAL NETWORK ARCHITECTURES

Publication number: 20220067524

Abstract: Systems, apparatuses and methods may provide for technology that prefetches compressed data and a sparsity bitmap from a memory to store the compressed data in a decode buffer, where the compressed data is associated with a plurality of tensors, wherein the compressed data is in a compressed format. The technology aligns the compressed data with the sparsity bitmap to generate decoded data, and provides the decoded data to a plurality of processing elements.

Type: Application

Filed: November 11, 2021

Publication date: March 3, 2022

Applicant: Intel Corporation

Inventors: Deepak Mathaikutty, Arnab Raha, Raymond Sung, Debabrata Mohapatra, Cormac Brick
METHODS, APPARATUS, AND ARTICLES OF MANUFACTURE TO INCREASE DATA REUSE FOR MULTIPLY AND ACCUMULATE (MAC) OPERATIONS

Publication number: 20220012058

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.

Type: Application

Filed: September 24, 2021

Publication date: January 13, 2022

Inventors: Niall Hanrahan, Martin Power, Kevin Brady, Martin-Thomas Grymel, David Bernard, Gary Baugh, Cormac Brick
METHODS AND APPARATUS FOR SPARSE TENSOR STORAGE FOR NEURAL NETWORK ACCELERATORS

Publication number: 20210406164

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.

Type: Application

Filed: June 25, 2021

Publication date: December 30, 2021

Inventors: Martin-Thomas Grymel, David Bernard, Niall Hanrahan, Martin Power, Kevin Brady, Gary Baugh, Cormac Brick
AREA AND ENERGY EFFICIENT MULTI-PRECISION MULTIPLY-ACCUMULATE UNIT-BASED PROCESSOR

Publication number: 20210397414

Abstract: Systems, apparatuses and methods may provide for multi-precision multiply-accumulate (MAC) technology that includes a plurality of arithmetic blocks, wherein the plurality of arithmetic blocks each contain multiple multipliers, and wherein the logic is to combine multipliers one or more of within each arithmetic block or across multiple arithmetic blocks. In one example, one or more intermediate multipliers are of a size that is less than precisions supported by arithmetic blocks containing the one or more intermediate multipliers.

Type: Application

Filed: June 25, 2021

Publication date: December 23, 2021

Inventors: Arnab Raha, Mark A. Anders, Martin Power, Martin Langhammer, Himanshu Kaul, Debabrata Mohapatra, Gautham Chinya, Cormac Brick, Ram Krishnamurthy

1 2 3 next