Patents by Inventor Deepak Mathaikutty

Deepak Mathaikutty has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Techniques for increasing activation sparsity in artificial neural networks

Patent number: 12682221

Abstract: A method for implementing an artificial neural network in a computing system that comprises performing a compute operation using an input activation and a weight to generate an output activation, and modifying the output activation using a noise value to increase activation sparsity.

Type: Grant

Filed: September 27, 2022

Date of Patent: July 14, 2026

Assignee: Altera Corporation

Inventors: Nihat Tunali, Arnab Raha, Bogdan Pasca, Martin Langhammer, Michael Wu, Deepak Mathaikutty
Methods and apparatus to load data within a machine learning accelerator

Patent number: 12242861

Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.

Type: Grant

Filed: January 18, 2024

Date of Patent: March 4, 2025

Assignee: Intel Corporation

Inventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
Sparsity-aware datastore for inference processing in deep neural network architectures

Patent number: 12229673

Abstract: Systems, apparatuses and methods may provide for technology that prefetches compressed data and a sparsity bitmap from a memory to store the compressed data in a decode buffer, where the compressed data is associated with a plurality of tensors, wherein the compressed data is in a compressed format. The technology aligns the compressed data with the sparsity bitmap to generate decoded data, and provides the decoded data to a plurality of processing elements.

Type: Grant

Filed: November 11, 2021

Date of Patent: February 18, 2025

Assignee: Intel Corporation

Inventors: Deepak Mathaikutty, Arnab Raha, Raymond Sung, Debabrata Mohapatra, Cormac Brick
PERFORMANCE SCALING FOR DATAFLOW DEEP NEURAL NETWORK HARDWARE ACCELERATORS

Publication number: 20250036928

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.

Type: Application

Filed: October 7, 2024

Publication date: January 30, 2025

Applicant: Intel Corporation

Inventors: Arnab Raha, Debabrata Mohapatra, Gautham Chinya, Guruguhanathan Venkataramanan, Sang Kyun Kim, Deepak Mathaikutty, Raymond Sung, Cormac Brick
SCHEDULE-AWARE DYNAMICALLY RECONFIGURABLE ADDER TREE ARCHITECTURE FOR PARTIAL SUM ACCUMULATION IN MACHINE LEARNING ACCELERATORS

Publication number: 20250028565

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. The present disclosure provides a schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators, wherein the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator. Other embodiments may be described and/or claimed.

Type: Application

Filed: October 4, 2024

Publication date: January 23, 2025

Applicant: Intel Corporation

Inventors: Debabrata Mohapatra, Arnab Raha, Deepak Mathaikutty, Raymond Sung, Cormac Brick
Schedule-aware dynamically reconfigurable adder tree architecture for partial sum accumulation in machine learning accelerators

Patent number: 12147836

Abstract: Techniques and configurations enhancing the performance of hardware (HW) accelerators are provided. A schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators is provided, where the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator.

Type: Grant

Filed: November 5, 2021

Date of Patent: November 19, 2024

Assignee: Intel Corporation

Inventors: Debabrata Mohapatra, Arnab Raha, Deepak Mathaikutty, Raymond Sung, Cormac Brick
Performance scaling for dataflow deep neural network hardware accelerators

Patent number: 12141683

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.

Type: Grant

Filed: April 30, 2021

Date of Patent: November 12, 2024

Assignee: Intel Corporation

Inventors: Arnab Raha, Debabrata Mohapatra, Gautham Chinya, Guruguhanathan Venkataramanan, Sang Kyun Kim, Deepak Mathaikutty, Raymond Sung, Cormac Brick
METHODS AND APPARATUS TO LOAD DATA WITHIN A MACHINE LEARNING ACCELERATOR

Publication number: 20240231839

Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.

Type: Application

Filed: January 18, 2024

Publication date: July 11, 2024

Applicant: Intel Corporation

Inventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
Methods and apparatus to load data within a machine learning accelerator

Patent number: 11922178

Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.

Type: Grant

Filed: June 25, 2021

Date of Patent: March 5, 2024

Assignee: Intel Corporation

Inventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
Techniques For Increasing Activation Sparsity In Artificial Neural Networks

Publication number: 20230021396

Abstract: A method for implementing an artificial neural network in a computing system that comprises performing a compute operation using an input activation and a weight to generate an output activation, and modifying the output activation using a noise value to increase activation sparsity.

Type: Application

Filed: September 27, 2022

Publication date: January 26, 2023

Applicant: Intel Corporation

Inventors: Nihat Tunali, Arnab Raha, Bogdan Pasca, Martin Langhammer, Michael Wu, Deepak Mathaikutty
SCHEDULE-AWARE DYNAMICALLY RECONFIGURABLE ADDER TREE ARCHITECTURE FOR PARTIAL SUM ACCUMULATION IN MACHINE LEARNING ACCELERATORS

Publication number: 20220129320

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. The present disclosure provides a schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators, wherein the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator. Other embodiments may be described and/or claimed.

Type: Application

Filed: November 5, 2021

Publication date: April 28, 2022

Inventors: Debabrata Mohapatra, Arnab Raha, Deepak Mathaikutty, Raymond Sung, Cormac Brick
SPARSITY-AWARE DATASTORE FOR INFERENCE PROCESSING IN DEEP NEURAL NETWORK ARCHITECTURES

Publication number: 20220067524

Abstract: Systems, apparatuses and methods may provide for technology that prefetches compressed data and a sparsity bitmap from a memory to store the compressed data in a decode buffer, where the compressed data is associated with a plurality of tensors, wherein the compressed data is in a compressed format. The technology aligns the compressed data with the sparsity bitmap to generate decoded data, and provides the decoded data to a plurality of processing elements.

Type: Application

Filed: November 11, 2021

Publication date: March 3, 2022

Applicant: Intel Corporation

Inventors: Deepak Mathaikutty, Arnab Raha, Raymond Sung, Debabrata Mohapatra, Cormac Brick
METHODS AND APPARATUS TO LOAD DATA WITHIN A MACHINE LEARNING ACCELERATOR

Publication number: 20210326144

Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.

Type: Application

Filed: June 25, 2021

Publication date: October 21, 2021

Inventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
PERFORMANCE SCALING FOR DATAFLOW DEEP NEURAL NETWORK HARDWARE ACCELERATORS

Publication number: 20210271960

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.

Type: Application

Filed: April 30, 2021

Publication date: September 2, 2021

Inventors: Arnab Raha, Debabrata Mohapatra, Gautham Chinya, Guruguhanathan Venkataramanan, Sang Kyun Kim, Deepak Mathaikutty, Raymond Sung, Cormac Brick
ACCELERATED LOADING OF UNSTRUCTURED SPARSE DATA IN MACHINE LEARNING ARCHITECTURES

Publication number: 20210042617

Abstract: Systems, apparatuses and methods may provide for technology that identify an assignment of weights of a workload to a plurality of processing elements, where the workload is to be associated with a neural network. The technology generates a representation that is to represent whether each of the weights is a zero value or a non-zero value. The technology further stores the representation into partitions of a storage structure based on the assignment of the weights, where the partitions are each to be dedicated to a different one of the processing elements.

Type: Application

Filed: October 27, 2020

Publication date: February 11, 2021

Inventors: Gautham Chinya, Deepak Mathaikutty, Guruguhanathan Venkataramanan, Debabrata Mohapatra, Moongon Jung, Sang Kyun Kim, Arnab Raha, Cormac Brick