Patents by Inventor Deepak A. Mathaikutty

Deepak A. Mathaikutty has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11922178
    Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: March 5, 2024
    Assignee: Intel Corporation
    Inventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
  • Publication number: 20230021396
    Abstract: A method for implementing an artificial neural network in a computing system that comprises performing a compute operation using an input activation and a weight to generate an output activation, and modifying the output activation using a noise value to increase activation sparsity.
    Type: Application
    Filed: September 27, 2022
    Publication date: January 26, 2023
    Applicant: Intel Corporation
    Inventors: Nihat Tunali, Arnab Raha, Bogdan Pasca, Martin Langhammer, Michael Wu, Deepak Mathaikutty
  • Publication number: 20220129320
    Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. The present disclosure provides a schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators, wherein the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator. Other embodiments may be described and/or claimed.
    Type: Application
    Filed: November 5, 2021
    Publication date: April 28, 2022
    Inventors: Debabrata Mohapatra, Arnab Raha, Deepak Mathaikutty, Raymond Sung, Cormac Brick
  • Publication number: 20220067524
    Abstract: Systems, apparatuses and methods may provide for technology that prefetches compressed data and a sparsity bitmap from a memory to store the compressed data in a decode buffer, where the compressed data is associated with a plurality of tensors, wherein the compressed data is in a compressed format. The technology aligns the compressed data with the sparsity bitmap to generate decoded data, and provides the decoded data to a plurality of processing elements.
    Type: Application
    Filed: November 11, 2021
    Publication date: March 3, 2022
    Applicant: Intel Corporation
    Inventors: Deepak Mathaikutty, Arnab Raha, Raymond Sung, Debabrata Mohapatra, Cormac Brick
  • Publication number: 20210326144
    Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.
    Type: Application
    Filed: June 25, 2021
    Publication date: October 21, 2021
    Inventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
  • Publication number: 20210271960
    Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.
    Type: Application
    Filed: April 30, 2021
    Publication date: September 2, 2021
    Inventors: Arnab Raha, Debabrata Mohapatra, Gautham Chinya, Guruguhanathan Venkataramanan, Sang Kyun Kim, Deepak Mathaikutty, Raymond Sung, Cormac Brick
  • Patent number: 11010166
    Abstract: A processor includes a front end including circuitry to decode a first instruction to set a performance register for an execution unit and a second instruction, and an allocator including circuitry to assign the second instruction to the execution unit to execute the second instruction. The execution unit includes circuitry to select between a normal computation and an accelerated computation based on a mode field of the performance register, perform the selected computation, and select between a normal result associated with the normal computation and an accelerated result associated with the accelerated computation based on the mode field.
    Type: Grant
    Filed: March 31, 2016
    Date of Patent: May 18, 2021
    Assignee: Intel Corporation
    Inventors: Debabrata Mohapatra, Perry H. Wang, Xiang Zou, Sang Kyun Kim, Deepak A. Mathaikutty, Gautham N. Chinya
  • Publication number: 20210042617
    Abstract: Systems, apparatuses and methods may provide for technology that identify an assignment of weights of a workload to a plurality of processing elements, where the workload is to be associated with a neural network. The technology generates a representation that is to represent whether each of the weights is a zero value or a non-zero value. The technology further stores the representation into partitions of a storage structure based on the assignment of the weights, where the partitions are each to be dedicated to a different one of the processing elements.
    Type: Application
    Filed: October 27, 2020
    Publication date: February 11, 2021
    Inventors: Gautham Chinya, Deepak Mathaikutty, Guruguhanathan Venkataramanan, Debabrata Mohapatra, Moongon Jung, Sang Kyun Kim, Arnab Raha, Cormac Brick
  • Patent number: 9785576
    Abstract: Systems and methods for employing hardware-assisted virtualization for implementing a secure video output path. An example processing system comprises: a memory; a shared interconnect; and a processing core communicatively coupled to the memory via the shared interconnect, the processing core to: initialize a first virtual machine and a second virtual machine; responsive to receiving a memory access transaction initiated by the first virtual machine to access a memory buffer, tag the memory access transaction with an identifier of the first virtual machine; and responsive to receiving a digital content decoder access transaction initiated by the second virtual machine, tag the digital decoder access transaction with an identifier of the second virtual machine.
    Type: Grant
    Filed: March 27, 2014
    Date of Patent: October 10, 2017
    Assignee: Intel Corporation
    Inventors: Thiam Wah Loh, Per Hammarlund, Andreas Wasserbauer, Swee Chong Peter Kuan, Eckhard Delfs, Deepak A. Mathaikutty, Stephen J. Robinson, Gautham N. Chinya, Perry H. Wang, Chee Weng Tan, Hong Wang, Reza Fortas
  • Publication number: 20170286117
    Abstract: A processor includes a front end including circuitry to decode a first instruction to set a performance register for an execution unit and a second instruction and an allocator including circuitry to assign the second instruction to the execution unit to execute the second instruction. The execution unit includes circuitry to select between a normal computation and an accelerated computation based on a mode field of the performance register, perform the selected computation, and select between a normal result associated with the normal computation and an accelerated result associated with the accelerated computation based on the mode field.
    Type: Application
    Filed: March 31, 2016
    Publication date: October 5, 2017
    Inventors: Debabrata Mohapatra, Perry H. Wang, Xiang Zou, Sang Kyun Kim, Deepak A. Mathaikutty, Gautham N. Chinya
  • Publication number: 20150278119
    Abstract: Systems and methods for employing hardware-assisted virtualization for implementing a secure video output path. An example processing system comprises: a memory; a shared interconnect; and a processing core communicatively coupled to the memory via the shared interconnect, the processing core to: initialize a first virtual machine and a second virtual machine; responsive to receiving a memory access transaction initiated by the first virtual machine to access a memory buffer, tag the memory access transaction with an identifier of the first virtual machine; and responsive to receiving a digital content decoder access transaction initiated by the second virtual machine, tag the digital decoder access transaction with an identifier of the second virtual machine.
    Type: Application
    Filed: March 27, 2014
    Publication date: October 1, 2015
    Inventors: THIAM WAH LOH, PER HAMMARLUND, ANDREAS WASSERBAUER, SWEE CHONG PETER KUAN, ECKHARD DELFS, DEEPAK A. MATHAIKUTTY, STEPHEN J. ROBINSON, GAUTHAM N. CHINYA, PERRY H. WANG, CHEE WENG TAN, HONG WANG, REZA FORTAS
  • Publication number: 20150277949
    Abstract: A processing system includes an interconnect and a processing core, coupled to the interconnect, to execute a plurality of virtual machines each being identified by a respective identifier, and tag, by an identifier of the first virtual machine, a first transaction initiated by a first virtual machine to access the interconnect.
    Type: Application
    Filed: March 27, 2014
    Publication date: October 1, 2015
    Inventors: THIAM WAH LOH, GAUTHAM N. CHINYA, STEPHEN J. ROBINSON, REZA FORTAS, HONG WANG, HELMUT REINIG, PER HAMMARLUND, DEEPAK A. MATHAIKUTTY, CHRISTIAN ERBEN
  • Patent number: 9003164
    Abstract: In one embodiment, the present invention includes a memory management unit (MMU) having entries to store virtual address to physical address translations, where each entry includes a location indicator to indicate whether a memory location for the corresponding entry is present in a local or remote memory. In this way, a common virtual memory space can be shared between the two memories, which may be separated by one or more non-coherent links. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 21, 2014
    Date of Patent: April 7, 2015
    Assignee: Intel Corporation
    Inventors: Gautham N. Chinya, Hong Wang, Deepak A. Mathaikutty, Jamison D. Collins, Ethan Schuchman, James P. Held, Ajay V. Bhatt, Prashant Sethi, Stephen F. Whalley
  • Publication number: 20140208042
    Abstract: In one embodiment, the present invention includes a memory management unit (MMU) having entries to store virtual address to physical address translations, where each entry includes a location indicator to indicate whether a memory location for the corresponding entry is present in a local or remote memory. In this way, a common virtual memory space can be shared between the two memories, which may be separated by one or more non-coherent links. Other embodiments are described and claimed.
    Type: Application
    Filed: March 21, 2014
    Publication date: July 24, 2014
    Inventors: Gautham N. Chinya, Hong Wang, Deepak A. Mathaikutty, Jamison D. Collins, Ethan Schuchman, James P. Held, Ajay V. Bhatt, Prashant Sethi, Stephen F. Whalley
  • Patent number: 8719547
    Abstract: In one embodiment, the present invention includes a memory management unit (MMU) having entries to store virtual address to physical address translations, where each entry includes a location indicator to indicate whether a memory location for the corresponding entry is present in a local or remote memory. In this way, a common virtual memory space can be shared between the two memories, which may be separated by one or more non-coherent links. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 18, 2009
    Date of Patent: May 6, 2014
    Assignee: Intel Corporation
    Inventors: Gautham N. Chinya, Hong Wang, Deepak A. Mathaikutty, Jamison D. Collins, Ethan Schuchman, James P. Held, Ajay V. Bhatt, Prashant Sethi, Stephen F. Whalley
  • Publication number: 20110072234
    Abstract: In one embodiment, the present invention includes a memory management unit (MMU) having entries to store virtual address to physical address translations, where each entry includes a location indicator to indicate whether a memory location for the corresponding entry is present in a local or remote memory. In this way, a common virtual memory space can be shared between the two memories, which may be separated by one or more non-coherent links. Other embodiments are described and claimed.
    Type: Application
    Filed: September 18, 2009
    Publication date: March 24, 2011
    Inventors: Gautham N. Chinya, Hong Wang, Deepak A. Mathaikutty, Jamison D. Collins, Ethan Schuchman, James P. Held, Ajay V. Bhatt, Prashant Sethi, Stephen F. Whalley