Patents by Inventor Gautham Chinya

Gautham Chinya has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods and apparatus to load data within a machine learning accelerator

Patent number: 11922178

Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.

Type: Grant

Filed: June 25, 2021

Date of Patent: March 5, 2024

Assignee: Intel Corporation

Inventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
Schedule-aware tensor distribution module

Patent number: 11907827

Abstract: Methods and systems include a neural network system that includes a neural network accelerator. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.

Type: Grant

Filed: June 28, 2019

Date of Patent: February 20, 2024

Assignee: Intel Corporation

Inventors: Gautham Chinya, Huichu Liu, Arnab Raha, Debabrata Mohapatra, Cormac Brick, Lance Hacking
METHODS, SYSTEMS, ARTICLES OF MANUFACTURE, AND APPARATUS TO DECODE ZERO-VALUE-COMPRESSION DATA VECTORS

Publication number: 20240022259

Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.

Type: Application

Filed: September 12, 2023

Publication date: January 18, 2024

Applicant: Intel Corporation

Inventors: Gautham Chinya, Debabrata Mohapatra, Arnab Raha, Huichu Liu, Cormac Brick
Methods, systems, articles of manufacture, and apparatus to decode zero-value-compression data vectors

Patent number: 11804851

Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.

Type: Grant

Filed: March 27, 2020

Date of Patent: October 31, 2023

Assignee: INTEL CORPORATION

Inventors: Gautham Chinya, Debabrata Mohapatra, Arnab Raha, Huichu Liu, Cormac Brick
Multiplication-free approximation for neural networks and sparse coding

Patent number: 11714977

Abstract: Systems, apparatuses and methods may provide for replacing floating point matrix multiplication operations with an approximation algorithm or computation in applications that involve sparse codes and neural networks. The system may replace floating point matrix multiplication operations in sparse code applications and neural network applications with an approximation computation that applies an equivalent number of addition and/or subtraction operations.

Type: Grant

Filed: December 17, 2021

Date of Patent: August 1, 2023

Assignee: Intel Corporation

Inventors: Gautham Chinya, Shihao Ji, Arnab Paul
MULTIPLICATION-FREE APPROXIMATION FOR NEURAL NETWORKS AND SPARSE CODING

Publication number: 20220108093

Abstract: Systems, apparatuses and methods may provide for replacing floating point matrix multiplication operations with an approximation algorithm or computation in applications that involve sparse codes and neural networks. The system may replace floating point matrix multiplication operations in sparse code applications and neural network applications with an approximation computation that applies an equivalent number of addition and/or subtraction operations.

Type: Application

Filed: December 17, 2021

Publication date: April 7, 2022

Applicant: Intel Corporation

Inventors: Gautham Chinya, Shihao Ji, Arnab Paul
Multiplication-free approximation for neural networks and sparse coding

Patent number: 11232273

Abstract: Systems, apparatuses and methods may provide for replacing floating point matrix multiplication operations with an approximation algorithm or computation in applications that involve sparse codes and neural networks. The system may replace floating point matrix multiplication operations in sparse code applications and neural network applications with an approximation computation that applies an equivalent number of addition and/or subtraction operations.

Type: Grant

Filed: October 12, 2020

Date of Patent: January 25, 2022

Assignee: Intel Corporation

Inventors: Gautham Chinya, Shihao Ji, Arnab Paul
AREA AND ENERGY EFFICIENT MULTI-PRECISION MULTIPLY-ACCUMULATE UNIT-BASED PROCESSOR

Publication number: 20210397414

Abstract: Systems, apparatuses and methods may provide for multi-precision multiply-accumulate (MAC) technology that includes a plurality of arithmetic blocks, wherein the plurality of arithmetic blocks each contain multiple multipliers, and wherein the logic is to combine multipliers one or more of within each arithmetic block or across multiple arithmetic blocks. In one example, one or more intermediate multipliers are of a size that is less than precisions supported by arithmetic blocks containing the one or more intermediate multipliers.

Type: Application

Filed: June 25, 2021

Publication date: December 23, 2021

Inventors: Arnab Raha, Mark A. Anders, Martin Power, Martin Langhammer, Himanshu Kaul, Debabrata Mohapatra, Gautham Chinya, Cormac Brick, Ram Krishnamurthy
METHODS AND APPARATUS TO LOAD DATA WITHIN A MACHINE LEARNING ACCELERATOR

Publication number: 20210326144

Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.

Type: Application

Filed: June 25, 2021

Publication date: October 21, 2021

Inventors: Arnab Raha, Deepak Mathaikutty, Debabrata Mohapatra, Sang Kyun Kim, Gautham Chinya, Cormac Brick
PERFORMANCE SCALING FOR DATAFLOW DEEP NEURAL NETWORK HARDWARE ACCELERATORS

Publication number: 20210271960

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.

Type: Application

Filed: April 30, 2021

Publication date: September 2, 2021

Inventors: Arnab Raha, Debabrata Mohapatra, Gautham Chinya, Guruguhanathan Venkataramanan, Sang Kyun Kim, Deepak Mathaikutty, Raymond Sung, Cormac Brick
MULTI-BUFFERED REGISTER FILES WITH SHARED ACCESS CIRCUITS

Publication number: 20210117197

Abstract: Systems, apparatuses and methods identify a plurality of registers that are associated with a system-on-chip. The plurality of registers includes a first portion dedicated to write operations and a second portion dedicated to read operations. The technology writes data to the first portion of the plurality of registers, and transfers the data from the first portion to the second portion.

Type: Application

Filed: December 23, 2020

Publication date: April 22, 2021

Applicant: Intel Corporation

Inventors: Steven Hsu, Amit Agarwal, Debabrata Mohapatra, Arnab Raha, Moongon Jung, Gautham Chinya, Ram Krishnamurthy
ACCELERATED LOADING OF UNSTRUCTURED SPARSE DATA IN MACHINE LEARNING ARCHITECTURES

Publication number: 20210042617

Abstract: Systems, apparatuses and methods may provide for technology that identify an assignment of weights of a workload to a plurality of processing elements, where the workload is to be associated with a neural network. The technology generates a representation that is to represent whether each of the weights is a zero value or a non-zero value. The technology further stores the representation into partitions of a storage structure based on the assignment of the weights, where the partitions are each to be dedicated to a different one of the processing elements.

Type: Application

Filed: October 27, 2020

Publication date: February 11, 2021

Inventors: Gautham Chinya, Deepak Mathaikutty, Guruguhanathan Venkataramanan, Debabrata Mohapatra, Moongon Jung, Sang Kyun Kim, Arnab Raha, Cormac Brick
MULTIPLICATION-FREE APPROXIMATION FOR NEURAL NETWORKS AND SPARSE CODING

Publication number: 20210027029

Abstract: Systems, apparatuses and methods may provide for replacing floating point matrix multiplication operations with an approximation algorithm or computation in applications that involve sparse codes and neural networks. The system may replace floating point matrix multiplication operations in sparse code applications and neural network applications with an approximation computation that applies an equivalent number of addition and/or subtraction operations.

Type: Application

Filed: October 12, 2020

Publication date: January 28, 2021

Applicant: Intel Corporation

Inventors: Gautham Chinya, Shihao Ji, Arnab Paul
Schedule-Aware Tensor Distribution Module

Publication number: 20200410327

Abstract: Methods and systems include a neural network system that includes a neural network accelerator comprising. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.

Type: Application

Filed: June 28, 2019

Publication date: December 31, 2020

Inventors: Gautham Chinya, Huichu Liu, Arnab Raha, Debabrata Mohapatra, Cormac Brick, Lance Hacking
Multiplication-free approximation for neural networks and sparse coding

Patent number: 10867142

Abstract: Systems, apparatuses and methods may provide for replacing floating point matrix multiplication operations with an approximation algorithm or computation in applications that involve sparse codes and neural networks. The system may replace floating point matrix multiplication operations in sparse code applications and neural network applications with an approximation computation that applies an equivalent number of addition and/or subtraction operations.

Type: Grant

Filed: June 29, 2016

Date of Patent: December 15, 2020

Assignee: Intel Corporation

Inventors: Gautham Chinya, Shihao Ji, Arnab Paul
METHODS, SYSTEMS, ARTICLES OF MANUFACTURE, AND APPARATUS TO DECODE ZERO-VALUE-COMPRESSION DATA VECTORS

Publication number: 20200228137

Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.

Type: Application

Filed: March 27, 2020

Publication date: July 16, 2020

Inventors: Gautham Chinya, Debabrata Mohapatra, Arnab Raha, Huichu Liu, Cormac Brick
CONFIGURABLE PROCESSOR ELEMENT ARRAYS FOR IMPLEMENTING CONVOLUTIONAL NEURAL NETWORKS

Publication number: 20200134417

Abstract: Example apparatus disclosed herein include an array of processor elements, the array including rows each having a first number of processor elements and columns each having a second number of processor elements. Disclosed example apparatus also include configuration registers to store descriptors to configure the array to implement a layer of a convolutional neural network based on a dataflow schedule corresponding to one of multiple tensor processing templates, ones of the processor elements to be configured based on the descriptors to implement the one of the tensor processing templates to operate on input activation data and filter data associated with the layer of the convolutional neural network to produce output activation data associated with the layer of the convolutional neural network. Disclosed example apparatus further include memory to store the input activation data, the filter data and the output activation data associated with the layer of the convolutional neural network.

Type: Application

Filed: December 24, 2019

Publication date: April 30, 2020

Inventors: Debabrata Mohapatra, Arnab Raha, Gautham Chinya, Huichu Liu, Cormac Brick, Lance Hacking
MULTIPLICATION-FREE APPROXIMATION FOR NEURAL NETWORKS AND SPARSE CODING

Publication number: 20190130148

Abstract: Systems, apparatuses and methods may provide for replacing floating point matrix multiplication operations with an approximation algorithm or computation in applications that involve sparse codes and neural networks. The system may replace floating point matrix multiplication operations in sparse code applications and neural network applications with an approximation computation that applies an equivalent number of addition and/or subtraction operations.

Type: Application

Filed: June 29, 2016

Publication date: May 2, 2019

Inventors: Gautham Chinya, Shihao Ji, Arnab Paul
Mechanism for instruction set based thread execution of a plurality of instruction sequencers

Patent number: 9990206

Abstract: In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.

Type: Grant

Filed: March 15, 2013

Date of Patent: June 5, 2018

Assignee: INTEL CORPORATION

Inventors: Hong Wang, John Shen, Edward Grochowski, Richard Hankins, Gautham Chinya, Bryant Bigbee, Shivnandan Kaushik, Xiang Chris Zou, Per Hammarlund, Scott Dion Rodgers, Xinmin Tian, Anil Aggawal, Prashant Sethi, Baiju Patel, James Held
Apparatus, system, and method for persistent user-level thread

Patent number: 9875102

Abstract: Embodiments of the invention provide a method of creating, based on an operating-system-scheduled thread running on an operating-system-visible sequencer and using an instruction set extension, a persistent user-level thread to run on an operating-system-sequestered sequencer independently of context switch activities on the operating-system-scheduled thread. The operating-system-scheduled thread and the persistent user-level thread may share a common virtual address space. Embodiments of the invention may also provide a method of causing a service thread running on an additional operating-system-visible sequencer to provide operating system services to the persistent user-level thread. Embodiments of the invention may further provide apparatus, system, and machine-readable medium thereof.

Type: Grant

Filed: December 21, 2016

Date of Patent: January 23, 2018

Assignee: Intel Corporation

Inventors: Gautham Chinya, Hong Wang, Prashant Sethi, Shivnandan Kaushik, Bryant Bigbee, John Shen, Richard Hankins, Xiang Zou, Baiju V. Patel, Jason W. Brandt, Anil Aggarwal, John L. Reid

1 2 3 4 next