Patents by Inventor Karamvir CHATHA

Karamvir CHATHA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Adaptive quantization for execution of machine learning models

Patent number: 11861467

Abstract: Certain aspects of the present disclosure provide techniques for adaptively executing machine learning models on a computing device. An example method generally includes receiving weight information for a machine learning model to be executed on a computing device. The received weight information is reduced into quantized weight information having a reduced bit size relative to the received weight information. First inferences using the machine learning model and the received weight information, and second inferences are performed using the machine learning model and the quantized weight information. Results of the first and second inferences are compared, it is determined that results of the second inferences are within a threshold performance level of results of the first inferences, and based on the determination, one or more subsequent inferences are performed using the machine learning model and the quantized weight information.

Type: Grant

Filed: March 5, 2020

Date of Patent: January 2, 2024

Assignee: QUALCOMM Incorporated

Inventors: Serag Gadelrab, Karamvir Chatha, Ofer Rosenberg
System and method for decoupling operations to accelerate processing of loop structures

Patent number: 11614941

Abstract: An apparatus for hardware acceleration for use in operating a computational network is configured for determining that a loop structure including one or more loops is to be executed by a first processor. Each of the one or more loops includes a set of operations. The loop structure may be configured as a nested loop, a cascaded or a combination of the two. A second processor may be configured to decouple overhead operations of the loop structure from compute operations of the loop structure. The apparatus accelerates processing of the loop structure by simultaneously processing the overhead operations using the second processor separately from processing the compute operations based on the configuration to operate the computational network.

Type: Grant

Filed: March 30, 2018

Date of Patent: March 28, 2023

Assignee: QUALCOMM Incorporated

Inventors: Amrit Panda, Francisco Perez, Karamvir Chatha
ADAPTIVE QUANTIZATION FOR EXECUTION OF MACHINE LEARNING MODELS

Publication number: 20210279635

Abstract: Certain aspects of the present disclosure provide techniques for adaptively executing machine learning models on a computing device. An example method generally includes receiving weight information for a machine learning model to be executed on a computing device. The received weight information is reduced into quantized weight information having a reduced bit size relative to the received weight information. First inferences using the machine learning model and the received weight information, and second inferences are performed using the machine learning model and the quantized weight information. Results of the first and second inferences are compared, it is determined that results of the second inferences are within a threshold performance level of results of the first inferences, and based on the determination, one or more subsequent inferences are performed using the machine learning model and the quantized weight information.

Type: Application

Filed: March 5, 2020

Publication date: September 9, 2021

Inventors: Serag GADELRAB, Karamvir CHATHA, Ofer ROSENBERG
Architecture for sparse neural network acceleration

Patent number: 10871964

Abstract: A method, a computer-readable medium, and an apparatus for a sparse neural network are provided. The apparatus may include a hardware accelerator. The apparatus may determine, for each pair of operands to be processed by a MAR unit, whether both operands of the pair are non-zero. The apparatus may prevent a pair of operands to be processed by the MAR unit from being loaded to a multiplier of the MAR unit when an operand of the pair of operands is zero. The apparatus may place the pair of operands into one of a plurality of queues when both operands of the pair of operands are non-zero.

Type: Grant

Filed: December 29, 2016

Date of Patent: December 22, 2020

Assignee: Qualcomm Incorporated

Inventors: Yatish Girish Turakhia, Javid Jaffari, Amrit Panda, Karamvir Chatha
INSTRUCTION SET FOR MINIMIZING CONTROL VARIANCE OVERHEAD IN DATAFLOW ARCHITECTURES

Publication number: 20200089497

Abstract: Systems and methods for of minimizing control variance overhead in a dataflow processor include receiving a generating instruction specifying at least an acknowledge predicate based on a first number, a second number, and a first value, wherein a true branch comprises the first number of consumer instructions of the generating instruction based on the first value, used as a first predicate, being true; and a false branch comprises a second number of consumer instructions of the generating instruction based on the first value, used as the first predicate, being false. The acknowledge predicate is evaluated to be a selected number, which is the first number if the first value is true, or the second number if the first value is false. The generating instruction is fired upon the selected number of acknowledge arcs being received from the true branch or the false branch.

Type: Application

Filed: September 18, 2018

Publication date: March 19, 2020

Inventors: Rakesh KOMURAVELLI, Amin ANSARI, Ramesh Chandra CHAUHAN, Karamvir CHATHA
ZERO OVERHEAD LOOP EXECUTION IN DEEP LEARNING ACCELERATORS

Publication number: 20190303156

Abstract: An apparatus for hardware acceleration for use in operating a computational network is configured for determining that a loop structure including one or more loops is to be executed by a first processor. Each of the one or more loops includes a set of operations. The loop structure may be configured as a nested loop, a cascaded or a combination of the two. A second processor may be configured to decouple overhead operations of the loop structure from compute operations of the loop structure. The apparatus accelerates processing of the loop structure by simultaneously processing the overhead operations using the second processor separately from processing the compute operations based on the configuration to operate the computational network.

Type: Application

Filed: March 30, 2018

Publication date: October 3, 2019

Inventors: Amrit PANDA, Francisco PEREZ, Karamvir CHATHA
Approximation of non-linear functions in fixed point using look-up tables

Patent number: 10037306

Abstract: Computing a non-linear function ƒ(x) in hardware or embedded systems can be complex and resource intensive. In one or more aspects of the disclosure, a method, a computer-readable medium, and an apparatus are provided for computing a non-linear function ƒ(x) accurately and efficiently in hardware using look-up tables (LUTs) and interpolation or extrapolation. The apparatus may be a processor. The processor computes a non-linear function ƒ(x) for an input variable x, where ƒ(x)=g(y(x),z(x)). The processor determines an integer n by determining a position of a most significant bit (MSB) of an input variable x. In addition, the processor determines a value for y(x) based on a first look-up table and the determined integer n. Also, the processor determines a value for z(x) based on n and the input variable x, and based on a second look-up table. Further, the processor computes ƒ(x) based on the determined values for y(x) and z(x).

Type: Grant

Filed: September 1, 2016

Date of Patent: July 31, 2018

Assignee: QUALCOMM Incorporated

Inventors: Dexu Lin, Edward Liao, Somdeb Majumdar, Aaron Lamb, Karamvir Chatha
ARCHITECTURE FOR SPARSE NEURAL NETWORK ACCELERATION

Publication number: 20180189056

Abstract: A method, a computer-readable medium, and an apparatus for a sparse neural network are provided. The apparatus may include a hardware accelerator. The apparatus may determine, for each pair of operands to be processed by a MAR unit, whether both operands of the pair are non-zero. The apparatus may prevent a pair of operands to be processed by the MAR unit from being loaded to a multiplier of the MAR unit when an operand of the pair of operands is zero. The apparatus may place the pair of operands into one of a plurality of queues when both operands of the pair of operands are non-zero.

Type: Application

Filed: December 29, 2016

Publication date: July 5, 2018

Inventors: Yatish Girish TURAKHIA, Javid JAFFARI, Amrit PANDA, Karamvir CHATHA
LOW-POWER ARCHITECTURE FOR SPARSE NEURAL NETWORK

Publication number: 20180164866

Abstract: A method, a computer-readable medium, and an apparatus for reducing power consumption of a neural network are provided. The apparatus may retrieve, from a tag storage, at least one tag value of a first tag value for a weight in the neural network or a second tag value for an activation in the neural network. The first tag value may indicate whether the weight is zero and the second tag value may indicate whether the activation is zero. The weight and the activation are to be loaded to a multiplier of a multiplier-accumulator unit as a pair of operands. The apparatus may determine whether the at least one tag value indicates a zero value. The apparatus may disable loading the weight and the activation to the multiplier when the at least one tag value indicates a zero value. The apparatus may disable updating of zero-value activations.

Type: Application

Filed: December 13, 2016

Publication date: June 14, 2018

Inventors: Yatish Girish TURAKHIA, Javid JAFFARI, Amrit PANDA, Karamvir CHATHA
APPROXIMATION OF NON-LINEAR FUNCTIONS IN FIXED POINT USING LOOK-UP TABLES

Publication number: 20180060278

Abstract: Computing a non-linear function ƒ(x) in hardware or embedded systems can be complex and resource intensive. In one or more aspects of the disclosure, a method, a computer-readable medium, and an apparatus are provided for computing a non-linear function ƒ(x) accurately and efficiently in hardware using look-up tables (LUTs) and interpolation or extrapolation. The apparatus may be a processor. The processor computes a non-linear function ƒ(x) for an input variable x, where ƒ(x)=g(y(x),z(x)). The processor determines an integer n by determining a position of a most significant bit (MSB) of an input variable x. In addition, the processor determines a value for y(x) based on a first look-up table and the determined integer n. Also, the processor determines a value for z(x) based on n and the input variable x, and based on a second look-up table. Further, the processor computes ƒ(x) based on the determined values for y(x) and z(x).

Type: Application

Filed: September 1, 2016

Publication date: March 1, 2018

Inventors: Dexu LIN, Edward LIAO, Somdeb MAJUMDAR, Aaron LAMB, Karamvir CHATHA