Patents by Inventor Prakash Sathyanath RAGHAVENDRA

Prakash Sathyanath RAGHAVENDRA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

QUANTIZATION-AWARE TRAINING WITH NUMERICAL OVERFLOW AVOIDANCE FOR NEURAL NETWORKS

Publication number: 20240193413

Abstract: An apparatus and method for efficiently creating less computationally intensive nodes for a neural network. In various implementations, a computing system includes a memory that stores multiple input data values for training a neural network, and a processor. Rather than determine a bit width P of an integer accumulator of a node of the neural network based on bit widths of the input data values and corresponding weight values, the processor selects the bit width P during training. The processor adjusts the magnitudes of the weight values during iterative stages of training the node such that an L1 norm value of the weight values of the node does not exceed a corresponding weight magnitude limit.

Type: Application

Filed: December 13, 2022

Publication date: June 13, 2024

Inventors: Ian Charles Colbert, Mehdi Saeedi, Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Gabor Sines, Prakash Sathyanath Raghavendra, Alessandro Pappalardo
ACCELERATING NEURAL NETWORKS WITH ONE SHOT SKIP LAYER PRUNING

Publication number: 20230351187

Abstract: Systems, methods, and devices for pruning a convolutional neural network (CNN). A subset of layers of the CNN is chosen, and for each layer of the subset of layers, how salient each filter in the layer is to an output of the CNN is determined, a subset of the filters in the layer is determined based on the salience of each filter in the layer, and the subset of filters in the layer is pruned. In some implementations, the layers of the subset of layers of the CNN are non-contiguous. In some implementations, the subset of layers includes odd numbered layers of the CNN and excludes even numbered layers of the CNN. In some implementations, the subset of layers includes even numbered layers of the CNN and excludes odd numbered layers of the CNN.

Type: Application

Filed: June 30, 2023

Publication date: November 2, 2023

Applicant: Advanced Micro Devices, Inc.

Inventors: Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Prakash Sathyanath Raghavendra, Keerthan Shagrithaya
Accelerating neural networks with one shot skip layer pruning

Patent number: 11694081

Abstract: Systems, methods, and devices for pruning a convolutional neural network (CNN). A subset of layers of the CNN is chosen, and for each layer of the subset of layers, how salient each filter in the layer is to an output of the CNN is determined, a subset of the filters in the layer is determined based on the salience of each filter in the layer, and the subset of filters in the layer is pruned. In some implementations, the layers of the subset of layers of the CNN are non-contiguous. In some implementations, the subset of layers includes odd numbered layers of the CNN and excludes even numbered layers of the CNN. In some implementations, the subset of layers includes even numbered layers of the CNN and excludes odd numbered layers of the CNN.

Type: Grant

Filed: June 28, 2019

Date of Patent: July 4, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Prakash Sathyanath Raghavendra, Keerthan Subraya Shagrithaya
EFFICIENT WEIGHT CLIPPING FOR NEURAL NETWORKS

Publication number: 20210406690

Abstract: Systems, apparatuses, and methods for implementing one-sided per-kernel clipping and weight transformation for neural networks are disclosed. Various parameters of a neural network are quantized from higher-bit representations to lower-bit representations to reduce memory utilization and power consumption. To exploit the effective range of quantized representations, positively biased weights are clipped and negated before convolution. Then, the results are rescaled back after convolution. A one-sided clipping technique is used for transforming weights to exploit the quantization range effectively, with the side chosen to be clipped being the biased side. This technique uses a global strategy for clipping without requiring skilled expertise. This approach allows the system to retain as much information as possible without losing unnecessary accuracy when quantizing parameters from higher-bit representations to lower-bit representations.

Type: Application

Filed: September 25, 2020

Publication date: December 30, 2021

Inventors: Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Keerthan S. Shagrithaya, Prakash Sathyanath Raghavendra, Vasanthakumar Rajagopal
ADAPTIVE FILTER REPLACEMENT IN CONVOLUTIONAL NEURAL NETWORKS

Publication number: 20210012203

Abstract: Systems, methods, and devices for increasing inference speed of a trained convolutional neural network (CNN). A first computation speed of first filters having a first filter size in a layer of the CNN is determined, and a second computation speed of second filters having a second filter size in the layer of the CNN is determined. The size of at least one of the first filters is changed to the second filter size if the second computation speed is faster than the first computation speed. In some implementations the CNN is retrained, after changing the size of at least one of the first filters to the second filter size, to generate a retrained CNN. The size of a fewer number of the first filters is changed to the second filter size if a key performance indicator loss of the retrained CNN exceeds a threshold.

Type: Application

Filed: July 10, 2019

Publication date: January 14, 2021

Applicant: Advanced Micro Devices, Inc.

Inventors: Abhinav Vishnu, Prakash Sathyanath Raghavendra, Tamer M. Elsharnouby, Rachida Kebichi, Walid Ali, Jonathan Charles Gallmeier
ACCELERATING NEURAL NETWORKS WITH ONE SHOT SKIP LAYER PRUNING

Publication number: 20200364573

Abstract: Systems, methods, and devices for pruning a convolutional neural network (CNN). A subset of layers of the CNN is chosen, and for each layer of the subset of layers, how salient each filter in the layer is to an output of the CNN is determined, a subset of the filters in the layer is determined based on the salience of each filter in the layer, and the subset of filters in the layer is pruned. In some implementations, the layers of the subset of layers of the CNN are non-contiguous. In some implementations, the subset of layers includes odd numbered layers of the CNN and excludes even numbered layers of the CNN. In some implementations, the subset of layers includes even numbered layers of the CNN and excludes odd numbered layers of the CNN.

Type: Application

Filed: June 28, 2019

Publication date: November 19, 2020

Applicant: Advanced Micro Devices, Inc.

Inventors: Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Prakash Sathyanath Raghavendra, Keerthan Subraya Shagrithaya
Estimation of bit widths of variables based on liveness

Patent number: 10180826

Abstract: A compiler generates transfer functions for blocks of a program during compilation of the program. The transfer functions estimate bit widths of variables in the blocks based on numbers of bits needed to carry out at least one instruction in the blocks and whether the variables are live in the blocks. For example, a transfer function may return a number indicating how many bits of a variable are needed to execute a current instruction as a function of the number of bits of the variable used by the program in subsequent instructions. Numbers of bits to represent the variables in the compiled program based on the transfer functions.

Type: Grant

Filed: October 22, 2015

Date of Patent: January 15, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Prakash Sathyanath Raghavendra, Dibyendu Das, Arun Rangasamy
ESTIMATION OF BIT WIDTHS OF VARIABLES BASED ON LIVENESS

Publication number: 20170115970

Abstract: A compiler generates transfer functions for blocks of a program during compilation of the program. The transfer functions estimate bit widths of variables in the blocks based on numbers of bits needed to carry out at least one instruction in the blocks and whether the variables are live in the blocks. For example, a transfer function may return a number indicating how many bits of a variable are needed to execute a current instruction as a function of the number of bits of the variable used by the program in subsequent instructions. Numbers of bits to represent the variables in the compiled program based on the transfer functions.

Type: Application

Filed: October 22, 2015

Publication date: April 27, 2017

Inventors: Prakash Sathyanath Raghavendra, Dibyendu Das, Arun Rangasamy
System and method for improving run-time performance of applications with multithreaded and single threaded routines

Patent number: 8495662

Abstract: A system and method for improving run-time performance of applications with multithreaded and single threaded routines that are linked with libpthreads library is disclosed. In one embodiment, a method for running a mixed ST/MT application program linked with libpthreads library including creating an interceptor library containing pthread application programming interface (pthread_API) call interceptors and loading the interceptor library into the mixed ST/MT application program, and running the mixed ST/MT application program by using light weight (LW) and heavy weight (HW) synchronization routines based on determining switchovers between ST and MT program modes, respectively, during run-time using the interceptor library.

Type: Grant

Filed: September 23, 2008

Date of Patent: July 23, 2013

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Sandya Srivilliputtur Mannarswamy, Sujoy Saraswati, Prakash Sathyanath Raghavendra
System and method for recompiling code based on locality domain and thread affinity in NUMA computer systems

Patent number: 8453132

Abstract: A technique for reducing non-local access, in dynamically generated code that resides in a code buffer of a non-uniform memory access computer system including multiple nodes, for improving overall performance of dynamic optimization systems. In one example embodiment, this is accomplished by partitioning the code buffer into multiple smaller code buffers and assigning each of the multiple smaller code buffers to one of the multiple nodes. Statically determining which methods in the generated code are executed by a thread and then to place those methods in associated one of the multiple smaller code buffers to reduce memory latencies introduced by non-local accesses.

Type: Grant

Filed: June 20, 2007

Date of Patent: May 28, 2013

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Sandya S. Mannarswamy, Virendra Kumar Mehta, Prakash Sathyanath Raghavendra
SYSTEM AND METHOD FOR IMPROVING RUN-TIME PERFORMANCE OF APPLICATIONS WITH MULTITHREADED AND SINGLE THREADED ROUTINES

Publication number: 20100037242

Abstract: A system and method for improving run-time performance of applications with multithreaded and single threaded routines that are linked with libpthreads library is disclosed. In one embodiment, a method for running a mixed ST/MT application program linked with libpthreads library including creating an interceptor library containing pthread application programming interface (pthread_API) call interceptors and loading the interceptor library into the mixed ST/MT application program, and running the mixed ST/MT application program by using light weight (LW) and heavy weight (HW) synchronization routines based on determining switchovers between ST and MT program modes, respectively, during run-time using the interceptor library.

Type: Application

Filed: September 23, 2008

Publication date: February 11, 2010

Inventors: Sandya Srivilliputtur MANNARSWAMY, Sujoy SARASWATI, Prakash Sathyanath RAGHAVENDRA
System and method for recompiling code based on locality domain and thread affinity in NUMA computer systems

Publication number: 20080028179

Abstract: A technique for reducing non-local access, in dynamically generated code that resides in a code buffer of a NUMA computer system including multiple nodes, for improving overall performance of dynamic optimization systems. In one example embodiment, this is accomplished by partitioning the code buffer into multiple smaller code buffers and assigning each of the multiple smaller code buffers to one of the multiple nodes. Statically determining which methods in the generated code are executed by a thread and then to place those methods in associated one of the multiple smaller code buffers to reduce memory latencies introduced by non-local accesses.

Type: Application

Filed: June 20, 2007

Publication date: January 31, 2008

Inventors: Sandya S. Mannarswamy, Virendra Kumar Mehta, Prakash Sathyanath Raghavendra