Patents by Inventor Prakash Sathyanath RAGHAVENDRA
Prakash Sathyanath RAGHAVENDRA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240193413Abstract: An apparatus and method for efficiently creating less computationally intensive nodes for a neural network. In various implementations, a computing system includes a memory that stores multiple input data values for training a neural network, and a processor. Rather than determine a bit width P of an integer accumulator of a node of the neural network based on bit widths of the input data values and corresponding weight values, the processor selects the bit width P during training. The processor adjusts the magnitudes of the weight values during iterative stages of training the node such that an L1 norm value of the weight values of the node does not exceed a corresponding weight magnitude limit.Type: ApplicationFiled: December 13, 2022Publication date: June 13, 2024Inventors: Ian Charles Colbert, Mehdi Saeedi, Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Gabor Sines, Prakash Sathyanath Raghavendra, Alessandro Pappalardo
-
Publication number: 20230351187Abstract: Systems, methods, and devices for pruning a convolutional neural network (CNN). A subset of layers of the CNN is chosen, and for each layer of the subset of layers, how salient each filter in the layer is to an output of the CNN is determined, a subset of the filters in the layer is determined based on the salience of each filter in the layer, and the subset of filters in the layer is pruned. In some implementations, the layers of the subset of layers of the CNN are non-contiguous. In some implementations, the subset of layers includes odd numbered layers of the CNN and excludes even numbered layers of the CNN. In some implementations, the subset of layers includes even numbered layers of the CNN and excludes odd numbered layers of the CNN.Type: ApplicationFiled: June 30, 2023Publication date: November 2, 2023Applicant: Advanced Micro Devices, Inc.Inventors: Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Prakash Sathyanath Raghavendra, Keerthan Shagrithaya
-
Patent number: 11694081Abstract: Systems, methods, and devices for pruning a convolutional neural network (CNN). A subset of layers of the CNN is chosen, and for each layer of the subset of layers, how salient each filter in the layer is to an output of the CNN is determined, a subset of the filters in the layer is determined based on the salience of each filter in the layer, and the subset of filters in the layer is pruned. In some implementations, the layers of the subset of layers of the CNN are non-contiguous. In some implementations, the subset of layers includes odd numbered layers of the CNN and excludes even numbered layers of the CNN. In some implementations, the subset of layers includes even numbered layers of the CNN and excludes odd numbered layers of the CNN.Type: GrantFiled: June 28, 2019Date of Patent: July 4, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Prakash Sathyanath Raghavendra, Keerthan Subraya Shagrithaya
-
Publication number: 20210406690Abstract: Systems, apparatuses, and methods for implementing one-sided per-kernel clipping and weight transformation for neural networks are disclosed. Various parameters of a neural network are quantized from higher-bit representations to lower-bit representations to reduce memory utilization and power consumption. To exploit the effective range of quantized representations, positively biased weights are clipped and negated before convolution. Then, the results are rescaled back after convolution. A one-sided clipping technique is used for transforming weights to exploit the quantization range effectively, with the side chosen to be clipped being the biased side. This technique uses a global strategy for clipping without requiring skilled expertise. This approach allows the system to retain as much information as possible without losing unnecessary accuracy when quantizing parameters from higher-bit representations to lower-bit representations.Type: ApplicationFiled: September 25, 2020Publication date: December 30, 2021Inventors: Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Keerthan S. Shagrithaya, Prakash Sathyanath Raghavendra, Vasanthakumar Rajagopal
-
Publication number: 20210012203Abstract: Systems, methods, and devices for increasing inference speed of a trained convolutional neural network (CNN). A first computation speed of first filters having a first filter size in a layer of the CNN is determined, and a second computation speed of second filters having a second filter size in the layer of the CNN is determined. The size of at least one of the first filters is changed to the second filter size if the second computation speed is faster than the first computation speed. In some implementations the CNN is retrained, after changing the size of at least one of the first filters to the second filter size, to generate a retrained CNN. The size of a fewer number of the first filters is changed to the second filter size if a key performance indicator loss of the retrained CNN exceeds a threshold.Type: ApplicationFiled: July 10, 2019Publication date: January 14, 2021Applicant: Advanced Micro Devices, Inc.Inventors: Abhinav Vishnu, Prakash Sathyanath Raghavendra, Tamer M. Elsharnouby, Rachida Kebichi, Walid Ali, Jonathan Charles Gallmeier
-
Publication number: 20200364573Abstract: Systems, methods, and devices for pruning a convolutional neural network (CNN). A subset of layers of the CNN is chosen, and for each layer of the subset of layers, how salient each filter in the layer is to an output of the CNN is determined, a subset of the filters in the layer is determined based on the salience of each filter in the layer, and the subset of filters in the layer is pruned. In some implementations, the layers of the subset of layers of the CNN are non-contiguous. In some implementations, the subset of layers includes odd numbered layers of the CNN and excludes even numbered layers of the CNN. In some implementations, the subset of layers includes even numbered layers of the CNN and excludes odd numbered layers of the CNN.Type: ApplicationFiled: June 28, 2019Publication date: November 19, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Prakash Sathyanath Raghavendra, Keerthan Subraya Shagrithaya
-
Patent number: 10180826Abstract: A compiler generates transfer functions for blocks of a program during compilation of the program. The transfer functions estimate bit widths of variables in the blocks based on numbers of bits needed to carry out at least one instruction in the blocks and whether the variables are live in the blocks. For example, a transfer function may return a number indicating how many bits of a variable are needed to execute a current instruction as a function of the number of bits of the variable used by the program in subsequent instructions. Numbers of bits to represent the variables in the compiled program based on the transfer functions.Type: GrantFiled: October 22, 2015Date of Patent: January 15, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Prakash Sathyanath Raghavendra, Dibyendu Das, Arun Rangasamy
-
Publication number: 20170115970Abstract: A compiler generates transfer functions for blocks of a program during compilation of the program. The transfer functions estimate bit widths of variables in the blocks based on numbers of bits needed to carry out at least one instruction in the blocks and whether the variables are live in the blocks. For example, a transfer function may return a number indicating how many bits of a variable are needed to execute a current instruction as a function of the number of bits of the variable used by the program in subsequent instructions. Numbers of bits to represent the variables in the compiled program based on the transfer functions.Type: ApplicationFiled: October 22, 2015Publication date: April 27, 2017Inventors: Prakash Sathyanath Raghavendra, Dibyendu Das, Arun Rangasamy
-
Patent number: 8495662Abstract: A system and method for improving run-time performance of applications with multithreaded and single threaded routines that are linked with libpthreads library is disclosed. In one embodiment, a method for running a mixed ST/MT application program linked with libpthreads library including creating an interceptor library containing pthread application programming interface (pthread_API) call interceptors and loading the interceptor library into the mixed ST/MT application program, and running the mixed ST/MT application program by using light weight (LW) and heavy weight (HW) synchronization routines based on determining switchovers between ST and MT program modes, respectively, during run-time using the interceptor library.Type: GrantFiled: September 23, 2008Date of Patent: July 23, 2013Assignee: Hewlett-Packard Development Company, L.P.Inventors: Sandya Srivilliputtur Mannarswamy, Sujoy Saraswati, Prakash Sathyanath Raghavendra
-
Patent number: 8453132Abstract: A technique for reducing non-local access, in dynamically generated code that resides in a code buffer of a non-uniform memory access computer system including multiple nodes, for improving overall performance of dynamic optimization systems. In one example embodiment, this is accomplished by partitioning the code buffer into multiple smaller code buffers and assigning each of the multiple smaller code buffers to one of the multiple nodes. Statically determining which methods in the generated code are executed by a thread and then to place those methods in associated one of the multiple smaller code buffers to reduce memory latencies introduced by non-local accesses.Type: GrantFiled: June 20, 2007Date of Patent: May 28, 2013Assignee: Hewlett-Packard Development Company, L.P.Inventors: Sandya S. Mannarswamy, Virendra Kumar Mehta, Prakash Sathyanath Raghavendra
-
Publication number: 20100037242Abstract: A system and method for improving run-time performance of applications with multithreaded and single threaded routines that are linked with libpthreads library is disclosed. In one embodiment, a method for running a mixed ST/MT application program linked with libpthreads library including creating an interceptor library containing pthread application programming interface (pthread_API) call interceptors and loading the interceptor library into the mixed ST/MT application program, and running the mixed ST/MT application program by using light weight (LW) and heavy weight (HW) synchronization routines based on determining switchovers between ST and MT program modes, respectively, during run-time using the interceptor library.Type: ApplicationFiled: September 23, 2008Publication date: February 11, 2010Inventors: Sandya Srivilliputtur MANNARSWAMY, Sujoy SARASWATI, Prakash Sathyanath RAGHAVENDRA
-
Publication number: 20080028179Abstract: A technique for reducing non-local access, in dynamically generated code that resides in a code buffer of a NUMA computer system including multiple nodes, for improving overall performance of dynamic optimization systems. In one example embodiment, this is accomplished by partitioning the code buffer into multiple smaller code buffers and assigning each of the multiple smaller code buffers to one of the multiple nodes. Statically determining which methods in the generated code are executed by a thread and then to place those methods in associated one of the multiple smaller code buffers to reduce memory latencies introduced by non-local accesses.Type: ApplicationFiled: June 20, 2007Publication date: January 31, 2008Inventors: Sandya S. Mannarswamy, Virendra Kumar Mehta, Prakash Sathyanath Raghavendra