Patents by Inventor Prakash Sathyanath RAGHAVENDRA

Prakash Sathyanath RAGHAVENDRA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230351187
    Abstract: Systems, methods, and devices for pruning a convolutional neural network (CNN). A subset of layers of the CNN is chosen, and for each layer of the subset of layers, how salient each filter in the layer is to an output of the CNN is determined, a subset of the filters in the layer is determined based on the salience of each filter in the layer, and the subset of filters in the layer is pruned. In some implementations, the layers of the subset of layers of the CNN are non-contiguous. In some implementations, the subset of layers includes odd numbered layers of the CNN and excludes even numbered layers of the CNN. In some implementations, the subset of layers includes even numbered layers of the CNN and excludes odd numbered layers of the CNN.
    Type: Application
    Filed: June 30, 2023
    Publication date: November 2, 2023
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Prakash Sathyanath Raghavendra, Keerthan Shagrithaya
  • Patent number: 11694081
    Abstract: Systems, methods, and devices for pruning a convolutional neural network (CNN). A subset of layers of the CNN is chosen, and for each layer of the subset of layers, how salient each filter in the layer is to an output of the CNN is determined, a subset of the filters in the layer is determined based on the salience of each filter in the layer, and the subset of filters in the layer is pruned. In some implementations, the layers of the subset of layers of the CNN are non-contiguous. In some implementations, the subset of layers includes odd numbered layers of the CNN and excludes even numbered layers of the CNN. In some implementations, the subset of layers includes even numbered layers of the CNN and excludes odd numbered layers of the CNN.
    Type: Grant
    Filed: June 28, 2019
    Date of Patent: July 4, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Prakash Sathyanath Raghavendra, Keerthan Subraya Shagrithaya
  • Publication number: 20210406690
    Abstract: Systems, apparatuses, and methods for implementing one-sided per-kernel clipping and weight transformation for neural networks are disclosed. Various parameters of a neural network are quantized from higher-bit representations to lower-bit representations to reduce memory utilization and power consumption. To exploit the effective range of quantized representations, positively biased weights are clipped and negated before convolution. Then, the results are rescaled back after convolution. A one-sided clipping technique is used for transforming weights to exploit the quantization range effectively, with the side chosen to be clipped being the biased side. This technique uses a global strategy for clipping without requiring skilled expertise. This approach allows the system to retain as much information as possible without losing unnecessary accuracy when quantizing parameters from higher-bit representations to lower-bit representations.
    Type: Application
    Filed: September 25, 2020
    Publication date: December 30, 2021
    Inventors: Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Keerthan S. Shagrithaya, Prakash Sathyanath Raghavendra, Vasanthakumar Rajagopal
  • Publication number: 20210012203
    Abstract: Systems, methods, and devices for increasing inference speed of a trained convolutional neural network (CNN). A first computation speed of first filters having a first filter size in a layer of the CNN is determined, and a second computation speed of second filters having a second filter size in the layer of the CNN is determined. The size of at least one of the first filters is changed to the second filter size if the second computation speed is faster than the first computation speed. In some implementations the CNN is retrained, after changing the size of at least one of the first filters to the second filter size, to generate a retrained CNN. The size of a fewer number of the first filters is changed to the second filter size if a key performance indicator loss of the retrained CNN exceeds a threshold.
    Type: Application
    Filed: July 10, 2019
    Publication date: January 14, 2021
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Abhinav Vishnu, Prakash Sathyanath Raghavendra, Tamer M. Elsharnouby, Rachida Kebichi, Walid Ali, Jonathan Charles Gallmeier
  • Publication number: 20200364573
    Abstract: Systems, methods, and devices for pruning a convolutional neural network (CNN). A subset of layers of the CNN is chosen, and for each layer of the subset of layers, how salient each filter in the layer is to an output of the CNN is determined, a subset of the filters in the layer is determined based on the salience of each filter in the layer, and the subset of filters in the layer is pruned. In some implementations, the layers of the subset of layers of the CNN are non-contiguous. In some implementations, the subset of layers includes odd numbered layers of the CNN and excludes even numbered layers of the CNN. In some implementations, the subset of layers includes even numbered layers of the CNN and excludes odd numbered layers of the CNN.
    Type: Application
    Filed: June 28, 2019
    Publication date: November 19, 2020
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Prakash Sathyanath Raghavendra, Keerthan Subraya Shagrithaya
  • Patent number: 10180826
    Abstract: A compiler generates transfer functions for blocks of a program during compilation of the program. The transfer functions estimate bit widths of variables in the blocks based on numbers of bits needed to carry out at least one instruction in the blocks and whether the variables are live in the blocks. For example, a transfer function may return a number indicating how many bits of a variable are needed to execute a current instruction as a function of the number of bits of the variable used by the program in subsequent instructions. Numbers of bits to represent the variables in the compiled program based on the transfer functions.
    Type: Grant
    Filed: October 22, 2015
    Date of Patent: January 15, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Prakash Sathyanath Raghavendra, Dibyendu Das, Arun Rangasamy
  • Publication number: 20170115970
    Abstract: A compiler generates transfer functions for blocks of a program during compilation of the program. The transfer functions estimate bit widths of variables in the blocks based on numbers of bits needed to carry out at least one instruction in the blocks and whether the variables are live in the blocks. For example, a transfer function may return a number indicating how many bits of a variable are needed to execute a current instruction as a function of the number of bits of the variable used by the program in subsequent instructions. Numbers of bits to represent the variables in the compiled program based on the transfer functions.
    Type: Application
    Filed: October 22, 2015
    Publication date: April 27, 2017
    Inventors: Prakash Sathyanath Raghavendra, Dibyendu Das, Arun Rangasamy
  • Patent number: 8495662
    Abstract: A system and method for improving run-time performance of applications with multithreaded and single threaded routines that are linked with libpthreads library is disclosed. In one embodiment, a method for running a mixed ST/MT application program linked with libpthreads library including creating an interceptor library containing pthread application programming interface (pthread_API) call interceptors and loading the interceptor library into the mixed ST/MT application program, and running the mixed ST/MT application program by using light weight (LW) and heavy weight (HW) synchronization routines based on determining switchovers between ST and MT program modes, respectively, during run-time using the interceptor library.
    Type: Grant
    Filed: September 23, 2008
    Date of Patent: July 23, 2013
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Sandya Srivilliputtur Mannarswamy, Sujoy Saraswati, Prakash Sathyanath Raghavendra
  • Patent number: 8453132
    Abstract: A technique for reducing non-local access, in dynamically generated code that resides in a code buffer of a non-uniform memory access computer system including multiple nodes, for improving overall performance of dynamic optimization systems. In one example embodiment, this is accomplished by partitioning the code buffer into multiple smaller code buffers and assigning each of the multiple smaller code buffers to one of the multiple nodes. Statically determining which methods in the generated code are executed by a thread and then to place those methods in associated one of the multiple smaller code buffers to reduce memory latencies introduced by non-local accesses.
    Type: Grant
    Filed: June 20, 2007
    Date of Patent: May 28, 2013
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Sandya S. Mannarswamy, Virendra Kumar Mehta, Prakash Sathyanath Raghavendra
  • Publication number: 20100037242
    Abstract: A system and method for improving run-time performance of applications with multithreaded and single threaded routines that are linked with libpthreads library is disclosed. In one embodiment, a method for running a mixed ST/MT application program linked with libpthreads library including creating an interceptor library containing pthread application programming interface (pthread_API) call interceptors and loading the interceptor library into the mixed ST/MT application program, and running the mixed ST/MT application program by using light weight (LW) and heavy weight (HW) synchronization routines based on determining switchovers between ST and MT program modes, respectively, during run-time using the interceptor library.
    Type: Application
    Filed: September 23, 2008
    Publication date: February 11, 2010
    Inventors: Sandya Srivilliputtur MANNARSWAMY, Sujoy SARASWATI, Prakash Sathyanath RAGHAVENDRA
  • Publication number: 20080028179
    Abstract: A technique for reducing non-local access, in dynamically generated code that resides in a code buffer of a NUMA computer system including multiple nodes, for improving overall performance of dynamic optimization systems. In one example embodiment, this is accomplished by partitioning the code buffer into multiple smaller code buffers and assigning each of the multiple smaller code buffers to one of the multiple nodes. Statically determining which methods in the generated code are executed by a thread and then to place those methods in associated one of the multiple smaller code buffers to reduce memory latencies introduced by non-local accesses.
    Type: Application
    Filed: June 20, 2007
    Publication date: January 31, 2008
    Inventors: Sandya S. Mannarswamy, Virendra Kumar Mehta, Prakash Sathyanath Raghavendra