Patents by Inventor ABHINAV VISHNU

ABHINAV VISHNU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230394311
    Abstract: A system comprising an electronic device that includes a processor is described. During operation, the processor acquires a full version of a neural network, the neural network including internal elements for processing instances of input image data having a set of color channels. The processor then generates, from the neural network, a set of sub-networks, each sub-network being a separate copy of the neural network with the internal elements for processing at least one of the color channels in instances of input image data removed, so that each sub-network is configured for processing a different set of one or more color channels in instances of input image data. The processor next provides the sub-networks for processing instances of input image data—and may itself use the sub-networks for processing instances of input image data.
    Type: Application
    Filed: August 22, 2023
    Publication date: December 7, 2023
    Inventors: Sudhanva Gurumurthi, Abhinav Vishnu
  • Patent number: 11775799
    Abstract: Systems, apparatuses, and methods for managing buffers in a neural network implementation with heterogeneous memory are disclosed. A system includes a neural network coupled to a first memory and a second memory. The first memory is a relatively low-capacity, high-bandwidth memory while the second memory is a relatively high-capacity, low-bandwidth memory. During a forward propagation pass of the neural network, a run-time manager monitors the usage of the buffers for the various layers of the neural network. During a backward propagation pass of the neural network, the run-time manager determines how to move the buffers between the first and second memories based on the monitored buffer usage during the forward propagation pass. As a result, the run-time manager is able to reduce memory access latency for the layers of the neural network during the backward propagation pass.
    Type: Grant
    Filed: November 19, 2018
    Date of Patent: October 3, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Georgios Mappouras, Amin Farmahini-Farahani, Sudhanva Gurumurthi, Abhinav Vishnu, Gabriel H. Loh
  • Patent number: 11763155
    Abstract: A system comprising an electronic device that includes a processor is described. During operation, the processor acquires a full version of a neural network, the neural network including internal elements for processing instances of input image data having a set of color channels. The processor then generates, from the neural network, a set of sub-networks, each sub-network being a separate copy of the neural network with the internal elements for processing at least one of the color channels in instances of input image data removed, so that each sub-network is configured for processing a different set of one or more color channels in instances of input image data. The processor next provides the sub-networks for processing instances of input image data—and may itself use the sub-networks for processing instances of input image data.
    Type: Grant
    Filed: August 12, 2019
    Date of Patent: September 19, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sudhanva Gurumurthi, Abhinav Vishnu
  • Publication number: 20230196430
    Abstract: An electronic device includes multiple nodes. Each node generates compressed lookup data to be used for processing instances of input data through a model using input index vectors from a compressed set of input index vectors for each part among multiple parts of a respective set of input index vectors. Each node then communicates compressed lookup data for a respective part to each other node.
    Type: Application
    Filed: December 22, 2021
    Publication date: June 22, 2023
    Inventors: Sarunya Pumma, Abhinav Vishnu
  • Patent number: 11669473
    Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.
    Type: Grant
    Filed: September 25, 2020
    Date of Patent: June 6, 2023
    Inventors: Abhinav Vishnu, Joseph Lee Greathouse
  • Patent number: 11620525
    Abstract: A heterogeneous processing system includes at least one central processing unit (CPU) core and at least one graphics processing unit (GPU) core. The heterogeneous processing system is configured to compute an activation for each one of a plurality of neurons for a first network layer of a neural network. The heterogeneous processing system randomly drops a first subset of the plurality of neurons for the first network layer and keeps a second subset of the plurality of neurons for the first network layer. Activation for each one of the second subset of the plurality of neurons is forwarded to the CPU core and coalesced to generate a set of coalesced activation sub-matrices.
    Type: Grant
    Filed: September 25, 2018
    Date of Patent: April 4, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Abhinav Vishnu
  • Patent number: 11436060
    Abstract: Systems, apparatuses, and methods for proactively managing inter-processor network links are disclosed. A computing system includes at least a control unit and a plurality of processing units. Each processing unit of the plurality of processing units includes a compute module and a configurable link interface. The control unit dynamically adjusts a clock frequency and a link width of the configurable link interface of each processing unit based on a data transfer size and layer computation time of a plurality of layers of a neural network so as to reduce execution time of each layer. By adjusting the clock frequency and the link width of the link interface on a per-layer basis, the overlapping of communication and computation phases is closely matched, allowing layers to complete more quickly.
    Type: Grant
    Filed: August 27, 2019
    Date of Patent: September 6, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Karthik Rao, Abhinav Vishnu
  • Publication number: 20210406209
    Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.
    Type: Application
    Filed: September 25, 2020
    Publication date: December 30, 2021
    Inventors: Abhinav Vishnu, Joseph Lee Greathouse
  • Publication number: 20210319312
    Abstract: Values of physical variables that represent a first state of a first physical system are estimated using a deep learning (DL) algorithm that is trained based on values of physical variables that represent states of other physical systems that are determined by one or more physical equations and subject to one or more conservation laws. A physics-based model modifies the estimated values based on the one or more physical equations so that the resulting modified values satisfy the one or more conservation laws.
    Type: Application
    Filed: August 31, 2020
    Publication date: October 14, 2021
    Inventors: Nicholas Penha MALAYA, Abhinav VISHNU, Octavi OBIOLS SALES
  • Publication number: 20210064444
    Abstract: Systems, apparatuses, and methods for proactively managing inter-processor network links are disclosed. A computing system includes at least a control unit and a plurality of processing units. Each processing unit of the plurality of processing units includes a compute module and a configurable link interface. The control unit dynamically adjusts a clock frequency and a link width of the configurable link interface of each processing unit based on a data transfer size and layer computation time of a plurality of layers of a neural network so as to reduce execution time of each layer. By adjusting the clock frequency and the link width of the link interface on a per-layer basis, the overlapping of communication and computation phases is closely matched, allowing layers to complete more quickly.
    Type: Application
    Filed: August 27, 2019
    Publication date: March 4, 2021
    Inventors: Karthik Rao, Abhinav Vishnu
  • Publication number: 20210049446
    Abstract: A system comprising an electronic device that includes a processor is described. During operation, the processor acquires a full version of a neural network, the neural network including internal elements for processing instances of input image data having a set of color channels. The processor then generates, from the neural network, a set of sub-networks, each sub-network being a separate copy of the neural network with the internal elements for processing at least one of the color channels in instances of input image data removed, so that each sub-network is configured for processing a different set of one or more color channels in instances of input image data. The processor next provides the sub-networks for processing instances of input image data—and may itself use the sub-networks for processing instances of input image data.
    Type: Application
    Filed: August 12, 2019
    Publication date: February 18, 2021
    Inventors: Sudhanva Gurumurthi, Abhinav Vishnu
  • Publication number: 20210012203
    Abstract: Systems, methods, and devices for increasing inference speed of a trained convolutional neural network (CNN). A first computation speed of first filters having a first filter size in a layer of the CNN is determined, and a second computation speed of second filters having a second filter size in the layer of the CNN is determined. The size of at least one of the first filters is changed to the second filter size if the second computation speed is faster than the first computation speed. In some implementations the CNN is retrained, after changing the size of at least one of the first filters to the second filter size, to generate a retrained CNN. The size of a fewer number of the first filters is changed to the second filter size if a key performance indicator loss of the retrained CNN exceeds a threshold.
    Type: Application
    Filed: July 10, 2019
    Publication date: January 14, 2021
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Abhinav Vishnu, Prakash Sathyanath Raghavendra, Tamer M. Elsharnouby, Rachida Kebichi, Walid Ali, Jonathan Charles Gallmeier
  • Publication number: 20200379814
    Abstract: Techniques for scheduling resources on a managed computer system are provided herein. A generative adversarial network generates predicted resource utilization. An orchestrator trains the generative adversarial network and provides the predicted resource utilization from the generative adversarial network to a resource scheduler for usage when the quality of the predicted resource utilization is above a threshold. The quality is measured as the ability of a generator component of the generative adversarial network to “fool” a discriminator component of the generative adversarial network into misclassifying the predicted resource utilization as being real (i.e., being of the type that is actually measured from the computer system).
    Type: Application
    Filed: May 29, 2019
    Publication date: December 3, 2020
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Sergey Blagodurov, Abhinav Vishnu, Thaleia Dimitra Doudali, Jagadish B. Kotra
  • Publication number: 20200151573
    Abstract: A processor determines losses of samples within an input volume that is provided to a neural network during a first epoch, groups the samples into subsets based on losses, and assigns the subsets to operands in the neural network that represent the samples at different precisions. Each subset is associated with a different precision. The processor then processes the subsets in the neural network at the different precisions during the first epoch. In some cases, the samples in the subsets are used in a forward pass and a backward pass through the neural network. A memory configured to store information representing the samples in the subsets at the different precisions. In some cases, the processor stores information representing model parameters of the neural network in the memory at the different precisions of the subsets of the corresponding samples.
    Type: Application
    Filed: May 29, 2019
    Publication date: May 14, 2020
    Inventors: Shomit N. DAS, Abhinav VISHNU
  • Publication number: 20200151510
    Abstract: A method of adaptive batch reuse includes prefetching, from a CPU to a GPU, a first plurality of mini-batches comprising a subset of a training dataset. The GPU trains the neural network for the current epoch by reusing, without discard, the first plurality of mini-batches in training the neural network for the current epoch based on a reuse count value. The GPU also runs a validation set to identify a validation error for the current epoch. If the validation error for the current epoch is less than a validation error of a previous epoch, the reuse count value is incremented for a next epoch. However, if the validation error for the current epoch is greater than a validation error of a previous epoch, the reuse count value is decremented for the next epoch.
    Type: Application
    Filed: May 28, 2019
    Publication date: May 14, 2020
    Inventor: Abhinav VISHNU
  • Publication number: 20200097822
    Abstract: A heterogeneous processing system includes at least one central processing unit (CPU) core and at least one graphics processing unit (GPU) core. The heterogeneous processing system is configured to compute an activation for each one of a plurality of neurons for a first network layer of a neural network. The heterogeneous processing system randomly drops a first subset of the plurality of neurons for the first network layer and keeps a second subset of the plurality of neurons for the first network layer. Activation for each one of the second subset of the plurality of neurons is forwarded to the CPU core and coalesced to generate a set of coalesced activation sub-matrices.
    Type: Application
    Filed: September 25, 2018
    Publication date: March 26, 2020
    Inventor: Abhinav VISHNU
  • Publication number: 20200042859
    Abstract: Systems, apparatuses, and methods for managing buffers in a neural network implementation with heterogeneous memory are disclosed. A system includes a neural network coupled to a first memory and a second memory. The first memory is a relatively low-capacity, high-bandwidth memory while the second memory is a relatively high-capacity, low-bandwidth memory. During a forward propagation pass of the neural network, a run-time manager monitors the usage of the buffers for the various layers of the neural network. During a backward propagation pass of the neural network, the run-time manager determines how to move the buffers between the first and second memories based on the monitored buffer usage during the forward propagation pass. As a result, the run-time manager is able to reduce memory access latency for the layers of the neural network during the backward propagation pass.
    Type: Application
    Filed: November 19, 2018
    Publication date: February 6, 2020
    Inventors: Georgios Mappouras, Amin Farmahini-Farahani, Sudhanva Gurumurthi, Abhinav Vishnu, Gabriel H. Loh
  • Publication number: 20190354833
    Abstract: Methods and systems for reducing communication frequency in neural networks (NN) are described. The method includes running, in an initial epoch, mini-batches of samples from a training set through the NN and determining one or more errors from a ground truth, where the ground truth is the given label for the sample. The errors are recorded for each sample and are sorted in a non-decreasing order. In a next epoch, mini-batches of samples are formed starting from the sample which has the smallest error in the sorted list. The parameters of the NN are updated and the mini-batches are run. A mini-batch(es) are communicated to the other processing elements if a previous update has resulted in making a significant impact on the NN, where significant impact is measured by determining if the errors or accumulated errors since the last communication update meet or exceed a significance threshold.
    Type: Application
    Filed: July 5, 2018
    Publication date: November 21, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Abhinav Vishnu
  • Patent number: 9503383
    Abstract: A message flow controller limits a process from passing a new message in a reliable message passing layer from a source node to at least one destination node while a total number of in-flight messages for the process meets a first level limit. The message flow controller limits the new message from passing from the source node to a particular destination node from among a plurality of destination nodes while a total number of in-flight messages to the particular destination node meets a second level limit. Responsive to the total number of in-flight messages to the particular destination node not meeting the second level limit, the message flow controller only sends a new packet from among at least one packet for the new message to the particular destination node while a total number of in-flight packets for the new message is less than a third level limit.
    Type: Grant
    Filed: April 17, 2015
    Date of Patent: November 22, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Uman Chan, Deryck X. Hong, Tsai-Yang Jea, Chulho Kim, Zenon J. Piatek, Hung Q. Thai, Abhinav Vishnu, Hanhong Xue
  • Publication number: 20150222556
    Abstract: A message flow controller limits a process from passing a new message in a reliable message passing layer from a source node to at least one destination node while a total number of in-flight messages for the process meets a first level limit. The message flow controller limits the new message from passing from the source node to a particular destination node from among a plurality of destination nodes while a total number of in-flight messages to the particular destination node meets a second level limit. Responsive to the total number of in-flight messages to the particular destination node not meeting the second level limit, the message flow controller only sends a new packet from among at least one packet for the new message to the particular destination node while a total number of in-flight packets for the new message is less than a third level limit.
    Type: Application
    Filed: April 17, 2015
    Publication date: August 6, 2015
    Inventors: UMAN CHAN, DERYCK X. HONG, TSAI-YANG JEA, CHULHO KIM, ZENON J. PIATEK, HUNG Q. THAI, ABHINAV VISHNU, HANHONG XUE