Patents by Inventor ABHINAV VISHNU

ABHINAV VISHNU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Dynamic precision scaling at epoch granularity in neural networks

Patent number: 12169782

Abstract: A processor determines losses of samples within an input volume that is provided to a neural network during a first epoch, groups the samples into subsets based on losses, and assigns the subsets to operands in the neural network that represent the samples at different precisions. Each subset is associated with a different precision. The processor then processes the subsets in the neural network at the different precisions during the first epoch. In some cases, the samples in the subsets are used in a forward pass and a backward pass through the neural network. A memory configured to store information representing the samples in the subsets at the different precisions. In some cases, the processor stores information representing model parameters of the neural network in the memory at the different precisions of the subsets of the corresponding samples.

Type: Grant

Filed: May 29, 2019

Date of Patent: December 17, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Shomit N. Das, Abhinav Vishnu
Adaptive batch reuse on deep memories

Patent number: 12039450

Abstract: A method of adaptive batch reuse includes prefetching, from a CPU to a GPU, a first plurality of mini-batches comprising a subset of a training dataset. The GPU trains the neural network for the current epoch by reusing, without discard, the first plurality of mini-batches in training the neural network for the current epoch based on a reuse count value. The GPU also runs a validation set to identify a validation error for the current epoch. If the validation error for the current epoch is less than a validation error of a previous epoch, the reuse count value is incremented for a next epoch. However, if the validation error for the current epoch is greater than a validation error of a previous epoch, the reuse count value is decremented for the next epoch.

Type: Grant

Filed: May 28, 2019

Date of Patent: July 16, 2024

Assignee: Advanced Micro Devices, Inc.

Inventor: Abhinav Vishnu
Using Sub-Networks Created from Neural Networks for Processing Color Images

Publication number: 20230394311

Abstract: A system comprising an electronic device that includes a processor is described. During operation, the processor acquires a full version of a neural network, the neural network including internal elements for processing instances of input image data having a set of color channels. The processor then generates, from the neural network, a set of sub-networks, each sub-network being a separate copy of the neural network with the internal elements for processing at least one of the color channels in instances of input image data removed, so that each sub-network is configured for processing a different set of one or more color channels in instances of input image data. The processor next provides the sub-networks for processing instances of input image data—and may itself use the sub-networks for processing instances of input image data.

Type: Application

Filed: August 22, 2023

Publication date: December 7, 2023

Inventors: Sudhanva Gurumurthi, Abhinav Vishnu
Runtime extension for neural network training with heterogeneous memory

Patent number: 11775799

Abstract: Systems, apparatuses, and methods for managing buffers in a neural network implementation with heterogeneous memory are disclosed. A system includes a neural network coupled to a first memory and a second memory. The first memory is a relatively low-capacity, high-bandwidth memory while the second memory is a relatively high-capacity, low-bandwidth memory. During a forward propagation pass of the neural network, a run-time manager monitors the usage of the buffers for the various layers of the neural network. During a backward propagation pass of the neural network, the run-time manager determines how to move the buffers between the first and second memories based on the monitored buffer usage during the forward propagation pass. As a result, the run-time manager is able to reduce memory access latency for the layers of the neural network during the backward propagation pass.

Type: Grant

Filed: November 19, 2018

Date of Patent: October 3, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Georgios Mappouras, Amin Farmahini-Farahani, Sudhanva Gurumurthi, Abhinav Vishnu, Gabriel H. Loh
Using sub-networks created from neural networks for processing color images

Patent number: 11763155

Abstract: A system comprising an electronic device that includes a processor is described. During operation, the processor acquires a full version of a neural network, the neural network including internal elements for processing instances of input image data having a set of color channels. The processor then generates, from the neural network, a set of sub-networks, each sub-network being a separate copy of the neural network with the internal elements for processing at least one of the color channels in instances of input image data removed, so that each sub-network is configured for processing a different set of one or more color channels in instances of input image data. The processor next provides the sub-networks for processing instances of input image data—and may itself use the sub-networks for processing instances of input image data.

Type: Grant

Filed: August 12, 2019

Date of Patent: September 19, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Sudhanva Gurumurthi, Abhinav Vishnu
Compression of Lookup Data Communicated by Nodes in an Electronic Device

Publication number: 20230196430

Abstract: An electronic device includes multiple nodes. Each node generates compressed lookup data to be used for processing instances of input data through a model using input index vectors from a compressed set of input index vectors for each part among multiple parts of a respective set of input index vectors. Each node then communicates compressed lookup data for a respective part to each other node.

Type: Application

Filed: December 22, 2021

Publication date: June 22, 2023

Inventors: Sarunya Pumma, Abhinav Vishnu
Allreduce enhanced direct memory access functionality

Patent number: 11669473

Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.

Type: Grant

Filed: September 25, 2020

Date of Patent: June 6, 2023

Inventors: Abhinav Vishnu, Joseph Lee Greathouse
Dropout for accelerated deep learning in heterogeneous architectures

Patent number: 11620525

Abstract: A heterogeneous processing system includes at least one central processing unit (CPU) core and at least one graphics processing unit (GPU) core. The heterogeneous processing system is configured to compute an activation for each one of a plurality of neurons for a first network layer of a neural network. The heterogeneous processing system randomly drops a first subset of the plurality of neurons for the first network layer and keeps a second subset of the plurality of neurons for the first network layer. Activation for each one of the second subset of the plurality of neurons is forwarded to the CPU core and coalesced to generate a set of coalesced activation sub-matrices.

Type: Grant

Filed: September 25, 2018

Date of Patent: April 4, 2023

Assignee: Advanced Micro Devices, Inc.

Inventor: Abhinav Vishnu
Proactive management of inter-GPU network links

Patent number: 11436060

Abstract: Systems, apparatuses, and methods for proactively managing inter-processor network links are disclosed. A computing system includes at least a control unit and a plurality of processing units. Each processing unit of the plurality of processing units includes a compute module and a configurable link interface. The control unit dynamically adjusts a clock frequency and a link width of the configurable link interface of each processing unit based on a data transfer size and layer computation time of a plurality of layers of a neural network so as to reduce execution time of each layer. By adjusting the clock frequency and the link width of the link interface on a per-layer basis, the overlapping of communication and computation phases is closely matched, allowing layers to complete more quickly.

Type: Grant

Filed: August 27, 2019

Date of Patent: September 6, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Karthik Rao, Abhinav Vishnu
ALLREDUCE ENHANCED DIRECT MEMORY ACCESS FUNCTIONALITY

Publication number: 20210406209

Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.

Type: Application

Filed: September 25, 2020

Publication date: December 30, 2021

Inventors: Abhinav Vishnu, Joseph Lee Greathouse
DEEP LEARNING ACCELERATION OF PHYSICS-BASED MODELING

Publication number: 20210319312

Abstract: Values of physical variables that represent a first state of a first physical system are estimated using a deep learning (DL) algorithm that is trained based on values of physical variables that represent states of other physical systems that are determined by one or more physical equations and subject to one or more conservation laws. A physics-based model modifies the estimated values based on the one or more physical equations so that the resulting modified values satisfy the one or more conservation laws.

Type: Application

Filed: August 31, 2020

Publication date: October 14, 2021

Inventors: Nicholas Penha MALAYA, Abhinav VISHNU, Octavi OBIOLS SALES
PROACTIVE MANAGEMENT OF INTER-GPU NETWORK LINKS

Publication number: 20210064444

Abstract: Systems, apparatuses, and methods for proactively managing inter-processor network links are disclosed. A computing system includes at least a control unit and a plurality of processing units. Each processing unit of the plurality of processing units includes a compute module and a configurable link interface. The control unit dynamically adjusts a clock frequency and a link width of the configurable link interface of each processing unit based on a data transfer size and layer computation time of a plurality of layers of a neural network so as to reduce execution time of each layer. By adjusting the clock frequency and the link width of the link interface on a per-layer basis, the overlapping of communication and computation phases is closely matched, allowing layers to complete more quickly.

Type: Application

Filed: August 27, 2019

Publication date: March 4, 2021

Inventors: Karthik Rao, Abhinav Vishnu
Using Sub-Networks Created from Neural Networks for Processing Color Images

Publication number: 20210049446

Abstract: A system comprising an electronic device that includes a processor is described. During operation, the processor acquires a full version of a neural network, the neural network including internal elements for processing instances of input image data having a set of color channels. The processor then generates, from the neural network, a set of sub-networks, each sub-network being a separate copy of the neural network with the internal elements for processing at least one of the color channels in instances of input image data removed, so that each sub-network is configured for processing a different set of one or more color channels in instances of input image data. The processor next provides the sub-networks for processing instances of input image data—and may itself use the sub-networks for processing instances of input image data.

Type: Application

Filed: August 12, 2019

Publication date: February 18, 2021

Inventors: Sudhanva Gurumurthi, Abhinav Vishnu
ADAPTIVE FILTER REPLACEMENT IN CONVOLUTIONAL NEURAL NETWORKS

Publication number: 20210012203

Abstract: Systems, methods, and devices for increasing inference speed of a trained convolutional neural network (CNN). A first computation speed of first filters having a first filter size in a layer of the CNN is determined, and a second computation speed of second filters having a second filter size in the layer of the CNN is determined. The size of at least one of the first filters is changed to the second filter size if the second computation speed is faster than the first computation speed. In some implementations the CNN is retrained, after changing the size of at least one of the first filters to the second filter size, to generate a retrained CNN. The size of a fewer number of the first filters is changed to the second filter size if a key performance indicator loss of the retrained CNN exceeds a threshold.

Type: Application

Filed: July 10, 2019

Publication date: January 14, 2021

Applicant: Advanced Micro Devices, Inc.

Inventors: Abhinav Vishnu, Prakash Sathyanath Raghavendra, Tamer M. Elsharnouby, Rachida Kebichi, Walid Ali, Jonathan Charles Gallmeier
COMPUTER RESOURCE SCHEDULING USING GENERATIVE ADVERSARIAL NETWORKS

Publication number: 20200379814

Abstract: Techniques for scheduling resources on a managed computer system are provided herein. A generative adversarial network generates predicted resource utilization. An orchestrator trains the generative adversarial network and provides the predicted resource utilization from the generative adversarial network to a resource scheduler for usage when the quality of the predicted resource utilization is above a threshold. The quality is measured as the ability of a generator component of the generative adversarial network to “fool” a discriminator component of the generative adversarial network into misclassifying the predicted resource utilization as being real (i.e., being of the type that is actually measured from the computer system).

Type: Application

Filed: May 29, 2019

Publication date: December 3, 2020

Applicant: Advanced Micro Devices, Inc.

Inventors: Sergey Blagodurov, Abhinav Vishnu, Thaleia Dimitra Doudali, Jagadish B. Kotra
DYNAMIC PRECISION SCALING AT EPOCH GRANULARITY IN NEURAL NETWORKS

Publication number: 20200151573

Abstract: A processor determines losses of samples within an input volume that is provided to a neural network during a first epoch, groups the samples into subsets based on losses, and assigns the subsets to operands in the neural network that represent the samples at different precisions. Each subset is associated with a different precision. The processor then processes the subsets in the neural network at the different precisions during the first epoch. In some cases, the samples in the subsets are used in a forward pass and a backward pass through the neural network. A memory configured to store information representing the samples in the subsets at the different precisions. In some cases, the processor stores information representing model parameters of the neural network in the memory at the different precisions of the subsets of the corresponding samples.

Type: Application

Filed: May 29, 2019

Publication date: May 14, 2020

Inventors: Shomit N. DAS, Abhinav VISHNU
ADAPTIVE BATCH REUSE ON DEEP MEMORIES

Publication number: 20200151510

Abstract: A method of adaptive batch reuse includes prefetching, from a CPU to a GPU, a first plurality of mini-batches comprising a subset of a training dataset. The GPU trains the neural network for the current epoch by reusing, without discard, the first plurality of mini-batches in training the neural network for the current epoch based on a reuse count value. The GPU also runs a validation set to identify a validation error for the current epoch. If the validation error for the current epoch is less than a validation error of a previous epoch, the reuse count value is incremented for a next epoch. However, if the validation error for the current epoch is greater than a validation error of a previous epoch, the reuse count value is decremented for the next epoch.

Type: Application

Filed: May 28, 2019

Publication date: May 14, 2020

Inventor: Abhinav VISHNU
DROPOUT FOR ACCELERATED DEEP LEARNING IN HETEROGENEOUS ARCHITECTURES

Publication number: 20200097822

Abstract: A heterogeneous processing system includes at least one central processing unit (CPU) core and at least one graphics processing unit (GPU) core. The heterogeneous processing system is configured to compute an activation for each one of a plurality of neurons for a first network layer of a neural network. The heterogeneous processing system randomly drops a first subset of the plurality of neurons for the first network layer and keeps a second subset of the plurality of neurons for the first network layer. Activation for each one of the second subset of the plurality of neurons is forwarded to the CPU core and coalesced to generate a set of coalesced activation sub-matrices.

Type: Application

Filed: September 25, 2018

Publication date: March 26, 2020

Inventor: Abhinav VISHNU
RUNTIME EXTENSION FOR NEURAL NETWORK TRAINING WITH HETEROGENEOUS MEMORY

Publication number: 20200042859

Abstract: Systems, apparatuses, and methods for managing buffers in a neural network implementation with heterogeneous memory are disclosed. A system includes a neural network coupled to a first memory and a second memory. The first memory is a relatively low-capacity, high-bandwidth memory while the second memory is a relatively high-capacity, low-bandwidth memory. During a forward propagation pass of the neural network, a run-time manager monitors the usage of the buffers for the various layers of the neural network. During a backward propagation pass of the neural network, the run-time manager determines how to move the buffers between the first and second memories based on the monitored buffer usage during the forward propagation pass. As a result, the run-time manager is able to reduce memory access latency for the layers of the neural network during the backward propagation pass.

Type: Application

Filed: November 19, 2018

Publication date: February 6, 2020

Inventors: Georgios Mappouras, Amin Farmahini-Farahani, Sudhanva Gurumurthi, Abhinav Vishnu, Gabriel H. Loh
METHOD AND SYSTEM FOR REDUCING COMMUNICATION FREQUENCY IN NEURAL NETWORK SYSTEMS

Publication number: 20190354833

Abstract: Methods and systems for reducing communication frequency in neural networks (NN) are described. The method includes running, in an initial epoch, mini-batches of samples from a training set through the NN and determining one or more errors from a ground truth, where the ground truth is the given label for the sample. The errors are recorded for each sample and are sorted in a non-decreasing order. In a next epoch, mini-batches of samples are formed starting from the sample which has the smallest error in the sorted list. The parameters of the NN are updated and the mini-batches are run. A mini-batch(es) are communicated to the other processing elements if a previous update has resulted in making a significant impact on the NN, where significant impact is measured by determining if the errors or accumulated errors since the last communication update meet or exceed a significance threshold.

Type: Application

Filed: July 5, 2018

Publication date: November 21, 2019

Applicant: Advanced Micro Devices, Inc.

Inventor: Abhinav Vishnu

1 2 next