Patents by Inventor ABHINAV VISHNU
ABHINAV VISHNU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230394311Abstract: A system comprising an electronic device that includes a processor is described. During operation, the processor acquires a full version of a neural network, the neural network including internal elements for processing instances of input image data having a set of color channels. The processor then generates, from the neural network, a set of sub-networks, each sub-network being a separate copy of the neural network with the internal elements for processing at least one of the color channels in instances of input image data removed, so that each sub-network is configured for processing a different set of one or more color channels in instances of input image data. The processor next provides the sub-networks for processing instances of input image data—and may itself use the sub-networks for processing instances of input image data.Type: ApplicationFiled: August 22, 2023Publication date: December 7, 2023Inventors: Sudhanva Gurumurthi, Abhinav Vishnu
-
Patent number: 11775799Abstract: Systems, apparatuses, and methods for managing buffers in a neural network implementation with heterogeneous memory are disclosed. A system includes a neural network coupled to a first memory and a second memory. The first memory is a relatively low-capacity, high-bandwidth memory while the second memory is a relatively high-capacity, low-bandwidth memory. During a forward propagation pass of the neural network, a run-time manager monitors the usage of the buffers for the various layers of the neural network. During a backward propagation pass of the neural network, the run-time manager determines how to move the buffers between the first and second memories based on the monitored buffer usage during the forward propagation pass. As a result, the run-time manager is able to reduce memory access latency for the layers of the neural network during the backward propagation pass.Type: GrantFiled: November 19, 2018Date of Patent: October 3, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Georgios Mappouras, Amin Farmahini-Farahani, Sudhanva Gurumurthi, Abhinav Vishnu, Gabriel H. Loh
-
Patent number: 11763155Abstract: A system comprising an electronic device that includes a processor is described. During operation, the processor acquires a full version of a neural network, the neural network including internal elements for processing instances of input image data having a set of color channels. The processor then generates, from the neural network, a set of sub-networks, each sub-network being a separate copy of the neural network with the internal elements for processing at least one of the color channels in instances of input image data removed, so that each sub-network is configured for processing a different set of one or more color channels in instances of input image data. The processor next provides the sub-networks for processing instances of input image data—and may itself use the sub-networks for processing instances of input image data.Type: GrantFiled: August 12, 2019Date of Patent: September 19, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Sudhanva Gurumurthi, Abhinav Vishnu
-
Publication number: 20230196430Abstract: An electronic device includes multiple nodes. Each node generates compressed lookup data to be used for processing instances of input data through a model using input index vectors from a compressed set of input index vectors for each part among multiple parts of a respective set of input index vectors. Each node then communicates compressed lookup data for a respective part to each other node.Type: ApplicationFiled: December 22, 2021Publication date: June 22, 2023Inventors: Sarunya Pumma, Abhinav Vishnu
-
Patent number: 11669473Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.Type: GrantFiled: September 25, 2020Date of Patent: June 6, 2023Inventors: Abhinav Vishnu, Joseph Lee Greathouse
-
Patent number: 11620525Abstract: A heterogeneous processing system includes at least one central processing unit (CPU) core and at least one graphics processing unit (GPU) core. The heterogeneous processing system is configured to compute an activation for each one of a plurality of neurons for a first network layer of a neural network. The heterogeneous processing system randomly drops a first subset of the plurality of neurons for the first network layer and keeps a second subset of the plurality of neurons for the first network layer. Activation for each one of the second subset of the plurality of neurons is forwarded to the CPU core and coalesced to generate a set of coalesced activation sub-matrices.Type: GrantFiled: September 25, 2018Date of Patent: April 4, 2023Assignee: Advanced Micro Devices, Inc.Inventor: Abhinav Vishnu
-
Patent number: 11436060Abstract: Systems, apparatuses, and methods for proactively managing inter-processor network links are disclosed. A computing system includes at least a control unit and a plurality of processing units. Each processing unit of the plurality of processing units includes a compute module and a configurable link interface. The control unit dynamically adjusts a clock frequency and a link width of the configurable link interface of each processing unit based on a data transfer size and layer computation time of a plurality of layers of a neural network so as to reduce execution time of each layer. By adjusting the clock frequency and the link width of the link interface on a per-layer basis, the overlapping of communication and computation phases is closely matched, allowing layers to complete more quickly.Type: GrantFiled: August 27, 2019Date of Patent: September 6, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Karthik Rao, Abhinav Vishnu
-
Publication number: 20210406209Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.Type: ApplicationFiled: September 25, 2020Publication date: December 30, 2021Inventors: Abhinav Vishnu, Joseph Lee Greathouse
-
Publication number: 20210319312Abstract: Values of physical variables that represent a first state of a first physical system are estimated using a deep learning (DL) algorithm that is trained based on values of physical variables that represent states of other physical systems that are determined by one or more physical equations and subject to one or more conservation laws. A physics-based model modifies the estimated values based on the one or more physical equations so that the resulting modified values satisfy the one or more conservation laws.Type: ApplicationFiled: August 31, 2020Publication date: October 14, 2021Inventors: Nicholas Penha MALAYA, Abhinav VISHNU, Octavi OBIOLS SALES
-
Publication number: 20210064444Abstract: Systems, apparatuses, and methods for proactively managing inter-processor network links are disclosed. A computing system includes at least a control unit and a plurality of processing units. Each processing unit of the plurality of processing units includes a compute module and a configurable link interface. The control unit dynamically adjusts a clock frequency and a link width of the configurable link interface of each processing unit based on a data transfer size and layer computation time of a plurality of layers of a neural network so as to reduce execution time of each layer. By adjusting the clock frequency and the link width of the link interface on a per-layer basis, the overlapping of communication and computation phases is closely matched, allowing layers to complete more quickly.Type: ApplicationFiled: August 27, 2019Publication date: March 4, 2021Inventors: Karthik Rao, Abhinav Vishnu
-
Publication number: 20210049446Abstract: A system comprising an electronic device that includes a processor is described. During operation, the processor acquires a full version of a neural network, the neural network including internal elements for processing instances of input image data having a set of color channels. The processor then generates, from the neural network, a set of sub-networks, each sub-network being a separate copy of the neural network with the internal elements for processing at least one of the color channels in instances of input image data removed, so that each sub-network is configured for processing a different set of one or more color channels in instances of input image data. The processor next provides the sub-networks for processing instances of input image data—and may itself use the sub-networks for processing instances of input image data.Type: ApplicationFiled: August 12, 2019Publication date: February 18, 2021Inventors: Sudhanva Gurumurthi, Abhinav Vishnu
-
Publication number: 20210012203Abstract: Systems, methods, and devices for increasing inference speed of a trained convolutional neural network (CNN). A first computation speed of first filters having a first filter size in a layer of the CNN is determined, and a second computation speed of second filters having a second filter size in the layer of the CNN is determined. The size of at least one of the first filters is changed to the second filter size if the second computation speed is faster than the first computation speed. In some implementations the CNN is retrained, after changing the size of at least one of the first filters to the second filter size, to generate a retrained CNN. The size of a fewer number of the first filters is changed to the second filter size if a key performance indicator loss of the retrained CNN exceeds a threshold.Type: ApplicationFiled: July 10, 2019Publication date: January 14, 2021Applicant: Advanced Micro Devices, Inc.Inventors: Abhinav Vishnu, Prakash Sathyanath Raghavendra, Tamer M. Elsharnouby, Rachida Kebichi, Walid Ali, Jonathan Charles Gallmeier
-
Publication number: 20200379814Abstract: Techniques for scheduling resources on a managed computer system are provided herein. A generative adversarial network generates predicted resource utilization. An orchestrator trains the generative adversarial network and provides the predicted resource utilization from the generative adversarial network to a resource scheduler for usage when the quality of the predicted resource utilization is above a threshold. The quality is measured as the ability of a generator component of the generative adversarial network to “fool” a discriminator component of the generative adversarial network into misclassifying the predicted resource utilization as being real (i.e., being of the type that is actually measured from the computer system).Type: ApplicationFiled: May 29, 2019Publication date: December 3, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Sergey Blagodurov, Abhinav Vishnu, Thaleia Dimitra Doudali, Jagadish B. Kotra
-
Publication number: 20200151573Abstract: A processor determines losses of samples within an input volume that is provided to a neural network during a first epoch, groups the samples into subsets based on losses, and assigns the subsets to operands in the neural network that represent the samples at different precisions. Each subset is associated with a different precision. The processor then processes the subsets in the neural network at the different precisions during the first epoch. In some cases, the samples in the subsets are used in a forward pass and a backward pass through the neural network. A memory configured to store information representing the samples in the subsets at the different precisions. In some cases, the processor stores information representing model parameters of the neural network in the memory at the different precisions of the subsets of the corresponding samples.Type: ApplicationFiled: May 29, 2019Publication date: May 14, 2020Inventors: Shomit N. DAS, Abhinav VISHNU
-
Publication number: 20200151510Abstract: A method of adaptive batch reuse includes prefetching, from a CPU to a GPU, a first plurality of mini-batches comprising a subset of a training dataset. The GPU trains the neural network for the current epoch by reusing, without discard, the first plurality of mini-batches in training the neural network for the current epoch based on a reuse count value. The GPU also runs a validation set to identify a validation error for the current epoch. If the validation error for the current epoch is less than a validation error of a previous epoch, the reuse count value is incremented for a next epoch. However, if the validation error for the current epoch is greater than a validation error of a previous epoch, the reuse count value is decremented for the next epoch.Type: ApplicationFiled: May 28, 2019Publication date: May 14, 2020Inventor: Abhinav VISHNU
-
Publication number: 20200097822Abstract: A heterogeneous processing system includes at least one central processing unit (CPU) core and at least one graphics processing unit (GPU) core. The heterogeneous processing system is configured to compute an activation for each one of a plurality of neurons for a first network layer of a neural network. The heterogeneous processing system randomly drops a first subset of the plurality of neurons for the first network layer and keeps a second subset of the plurality of neurons for the first network layer. Activation for each one of the second subset of the plurality of neurons is forwarded to the CPU core and coalesced to generate a set of coalesced activation sub-matrices.Type: ApplicationFiled: September 25, 2018Publication date: March 26, 2020Inventor: Abhinav VISHNU
-
Publication number: 20200042859Abstract: Systems, apparatuses, and methods for managing buffers in a neural network implementation with heterogeneous memory are disclosed. A system includes a neural network coupled to a first memory and a second memory. The first memory is a relatively low-capacity, high-bandwidth memory while the second memory is a relatively high-capacity, low-bandwidth memory. During a forward propagation pass of the neural network, a run-time manager monitors the usage of the buffers for the various layers of the neural network. During a backward propagation pass of the neural network, the run-time manager determines how to move the buffers between the first and second memories based on the monitored buffer usage during the forward propagation pass. As a result, the run-time manager is able to reduce memory access latency for the layers of the neural network during the backward propagation pass.Type: ApplicationFiled: November 19, 2018Publication date: February 6, 2020Inventors: Georgios Mappouras, Amin Farmahini-Farahani, Sudhanva Gurumurthi, Abhinav Vishnu, Gabriel H. Loh
-
Publication number: 20190354833Abstract: Methods and systems for reducing communication frequency in neural networks (NN) are described. The method includes running, in an initial epoch, mini-batches of samples from a training set through the NN and determining one or more errors from a ground truth, where the ground truth is the given label for the sample. The errors are recorded for each sample and are sorted in a non-decreasing order. In a next epoch, mini-batches of samples are formed starting from the sample which has the smallest error in the sorted list. The parameters of the NN are updated and the mini-batches are run. A mini-batch(es) are communicated to the other processing elements if a previous update has resulted in making a significant impact on the NN, where significant impact is measured by determining if the errors or accumulated errors since the last communication update meet or exceed a significance threshold.Type: ApplicationFiled: July 5, 2018Publication date: November 21, 2019Applicant: Advanced Micro Devices, Inc.Inventor: Abhinav Vishnu
-
Patent number: 9503383Abstract: A message flow controller limits a process from passing a new message in a reliable message passing layer from a source node to at least one destination node while a total number of in-flight messages for the process meets a first level limit. The message flow controller limits the new message from passing from the source node to a particular destination node from among a plurality of destination nodes while a total number of in-flight messages to the particular destination node meets a second level limit. Responsive to the total number of in-flight messages to the particular destination node not meeting the second level limit, the message flow controller only sends a new packet from among at least one packet for the new message to the particular destination node while a total number of in-flight packets for the new message is less than a third level limit.Type: GrantFiled: April 17, 2015Date of Patent: November 22, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Uman Chan, Deryck X. Hong, Tsai-Yang Jea, Chulho Kim, Zenon J. Piatek, Hung Q. Thai, Abhinav Vishnu, Hanhong Xue
-
Publication number: 20150222556Abstract: A message flow controller limits a process from passing a new message in a reliable message passing layer from a source node to at least one destination node while a total number of in-flight messages for the process meets a first level limit. The message flow controller limits the new message from passing from the source node to a particular destination node from among a plurality of destination nodes while a total number of in-flight messages to the particular destination node meets a second level limit. Responsive to the total number of in-flight messages to the particular destination node not meeting the second level limit, the message flow controller only sends a new packet from among at least one packet for the new message to the particular destination node while a total number of in-flight packets for the new message is less than a third level limit.Type: ApplicationFiled: April 17, 2015Publication date: August 6, 2015Inventors: UMAN CHAN, DERYCK X. HONG, TSAI-YANG JEA, CHULHO KIM, ZENON J. PIATEK, HUNG Q. THAI, ABHINAV VISHNU, HANHONG XUE