Patents by Inventor Amar Phanishayee

Amar Phanishayee has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

NEURAL NETWORK ACTIVATION COMPRESSION WITH NON-UNIFORM MANTISSAS

Publication number: 20240152758

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

Type: Application

Filed: January 17, 2024

Publication date: May 9, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao
Workload-Aware Hardware Architecture Recommendations

Publication number: 20240126611

Abstract: The description relates to accelerator architectures for deep learning models. One example can obtain a deep learning training script associated with a deep learning model and extract an operator graph from the training script. The example can split the operator graph into first and second portions of a heterogeneous pipeline and tune a first accelerator core for the first portion of the heterogeneous pipeline and a second accelerator core for the second portion of the heterogeneous pipeline. The example can also generate a hardware architecture that includes the first accelerator core and the second accelerator core arranged to collectively accomplish the deep learning model.

Type: Application

Filed: October 13, 2022

Publication date: April 18, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Amar PHANISHAYEE, Divya MAHAJAN, Janardhan KULKARNI, Miguel CASTRO, Muhammad ADNAN
Mitigating communication bottlenecks during parameter exchange in data-parallel DNN training

Patent number: 11868880

Abstract: An interconnect topology for communication between GPUs in a computing system is determined. A quantity of directed spanning trees are generated for transmitting data between the GPUs using the interconnect topology and packed. The directed spanning trees define the connections between GPUs that are to be utilized for the transmission and the amount of data to be transmitted on each connection. Program code is generated for implementing the data transfer defined by the directed spanning trees. When the program code is executed, the directed spanning trees are used to pipeline the transmission of chunks of data, such as model parameters used during data-parallel DNN training, between the GPUs. The program code can also determine an optimal chunk size for data to be transferred between the GPUs.

Type: Grant

Filed: February 14, 2019

Date of Patent: January 9, 2024

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Nikhil Devanur Rangarajan, Jorgen Thelin, Amar Phanishayee, Guanhua Wang, Shivaram Venkataraman
Efficient data encoding for deep neural network training

Patent number: 11715002

Abstract: Functions are added to a deep neural network (“DNN”) computation graph for encoding data structures during a forward training pass of the DNN and decoding previously-encoded data structures during a backward training pass of the DNN. The functions added to the DNN computation graph can be selected based upon on the specific layer pairs specified in the DNN computation graph. Once a modified DNN computation graph has been generated, the DNN can be trained using the modified DNN computation graph. The functions added to the modified DNN computation graph can reduce the utilization of memory during training of the DNN.

Type: Grant

Filed: June 29, 2018

Date of Patent: August 1, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Amar Phanishayee, Gennady Pekhimenko, Animesh Jain
NEURAL NETWORK ACTIVATION COMPRESSION WITH NON-UNIFORM MANTISSAS

Publication number: 20230140185

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

Type: Application

Filed: January 3, 2023

Publication date: May 4, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao
Neural network activation compression with non-uniform mantissas

Patent number: 11562247

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

Type: Grant

Filed: January 24, 2019

Date of Patent: January 24, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao
SELECTIVE DATA STRUCTURE ENCODING FOR DEEP NEURAL NETWORK TRAINING

Publication number: 20220414457

Abstract: Methods, systems, apparatuses, and computer-readable storage mediums described herein are directed to techniques for efficient data encoding for neural network training. In particular, the embodiments described herein train a DNN based on a selective encoding (e.g., compressing) of data structures that are generated during training. For example, multiple training sessions may be performed where, in each training session, a different set of data structures performed by various operators of the DNN are encoded. Memory allocation information generated based on each training session is analyzed to determine which combination of encoded data structures results in a reduction of memory required to train the DNN.

Type: Application

Filed: June 29, 2021

Publication date: December 29, 2022

Inventors: Fanny NINA PARAVECINO, Amar PHANISHAYEE, Atefeh MEHRABI
TRAINING NEURAL NETWORKS BASED ON DUAL PIPELINE ARCHITECTURES

Publication number: 20220138524

Abstract: Embodiments of the present disclosure include systems and methods for training neural networks based on dual pipeline architectures. In some embodiments, a first set of compute elements are configured to implement a first set of layers of a first instance of a neural network. A second set of compute elements are configured to implement a second set of layers of the first instance of the neural network. The second set of compute elements are further configured to implement a first set of layers of a second instance of the neural network. The first set of compute elements are further configured to implement a second set of layers of the second instance of the neural network. The first set of layers of the first instance of the neural network and the first set of layers of the second instance of the neural network are each configured to receive training data.

Type: Application

Filed: January 15, 2021

Publication date: May 5, 2022

Inventors: Mattheus HEDDES, Torsten HOEFLER, Kenneth Andrew COLWELL, Amar PHANISHAYEE
ADJUSTING ACTIVATION COMPRESSION FOR NEURAL NETWORK TRAINING

Publication number: 20200264876

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and, in particular, for adjusting floating-point formats used to store activation values during training. In certain examples of the disclosed technology, a computing system includes processors, memory, and a floating-point compressor in communication with the memory. The computing system is configured to produce a neural network comprising activation values expressed in a first floating-point format, select a second floating-point format for the neural network based on a performance metric, convert at least one of the activation values to the second floating-point format, and store the compressed activation values in the memory. Aspects of the second floating-point format that can be adjusted include the number of bits used to express mantissas, exponent format, use of non-uniform mantissas, and/or use of outlier values to express some of the mantissas.

Type: Application

Filed: February 14, 2019

Publication date: August 20, 2020

Applicant: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Bita Darvish Rouhani, Eric S. Chung, Yiren Zhao, Amar Phanishayee, Ritchie Zhao
Block storage by decoupling ordering from durability

Patent number: 10740195

Abstract: This document relates to data storage techniques. One example can buffer write commands and cause the write commands to be committed to storage in flush epoch order. Another example can maintain a persistent log of write commands that are arranged in the persistent log in flush epoch order. Both examples may provide a prefix consistent state in the event of a crash.

Type: Grant

Filed: September 25, 2018

Date of Patent: August 11, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: James W. Mickens, Amar Phanishayee, Vijaychidambaram Velayudhan Pillai
NEURAL NETWORK ACTIVATION COMPRESSION WITH NON-UNIFORM MANTISSAS

Publication number: 20200242474

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

Type: Application

Filed: January 24, 2019

Publication date: July 30, 2020

Applicant: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao
NEURAL NETWORK ACTIVATION COMPRESSION WITH OUTLIER BLOCK FLOATING-POINT

Publication number: 20200210839

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats having outlier values are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. Outlier values, comprising additional bits of mantissa and/or exponent are stored in ancillary storage for subset of the activation values. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

Type: Application

Filed: December 31, 2018

Publication date: July 2, 2020

Applicant: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao, Ritchie Zhao
NEURAL NETWORK ACTIVATION COMPRESSION WITH NARROW BLOCK FLOATING-POINT

Publication number: 20200210838

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

Type: Application

Filed: December 31, 2018

Publication date: July 2, 2020

Applicant: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao, Ritchie Zhao
MITIGATING COMMUNICATION BOTTLENECKS DURING PARAMETER EXCHANGE IN DATA-PARALLEL DNN TRAINING

Publication number: 20200160171

Abstract: Technologies are disclosed herein for dynamically generating communication primitives for use in model parameter synchronization during data-parallel DNN training by packing directed spanning trees. An interconnect topology for communication between GPUs in a computing system is determined. A quantity of directed spanning trees are generated for transmitting data between the GPUs using the interconnect topology and packed. The directed spanning trees define the connections between GPUs that are to be utilized for the transmission and the amount of data to be transmitted on each connection. Program code is generated for implementing the data transfer defined by the directed spanning trees. When the program code is executed, the directed spanning trees are used to pipeline the transmission of chunks of data, such as model parameters used during data-parallel DNN training, between the GPUs. The program code can also determine an optimal chunk size for data to be transferred between the GPUs.

Type: Application

Filed: February 14, 2019

Publication date: May 21, 2020

Inventors: Nikhil Devanur RANGARAJAN, Jorgen THELIN, Amar PHANISHAYEE, Guanhua WANG, Shivaram VENKATARAMAN
HIGHLY PERFORMANT PIPELINE PARALLEL DEEP NEURAL NETWORK TRAINING

Publication number: 20190362227

Abstract: Layers of a deep neural network (DNN) are partitioned into stages using a profile of the DNN. Each of the stages includes one or more of the layers of the DNN. The partitioning of the layers of the DNN into stages is optimized in various ways including optimizing the partitioning to minimize training time, to minimize data communication between worker computing devices used to train the DNN, or to ensure that the worker computing devices perform an approximately equal amount of the processing for training the DNN. The stages are assigned to the worker computing devices. The worker computing devices process batches of training data using a scheduling policy that causes the workers to alternate between forward processing of the batches of the DNN training data and backward processing of the batches of the DNN training data. The stages can be configured for model parallel processing or data parallel processing.

Type: Application

Filed: June 29, 2018

Publication date: November 28, 2019

Inventors: Vivek SESHADRI, Amar PHANISHAYEE, Deepak NARAYANAN, Aaron HARLAP, Nikhil Devanur RANGARAJAN
EFFICIENT DATA ENCODING FOR DEEP NEURAL NETWORK TRAINING

Publication number: 20190347549

Abstract: Functions are added to a deep neural network (“DNN”) computation graph for encoding data structures during a forward training pass of the DNN and decoding previously-encoded data structures during a backward training pass of the DNN. The functions added to the DNN computation graph can be selected based upon on the specific layer pairs specified in the DNN computation graph. Once a modified DNN computation graph has been generated, the DNN can be trained using the modified DNN computation graph. The functions added to the modified DNN computation graph can reduce the utilization of memory during training of the DNN.

Type: Application

Filed: June 29, 2018

Publication date: November 14, 2019

Inventors: Amar PHANISHAYEE, Gennady PEKHIMENKO, Animesh JAIN
IoT gateway for weakly connected settings

Patent number: 10356187

Abstract: A gateway that may be implemented in a local network and that communicates with a cloud network to provide efficient services in a weakly connected setting is disclosed. The gateway may be configured to enable services that efficiently utilize resources in both of the gateway and the cloud network, and provide a desired quality of service while operating in a weakly connected setting. The gateway may provide data collection and processing, local network services, and enable cloud services that utilize data collected and processed by the gateway. The local network may include one or more sensors and/or video cameras that provide data to the gateway. In a further implementation, the gateway may determine an allocation of one or more tasks of a service between the gateway and a cloud network by determining the allocation of the one or more service tasks based on desired service latency.

Type: Grant

Filed: August 14, 2018

Date of Patent: July 16, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Ranveer Chandra, Ashish Kapoor, Sudipta Sinha, Amar Phanishayee, Deepak Vasisht, Xinxin Jin, Madhusudhan Gumbalapura Sudarshan
BLOCK STORAGE BY DECOUPLING ORDERING FROM DURABILITY

Publication number: 20190087287

Abstract: This document relates to data storage techniques. One example can buffer write commands and cause the write commands to be committed to storage in flush epoch order. Another example can maintain a persistent log of write commands that are arranged in the persistent log in flush epoch order. Both examples may provide a prefix consistent state in the event of a crash.

Type: Application

Filed: September 25, 2018

Publication date: March 21, 2019

Applicant: Microsoft Technology Licensing, LLC

Inventors: James W. MICKENS, Amar PHANISHAYEE, Vijaychidambaram VELAYUDHAN PILLAI
Data center topology having multiple classes of reliability

Patent number: 10187292

Abstract: Techniques and architectures may be used to generate data center network topologies that use less reliable and less expensive links mixed with links of higher reliability. Such topologies may be categorized into reliability classes, where each class corresponds to a bound(s) on reliability of paths that include the links. A topology class may be selected for use by an application based, at least in part, on the degree of reliability demanded by the application.

Type: Grant

Filed: April 15, 2016

Date of Patent: January 22, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Monia Ghobadi, Ratul Mahajan, Amar Phanishayee, Danyang Zhuo, Xuan Kelvin Zou
IoT Gateway For Weakly Connected Settings

Publication number: 20190007505

Abstract: A gateway that may be implemented in a local network and that communicates with a cloud network to provide efficient services in a weakly connected setting is disclosed. The gateway may be configured to enable services that efficiently utilize resources in both of the gateway and the cloud network, and provide a desired quality of service while operating in a weakly connected setting. The gateway may provide data collection and processing, local network services, and enable cloud services that utilize data collected and processed by the gateway. The local network may include one or more sensors and/or video cameras that provide data to the gateway. In a further implementation, the gateway may determine an allocation of one or more tasks of a service between the gateway and a cloud network by determining the allocation of the one or more service tasks based on desired service latency.

Type: Application

Filed: August 14, 2018

Publication date: January 3, 2019

Inventors: Ranveer CHANDRA, Ashish KAPOOR, Sudipta SINHA, Amar PHANISHAYEE, Deepak VASISHT, Xinxin JIN, Madhusudhan Gumbalapura SUDARSHAN

1 2 next