Patents by Inventor Bita Darvish Rouhani

Bita Darvish Rouhani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ADJUSTING ACTIVATION COMPRESSION FOR NEURAL NETWORK TRAINING

Publication number: 20250061320

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and, in particular, for adjusting floating-point formats used to store activation values during training. In certain examples of the disclosed technology, a computing system includes processors, memory, and a floating-point compressor in communication with the memory. The computing system is configured to produce a neural network comprising activation values expressed in a first floating-point format, select a second floating-point format for the neural network based on a performance metric, convert at least one of the activation values to the second floating-point format, and store the compressed activation values in the memory. Aspects of the second floating-point format that can be adjusted include the number of bits used to express mantissas, exponent format, use of non-uniform mantissas, and/or use of outlier values to express some of the mantissas.

Type: Application

Filed: November 4, 2024

Publication date: February 20, 2025

Applicant: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Bita Darvish Rouhani, Eric S. Chung, Yiren Zhao, Amar Phanishayee, Ritchie Zhao
System for training an artificial neural network

Patent number: 12190235

Abstract: Embodiments of the present disclosure include a system for optimizing an artificial neural network by configuring a model, based on a plurality of training parameters, to execute a training process, monitoring a plurality of statistics produced upon execution of the training process, and adjusting one or more of the training parameters, based on one or more of the statistics, to maintain at least one of the statistics within a predetermined range. In some embodiments, artificial intelligence (AI) processors may execute a training process on a model, the training process having an associated set of training parameters. Execution of the training process may produce a plurality of statistics. Control processor(s) coupled to the AI processor(s) may receive the statistics, and in accordance therewith, adjust one or more of the training parameters to maintain at least one of the statistics within a predetermined range during execution of the training process.

Type: Grant

Filed: January 29, 2021

Date of Patent: January 7, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Maximilian Golub, Ritchie Zhao, Eric Chung, Douglas Burger, Bita Darvish Rouhani, Ge Yang, Nicolo Fusi
Adjusting activation compression for neural network training

Patent number: 12165038

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and, in particular, for adjusting floating-point formats used to store activation values during training. In certain examples of the disclosed technology, a computing system includes processors, memory, and a floating-point compressor in communication with the memory. The computing system is configured to produce a neural network comprising activation values expressed in a first floating-point format, select a second floating-point format for the neural network based on a performance metric, convert at least one of the activation values to the second floating-point format, and store the compressed activation values in the memory. Aspects of the second floating-point format that can be adjusted include the number of bits used to express mantissas, exponent format, use of non-uniform mantissas, and/or use of outlier values to express some of the mantissas.

Type: Grant

Filed: February 14, 2019

Date of Patent: December 10, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Bita Darvish Rouhani, Eric S. Chung, Yiren Zhao, Amar Phanishayee, Ritchie Zhao
OPTIMIZING DEEP NEURAL NETWORK MODELS BASED ON SPARSIFICATION AND QUANTIZATION

Publication number: 20240403644

Abstract: Embodiments of the present disclosure include systems and methods for optimizing deep neural network models based on sparsification and quantization. A device may identify a layer in a plurality of layers included in a neural network model, each layer in the plurality of layers comprising a plurality of weight values. The device may select a weight value from the plurality of weight values in the layer. The device may remove the weight value from the plurality of weight values in the layer to produce a modified version of the layer. The device may update remaining weight values in the plurality of weight values in the modified version of the layer, wherein removing the weight value and updating the remaining weight values provides greater compression of the neural network model and reduces loss of accuracy of the neural network model.

Type: Application

Filed: May 31, 2023

Publication date: December 5, 2024

Inventors: Rasoul SHAFIPOUR, Bita DARVISH ROUHANI, Eric Sen CHUNG, Douglas C. BURGER
DETERMINING SHARED EXPONENT VALUES FOR SHARED EXPONENT FLOATING POINT DATA TYPES

Publication number: 20240402993

Abstract: Embodiments of the present disclosure include systems and methods for determining shared exponent values for shared exponent floating point data types. A device may determine a shared global exponent value for a plurality of floating point numbers. The device may determine a sub exponent value from a plurality of candidate sub exponent values that minimizes a quantization error value determined based on a subset of the plurality of floating point numbers and a version of the subset of the plurality of floating point numbers quantized based on the shared global exponent and the sub exponent value. The determined sub exponent value is shared among the subset of the plurality of floating point numbers. The device may, based on the sub exponent value, represent the subset of the plurality of floating point numbers using a shared exponent floating point data type.

Type: Application

Filed: May 30, 2023

Publication date: December 5, 2024

Inventors: Rasoul SHAFIPOUR, Bita DARVISH ROUHANI, Marius Octavian STAN, Mathew Kent HALL, Preyas Janak SHAH, Ankit MORE, Eric Sen CHUNG, Douglas C. BURGER
DIGITAL WATERMARKING OF MACHINE LEARNING MODELS

Publication number: 20240242191

Abstract: A method may include embedding, in a hidden layer and/or an output layer of a first machine learning model, a first digital watermark. The first digital watermark may correspond to input samples altering the low probabilistic regions of an activation map associated with the hidden layer of the first machine learning model. Alternatively, the first digital watermark may correspond to input samples rarely encountered by the first machine learning model. The first digital watermark may be embedded in the first machine learning model by at least training, based on training data including the input samples, the first machine learning model. A second machine learning model may be determined to be a duplicate of the first machine learning model based on a comparison of the first digital watermark embedded in the first machine learning model and a second digital watermark extracted from the second machine learning model.

Type: Application

Filed: February 27, 2024

Publication date: July 18, 2024

Inventors: Bita Darvish Rouhani, Huili Chen, Farinaz Koushanfar
Digital watermarking of machine learning models

Patent number: 11972408

Abstract: A method may include embedding, in a hidden layer and/or an output layer of a first machine learning model, a first digital watermark. The first digital watermark may correspond to input samples altering the low probabilistic regions of an activation map associated with the hidden layer of the first machine learning model. Alternatively, the first digital watermark may correspond to input samples rarely encountered by the first machine learning model. The first digital watermark may be embedded in the first machine learning model by at least training, based on training data including the input samples, the first machine learning model. A second machine learning model may be determined to be a duplicate of the first machine learning model based on a comparison of the first digital watermark embedded in the first machine learning model and a second digital watermark extracted from the second machine learning model.

Type: Grant

Filed: March 21, 2019

Date of Patent: April 30, 2024

Assignee: The Regents of the University of California

Inventors: Bita Darvish Rouhani, Huili Chen, Farinaz Koushanfar
Partitioned machine learning architecture

Patent number: 11922313

Abstract: A system may include a processor and a memory. The memory may include program code that provides operations when executed by the processor. The operations may include: partitioning, based at least on a resource constraint of a platform, a global machine learning model into a plurality of local machine learning models; transforming training data to at least conform to the resource constraint of the platform; and training the global machine learning model by at least processing, at the platform, the transformed training data with a first of the plurality of local machine learning models.

Type: Grant

Filed: February 6, 2017

Date of Patent: March 5, 2024

Assignee: WILLIAM MARSH RICE UNIVERSITY

Inventors: Bita Darvish Rouhani, Azalia Mirhoseini, Farinaz Koushanfar
Hierarchical and shared exponent floating point data types

Patent number: 11886833

Abstract: Embodiments of the present disclosure include systems and methods for providing hierarchical and shared exponent floating point data types. First and second shared exponent values are determined based on exponent values of a plurality of floating point values. A third shared exponent value is determined based the first shared exponent value and the second shared exponent value. First and second difference values are determined based on the first shared exponent value, the second shared exponent value, and the third shared exponent value. Sign values and mantissa values are determined for the plurality of floating point values. The sign value and the mantissa value for each floating point value in the plurality of floating point values, the third shared exponent value, the first difference value, and the second difference value are stored in a data structure for a shared exponent floating point data type.

Type: Grant

Filed: June 28, 2021

Date of Patent: January 30, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Bita Darvish Rouhani, Venmugil Elango, Rasoul Shafipour, Jeremy Fowers, Ming Gang Liu, Jinwen Xi, Douglas C. Burger, Eric S. Chung
SYSTEMS AND METHODS FOR SPARSE MATRIX MULTIPLICATION

Publication number: 20230385374

Abstract: A method for sparse matrix multiplication comprises receiving a first block having M elements in a first dimension, and parsing the first block of M elements into a first set of B sub-blocks including MB elements in the first dimension. A first sparsity mask having S % sparsity is applied to the first block of elements, such that each of the first set of B sub-blocks has S % sparsity. A second block is received having M elements in a second dimension, and is parsed into a second set of B sub-blocks that include MB elements in the second dimension. A second sparsity mask having S?% sparsity is applied to the second block of elements, such that S?% of the second set of B sub-blocks have 100% sparsity and (100?S?)% of the second set of B sub-blocks have 0% sparsity. The first and second blocks are then matrix multiplied.

Type: Application

Filed: April 4, 2022

Publication date: November 30, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Venmugil ELANGO, Bita DARVISH ROUHANI, Eric S CHUNG, Douglas Christopher BURGER
MODEL CUSTOMIZATION OF TRANSFORMERS FOR IMPROVED EFFICIENCY

Publication number: 20230376725

Abstract: Embodiments of the present disclosure include systems and methods for providing model customizations of transformers for improved efficiency. A first set of settings for a transformer model is received. Based on the first set of settings, a second set of settings for the transformer model is determined. The first set of settings and the second set of settings are used to configure and train the transformer model.

Type: Application

Filed: May 19, 2022

Publication date: November 23, 2023

Inventors: Maral Mesmakhosroshahi, Bita Darvish Rouhani, Eric S. Chung, Douglas C. Burger, Maximilian Taylor Golub
SPARSIFYING VECTORS FOR NEURAL NETWORK MODELS BASED ON OVERLAPPING WINDOWS

Publication number: 20230334284

Abstract: Embodiments of the present disclosure include systems and methods for sparsifying vectors for neural network models based on overlapping windows. A window is used to select a first set of elements in a vector of elements. A first element is selected from the first set of elements having the highest absolute value. The window is slid along the vector by a defined number of elements. The window is used to select a second set of elements in the vector, wherein the first set of elements and the second set of elements share at least one common element. A second element is selected from the second set of elements having the highest absolute value.

Type: Application

Filed: May 27, 2022

Publication date: October 19, 2023

Inventors: Girish Vishnu VARATKAR, Ankit MORE, Bita DARVISH ROUHANI, Mattheus C. HEDDES, Gaurav AGRAWAL
MACHINE LEARNING MODEL PROCESSING BASED ON PERPLEXITY

Publication number: 20230316043

Abstract: A method for operating a machine learning model is presented. The machine learning model includes a plurality of sequential transformer blocks. The method comprises receiving input data at a transformer block and processing the input data via a mixture of experts layer. At an auxiliary classifier, a measure of perplexity of the processed input data is determined. Based on the determined measure of perplexity, one or more experts in a downstream transformer block that will subsequently process the input data are indicated. Weight matrices are then fetched for the indicated one or more experts.

Type: Application

Filed: March 31, 2022

Publication date: October 5, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Bita DARVISH ROUHANI, Douglas Christopher BURGER, Eric S. CHUNG
SPARSITY MASKING METHODS FOR NEURAL NETWORK TRAINING

Publication number: 20230316080

Abstract: A method is presented for training a neural network. For a weight matrix having integer dimensions M1 in a first dimension and an integer dimension M2 in a second dimension, a first balanced sparsity mask is generated that is an N1 of M1 mask in the first dimension. The first balanced sparsity mask is applied to the weight matrix during inference. A second balanced sparsity mask is generated for a transpose of the weight matrix. The second balanced sparsity mask is an N2 of M2 mask in the second dimension. The second balanced sparsity mask is applied to the transpose of the weight matrix during backpropagation.

Type: Application

Filed: March 29, 2022

Publication date: October 5, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Maximilian Taylor GOLUB, Bita DARVISH ROUHANI, Eric S CHUNG, Douglas Christopher BURGER
SPARSITY AND QUANTIZATION FOR DEEP NEURAL NETWORKS

Publication number: 20230316039

Abstract: A computing system is configured to implement a deep neural network comprising an input layer for receiving inputs applied to the deep neural network, an output layer for outputting inferences based on the received inputs, and a plurality of hidden layers interposed between the input layer and the output layer. A plurality of nodes selectively operate on the inputs to generate and cause outputting of the inferences, wherein operation of the nodes is controlled based on parameters of the deep neural network. A sparsity controller is configured to selectively apply a plurality of different sparsity states to control parameter density of the deep neural network. A quantization controller is configured to selectively quantize the parameters of the deep neural network in a manner that is sparsity-dependent, such that quantization applied to each parameter is based on which of the plurality of different sparsity states applies to the parameter.

Type: Application

Filed: May 23, 2022

Publication date: October 5, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Rasoul SHAFIPOUR, Bita DARVISH ROUHANI, Douglas Christopher BURGER, Ming Gang LIU, Eric S. CHUNG, Ritchie Zhao
MIXTURE OF EXPERTS MODELS WITH SPARSIFIED WEIGHTS

Publication number: 20230316042

Abstract: A method is presented for operating a machine learning model including one or more mixture of experts layers. The method comprises receiving one or more input data shards at a routing gate network for a mixture of experts layer comprising a plurality of neural network experts. One or more neural network experts in the mixture of experts layer is designated layer to evaluate each input data shard. For each designated neural network expert, a weight matrix is retrieved having a predetermined sparsity to generate a sparsified designated neural network expert. Each input data shard is evaluated with a respective sparsified designated neural network expert.

Type: Application

Filed: March 31, 2022

Publication date: October 5, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Bita DARVISH ROUHANI, Douglas Christopher BURGER, Eric S CHUNG
Training neural networks using mixed precision computations

Patent number: 11741362

Abstract: A system for training a neural network receives training data and performing lower precision format training calculations using lower precision format data at one or more training phases. One or more results from the lower precision format training calculations are converted to higher precision format data, and higher precision format training calculations are performed using the higher precision format data at one or more additional training phases. The neural network is modified using the results from the one or more additional training phases. The mixed precision format training calculations train the neural network more efficiently, while maintaining an overall accuracy.

Type: Grant

Filed: May 8, 2018

Date of Patent: August 29, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Eric Sen Chung, Bita Darvish Rouhani
TRAINING NEURAL NETWORK ACCELERATORS USING MIXED PRECISION DATA FORMATS

Publication number: 20230267319

Abstract: Technology related to training a neural network accelerator using mixed precision data formats is disclosed. In one example of the disclosed technology, a neural network accelerator is configured to accelerate a given layer of a multi-layer neural network. An input tensor for the given layer can be converted from a normal-precision floating-point format to a quantized-precision floating-point format. A tensor operation can be performed using the converted input tensor. A result of the tensor operation can be converted from the block floating-point format to the normal-precision floating-point format. The converted result can be used to generate an output tensor of the layer of the neural network, where the output tensor is in normal-precision floating-point format.

Type: Application

Filed: April 28, 2023

Publication date: August 24, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Bita Darvish Rouhani, Taesik Na, Eric S. Chung, Daniel Lo, Douglas C. Burger
Training neural network accelerators using mixed precision data formats

Patent number: 11676003

Abstract: Technology related to training a neural network accelerator using mixed precision data formats is disclosed. In one example of the disclosed technology, a neural network accelerator is configured to accelerate a given layer of a multi-layer neural network. An input tensor for the given layer can be converted from a normal-precision floating-point format to a quantized-precision floating-point format. A tensor operation can be performed using the converted input tensor. A result of the tensor operation can be converted from the block floating-point format to the normal-precision floating-point format. The converted result can be used to generate an output tensor of the layer of the neural network, where the output tensor is in normal-precision floating-point format.

Type: Grant

Filed: December 18, 2018

Date of Patent: June 13, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Bita Darvish Rouhani, Taesik Na, Eric S. Chung, Daniel Lo, Douglas C. Burger
Hardware-based machine learning acceleration

Patent number: 11651260

Abstract: A method for hardware-based machine learning acceleration is provided. The method may include partitioning, into a first batch of data and a second batch of data, an input data received at a hardware accelerator implementing a machine learning model. The input data may be a continuous stream of data samples. The input data may be partitioned based at least on a resource constraint of the hardware accelerator. An update of a probability density function associated with the machine learning model may be performed in real time. The probability density function may be updated by at least processing, by the hardware accelerator, the first batch of data before the second batch of data. An output may be generated based at least on the updated probability density function. The output may include a probability of encountering a data value. Related systems and articles of manufacture, including computer program products, are also provided.

Type: Grant

Filed: January 31, 2018

Date of Patent: May 16, 2023

Assignee: The Regents of the University of California

Inventors: Bita Darvish Rouhani, Mohammad Ghasemzadeh, Farinaz Koushanfar

1 2 3 next