Patents by Inventor Tiyasa Mitra

Tiyasa Mitra has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DEEP FUSION OF KERNEL EXECUTION

Publication number: 20240126617

Abstract: Embodiments of the present disclosure include techniques for machine language processing. In one embodiment, the present disclosure includes configuring functional modules on a machine learning processor to execute a plurality of machine learning (ML) operations during a plurality of time segments. During the time segments, a first portion of the ML operations execute serially and at least one other ML operation executes during at least a majority of the time of each of the time segments. Serial ML operations may be processed simultaneously with the at least one other ML operation.

Type: Application

Filed: October 14, 2022

Publication date: April 18, 2024

Inventors: Haishan ZHU, Preyas Janak SHAH, Tiyasa MITRA, Eric S. CHUNG
Determining position values for transformer models

Patent number: 11954448

Abstract: Embodiments of the present disclosure include systems and methods for determining position values for training data that is used to train transformer models. In some embodiments, a set of input data for training a transformer model is received. The set of input data comprises a set of tokens. Based on an offset value, a set of successive position values for the set of tokens is determined. Each position value in the set of successive position values represents a position of a token in the set of tokens relative to other tokens in the set of tokens. A set of training data is generated to comprise the set of tokens and the set of successive position values. The transformer model is trained using the set of training data.

Type: Grant

Filed: July 21, 2020

Date of Patent: April 9, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andy Wagner, Tiyasa Mitra, Marc Tremblay
Token packing for sequence models

Patent number: 11928429

Abstract: Embodiments of the present disclosure include systems and methods for packing tokens to train sequence models. In some embodiments, a plurality of datasets for training a sequence model is received. Each dataset in the plurality of datasets includes a sequence of correlated tokens. A set of training data is generated that includes a subset of a sequence of tokens from a first dataset in the plurality of datasets and a subset of a sequence of tokens from a second, different dataset in the plurality of datasets. The sequence model is trained using the set of training data.

Type: Grant

Filed: May 22, 2020

Date of Patent: March 12, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andy Wagner, Tiyasa Mitra, Marc Tremblay
Position masking for transformer models

Patent number: 11893469

Abstract: Embodiments of the present disclosure include systems and methods for training transformer models using position masking. In some embodiments, a set of data for training a transformer model is received. The set of data includes a sequence of tokens and a set of position values. Each position value in the set of position values represents a position of a token in the sequence of tokens relative to other tokens in the sequence of tokens. A subset of the set of position values in the set of data is selected. Each position value in the subset of the set of position values is replaced with a second defined value to form a second set of defined values. The transformer model is trained using the set of data.

Type: Grant

Filed: May 22, 2020

Date of Patent: February 6, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andy Wagner, Tiyasa Mitra, Marc Tremblay
Reducing hardware resource utilization for residual neural networks

Patent number: 11886983

Abstract: Embodiments of the present disclosure include systems and methods for reducing hardware resource utilization by residual neural networks. In some embodiments, a first matrix is received at a layer included in a neural network. The first matrix is compressed to produce a second matrix. The second matrix has a reduced dimensionality relative to a dimensionality of the first matrix. The second matrix is processed through a network block in the layer included in the neural network. The processed second matrix is expanded to produce a third matrix. The third matrix has a dimensionality that is equal to a dimensionality of the first matrix. The third matrix is added to the first matrix to produce a fourth matrix.

Type: Grant

Filed: August 25, 2020

Date of Patent: January 30, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andy Wagner, Tiyasa Mitra, Marc Tremblay
Pipelined neural network processing with continuous and asynchronous updates

Patent number: 11663444

Abstract: Systems and methods for pipelined neural network processing with continuous and asynchronous updates are described. A method for processing a neural network comprising L layers, where L is an integer greater than two, includes partitioning the L layers among a set of computing resources configured to process forward passes and backward passes associated with each of the L layers. The method further includes initiating processing of the forward passes and the backward passes using the set of computing resources. The method further includes upon completion of a first set of forward passes and a first set of backward passes associated with a first layer of the L layers, initiating update of parameters associated with the first layer when gradients are available for updating the parameters associated with the first layer without waiting to calculate gradients associated with any of remaining L layers.

Type: Grant

Filed: September 27, 2019

Date of Patent: May 30, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andy Wagner, Tiyasa Mitra, Saurabh M. Kulkarni, Marc Tremblay, Sujeeth S. Bharadwaj
Systems and methods for training a neural network

Patent number: 11610120

Abstract: Embodiments of the present disclosure include systems and methods for training neural networks. In one embodiment, neural network may receive input data and produce output results in response to the input data and weights of the neural network. An error is determined at an output of the neural network based on the output results. The error is propagated in a reverse direction through the neural network from the output and one or more intermediate outputs to adjust the weights.

Type: Grant

Filed: May 8, 2020

Date of Patent: March 21, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andy Wagner, Tiyasa Mitra, Marc Tremblay
Token-position handling for sequence based neural networks

Patent number: 11544537

Abstract: Embodiments of the present disclosure include a method for token-position handling comprising: processing a first sequence of tokens to produce a second sequence of tokens, wherein the second sequence of tokens has a smaller number of tokens than the first sequence of tokens; masking at least some tokens in the second sequence to produce masked tokens; moving the masked tokens to the beginning of the second sequence to produce a third sequence; encoding tokens in the third sequence into a set of numeric vectors in a first array; and processing the first array in a transformer neural network to determine correlations among the third sequence, the processing the first array producing a second array.

Type: Grant

Filed: April 14, 2020

Date of Patent: January 3, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andrew Wagner, Tiyasa Mitra, Sujeeth Subramanya Bharadwaj, Marc Tremblay, Saurabh Mohan Kulkarni
Compressing weights for distributed neural networks

Patent number: 11537890

Abstract: Embodiments of the present disclosure include systems and methods for compressing weights for distributed neural networks. In some embodiments, a first network comprising a first set of weights is trained using a set of training data. A second network comprising a second set of weights is trained using the set of training data. A number of weights in the first set of weights is greater than a number of weights in the second set of weights. The first set of weights are adjusted based on a first loss determined by the first network and a second loss determined by the second network. The second set of weights are adjusted based on the first loss determined by the first network and the second loss determined by the second network. Values of the second set of weights are sent to a computing system.

Type: Grant

Filed: September 9, 2020

Date of Patent: December 27, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andy Wagner, Tiyasa Mitra, Marc Tremblay
Executing large artificial intelligence models on memory-constrained devices

Patent number: 11520592

Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be manually or automatically adjusted to reduce the communication overhead.

Type: Grant

Filed: September 20, 2019

Date of Patent: December 6, 2022

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Gautham Popuri, Layali Rashid, Tiyasa Mitra, Mohit Mittal, Maral Mesmakhosroshahi
TRAINING MASKED LANGUAGE MODELS BASED ON PARTIAL SEQUENCES OF TOKENS

Publication number: 20220382978

Abstract: Embodiments of the present disclosure include systems and methods for training masked language models based on partial sequences of tokens. A sequence of tokens for training a transformer model is received. A defined proportion of the sequence of tokens is selected. Each value of the defined proportion of the sequence of tokens is replaced with a defined value.

Type: Application

Filed: May 28, 2021

Publication date: December 1, 2022

Inventors: Andy WAGNER, Tiyasa MITRA, Fanny NINA PARAVECINO
Spread neural networks

Patent number: 11475303

Abstract: Techniques for training neural networks are provided. According to one set of embodiments, a first array is processed in a spreading component to produce a second array, where a first dimension of the first array corresponds to at least one sequence of approximately orthogonal numeric vectors representing tokens, and where the spreading component combines values along the first dimension. The second array is processed in a transformer neural network to determine correlations between the sequence, which produces a third array. One or more batches of the third array are processed in a de-spreading component to produce a fourth array.

Type: Grant

Filed: April 14, 2020

Date of Patent: October 18, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andrew Wagner, Tiyasa Mitra, Sujeeth Subramanya Bharadwaj, Saurabh Mohan Kulkarni, Marc Tremblay
System and method for gradient accumulation with free momentum

Patent number: 11449752

Abstract: Methods for gradient accumulation with free momentum are performed by systems and devices during neural network model training. An accumulator that includes a processor circuit and a memory element generates free momentum between passes of a neural network model training process. The processor circuit receives a difference weight (gradient) and generates a first input by applying a weighting parameter thereto. The processor circuit obtains a prior weight from the memory element and generates a second input by applying another weighting parameter thereto. The processor circuit generates a filtered input with momentum by filtering the first and second input. The memory element generates a stored next pass weight by accumulating the filtered input with the prior weight. A computing resource then processes the next pass of the neural network model training using the stored next pass weight. The methods, systems, and devices are applicable to pipelined model parallelism training processes.

Type: Grant

Filed: March 31, 2020

Date of Patent: September 20, 2022

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Andrew Wagner, Marc Tremblay, Saurabh M. Kulkarni, Tiyasa Mitra, Sujeeth S. Bharadwaj
EXECUTING LARGE ARTIFICIAL INTELLIGENCE MODELS ON MEMORY-CONSTRAINED DEVICES

Publication number: 20220276871

Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be manually or automatically adjusted to reduce the communication overhead.

Type: Application

Filed: May 18, 2022

Publication date: September 1, 2022

Inventors: Bharadwaj PUDIPEDDI, Marc TREMBLAY, Gautham POPURI, Layali RASHID, Tiyasa MITRA, Mohit MITTAL, Maral MESMAKHOSROSHAHI
EFFICIENT WEIGHT UPDATES

Publication number: 20220222521

Abstract: Weights may be updated during training of a neural network artificial intelligence model. Certain techniques split the training data into mini-batches, process each mini-batch in a pipeline, and then apply the weight updates after processing of the mini-batch completes. However, waiting for the mini-batch to complete before applying the weight updates causes significant delays during a ramp-down period as the data must be flushed out of the pipeline and then again during a ramp-up period as the pipeline is being filled with data from the next mini-batch. The present disclosure avoids such delays and improves performance by applying the weight updates at specific intervals, without splitting the data into mini-batches. The updated weights may be applied during a steady-state operation of the pipeline.

Type: Application

Filed: January 13, 2021

Publication date: July 14, 2022

Inventors: Andy WAGNER, Tiyasa MITRA
DECIMATING HIDDEN LAYERS FOR TRAINING TRANSFORMER MODELS

Publication number: 20220108162

Abstract: Embodiments of the present disclosure include systems and methods for decimating hidden layers for training transformer models. In some embodiments, input data for training a transform model is received receive at a transformer layer included in the transformer model. The transformer layer comprises a hidden layer. The hidden layer comprises a set of neurons configured to process training data. A subset of the set of neurons of the hidden layer is selected. Only the subset of the set of neurons of the hidden layer are used to train the transformer model with the input data.

Type: Application

Filed: October 1, 2020

Publication date: April 7, 2022

Inventors: Andy WAGNER, Tiyasa MITRA, Marc TREMBLAY
FORCING WEIGHTS OF TRANSFORMER MODEL LAYERS

Publication number: 20220076127

Abstract: Embodiments of the present disclosure include systems and methods for forcing weights of transformer model layers when training a transformer model. In some embodiments, input data is received at a first layer included in a transformer model. The input data is processed through the first layer of the transformer model to produce a first output data. The first output data is processed through the first layer of the transformer model to produce a second output data. The first output data is processed through a second layer included in the transformer model to produce a third output data. A difference is calculated between the second output data and the third output data. Weights included in the first layer of the transformer model are adjusted based on the calculated difference.

Type: Application

Filed: September 9, 2020

Publication date: March 10, 2022

Inventors: Andy WAGNER, Tiyasa MITRA, Marc TREMBLAY
COMPRESSING WEIGHTS FOR DISTRIBUTED NEURAL NETWORKS

Publication number: 20220076112

Abstract: Embodiments of the present disclosure include systems and methods for compressing weights for distributed neural networks. In some embodiments, a first network comprising a first set of weights is trained using a set of training data. A second network comprising a second set of weights is trained using the set of training data. A number of weights in the first set of weights is greater than a number of weights in the second set of weights. The first set of weights are adjusted based on a first loss determined by the first network and a second loss determined by the second network. The second set of weights are adjusted based on the first loss determined by the first network and the second loss determined by the second network. Values of the second set of weights are sent to a computing system.

Type: Application

Filed: September 9, 2020

Publication date: March 10, 2022

Inventors: Andy WAGNER, Tiyasa MITRA, Marc TREMBLAY
Compressing and Decompressing Data for Language Models

Publication number: 20220067529

Abstract: Embodiments of the present disclosure include systems and methods for compressing and decompressing data generated by sub-blocks in a neural network. In some embodiment, an input matrix is received at a compression block in the neural network. The compression block compresses the input matrix into a compressed matrix and outputs the compressed matrix. The compressed matrix has a reduced dimensionality relative to a dimensionality of the input matrix. A decompression block retrieves the compressed matrix. The decompression block decompresses compressed matrix into a decompressed matrix and outputs the decompressed matrix. The decompressed matrix has a same dimensionality as the dimensionality of the input matrix. The compression and decompression blocks are optimized based on feedback received from the neural network.

Type: Application

Filed: August 25, 2020

Publication date: March 3, 2022

Inventors: Andy WAGNER, Tiyasa MITRA, Marc TREMBLAY
MULTI-TOKEN EMBEDDING AND CLASSIFIER FOR MASKED LANGUAGE MODELS

Publication number: 20220067280

Abstract: Embodiments of the present disclosure include systems and methods for training transformer models. In some embodiments, a set of input data are received. The input data comprises a plurality of tokens including masked tokens. The plurality of tokens in an embedding layer are processed. The embedding layer is coupled to a transformer layer. The plurality of tokens are processed in the transformer layer, which is coupled to a classifier layer. The plurality of tokens are processed in the classifier layer. The classifier layer is coupled to a loss layer. At least one of the embedding layer and the classifier layer combine masked tokens at a current position with tokens at one or more of a previous position and a subsequent position.

Type: Application

Filed: August 25, 2020

Publication date: March 3, 2022

Inventors: Andy Wagner, Tiyasa Mitra, Marc Tremblay

1 2 next