Patents by Inventor Andy Wagner

Andy Wagner has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Low-Power and Privacy Key Resolution

Publication number: 20250088858

Abstract: A first wireless communication device having an application processor configured to generate, for transmission to a second wireless communication device, a first identity resolving key (IRK) that is unique to the second wireless communication device, wherein the first IRK indicates the second wireless communication device is allowed to perform find location operations with the first wireless communication device, a Bluetooth controller configured to perform Bluetooth scanning operations to receive a Bluetooth advertisement having a payload comprising an IRK and an always on processor (AOP) configured to compare the IRK to the first IRK.

Type: Application

Filed: September 5, 2024

Publication date: March 13, 2025

Inventors: Yann LY-GAGNON, Andy WAGNER, Anjali S. SANDESARA, Bryant LIU, Michelle J. NG, Yilok L. WONG
Low-Power and Privacy Key Resolution Using an Always on Processor

Publication number: 20250088859

Abstract: An apparatus configured to process an indication of a wireless communication device with which the apparatus is allowed to perform find location operations, wherein the indication comprises a first identity resolving key (IRK) that is unique to the wireless communication device, process a payload of an advertisement comprising an IRK and compare the IRK to the first IRK received by the always on processor.

Type: Application

Filed: September 5, 2024

Publication date: March 13, 2025

Inventors: Yann LY-GAGNON, Andy WAGNER, Anjali S SANDESARA, Bryant LIU, Michelle J NG, Yilok L WONG
Direct Acquisition of UWB Ranging Triggers Over Both Bluetooth and Internet

Publication number: 20250088834

Abstract: An apparatus configured to initiate an operation to locate a target device, generate, for transmission to the target device, Bluetooth discovery signals for detecting a proximity of the target device, generate, for transmission to the target device at substantially a same time as the Bluetooth discovery signals, a discovery message via a network connection, detect a Bluetooth discovery response or a response to the discovery message from the target device and trigger an ultra-wideband (UWB) ranging operation based on detection of the Bluetooth discovery response or response to the discovery message.

Type: Application

Filed: September 6, 2024

Publication date: March 13, 2025

Inventors: Vignesh Babu MOORTHY, Andy WAGNER, Michelle Julia NG, Qiang CHEN, Richard ONG, Robert W. BRUMLEY, Robert GOLSHAN, Yann LY-GAGNON
Compressing and decompressing data for language models

Patent number: 12182716

Abstract: Embodiments of the present disclosure include systems and methods for compressing and decompressing data generated by sub-blocks in a neural network. In some embodiment, an input matrix is received at a compression block in the neural network. The compression block compresses the input matrix into a compressed matrix and outputs the compressed matrix. The compressed matrix has a reduced dimensionality relative to a dimensionality of the input matrix. A decompression block retrieves the compressed matrix. The decompression block decompresses compressed matrix into a decompressed matrix and outputs the decompressed matrix. The decompressed matrix has a same dimensionality as the dimensionality of the input matrix. The compression and decompression blocks are optimized based on feedback received from the neural network.

Type: Grant

Filed: August 25, 2020

Date of Patent: December 31, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andy Wagner, Tiyasa Mitra, Marc Tremblay
Token Packing for Sequence Models

Publication number: 20240176953

Abstract: Embodiments of the present disclosure include systems and methods for packing tokens to train sequence models. In some embodiments, a plurality of datasets for training a sequence model is received. Each dataset in the plurality of datasets includes a sequence of correlated tokens. A set of training data is generated that includes a subset of a sequence of tokens from a first dataset in the plurality of datasets and a subset of a sequence of tokens from a second, different dataset in the plurality of datasets. The sequence model is trained using the set of training data.

Type: Application

Filed: February 2, 2024

Publication date: May 30, 2024

Inventors: Andy WAGNER, Tiyasa MITRA, Marc TREMBLAY
Determining position values for transformer models

Patent number: 11954448

Abstract: Embodiments of the present disclosure include systems and methods for determining position values for training data that is used to train transformer models. In some embodiments, a set of input data for training a transformer model is received. The set of input data comprises a set of tokens. Based on an offset value, a set of successive position values for the set of tokens is determined. Each position value in the set of successive position values represents a position of a token in the set of tokens relative to other tokens in the set of tokens. A set of training data is generated to comprise the set of tokens and the set of successive position values. The transformer model is trained using the set of training data.

Type: Grant

Filed: July 21, 2020

Date of Patent: April 9, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andy Wagner, Tiyasa Mitra, Marc Tremblay
Token packing for sequence models

Patent number: 11928429

Abstract: Embodiments of the present disclosure include systems and methods for packing tokens to train sequence models. In some embodiments, a plurality of datasets for training a sequence model is received. Each dataset in the plurality of datasets includes a sequence of correlated tokens. A set of training data is generated that includes a subset of a sequence of tokens from a first dataset in the plurality of datasets and a subset of a sequence of tokens from a second, different dataset in the plurality of datasets. The sequence model is trained using the set of training data.

Type: Grant

Filed: May 22, 2020

Date of Patent: March 12, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andy Wagner, Tiyasa Mitra, Marc Tremblay
Position masking for transformer models

Patent number: 11893469

Abstract: Embodiments of the present disclosure include systems and methods for training transformer models using position masking. In some embodiments, a set of data for training a transformer model is received. The set of data includes a sequence of tokens and a set of position values. Each position value in the set of position values represents a position of a token in the sequence of tokens relative to other tokens in the sequence of tokens. A subset of the set of position values in the set of data is selected. Each position value in the subset of the set of position values is replaced with a second defined value to form a second set of defined values. The transformer model is trained using the set of data.

Type: Grant

Filed: May 22, 2020

Date of Patent: February 6, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andy Wagner, Tiyasa Mitra, Marc Tremblay
Reducing hardware resource utilization for residual neural networks

Patent number: 11886983

Abstract: Embodiments of the present disclosure include systems and methods for reducing hardware resource utilization by residual neural networks. In some embodiments, a first matrix is received at a layer included in a neural network. The first matrix is compressed to produce a second matrix. The second matrix has a reduced dimensionality relative to a dimensionality of the first matrix. The second matrix is processed through a network block in the layer included in the neural network. The processed second matrix is expanded to produce a third matrix. The third matrix has a dimensionality that is equal to a dimensionality of the first matrix. The third matrix is added to the first matrix to produce a fourth matrix.

Type: Grant

Filed: August 25, 2020

Date of Patent: January 30, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andy Wagner, Tiyasa Mitra, Marc Tremblay
Pipelined neural network processing with continuous and asynchronous updates

Patent number: 11663444

Abstract: Systems and methods for pipelined neural network processing with continuous and asynchronous updates are described. A method for processing a neural network comprising L layers, where L is an integer greater than two, includes partitioning the L layers among a set of computing resources configured to process forward passes and backward passes associated with each of the L layers. The method further includes initiating processing of the forward passes and the backward passes using the set of computing resources. The method further includes upon completion of a first set of forward passes and a first set of backward passes associated with a first layer of the L layers, initiating update of parameters associated with the first layer when gradients are available for updating the parameters associated with the first layer without waiting to calculate gradients associated with any of remaining L layers.

Type: Grant

Filed: September 27, 2019

Date of Patent: May 30, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andy Wagner, Tiyasa Mitra, Saurabh M. Kulkarni, Marc Tremblay, Sujeeth S. Bharadwaj
Systems and methods for training a neural network

Patent number: 11610120

Abstract: Embodiments of the present disclosure include systems and methods for training neural networks. In one embodiment, neural network may receive input data and produce output results in response to the input data and weights of the neural network. An error is determined at an output of the neural network based on the output results. The error is propagated in a reverse direction through the neural network from the output and one or more intermediate outputs to adjust the weights.

Type: Grant

Filed: May 8, 2020

Date of Patent: March 21, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andy Wagner, Tiyasa Mitra, Marc Tremblay
Compressing weights for distributed neural networks

Patent number: 11537890

Abstract: Embodiments of the present disclosure include systems and methods for compressing weights for distributed neural networks. In some embodiments, a first network comprising a first set of weights is trained using a set of training data. A second network comprising a second set of weights is trained using the set of training data. A number of weights in the first set of weights is greater than a number of weights in the second set of weights. The first set of weights are adjusted based on a first loss determined by the first network and a second loss determined by the second network. The second set of weights are adjusted based on the first loss determined by the first network and the second loss determined by the second network. Values of the second set of weights are sent to a computing system.

Type: Grant

Filed: September 9, 2020

Date of Patent: December 27, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andy Wagner, Tiyasa Mitra, Marc Tremblay
TRAINING MASKED LANGUAGE MODELS BASED ON PARTIAL SEQUENCES OF TOKENS

Publication number: 20220382978

Abstract: Embodiments of the present disclosure include systems and methods for training masked language models based on partial sequences of tokens. A sequence of tokens for training a transformer model is received. A defined proportion of the sequence of tokens is selected. Each value of the defined proportion of the sequence of tokens is replaced with a defined value.

Type: Application

Filed: May 28, 2021

Publication date: December 1, 2022

Inventors: Andy WAGNER, Tiyasa MITRA, Fanny NINA PARAVECINO
EFFICIENT WEIGHT UPDATES

Publication number: 20220222521

Abstract: Weights may be updated during training of a neural network artificial intelligence model. Certain techniques split the training data into mini-batches, process each mini-batch in a pipeline, and then apply the weight updates after processing of the mini-batch completes. However, waiting for the mini-batch to complete before applying the weight updates causes significant delays during a ramp-down period as the data must be flushed out of the pipeline and then again during a ramp-up period as the pipeline is being filled with data from the next mini-batch. The present disclosure avoids such delays and improves performance by applying the weight updates at specific intervals, without splitting the data into mini-batches. The updated weights may be applied during a steady-state operation of the pipeline.

Type: Application

Filed: January 13, 2021

Publication date: July 14, 2022

Inventors: Andy WAGNER, Tiyasa MITRA
DECIMATING HIDDEN LAYERS FOR TRAINING TRANSFORMER MODELS

Publication number: 20220108162

Abstract: Embodiments of the present disclosure include systems and methods for decimating hidden layers for training transformer models. In some embodiments, input data for training a transform model is received receive at a transformer layer included in the transformer model. The transformer layer comprises a hidden layer. The hidden layer comprises a set of neurons configured to process training data. A subset of the set of neurons of the hidden layer is selected. Only the subset of the set of neurons of the hidden layer are used to train the transformer model with the input data.

Type: Application

Filed: October 1, 2020

Publication date: April 7, 2022

Inventors: Andy WAGNER, Tiyasa MITRA, Marc TREMBLAY
COMPRESSING WEIGHTS FOR DISTRIBUTED NEURAL NETWORKS

Publication number: 20220076112

Abstract: Embodiments of the present disclosure include systems and methods for compressing weights for distributed neural networks. In some embodiments, a first network comprising a first set of weights is trained using a set of training data. A second network comprising a second set of weights is trained using the set of training data. A number of weights in the first set of weights is greater than a number of weights in the second set of weights. The first set of weights are adjusted based on a first loss determined by the first network and a second loss determined by the second network. The second set of weights are adjusted based on the first loss determined by the first network and the second loss determined by the second network. Values of the second set of weights are sent to a computing system.

Type: Application

Filed: September 9, 2020

Publication date: March 10, 2022

Inventors: Andy WAGNER, Tiyasa MITRA, Marc TREMBLAY
FORCING WEIGHTS OF TRANSFORMER MODEL LAYERS

Publication number: 20220076127

Abstract: Embodiments of the present disclosure include systems and methods for forcing weights of transformer model layers when training a transformer model. In some embodiments, input data is received at a first layer included in a transformer model. The input data is processed through the first layer of the transformer model to produce a first output data. The first output data is processed through the first layer of the transformer model to produce a second output data. The first output data is processed through a second layer included in the transformer model to produce a third output data. A difference is calculated between the second output data and the third output data. Weights included in the first layer of the transformer model are adjusted based on the calculated difference.

Type: Application

Filed: September 9, 2020

Publication date: March 10, 2022

Inventors: Andy WAGNER, Tiyasa MITRA, Marc TREMBLAY
Reducing Hardware Resource Utilization for Residual Neural Networks

Publication number: 20220067490

Abstract: Embodiments of the present disclosure include systems and methods for reducing hardware resource utilization by residual neural networks. In some embodiments, a first matrix is received at a layer included in a neural network. The first matrix is compressed to produce a second matrix. The second matrix has a reduced dimensionality relative to a dimensionality of the first matrix. The second matrix is processed through a network block in the layer included in the neural network. The processed second matrix is expanded to produce a third matrix. The third matrix has a dimensionality that is equal to a dimensionality of the first matrix. The third matrix is added to the first matrix to produce a fourth matrix.

Type: Application

Filed: August 25, 2020

Publication date: March 3, 2022

Inventors: Andy WAGNER, Tiyasa MITRA, Marc TREMBLAY
Compressing and Decompressing Data for Language Models

Publication number: 20220067529

Abstract: Embodiments of the present disclosure include systems and methods for compressing and decompressing data generated by sub-blocks in a neural network. In some embodiment, an input matrix is received at a compression block in the neural network. The compression block compresses the input matrix into a compressed matrix and outputs the compressed matrix. The compressed matrix has a reduced dimensionality relative to a dimensionality of the input matrix. A decompression block retrieves the compressed matrix. The decompression block decompresses compressed matrix into a decompressed matrix and outputs the decompressed matrix. The decompressed matrix has a same dimensionality as the dimensionality of the input matrix. The compression and decompression blocks are optimized based on feedback received from the neural network.

Type: Application

Filed: August 25, 2020

Publication date: March 3, 2022

Inventors: Andy WAGNER, Tiyasa MITRA, Marc TREMBLAY
MULTI-TOKEN EMBEDDING AND CLASSIFIER FOR MASKED LANGUAGE MODELS

Publication number: 20220067280

Abstract: Embodiments of the present disclosure include systems and methods for training transformer models. In some embodiments, a set of input data are received. The input data comprises a plurality of tokens including masked tokens. The plurality of tokens in an embedding layer are processed. The embedding layer is coupled to a transformer layer. The plurality of tokens are processed in the transformer layer, which is coupled to a classifier layer. The plurality of tokens are processed in the classifier layer. The classifier layer is coupled to a loss layer. At least one of the embedding layer and the classifier layer combine masked tokens at a current position with tokens at one or more of a previous position and a subsequent position.

Type: Application

Filed: August 25, 2020

Publication date: March 3, 2022

Inventors: Andy Wagner, Tiyasa Mitra, Marc Tremblay

1 2 next