Patents by Inventor Jinwen Xi

Jinwen Xi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEMS AND METHODS FOR HARDWARE ACCELERATION OF DATA MASKING

Publication number: 20240296133

Abstract: A field programmable gate array (FPGA) including a configurable interconnect fabric connecting a plurality of logic blocks, the configurable interconnect fabric and the logic blocks being configured to implement a data masking circuit configured to: receive input data including data values at a plurality of indices of the input data; select between a data value of the data values and an alternative value using a masking multiplexer to generate masked data, the masking multiplexer being controlled by a mask value of a plurality of mask values at indices corresponding to the indices of the input data; and output the masked data. In some examples, the configurable interconnect fabric and the logic blocks are further configured to implement a mask generation circuit configured to generate the mask values. In some examples, the mask values are received from external memory.

Type: Application

Filed: February 12, 2024

Publication date: September 5, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Jinwen XI, Ming Gang LIU, Eric S. CHUNG
SYSTEMS AND METHODS FOR RETIRING IN MULTI-STREAM DATA MOVEMENT

Publication number: 20240231894

Abstract: A hardware retire circuit includes: one or more input queues, each queue corresponding to an input stream of tasks and being configured to store input task identifiers corresponding to tasks of the input stream; and processing logic configured to: receive a completed task event; determine whether a completed task queue identifier and a completed task identifier of the completed task event match an input task identifier of an input task at a head of an input queue having an input queue identifier corresponding to the completed task queue identifier; and in response to determining a match, pop the task at the head of the input queue and output a task retirement event corresponding to the input task.

Type: Application

Filed: October 21, 2022

Publication date: July 11, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Yi LUO, Jinwen XI, Xuan ZUO, Haishan ZHU, Eric Sen CHUNG
SYSTEMS AND METHODS FOR RETIRING IN MULTI-STREAM DATA MOVEMENT

Publication number: 20240134683

Abstract: A hardware retire circuit includes: one or more input queues, each queue corresponding to an input stream of tasks and being configured to store input task identifiers corresponding to tasks of the input stream; and processing logic configured to: receive a completed task event; determine whether a completed task queue identifier and a completed task identifier of the completed task event match an input task identifier of an input task at a head of an input queue having an input queue identifier corresponding to the completed task queue identifier; and in response to determining a match, pop the task at the head of the input queue and output a task retirement event corresponding to the input task.

Type: Application

Filed: October 20, 2022

Publication date: April 25, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Yi LUO, Jinwen XI, Xuan ZUO, Haishan ZHU, Eric Sen CHUNG
Systems and methods for hardware acceleration of data masking using a field programmable gate array

Patent number: 11934327

Abstract: A field programmable gate array (FPGA) including a configurable interconnect fabric connecting a plurality of logic blocks, the configurable interconnect fabric and the logic blocks being configured to implement a data masking circuit configured to: receive input data including data values at a plurality of indices of the input data; select between a data value of the data values and an alternative value using a masking multiplexer to generate masked data, the masking multiplexer being controlled by a mask value of a plurality of mask values at indices corresponding to the indices of the input data; and output the masked data. In some examples, the configurable interconnect fabric and the logic blocks are further configured to implement a mask generation circuit configured to generate the mask values. In some examples, the mask values are received from external memory.

Type: Grant

Filed: December 22, 2021

Date of Patent: March 19, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jinwen Xi, Ming Gang Liu, Eric S. Chung
Hierarchical and shared exponent floating point data types

Patent number: 11886833

Abstract: Embodiments of the present disclosure include systems and methods for providing hierarchical and shared exponent floating point data types. First and second shared exponent values are determined based on exponent values of a plurality of floating point values. A third shared exponent value is determined based the first shared exponent value and the second shared exponent value. First and second difference values are determined based on the first shared exponent value, the second shared exponent value, and the third shared exponent value. Sign values and mantissa values are determined for the plurality of floating point values. The sign value and the mantissa value for each floating point value in the plurality of floating point values, the third shared exponent value, the first difference value, and the second difference value are stored in a data structure for a shared exponent floating point data type.

Type: Grant

Filed: June 28, 2021

Date of Patent: January 30, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Bita Darvish Rouhani, Venmugil Elango, Rasoul Shafipour, Jeremy Fowers, Ming Gang Liu, Jinwen Xi, Douglas C. Burger, Eric S. Chung
SYSTEMS AND METHODS FOR HARDWARE ACCELERATION OF MASKING AND NORMALIZING DATA WITH A TRIANGULAR INPUT MASK

Publication number: 20230376663

Abstract: A field programmable gate array including a configurable interconnect fabric connecting logic blocks implementing a circuit to: receive input data including data values organized into rows and columns, each row having N data values; select R[i] unmasked data values of a row of the input data in accordance with a mask and an index i of the row; select N?[i] unmasked data values of another row of the input data in accordance with the mask and an index of the another row; merge the R[i] unmasked data values of the row and the N?[i] data values of the another row into a combined data vector of N data values; and compute R[i] normalized values based on the R[i] unmasked data values of the combined data vector and N?[i] normalized values based on the N?[i] data values of the combined data vector to generate N normalized data values.

Type: Application

Filed: May 23, 2022

Publication date: November 23, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventor: Jinwen XI
HARDWARE-ASSISTED GRADIENT OPTIMIZATION USING STREAMED GRADIENTS

Publication number: 20230274130

Abstract: Systems and methods related to hardware-assisted gradient optimization using streamed gradients are described. An example method in a system comprising a memory configured to store weights associated with a neural network model comprising L layers, where L is an integer greater than one, a gradient optimizer, and a plurality of workers is described. The method includes during a single burst cycle moving a first set of gradients, received from each of the plurality of workers, from at least one gradient buffer to the gradient optimizer and moving weights from at least one buffer, coupled to the memory, to the gradient optimizer. The method further includes during the single burst cycle writing back the new weights, calculated by the gradient optimizer, to the memory. The method further includes during the single burst cycle transmitting the new weights, from the gradient optimizer, to each of the plurality of workers.

Type: Application

Filed: May 3, 2023

Publication date: August 31, 2023

Inventors: Jinwen XI, Bharadwaj PUDIPEDDI, Marc TREMBLAY
DUAL-MOMENTUM GRADIENT OPTIMIZATION WITH REDUCED MEMORY REQUIREMENTS

Publication number: 20230244945

Abstract: Systems and methods related to dual-momentum gradient optimization with reduced memory requirements are described. An example method in a system comprising a gradient optimizer and a memory configured to store momentum values associated with a neural network model comprising L layers is described. The method includes retrieving from the memory a first set of momentum values and a second set of momentum values, corresponding to a layer of the neural network model, having a selected storage format. The method further includes converting the first set of momentum values to a third set of momentum values having a training format associated with the gradient optimizer and converting the second set of momentum values to a fourth set of momentum values having a training format associated with the gradient optimizer. The method further includes performing gradient optimization using the third set of momentum values and the fourth set of momentum values.

Type: Application

Filed: April 11, 2023

Publication date: August 3, 2023

Inventors: Jinwen XI, Bharadwaj PUDIPEDDI, Marc TREMBLAY
SYSTEMS AND METHODS FOR HARDWARE ACCELERATION OF DATA MASKING

Publication number: 20230195665

Abstract: A field programmable gate array (FPGA) including a configurable interconnect fabric connecting a plurality of logic blocks, the configurable interconnect fabric and the logic blocks being configured to implement a data masking circuit configured to: receive input data including data values at a plurality of indices of the input data; select between a data value of the data values and an alternative value using a masking multiplexer to generate masked data, the masking multiplexer being controlled by a mask value of a plurality of mask values at indices corresponding to the indices of the input data; and output the masked data. In some examples, the configurable interconnect fabric and the logic blocks are further configured to implement a mask generation circuit configured to generate the mask values. In some examples, the mask values are received from external memory.

Type: Application

Filed: December 22, 2021

Publication date: June 22, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Jinwen XI, Ming Gang LIU, Eric S. CHUNG
FUSING OPERATORS FOR NEURAL NETWORK HARDWARE ACCELERATORS

Publication number: 20230195833

Abstract: Embodiments of the present disclosure include systems and methods for fusing operators for neural network hardware accelerators. A plurality of vector multiplication operations in a data path of a mapping function included in a neural network are identified. The plurality of vector multiplication operations are combined into a single vector multiplication operation in the data path of the mapping function.

Type: Application

Filed: December 22, 2021

Publication date: June 22, 2023

Inventors: Jinwen XI, Eric S. CHUNG
Hardware-assisted gradient optimization using streamed gradients

Patent number: 11681905

Abstract: Systems and methods related to hardware-assisted gradient optimization using streamed gradients are described. An example method in a system comprising a memory configured to store weights associated with a neural network model comprising L layers, where L is an integer greater than one, a gradient optimizer, and a plurality of workers is described. The method includes during a single burst cycle moving a first set of gradients, received from each of the plurality of workers, from at least one gradient buffer to the gradient optimizer and moving weights from at least one buffer, coupled to the memory, to the gradient optimizer. The method further includes during the single burst cycle writing back the new weights, calculated by the gradient optimizer, to the memory. The method further includes during the single burst cycle transmitting the new weights, from the gradient optimizer, to each of the plurality of workers.

Type: Grant

Filed: March 23, 2020

Date of Patent: June 20, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jinwen Xi, Bharadwaj Pudipeddi, Marc Tremblay
Systems and methods for error recovery

Patent number: 11675654

Abstract: Embodiments of the present disclosure include an error recovery method comprising detecting a computing error, restarting a first artificial intelligence processor of a plurality of artificial intelligence processors processing a data set, and loading a model in the artificial intelligence processor, wherein the model corresponds to a same model processed by the plurality of artificial intelligence processors during a previous processing iteration by the plurality of artificial intelligence processors on data from the data set.

Type: Grant

Filed: December 16, 2021

Date of Patent: June 13, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Bharadwaj Pudipeddi, Maral Mesmakhosroshahi, Jinwen Xi, Saurabh M. Kulkarni, Marc Tremblay, Matthias Baenninger, Nuno Claudino Pereira Lopes
Dual-momentum gradient optimization with reduced memory requirements

Patent number: 11651228

Abstract: Systems and methods related to dual-momentum gradient optimization with reduced memory requirements are described. An example method in a system comprising a gradient optimizer and a memory configured to store momentum values associated with a neural network model comprising L layers is described. The method includes retrieving from the memory a first set of momentum values and a second set of momentum values, corresponding to a layer of the neural network model, having a selected storage format. The method further includes converting the first set of momentum values to a third set of momentum values having a training format associated with the gradient optimizer and converting the second set of momentum values to a fourth set of momentum values having a training format associated with the gradient optimizer. The method further includes performing gradient optimization using the third set of momentum values and the fourth set of momentum values.

Type: Grant

Filed: April 17, 2020

Date of Patent: May 16, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jinwen Xi, Bharadwaj Pudipeddi, Marc Tremblay
SYSTEMS AND METHODS FOR ACCELERATING THE COMPUTATION OF THE EXPONENTIAL FUNCTION

Publication number: 20230106651

Abstract: Aspects of embodiments of the present disclosure relate to a field programmable gate array (FPGA) configured to implement an exponential function data path including: an input scaling stage including constant shifters and integer adders to scale a mantissa portion of an input floating-point value by approximately log2 e to compute a scaled mantissa value, where e is Euler's number; and an exponential stage including barrel shifters and an exponential lookup table to: extract an integer portion and a fractional portion from the scaled mantissa value based on the exponent portion of the input floating-point value; apply a bias shift to the integer portion to compute a result exponent portion of a result floating-point value; lookup a result mantissa portion of the result floating-point value in the exponential lookup table based on the fractional portion; and combine the result exponent portion and the result mantissa portion to generate the result floating-point value.

Type: Application

Filed: September 28, 2021

Publication date: April 6, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Jinwen XI, Ritchie ZHAO, Ming Gang LIU, Eric S. CHUNG
Lossless exponent and lossy mantissa weight compression for training deep neural networks

Patent number: 11615301

Abstract: Systems, methods, and apparatuses are provided for compressing values. A plurality of parameters may be obtained from a memory, each parameter comprising a floating-point number that is used in a relationship between artificial neurons or nodes in a model. A mantissa value and an exponent value may be extracted from each floating-point number to generate a set of mantissa values and a set of exponent values. The set of mantissa values may be compressed to generate a mantissa lookup table (LUT) and a plurality of mantissa LUT index values. The set of exponent values may be encoded to generate an exponent LUT and a plurality of exponent LUT index values. The mantissa LUT, mantissa LUT index values, exponent LUT, and exponent LUT index values may be provided to one or more processing entities to train the model.

Type: Grant

Filed: September 3, 2019

Date of Patent: March 28, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Jinwen Xi, Bharadwaj Pudipeddi, Marc Tremblay
DATA PARALLELISM IN DISTRIBUTED TRAINING OF ARTIFICIAL INTELLIGENCE MODELS

Publication number: 20220283820

Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be adjusted to reduce the communication overhead. Multi-level parallel parameters reduction may be performed at the parameter server and the target device.

Type: Application

Filed: May 24, 2022

Publication date: September 8, 2022

Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Devangkumar Patel, Jinwen Xi, Maral Mesmakhosroshahi
Data parallelism in distributed training of artificial intelligence models

Patent number: 11436019

Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be adjusted to reduce the communication overhead. Multi-level parallel parameters reduction may be performed at the parameter server and the target device.

Type: Grant

Filed: September 30, 2019

Date of Patent: September 6, 2022

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Devangkumar Patel, Jinwen Xi, Maral Mesmakhosroshahi
HIERARCHICAL AND SHARED EXPONENT FLOATING POINT DATA TYPES

Publication number: 20220253281

Abstract: Embodiments of the present disclosure include systems and methods for providing hierarchical and shared exponent floating point data types. First and second shared exponent values are determined based on exponent values of a plurality of floating point values. A third shared exponent value is determined based the first shared exponent value and the second shared exponent value. First and second difference values are determined based on the first shared exponent value, the second shared exponent value, and the third shared exponent value. Sign values and mantissa values are determined for the plurality of floating point values. The sign value and the mantissa value for each floating point value in the plurality of floating point values, the third shared exponent value, the first difference value, and the second difference value are stored in a data structure for a shared exponent floating point data type.

Type: Application

Filed: June 28, 2021

Publication date: August 11, 2022

Inventors: Bita DARVISH ROUHANI, Venmugil ELANGO, Rasoul SHAFIPOUR, Jeremy FOWERS, Ming Gang LIU, Jinwen XI, Douglas C. BURGER, Eric S. CHUNG
Dynamic multi-layer execution for artificial intelligence modeling

Patent number: 11354579

Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. This paradigm of executing one portion of the AI model at a time allows for dynamic execution of the large AI model.

Type: Grant

Filed: September 30, 2019

Date of Patent: June 7, 2022

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Jinwen Xi, Maral Mesmakhosroshahi
SHARED MEMORY SPACES IN DATA AND MODEL PARALLELISM

Publication number: 20220108209

Abstract: Techniques for shared memory spaces in data and model parallelism are provided to improve memory efficiency and memory access speed. A shared memory space may be established at a host system or in a hardware memory agent. The shared memory may store training data or model parameters for an artificial intelligence model at a memory address in one or more memory circuits. Data for the artificial intelligence model may be processed across a plurality of artificial intelligence accelerators using the training data or the model parameters of the shared memory space. That is, multiple accelerators access one copy of the data from the shared memory space instead of accessing their own separate memory space.

Type: Application

Filed: October 5, 2020

Publication date: April 7, 2022

Inventors: Bharadwaj PUDIPEDDI, Jinwen XI, Maral MESMAKHOSROSHAHI, Gurupurna VASISHT

1 2 next