Patents by Inventor Jinwen Xi

Jinwen Xi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEMS AND METHODS FOR ERROR RECOVERY

Publication number: 20220107864

Abstract: Embodiments of the present disclosure include an error recovery method comprising detecting a computing error, restarting a first artificial intelligence processor of a plurality of artificial intelligence processors processing a data set, and loading a model in the artificial intelligence processor, wherein the model corresponds to a same model processed by the plurality of artificial intelligence processors during a previous processing iteration by the plurality of artificial intelligence processors on data from the data set.

Type: Application

Filed: December 16, 2021

Publication date: April 7, 2022

Inventors: Bharadwaj PUDIPEDDI, Maral MESMAKHOSROSHAHI, Jinwen XI, Saurabh M. KULKARNI, Marc TREMBLAY, Matthias BAENNINGER, Nuno CLAUDINO PEREIRA LOPES
Systems and methods for error recovery

Patent number: 11226859

Abstract: Embodiments of the present disclosure include an error recovery method comprising detecting a computing error, restarting a first artificial intelligence processor of a plurality of artificial intelligence processors processing a data set, and loading a model in the artificial intelligence processor, wherein the model corresponds to a same model processed by the plurality of artificial intelligence processors during a previous processing iteration by the plurality of artificial intelligence processors on data from the data set.

Type: Grant

Filed: March 27, 2020

Date of Patent: January 18, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Bharadwaj Pudipeddi, Maral Mesmakhosroshahi, Jinwen Xi, Saurabh M. Kulkarni, Marc Tremblay, Matthias Baenninger, Nuno Claudino Pereira Lopes
DUAL-MOMENTUM GRADIENT OPTIMIZATION WITH REDUCED MEMORY REQUIREMENTS

Publication number: 20210326711

Abstract: Systems and methods related to dual-momentum gradient optimization with reduced memory requirements are described. An example method in a system comprising a gradient optimizer and a memory configured to store momentum values associated with a neural network model comprising L layers is described. The method includes retrieving from the memory a first set of momentum values and a second set of momentum values, corresponding to a layer of the neural network model, having a selected storage format. The method further includes converting the first set of momentum values to a third set of momentum values having a training format associated with the gradient optimizer and converting the second set of momentum values to a fourth set of momentum values having a training format associated with the gradient optimizer. The method further includes performing gradient optimization using the third set of momentum values and the fourth set of momentum values.

Type: Application

Filed: April 17, 2020

Publication date: October 21, 2021

Inventors: Jinwen XI, Bharadwaj PUDIPEDDI, Marc TREMBLAY
HARDWARE-ASSISTED GRADIENT OPTIMIZATION USING STREAMED GRADIENTS

Publication number: 20210295141

Abstract: Systems and methods related to hardware-assisted gradient optimization using streamed gradients are described. An example method in a system comprising a memory configured to store weights associated with a neural network model comprising L layers, where L is an integer greater than one, a gradient optimizer, and a plurality of workers is described. The method includes during a single burst cycle moving a first set of gradients, received from each of the plurality of workers, from at least one gradient buffer to the gradient optimizer and moving weights from at least one buffer, coupled to the memory, to the gradient optimizer. The method further includes during the single burst cycle writing back the new weights, calculated by the gradient optimizer, to the memory. The method further includes during the single burst cycle transmitting the new weights, from the gradient optimizer, to each of the plurality of workers.

Type: Application

Filed: March 23, 2020

Publication date: September 23, 2021

Applicant: Microsoft Technology Licensing, LLC

Inventors: Jinwen XI, Bharadwaj PUDIPEDDI, Marc TREMBLAY
SYSTEMS AND METHODS FOR ERROR RECOVERY

Publication number: 20210232451

Abstract: Embodiments of the present disclosure include an error recovery method comprising detecting a computing error, restarting a first artificial intelligence processor of a plurality of artificial intelligence processors processing a data set, and loading a model in the artificial intelligence processor, wherein the model corresponds to a same model processed by the plurality of artificial intelligence processors during a previous processing iteration by the plurality of artificial intelligence processors on data from the data set.

Type: Application

Filed: March 27, 2020

Publication date: July 29, 2021

Inventors: Bharadwaj PUDIPEDDI, Maral MESMAKHOSROSHAHI, Jinwen XI, Saurabh M. KULKARNI, Marc TREMBLAY, Matthias BAENNINGER, Nuno CLAUDINO PEREIRA LOPES
LOSSLESS EXPONENT AND LOSSY MANTISSA WEIGHT COMPRESSION FOR TRAINING DEEP NEURAL NETWORKS

Publication number: 20210064986

Abstract: Systems, methods, and apparatuses are provided for compressing values. A plurality of parameters may be obtained from a memory, each parameter comprising a floating-point number that is used in a relationship between artificial neurons or nodes in a model. A mantissa value and an exponent value may be extracted from each floating-point number to generate a set of mantissa values and a set of exponent values. The set of mantissa values may be compressed to generate a mantissa lookup table (LUT) and a plurality of mantissa LUT index values. The set of exponent values may be encoded to generate an exponent LUT and a plurality of exponent LUT index values. The mantissa LUT, mantissa LUT index values, exponent LUT, and exponent LUT index values may be provided to one or more processing entities to train the model.

Type: Application

Filed: September 3, 2019

Publication date: March 4, 2021

Inventors: Jinwen Xi, Bharadwaj Pudipeddi, Marc Tremblay
DATA PARALLELISM IN DISTRIBUTED TRAINING OF ARTIFICIAL INTELLIGENCE MODELS

Publication number: 20210019152

Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be adjusted to reduce the communication overhead. Multi-level parallel parameters reduction may be performed at the parameter server and the target device.

Type: Application

Filed: September 30, 2019

Publication date: January 21, 2021

Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Devangkumar Patel, Jinwen Xi, Maral Mesmakhosroshahi
DYNAMIC MULTI-LAYER EXECUTION FOR ARTIFICIAL INTELLIGENCE MODELING

Publication number: 20210019634

Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. This paradigm of executing one portion of the AI model at a time allows for dynamic execution of the large AI model.

Type: Application

Filed: September 30, 2019

Publication date: January 21, 2021

Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Jinwen Xi, Maral Mesmakhosroshahi
DIRECT COMPUTATION WITH COMPRESSED WEIGHT IN TRAINING DEEP NEURAL NETWORK

Publication number: 20200342288

Abstract: A distributed training system including a parameter server is configured to compress the weight metrices according to a clustering algorithm, with the compressed representation of the weight matrix may thereafter distributed to training workers. The compressed representation may comprise a centroid index matrix and a centroid table, wherein each element of the centroid index matrix corresponds to an element of the corresponding weight matrix and comprises an index into the centroid table, and wherein each element of the centroid table comprises a centroid value. In a further example aspect, a training worker may compute an activation result directly from the compressed representation of a weight matrix and a training data matrix by performing gather-reduce-add operations that accumulate all the elements of the training data matrix that correspond to the same centroid value to generate partial sums, multiplying each partial sum by its corresponding centroid value, and summing the resulting products.

Type: Application

Filed: September 26, 2019

Publication date: October 29, 2020

Inventors: Jinwen Xi, Bharadwaj Pudipeddi
Multi-input IIR filter with error feedback

Patent number: 8645446

Abstract: Methods and systems for multi-input IIR filters with error feedback are disclosed. By using multiple-inputs to generate multiple outputs during each iteration, a multi-input IIR filter in accordance with the present invention has greatly increased throughput. Furthermore, the addition of a multi-variable error feedback unit in accordance with the present invention in a multiple-input IIR filter can greatly increase the accuracy of the multi-variable IIR Filter.

Type: Grant

Filed: November 22, 2010

Date of Patent: February 4, 2014

Assignee: Applied Micro Circuits Corporation

Inventors: Maged F. Barsoum, Jinwen Xi, Dariush Dabiri
Multi-Input IIR Filter with Error Feedback

Publication number: 20120131080

Abstract: Methods and systems for multi-input IIR filters with error feedback are disclosed. By using multiple-inputs to generate multiple outputs during each iteration, a multi-input IIR filter in accordance with the present invention has greatly increased throughput. Furthermore, the addition of a multi-variable error feedback unit in accordance with the present invention in a multiple-input IIR filter can greatly increase the accuracy of the multi-variable IIR Filter.

Type: Application

Filed: November 22, 2010

Publication date: May 24, 2012

Inventors: Maged F. Barsoum, Jinwen Xi, Dariush Dabiri
Application specific instruction set processor for digital radio processor receiving chain signal processing

Patent number: 8065506

Abstract: This invention is an application specific integrated processor to implement the complete fixed-rate DRX signal processing paths (FDRX) for a reconfigurable processor-based multi-mode 3G wireless application. This architecture is based on the baseline 16-bit RISC architecture with addition functional blocks (ADU) tightly coupled with the based processor's data path. Each ADU accelerates a computation-intensive tasks in FDRX signal path, such as multi-tap FIRs, IIRs, complex domain and vectored data processing. The ADUs are controlled through custom instructions based on the load/store architecture. The whole FDRX data path can be easily implemented by the software employing these custom instructions.

Type: Grant

Filed: August 18, 2008

Date of Patent: November 22, 2011

Assignee: Texas Instruments Incorporated

Inventors: Jinwen Xi, Roman Staszewski, Thang Minh Tran
Application Specific Instruction Set Processor for Digital Radio Processor Receiving Chain Signal Processing

Publication number: 20090063820

Abstract: This invention is an application specific integrated processor to implement the complete fixed-rate DRX signal processing paths (FDRX) for a reconfigurable processor-based multi-mode 3G wireless application. This architecture is based on the baseline 16-bit RISC architecture with addition functional blocks (ADU) tightly coupled with the based processor's data path. Each ADU accelerates a computation-intensive tasks in FDRX signal path, such as multi-tap FIRs, IIRs, complex domain and vectored data processing. The ADUs are controlled through custom instructions based on the load/store architecture. The whole FDRX data path can be easily implemented by the software employing these custom instructions.

Type: Application

Filed: August 18, 2008

Publication date: March 5, 2009

Inventors: Jinwen Xi, Roman Staszewski, Thang Minh Tran

prev 1 2