Patents by Inventor Jinwen Xi

Jinwen Xi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210326711
    Abstract: Systems and methods related to dual-momentum gradient optimization with reduced memory requirements are described. An example method in a system comprising a gradient optimizer and a memory configured to store momentum values associated with a neural network model comprising L layers is described. The method includes retrieving from the memory a first set of momentum values and a second set of momentum values, corresponding to a layer of the neural network model, having a selected storage format. The method further includes converting the first set of momentum values to a third set of momentum values having a training format associated with the gradient optimizer and converting the second set of momentum values to a fourth set of momentum values having a training format associated with the gradient optimizer. The method further includes performing gradient optimization using the third set of momentum values and the fourth set of momentum values.
    Type: Application
    Filed: April 17, 2020
    Publication date: October 21, 2021
    Inventors: Jinwen XI, Bharadwaj PUDIPEDDI, Marc TREMBLAY
  • Publication number: 20210295141
    Abstract: Systems and methods related to hardware-assisted gradient optimization using streamed gradients are described. An example method in a system comprising a memory configured to store weights associated with a neural network model comprising L layers, where L is an integer greater than one, a gradient optimizer, and a plurality of workers is described. The method includes during a single burst cycle moving a first set of gradients, received from each of the plurality of workers, from at least one gradient buffer to the gradient optimizer and moving weights from at least one buffer, coupled to the memory, to the gradient optimizer. The method further includes during the single burst cycle writing back the new weights, calculated by the gradient optimizer, to the memory. The method further includes during the single burst cycle transmitting the new weights, from the gradient optimizer, to each of the plurality of workers.
    Type: Application
    Filed: March 23, 2020
    Publication date: September 23, 2021
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Jinwen XI, Bharadwaj PUDIPEDDI, Marc TREMBLAY
  • Publication number: 20210232451
    Abstract: Embodiments of the present disclosure include an error recovery method comprising detecting a computing error, restarting a first artificial intelligence processor of a plurality of artificial intelligence processors processing a data set, and loading a model in the artificial intelligence processor, wherein the model corresponds to a same model processed by the plurality of artificial intelligence processors during a previous processing iteration by the plurality of artificial intelligence processors on data from the data set.
    Type: Application
    Filed: March 27, 2020
    Publication date: July 29, 2021
    Inventors: Bharadwaj PUDIPEDDI, Maral MESMAKHOSROSHAHI, Jinwen XI, Saurabh M. KULKARNI, Marc TREMBLAY, Matthias BAENNINGER, Nuno CLAUDINO PEREIRA LOPES
  • Publication number: 20210064986
    Abstract: Systems, methods, and apparatuses are provided for compressing values. A plurality of parameters may be obtained from a memory, each parameter comprising a floating-point number that is used in a relationship between artificial neurons or nodes in a model. A mantissa value and an exponent value may be extracted from each floating-point number to generate a set of mantissa values and a set of exponent values. The set of mantissa values may be compressed to generate a mantissa lookup table (LUT) and a plurality of mantissa LUT index values. The set of exponent values may be encoded to generate an exponent LUT and a plurality of exponent LUT index values. The mantissa LUT, mantissa LUT index values, exponent LUT, and exponent LUT index values may be provided to one or more processing entities to train the model.
    Type: Application
    Filed: September 3, 2019
    Publication date: March 4, 2021
    Inventors: Jinwen Xi, Bharadwaj Pudipeddi, Marc Tremblay
  • Publication number: 20210019634
    Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. This paradigm of executing one portion of the AI model at a time allows for dynamic execution of the large AI model.
    Type: Application
    Filed: September 30, 2019
    Publication date: January 21, 2021
    Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Jinwen Xi, Maral Mesmakhosroshahi
  • Publication number: 20210019152
    Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be adjusted to reduce the communication overhead. Multi-level parallel parameters reduction may be performed at the parameter server and the target device.
    Type: Application
    Filed: September 30, 2019
    Publication date: January 21, 2021
    Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Devangkumar Patel, Jinwen Xi, Maral Mesmakhosroshahi
  • Publication number: 20200342288
    Abstract: A distributed training system including a parameter server is configured to compress the weight metrices according to a clustering algorithm, with the compressed representation of the weight matrix may thereafter distributed to training workers. The compressed representation may comprise a centroid index matrix and a centroid table, wherein each element of the centroid index matrix corresponds to an element of the corresponding weight matrix and comprises an index into the centroid table, and wherein each element of the centroid table comprises a centroid value. In a further example aspect, a training worker may compute an activation result directly from the compressed representation of a weight matrix and a training data matrix by performing gather-reduce-add operations that accumulate all the elements of the training data matrix that correspond to the same centroid value to generate partial sums, multiplying each partial sum by its corresponding centroid value, and summing the resulting products.
    Type: Application
    Filed: September 26, 2019
    Publication date: October 29, 2020
    Inventors: Jinwen Xi, Bharadwaj Pudipeddi
  • Patent number: 8645446
    Abstract: Methods and systems for multi-input IIR filters with error feedback are disclosed. By using multiple-inputs to generate multiple outputs during each iteration, a multi-input IIR filter in accordance with the present invention has greatly increased throughput. Furthermore, the addition of a multi-variable error feedback unit in accordance with the present invention in a multiple-input IIR filter can greatly increase the accuracy of the multi-variable IIR Filter.
    Type: Grant
    Filed: November 22, 2010
    Date of Patent: February 4, 2014
    Assignee: Applied Micro Circuits Corporation
    Inventors: Maged F. Barsoum, Jinwen Xi, Dariush Dabiri
  • Publication number: 20120131080
    Abstract: Methods and systems for multi-input IIR filters with error feedback are disclosed. By using multiple-inputs to generate multiple outputs during each iteration, a multi-input IIR filter in accordance with the present invention has greatly increased throughput. Furthermore, the addition of a multi-variable error feedback unit in accordance with the present invention in a multiple-input IIR filter can greatly increase the accuracy of the multi-variable IIR Filter.
    Type: Application
    Filed: November 22, 2010
    Publication date: May 24, 2012
    Inventors: Maged F. Barsoum, Jinwen Xi, Dariush Dabiri
  • Patent number: 8065506
    Abstract: This invention is an application specific integrated processor to implement the complete fixed-rate DRX signal processing paths (FDRX) for a reconfigurable processor-based multi-mode 3G wireless application. This architecture is based on the baseline 16-bit RISC architecture with addition functional blocks (ADU) tightly coupled with the based processor's data path. Each ADU accelerates a computation-intensive tasks in FDRX signal path, such as multi-tap FIRs, IIRs, complex domain and vectored data processing. The ADUs are controlled through custom instructions based on the load/store architecture. The whole FDRX data path can be easily implemented by the software employing these custom instructions.
    Type: Grant
    Filed: August 18, 2008
    Date of Patent: November 22, 2011
    Assignee: Texas Instruments Incorporated
    Inventors: Jinwen Xi, Roman Staszewski, Thang Minh Tran
  • Publication number: 20090063820
    Abstract: This invention is an application specific integrated processor to implement the complete fixed-rate DRX signal processing paths (FDRX) for a reconfigurable processor-based multi-mode 3G wireless application. This architecture is based on the baseline 16-bit RISC architecture with addition functional blocks (ADU) tightly coupled with the based processor's data path. Each ADU accelerates a computation-intensive tasks in FDRX signal path, such as multi-tap FIRs, IIRs, complex domain and vectored data processing. The ADUs are controlled through custom instructions based on the load/store architecture. The whole FDRX data path can be easily implemented by the software employing these custom instructions.
    Type: Application
    Filed: August 18, 2008
    Publication date: March 5, 2009
    Inventors: Jinwen Xi, Roman Staszewski, Thang Minh Tran