Patents by Inventor Jiusheng Chen

Jiusheng Chen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240046037
    Abstract: Systems and methods are provided for training a data model based on training data. The training includes pre-training and fine-tuning the data model based on a combination of an autoregressive (AR) model and a non-autoregressive (NAR) model. Training data may be received and encoded into streams of tokens. A pre-trainer during decoding generates a continuum of data structures of the AR and NAR combined model including a main stream and a series of predicting streams. Masked tokens in predicting streams reference or attend to one or more preceding tokens in the main stream or the preceding predicting streams. A fine-tuner selects streams to generate a trained model according to a target data model. The target data model is determined based on balancing an accuracy constraint and an efficiency constraint for predicting tokens. The decoder acts as abridge between the AR and NAR models in generating a trained data model.
    Type: Application
    Filed: December 25, 2020
    Publication date: February 8, 2024
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Jian JIAO, Yeyun GONG, Nan DUAN, Weizhu CHEN, Kewen TANG, Qiang LOU, Ruofei ZHANG, Yu YAN, Jiusheng CHEN
  • Publication number: 20220318601
    Abstract: Computing technology is described herein that provides an attention mechanism, implemented by a neural network, that generates attention information based on head-specific query information and shared key and value (KV) information, without computing head-specific key information and head-specific value information, and without caching the head-specific key information and the head-specific value information in memory. This manner of operation allows the computing technology to make efficient use of processing and memory resources. In some implementations, the attention mechanism is part of decoder of an encoder-decoder system, or a standalone decoder system. In some implementations, the computing technology leverages the attention information to generate synthesized text based on input text.
    Type: Application
    Filed: April 3, 2021
    Publication date: October 6, 2022
    Inventors: Yu YAN, Jiusheng CHEN, Nikhil BHENDAWADE, Yeyun GONG, Nan DUAN, Ruofei ZHANG
  • Publication number: 20220100676
    Abstract: Systems and methods for dynamically modifying a cache associated with a neural network model of a natural language generator are described. In examples, a neural network model employs a beam search algorithm at a decoder when decoding output and generating predicted output candidates. The decoder utilizes caching techniques to improve a speed at which the neural network operations. When an amount of memory utilized by one or more caches of the neural network model is determined to exceed a threshold memory size, a layer-specific portion of a cache associated with a layer of the neural network model is identified. The identified layer-specific portion of the cache can be deleted when the amount of memory utilized by the cache of the neural network model exceeds the threshold memory size. In examples, data in the cache is deduplicated and/or deleted.
    Type: Application
    Filed: February 18, 2021
    Publication date: March 31, 2022
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Yu YAN, Jiusheng CHEN, Ruofei ZHANG
  • Publication number: 20140147013
    Abstract: An Echo PIV analysis process, apparatus and algorithm are developed to reduce noise and analyze DICOM images representing a fluid flow of a plurality of particles. A plurality of DICOM images representing sequential image pairs of a plurality of particles is received. The plurality of DICOM sequential image pairs are grouped. The sequential image pairs are correlated to create N cross correlation maps. An average cross-correlation transformation is applied to each cross correlation map to create an image pair vector map for each image pair. A maximizing operation is applied to one or more of the N adjacent image pair vector maps to create a modified image pair vector map for the one or more of the N image pairs. The maps are combined to create a corresponding temporary vector map that are averaged to obtain a mean velocity vector field of the sequential image pairs.
    Type: Application
    Filed: October 11, 2011
    Publication date: May 29, 2014
    Applicant: The Regents of the University of Colorado, A Body Corporate
    Inventors: Robin Shandas, Fuxing Zhang, Jiusheng Chen, Luciano A. Mazzaro