Patents by Inventor Zhong Meng

Zhong Meng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11951814
    Abstract: The embodiments of the disclosure provide an intelligent glass and an intelligent window system, and relates to the technical field of window display. The intelligent glass of the disclosure includes a touch display assembly and a glass assembly. The touch display assembly is communicatively coupled to the glass assembly, and is configured to send a corresponding dimming instruction to the glass assembly based on a received touch instruction, such that the glass assembly adjusts its light transmittance based on the dimming instruction.
    Type: Grant
    Filed: September 26, 2019
    Date of Patent: April 9, 2024
    Assignee: BOE TECHNOLOGY GROUP CO., LTD.
    Inventors: Yongbo Wang, Chen Meng, Zhong Hu, Yutao Tang, Wenjie Zhong, Dahai Hu, Wei Shi
  • Patent number: 11935542
    Abstract: A hypothesis stitcher for speech recognition of long-form audio provides superior performance, such as higher accuracy and reduced computational cost. An example disclosed operation includes: segmenting the audio stream into a plurality of audio segments; identifying a plurality of speakers within each of the plurality of audio segments; performing automatic speech recognition (ASR) on each of the plurality of audio segments to generate a plurality of short-segment hypotheses; merging at least a portion of the short-segment hypotheses into a first merged hypothesis set; inserting stitching symbols into the first merged hypothesis set, the stitching symbols including a window change (WC) symbol; and consolidating, with a network-based hypothesis stitcher, the first merged hypothesis set into a first consolidated hypothesis.
    Type: Grant
    Filed: January 19, 2023
    Date of Patent: March 19, 2024
    Assignee: Microsoft Technology Licensing, LLC.
    Inventors: Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka
  • Patent number: 11915686
    Abstract: Embodiments are associated with a speaker-independent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-independent attention-based encoder-decoder model associated with a first output distribution, and a speaker-dependent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-dependent attention-based encoder-decoder model associated with a second output distribution. The second attention-based encoder-decoder model is trained to classify output tokens based on input speech frames of a target speaker and simultaneously trained to maintain a similarity between the first output distribution and the second output distribution.
    Type: Grant
    Filed: January 5, 2022
    Date of Patent: February 27, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong
  • Patent number: 11823702
    Abstract: To generate substantially condition-invariant and speaker-discriminative features, embodiments are associated with a feature extractor capable of extracting features from speech frames based on first parameters, a speaker classifier capable of identifying a speaker based on the features and on second parameters, and a condition classifier capable of identifying a noise condition based on the features and on third parameters. The first parameters of the feature extractor and the second parameters of the speaker classifier are trained to minimize a speaker classification loss, the first parameters of the feature extractor are further trained to maximize a condition classification loss, and the third parameters of the condition classifier are trained to minimize the condition classification loss.
    Type: Grant
    Filed: November 30, 2021
    Date of Patent: November 21, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong
  • Patent number: 11735190
    Abstract: To generate substantially domain-invariant and speaker-discriminative features, embodiments may operate to extract features from input data based on a first set of parameters, generate outputs based on the extracted features and on a second set of parameters, and identify words represented by the input data based on the outputs, wherein the first set of parameters and the second set of parameters have been trained to minimize a network loss associated with the second set of parameters, wherein the first set of parameters has been trained to maximize the domain classification loss of a network comprising 1) an attention network to determine, based on a third set of parameters, relative importances of features extracted based on the first parameters to domain classification and 2) a domain classifier to classify a domain based on the extracted features, the relative importances, and a fourth set of parameters, and wherein the third set of parameters and the fourth set of parameters have been trained to minimize
    Type: Grant
    Filed: October 5, 2021
    Date of Patent: August 22, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Zhong Meng, Jinyu Li, Yifan Gong
  • Publication number: 20230215439
    Abstract: The disclosure herein describes using a transcript generation model for generating a transcript from a multi-speaker audio stream. Audio data including overlapping speech of a plurality of speakers is obtained and a set of frame embeddings are generated from audio data frames of the obtained audio data using an audio data encoder. A set of words and channel change (CC) symbols are generated from the set of frame embeddings using a transcript generation model. The CC symbols are included between pairs of adjacent words that are spoken by different people at the same time. The set of words and CC symbols are transformed into a plurality of transcript lines, wherein words of the set of words are sorted into transcript lines based on the CC symbols, and a multi-speaker transcript is generated based on the plurality of transcript lines. The inclusion of CC symbols by the model enables efficient, accurate multi-speaker transcription.
    Type: Application
    Filed: December 31, 2021
    Publication date: July 6, 2023
    Inventors: Naoyuki KANDA, Takuya YOSHIOKA, Zhuo CHEN, Jinyu LI, Yashesh GAUR, Zhong MENG, Xiaofei WANG, Xiong XIAO
  • Publication number: 20230154468
    Abstract: A hypothesis stitcher for speech recognition of long-form audio provides superior performance, such as higher accuracy and reduced computational cost. An example disclosed operation includes: segmenting the audio stream into a plurality of audio segments; identifying a plurality of speakers within each of the plurality of audio segments; performing automatic speech recognition (ASR) on each of the plurality of audio segments to generate a plurality of short-segment hypotheses; merging at least a portion of the short-segment hypotheses into a first merged hypothesis set; inserting stitching symbols into the first merged hypothesis set, the stitching symbols including a window change (WC) symbol; and consolidating, with a network-based hypothesis stitcher, the first merged hypothesis set into a first consolidated hypothesis.
    Type: Application
    Filed: January 19, 2023
    Publication date: May 18, 2023
    Inventors: Naoyuki KANDA, Xuankai CHANG, Yashesh GAUR, Xiaofei WANG, Zhong MENG, Takuya YOSHIOKA
  • Patent number: 11586930
    Abstract: Embodiments are associated with conditional teacher-student model training. A trained teacher model configured to perform a task may be accessed and an untrained student model may be created. A model training platform may provide training data labeled with ground truths to the teacher model to produce teacher posteriors representing the training data. When it is determined that a teacher posterior matches the associated ground truth label, the platform may conditionally use the teacher posterior to train the student model. When it is determined that a teacher posterior does not match the associated ground truth label, the platform may conditionally use the ground truth label to train the student model. The models might be associated with, for example, automatic speech recognition (e.g., in connection with domain adaptation and/or speaker adaptation).
    Type: Grant
    Filed: May 13, 2019
    Date of Patent: February 21, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong
  • Patent number: 11574639
    Abstract: A hypothesis stitcher for speech recognition of long-form audio provides superior performance, such as higher accuracy and reduced computational cost. An example disclosed operation includes: segmenting the audio stream into a plurality of audio segments; identifying a plurality of speakers within each of the plurality of audio segments; performing automatic speech recognition (ASR) on each of the plurality of audio segments to generate a plurality of short-segment hypotheses; merging at least a portion of the short-segment hypotheses into a first merged hypothesis set; inserting stitching symbols into the first merged hypothesis set, the stitching symbols including a window change (WC) symbol; and consolidating, with a network-based hypothesis stitcher, the first merged hypothesis set into a first consolidated hypothesis.
    Type: Grant
    Filed: December 18, 2020
    Date of Patent: February 7, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka
  • Publication number: 20230022609
    Abstract: A method of preparing a Polygoni Milletii Rhizome tincture includes: preparing a first mixture; extracting the first mixture with 70-90% ethanol under reflux condition to obtain a first extract solution; preparing a second mixture; extracting the second mixture with 50-70% ethanol to obtain a second extract solution; and mixing the first extract solution with the second extract solution to obtain the polygoni milletii rhizome tincture. A method of preparing a Polygoni Milletii Rhizome poultice includes: reparing a Polygoni Milletii Rhizome mixture; mixing the Polygoni Milletii Rhizome mixture and a skin penetration enhancer in water; mixing a moisturizing agent and a binder in water; adding a thickener in water; mixing methylparaben and ethylparaben in 90% ethanol; mixing all solutions to form a mixture; and applying the mixture on a non-woven fabric cloth and drying to form the Polygoni Milletii Rhizome poultice.
    Type: Application
    Filed: July 1, 2022
    Publication date: January 26, 2023
    Inventors: Xiaolin XIE, Dezhu ZHANG, Chengyuan LIANG, Shujun DING, Yuzhi LIU, Zhao MA, Xuhua ZHOU, Zhong MENG, Jianguo MENG
  • Patent number: 11527238
    Abstract: A computer device is provided that includes one or more processors configured to receive an end-to-end (E2E) model that has been trained for automatic speech recognition with training data from a source-domain, and receive an external language model that has been trained with training data from a target-domain. The one or more processors are configured to perform an inference of the probability of an output token sequence given a sequence of input speech features. Performing the inference includes computing an E2E model score, computing an external language model score, and computing an estimated internal language model score for the E2E model. The estimated internal language model score is computed by removing a contribution of an intrinsic acoustic model. The processor is further configured to compute an integrated score based at least on E2E model score, the external language model score, and the estimated internal language model score.
    Type: Grant
    Filed: January 21, 2021
    Date of Patent: December 13, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Zhong Meng, Sarangarajan Parthasarathy, Xie Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong
  • Publication number: 20220199091
    Abstract: A hypothesis stitcher for speech recognition of long-form audio provides superior performance, such as higher accuracy and reduced computational cost. An example disclosed operation includes: segmenting the audio stream into a plurality of audio segments; identifying a plurality of speakers within each of the plurality of audio segments; performing automatic speech recognition (ASR) on each of the plurality of audio segments to generate a plurality of short-segment hypotheses; merging at least a portion of the short-segment hypotheses into a first merged hypothesis set; inserting stitching symbols into the first merged hypothesis set, the stitching symbols including a window change (WC) symbol; and consolidating, with a network-based hypothesis stitcher, the first merged hypothesis set into a first consolidated hypothesis.
    Type: Application
    Filed: December 18, 2020
    Publication date: June 23, 2022
    Inventors: Naoyuki KANDA, Xuankai CHANG, Yashesh GAUR, Xiaofei WANG, Zhong MENG, Takuya YOSHIOKA
  • Publication number: 20220165290
    Abstract: To generate substantially condition-invariant and speaker-discriminative features, embodiments are associated with a feature extractor capable of extracting features from speech frames based on first parameters, a speaker classifier capable of identifying a speaker based on the features and on second parameters, and a condition classifier capable of identifying a noise condition based on the features and on third parameters. The first parameters of the feature extractor and the second parameters of the speaker classifier are trained to minimize a speaker classification loss, the first parameters of the feature extractor are further trained to maximize a condition classification loss, and the third parameters of the condition classifier are trained to minimize the condition classification loss.
    Type: Application
    Filed: November 30, 2021
    Publication date: May 26, 2022
    Inventors: Zhong MENG, Yong ZHAO, Jinyu LI, Yifan GONG
  • Publication number: 20220139380
    Abstract: A computer device is provided that includes one or more processors configured to receive an end-to-end (E2E) model that has been trained for automatic speech recognition with training data from a source-domain, and receive an external language model that has been trained with training data from a target-domain. The one or more processors are configured to perform an inference of the probability of an output token sequence given a sequence of input speech features. Performing the inference includes computing an E2E model score, computing an external language model score, and computing an estimated internal language model score for the E2E model. The estimated internal language model score is computed by removing a contribution of an intrinsic acoustic model. The processor is further configured to compute an integrated score based at least on E2E model score, the external language model score, and the estimated internal language model score.
    Type: Application
    Filed: January 21, 2021
    Publication date: May 5, 2022
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Zhong MENG, Sarangarajan PARTHASARATHY, Xie SUN, Yashesh GAUR, Naoyuki KANDA, Liang LU, Xie CHEN, Rui ZHAO, Jinyu LI, Yifan GONG
  • Publication number: 20220130376
    Abstract: Embodiments are associated with a speaker-independent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-independent attention-based encoder-decoder model associated with a first output distribution, and a speaker-dependent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-dependent attention-based encoder-decoder model associated with a second output distribution. The second attention-based encoder-decoder model is trained to classify output tokens based on input speech frames of a target speaker and simultaneously trained to maintain a similarity between the first output distribution and the second output distribution.
    Type: Application
    Filed: January 5, 2022
    Publication date: April 28, 2022
    Inventors: Zhong MENG, Yashesh GAUR, Jinyu LI, Yifan GONG
  • Publication number: 20220028399
    Abstract: To generate substantially domain-invariant and speaker-discriminative features, embodiments may operate to extract features from input data based on a first set of parameters, generate outputs based on the extracted features and on a second set of parameters, and identify words represented by the input data based on the outputs, wherein the first set of parameters and the second set of parameters have been trained to minimize a network loss associated with the second set of parameters, wherein the first set of parameters has been trained to maximize the domain classification loss of a network comprising 1) an attention network to determine, based on a third set of parameters, relative importances of features extracted based on the first parameters to domain classification and 2) a domain classifier to classify a domain based on the extracted features, the relative importances, and a fourth set of parameters, and wherein the third set of parameters and the fourth set of parameters have been trained to minimize
    Type: Application
    Filed: October 5, 2021
    Publication date: January 27, 2022
    Inventors: Zhong MENG, Jinyu LI, Yifan GONG
  • Patent number: 11232782
    Abstract: Embodiments are associated with a speaker-independent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-independent attention-based encoder-decoder model associated with a first output distribution, a speaker-dependent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-dependent attention-based encoder-decoder model associated with a second output distribution, training of the second attention-based encoder-decoder model to classify output tokens based on input speech frames of a target speaker and simultaneously training the speaker-dependent attention-based encoder-decoder model to maintain a similarity between the first output distribution and the second output distribution, and performing automatic speech recognition on speech frames of the target speaker using the trained speaker-dependent attention-based encoder-decoder model.
    Type: Grant
    Filed: November 6, 2019
    Date of Patent: January 25, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong
  • Patent number: 11219652
    Abstract: A method for extracting herbal medicine includes: step one, spray extraction; step two, pressure filtration and concentration; step three, spray and countercurrent precipitation; and step four, concentrating reduced pressure and drying.
    Type: Grant
    Filed: August 14, 2020
    Date of Patent: January 11, 2022
    Assignee: SHAANXI PANLONG PHARMACEUTICAL GROUP LIMITED BY SHARE LTD.
    Inventors: Xiaolin Xie, Dezhu Zhang, Jianguo Meng, Yu Wang, Xuhua Zhou, Zhong Meng, Nan Hui, Juan Li
  • Patent number: 11217265
    Abstract: To generate substantially condition-invariant and speaker-discriminative features, embodiments are associated with a feature extractor capable of extracting features from speech frames based on first parameters, a speaker classifier capable of identifying a speaker based on the features and on second parameters, and a condition classifier capable of identifying a noise condition based on the features and on third parameters. The first parameters of the feature extractor and the second parameters of the speaker classifier are trained to minimize a speaker classification loss, the first parameters of the feature extractor are further trained to maximize a condition classification loss, and the third parameters of the condition classifier are trained to minimize the condition classification loss.
    Type: Grant
    Filed: June 7, 2019
    Date of Patent: January 4, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong
  • Patent number: 11170789
    Abstract: To generate substantially domain-invariant and speaker-discriminative features, embodiments are associated with a feature extractor to receive speech frames and extract features from the speech frames based on a first set of parameters of the feature extractor, a senone classifier to identify a senone based on the received features and on a second set of parameters of the senone classifier, an attention network capable of determining a relative importance of features extracted by the feature extractor to domain classification, based on a third set of parameters of the attention network, a domain classifier capable of classifying a domain based on the features and the relative importances, and on a fourth set of parameters of the domain classifier; and a training platform to train the first set of parameters of the feature extractor and the second set of parameters of the senone classifier to minimize the senone classification loss, train the first set of parameters of the feature extractor to maximize the dom
    Type: Grant
    Filed: July 26, 2019
    Date of Patent: November 9, 2021
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Zhong Meng, Jinyu Li, Yifan Gong