Patents by Inventor Zhong Meng
Zhong Meng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11951814Abstract: The embodiments of the disclosure provide an intelligent glass and an intelligent window system, and relates to the technical field of window display. The intelligent glass of the disclosure includes a touch display assembly and a glass assembly. The touch display assembly is communicatively coupled to the glass assembly, and is configured to send a corresponding dimming instruction to the glass assembly based on a received touch instruction, such that the glass assembly adjusts its light transmittance based on the dimming instruction.Type: GrantFiled: September 26, 2019Date of Patent: April 9, 2024Assignee: BOE TECHNOLOGY GROUP CO., LTD.Inventors: Yongbo Wang, Chen Meng, Zhong Hu, Yutao Tang, Wenjie Zhong, Dahai Hu, Wei Shi
-
Patent number: 11935542Abstract: A hypothesis stitcher for speech recognition of long-form audio provides superior performance, such as higher accuracy and reduced computational cost. An example disclosed operation includes: segmenting the audio stream into a plurality of audio segments; identifying a plurality of speakers within each of the plurality of audio segments; performing automatic speech recognition (ASR) on each of the plurality of audio segments to generate a plurality of short-segment hypotheses; merging at least a portion of the short-segment hypotheses into a first merged hypothesis set; inserting stitching symbols into the first merged hypothesis set, the stitching symbols including a window change (WC) symbol; and consolidating, with a network-based hypothesis stitcher, the first merged hypothesis set into a first consolidated hypothesis.Type: GrantFiled: January 19, 2023Date of Patent: March 19, 2024Assignee: Microsoft Technology Licensing, LLC.Inventors: Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka
-
Patent number: 11915686Abstract: Embodiments are associated with a speaker-independent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-independent attention-based encoder-decoder model associated with a first output distribution, and a speaker-dependent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-dependent attention-based encoder-decoder model associated with a second output distribution. The second attention-based encoder-decoder model is trained to classify output tokens based on input speech frames of a target speaker and simultaneously trained to maintain a similarity between the first output distribution and the second output distribution.Type: GrantFiled: January 5, 2022Date of Patent: February 27, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong
-
Patent number: 11823702Abstract: To generate substantially condition-invariant and speaker-discriminative features, embodiments are associated with a feature extractor capable of extracting features from speech frames based on first parameters, a speaker classifier capable of identifying a speaker based on the features and on second parameters, and a condition classifier capable of identifying a noise condition based on the features and on third parameters. The first parameters of the feature extractor and the second parameters of the speaker classifier are trained to minimize a speaker classification loss, the first parameters of the feature extractor are further trained to maximize a condition classification loss, and the third parameters of the condition classifier are trained to minimize the condition classification loss.Type: GrantFiled: November 30, 2021Date of Patent: November 21, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong
-
Patent number: 11735190Abstract: To generate substantially domain-invariant and speaker-discriminative features, embodiments may operate to extract features from input data based on a first set of parameters, generate outputs based on the extracted features and on a second set of parameters, and identify words represented by the input data based on the outputs, wherein the first set of parameters and the second set of parameters have been trained to minimize a network loss associated with the second set of parameters, wherein the first set of parameters has been trained to maximize the domain classification loss of a network comprising 1) an attention network to determine, based on a third set of parameters, relative importances of features extracted based on the first parameters to domain classification and 2) a domain classifier to classify a domain based on the extracted features, the relative importances, and a fourth set of parameters, and wherein the third set of parameters and the fourth set of parameters have been trained to minimizeType: GrantFiled: October 5, 2021Date of Patent: August 22, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Zhong Meng, Jinyu Li, Yifan Gong
-
Publication number: 20230215439Abstract: The disclosure herein describes using a transcript generation model for generating a transcript from a multi-speaker audio stream. Audio data including overlapping speech of a plurality of speakers is obtained and a set of frame embeddings are generated from audio data frames of the obtained audio data using an audio data encoder. A set of words and channel change (CC) symbols are generated from the set of frame embeddings using a transcript generation model. The CC symbols are included between pairs of adjacent words that are spoken by different people at the same time. The set of words and CC symbols are transformed into a plurality of transcript lines, wherein words of the set of words are sorted into transcript lines based on the CC symbols, and a multi-speaker transcript is generated based on the plurality of transcript lines. The inclusion of CC symbols by the model enables efficient, accurate multi-speaker transcription.Type: ApplicationFiled: December 31, 2021Publication date: July 6, 2023Inventors: Naoyuki KANDA, Takuya YOSHIOKA, Zhuo CHEN, Jinyu LI, Yashesh GAUR, Zhong MENG, Xiaofei WANG, Xiong XIAO
-
Publication number: 20230154468Abstract: A hypothesis stitcher for speech recognition of long-form audio provides superior performance, such as higher accuracy and reduced computational cost. An example disclosed operation includes: segmenting the audio stream into a plurality of audio segments; identifying a plurality of speakers within each of the plurality of audio segments; performing automatic speech recognition (ASR) on each of the plurality of audio segments to generate a plurality of short-segment hypotheses; merging at least a portion of the short-segment hypotheses into a first merged hypothesis set; inserting stitching symbols into the first merged hypothesis set, the stitching symbols including a window change (WC) symbol; and consolidating, with a network-based hypothesis stitcher, the first merged hypothesis set into a first consolidated hypothesis.Type: ApplicationFiled: January 19, 2023Publication date: May 18, 2023Inventors: Naoyuki KANDA, Xuankai CHANG, Yashesh GAUR, Xiaofei WANG, Zhong MENG, Takuya YOSHIOKA
-
Patent number: 11586930Abstract: Embodiments are associated with conditional teacher-student model training. A trained teacher model configured to perform a task may be accessed and an untrained student model may be created. A model training platform may provide training data labeled with ground truths to the teacher model to produce teacher posteriors representing the training data. When it is determined that a teacher posterior matches the associated ground truth label, the platform may conditionally use the teacher posterior to train the student model. When it is determined that a teacher posterior does not match the associated ground truth label, the platform may conditionally use the ground truth label to train the student model. The models might be associated with, for example, automatic speech recognition (e.g., in connection with domain adaptation and/or speaker adaptation).Type: GrantFiled: May 13, 2019Date of Patent: February 21, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong
-
Patent number: 11574639Abstract: A hypothesis stitcher for speech recognition of long-form audio provides superior performance, such as higher accuracy and reduced computational cost. An example disclosed operation includes: segmenting the audio stream into a plurality of audio segments; identifying a plurality of speakers within each of the plurality of audio segments; performing automatic speech recognition (ASR) on each of the plurality of audio segments to generate a plurality of short-segment hypotheses; merging at least a portion of the short-segment hypotheses into a first merged hypothesis set; inserting stitching symbols into the first merged hypothesis set, the stitching symbols including a window change (WC) symbol; and consolidating, with a network-based hypothesis stitcher, the first merged hypothesis set into a first consolidated hypothesis.Type: GrantFiled: December 18, 2020Date of Patent: February 7, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka
-
Publication number: 20230022609Abstract: A method of preparing a Polygoni Milletii Rhizome tincture includes: preparing a first mixture; extracting the first mixture with 70-90% ethanol under reflux condition to obtain a first extract solution; preparing a second mixture; extracting the second mixture with 50-70% ethanol to obtain a second extract solution; and mixing the first extract solution with the second extract solution to obtain the polygoni milletii rhizome tincture. A method of preparing a Polygoni Milletii Rhizome poultice includes: reparing a Polygoni Milletii Rhizome mixture; mixing the Polygoni Milletii Rhizome mixture and a skin penetration enhancer in water; mixing a moisturizing agent and a binder in water; adding a thickener in water; mixing methylparaben and ethylparaben in 90% ethanol; mixing all solutions to form a mixture; and applying the mixture on a non-woven fabric cloth and drying to form the Polygoni Milletii Rhizome poultice.Type: ApplicationFiled: July 1, 2022Publication date: January 26, 2023Inventors: Xiaolin XIE, Dezhu ZHANG, Chengyuan LIANG, Shujun DING, Yuzhi LIU, Zhao MA, Xuhua ZHOU, Zhong MENG, Jianguo MENG
-
Patent number: 11527238Abstract: A computer device is provided that includes one or more processors configured to receive an end-to-end (E2E) model that has been trained for automatic speech recognition with training data from a source-domain, and receive an external language model that has been trained with training data from a target-domain. The one or more processors are configured to perform an inference of the probability of an output token sequence given a sequence of input speech features. Performing the inference includes computing an E2E model score, computing an external language model score, and computing an estimated internal language model score for the E2E model. The estimated internal language model score is computed by removing a contribution of an intrinsic acoustic model. The processor is further configured to compute an integrated score based at least on E2E model score, the external language model score, and the estimated internal language model score.Type: GrantFiled: January 21, 2021Date of Patent: December 13, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Zhong Meng, Sarangarajan Parthasarathy, Xie Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong
-
Publication number: 20220199091Abstract: A hypothesis stitcher for speech recognition of long-form audio provides superior performance, such as higher accuracy and reduced computational cost. An example disclosed operation includes: segmenting the audio stream into a plurality of audio segments; identifying a plurality of speakers within each of the plurality of audio segments; performing automatic speech recognition (ASR) on each of the plurality of audio segments to generate a plurality of short-segment hypotheses; merging at least a portion of the short-segment hypotheses into a first merged hypothesis set; inserting stitching symbols into the first merged hypothesis set, the stitching symbols including a window change (WC) symbol; and consolidating, with a network-based hypothesis stitcher, the first merged hypothesis set into a first consolidated hypothesis.Type: ApplicationFiled: December 18, 2020Publication date: June 23, 2022Inventors: Naoyuki KANDA, Xuankai CHANG, Yashesh GAUR, Xiaofei WANG, Zhong MENG, Takuya YOSHIOKA
-
Publication number: 20220165290Abstract: To generate substantially condition-invariant and speaker-discriminative features, embodiments are associated with a feature extractor capable of extracting features from speech frames based on first parameters, a speaker classifier capable of identifying a speaker based on the features and on second parameters, and a condition classifier capable of identifying a noise condition based on the features and on third parameters. The first parameters of the feature extractor and the second parameters of the speaker classifier are trained to minimize a speaker classification loss, the first parameters of the feature extractor are further trained to maximize a condition classification loss, and the third parameters of the condition classifier are trained to minimize the condition classification loss.Type: ApplicationFiled: November 30, 2021Publication date: May 26, 2022Inventors: Zhong MENG, Yong ZHAO, Jinyu LI, Yifan GONG
-
Publication number: 20220139380Abstract: A computer device is provided that includes one or more processors configured to receive an end-to-end (E2E) model that has been trained for automatic speech recognition with training data from a source-domain, and receive an external language model that has been trained with training data from a target-domain. The one or more processors are configured to perform an inference of the probability of an output token sequence given a sequence of input speech features. Performing the inference includes computing an E2E model score, computing an external language model score, and computing an estimated internal language model score for the E2E model. The estimated internal language model score is computed by removing a contribution of an intrinsic acoustic model. The processor is further configured to compute an integrated score based at least on E2E model score, the external language model score, and the estimated internal language model score.Type: ApplicationFiled: January 21, 2021Publication date: May 5, 2022Applicant: Microsoft Technology Licensing, LLCInventors: Zhong MENG, Sarangarajan PARTHASARATHY, Xie SUN, Yashesh GAUR, Naoyuki KANDA, Liang LU, Xie CHEN, Rui ZHAO, Jinyu LI, Yifan GONG
-
Publication number: 20220130376Abstract: Embodiments are associated with a speaker-independent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-independent attention-based encoder-decoder model associated with a first output distribution, and a speaker-dependent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-dependent attention-based encoder-decoder model associated with a second output distribution. The second attention-based encoder-decoder model is trained to classify output tokens based on input speech frames of a target speaker and simultaneously trained to maintain a similarity between the first output distribution and the second output distribution.Type: ApplicationFiled: January 5, 2022Publication date: April 28, 2022Inventors: Zhong MENG, Yashesh GAUR, Jinyu LI, Yifan GONG
-
Publication number: 20220028399Abstract: To generate substantially domain-invariant and speaker-discriminative features, embodiments may operate to extract features from input data based on a first set of parameters, generate outputs based on the extracted features and on a second set of parameters, and identify words represented by the input data based on the outputs, wherein the first set of parameters and the second set of parameters have been trained to minimize a network loss associated with the second set of parameters, wherein the first set of parameters has been trained to maximize the domain classification loss of a network comprising 1) an attention network to determine, based on a third set of parameters, relative importances of features extracted based on the first parameters to domain classification and 2) a domain classifier to classify a domain based on the extracted features, the relative importances, and a fourth set of parameters, and wherein the third set of parameters and the fourth set of parameters have been trained to minimizeType: ApplicationFiled: October 5, 2021Publication date: January 27, 2022Inventors: Zhong MENG, Jinyu LI, Yifan GONG
-
Patent number: 11232782Abstract: Embodiments are associated with a speaker-independent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-independent attention-based encoder-decoder model associated with a first output distribution, a speaker-dependent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-dependent attention-based encoder-decoder model associated with a second output distribution, training of the second attention-based encoder-decoder model to classify output tokens based on input speech frames of a target speaker and simultaneously training the speaker-dependent attention-based encoder-decoder model to maintain a similarity between the first output distribution and the second output distribution, and performing automatic speech recognition on speech frames of the target speaker using the trained speaker-dependent attention-based encoder-decoder model.Type: GrantFiled: November 6, 2019Date of Patent: January 25, 2022Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong
-
Patent number: 11219652Abstract: A method for extracting herbal medicine includes: step one, spray extraction; step two, pressure filtration and concentration; step three, spray and countercurrent precipitation; and step four, concentrating reduced pressure and drying.Type: GrantFiled: August 14, 2020Date of Patent: January 11, 2022Assignee: SHAANXI PANLONG PHARMACEUTICAL GROUP LIMITED BY SHARE LTD.Inventors: Xiaolin Xie, Dezhu Zhang, Jianguo Meng, Yu Wang, Xuhua Zhou, Zhong Meng, Nan Hui, Juan Li
-
Patent number: 11217265Abstract: To generate substantially condition-invariant and speaker-discriminative features, embodiments are associated with a feature extractor capable of extracting features from speech frames based on first parameters, a speaker classifier capable of identifying a speaker based on the features and on second parameters, and a condition classifier capable of identifying a noise condition based on the features and on third parameters. The first parameters of the feature extractor and the second parameters of the speaker classifier are trained to minimize a speaker classification loss, the first parameters of the feature extractor are further trained to maximize a condition classification loss, and the third parameters of the condition classifier are trained to minimize the condition classification loss.Type: GrantFiled: June 7, 2019Date of Patent: January 4, 2022Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong
-
Patent number: 11170789Abstract: To generate substantially domain-invariant and speaker-discriminative features, embodiments are associated with a feature extractor to receive speech frames and extract features from the speech frames based on a first set of parameters of the feature extractor, a senone classifier to identify a senone based on the received features and on a second set of parameters of the senone classifier, an attention network capable of determining a relative importance of features extracted by the feature extractor to domain classification, based on a third set of parameters of the attention network, a domain classifier capable of classifying a domain based on the features and the relative importances, and on a fourth set of parameters of the domain classifier; and a training platform to train the first set of parameters of the feature extractor and the second set of parameters of the senone classifier to minimize the senone classification loss, train the first set of parameters of the feature extractor to maximize the domType: GrantFiled: July 26, 2019Date of Patent: November 9, 2021Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Zhong Meng, Jinyu Li, Yifan Gong