Patents by Inventor Ming Tu

Ming Tu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SPEECH RECOGNITION METHOD AND APPARATUS, AND ELECTRONIC DEVICE

Publication number: 20250384878

Abstract: A speech recognition method and apparatus, and an electronic device. The method comprises: obtaining a first speech; obtaining a first text corresponding to a previous segment of speech of the first speech; obtaining a first set, the first set comprising a plurality of text identifications and a text feature corresponding to each of the plurality of text identifications, the text feature being a feature associated with a plurality of subsequent texts of a text corresponding to the text identification, the text feature being associated with frequencies of the plurality of subsequent texts of the text in a text set, and the first set being determined based on the text set; and determining, based on the first text and the first set, text content associated with the first speech.

Type: Application

Filed: October 20, 2023

Publication date: December 18, 2025

Inventors: Ming TU, Lu LIU, Rui XIA, Xin LI, Chuanzeng HUANG, Yuxuan WANG
SPEECH RECOGNITION MODEL TRAINING

Publication number: 20250378821

Abstract: A method, an apparatus, a device, and a storage medium for training a speech recognition model are described. An example method includes: obtaining first training data including a first set of speech data corresponding to a plurality of languages; training the speech recognition model by using the first training data to adjust a parameter of an encoding module; obtaining second training data, the second training data including a second set of speech data corresponding to the plurality of languages and first text data corresponding to the second set of speech data; processing the second set of speech data by using the speech recognition model to obtain second text data; and training the speech recognition model based at least on a comparison between the first text data and the second text data to adjust at least a parameter of the encoding module and a conversion module.

Type: Application

Filed: June 11, 2025

Publication date: December 11, 2025

Inventors: Ming TU, Youjia HUANG, Chen SHEN, Lu LU, Yuxuan WANG
Sequenced data processing method and device, and text processing method and device

Patent number: 12380142

Abstract: A method includes constructing a graph including a plurality of nodes for a set of sequences, wherein each node corresponds to a sequence in the set of sequences; for each node, determining an initial feature matrix of the node, wherein the initial feature matrix of the node includes initial vectors of various elements in a sequence corresponding to the node; and, inputting the initial feature matrix of the node of the graph into a graph sequence network to enable the graph sequence network to update the feature matrix of the node using the feature matrix(es) of adjacent node(s) of the node; and obtaining a feature matrix output by the graph sequence network of each node to perform a sequence-based classification prediction using output feature matrixes, wherein the feature matrix output for each node includes updated vectors corresponding to the various elements in the sequence corresponding to the node.

Type: Grant

Filed: March 2, 2021

Date of Patent: August 5, 2025

Assignees: BEIJING WODONG TIANJUN INFORMATION TECHNOLOGY CO., LTD., BEIJING JINGDONG CENTURY TRADING CO., LTD.

Inventors: Ming Tu, Jing Huang, Xiaodong He, Bowen Zhou
Voice-controlled content creation

Patent number: 12243563

Abstract: The present disclosure describes techniques for voice-controlled content creation. The techniques comprise monitoring voice commands spoken by a creator. Recording of a content may be initiated in response to recognizing a first voice command spoken by the creator. Recording of the content may be stopped in response to recognizing a second voice command spoken by the creator. A timestamp associated with the second voice command may be created. A segment may be automatically deleted from the content based on the timestamp. The segment may comprise a recording of the second voice command.

Type: Grant

Filed: June 10, 2022

Date of Patent: March 4, 2025

Assignee: Lemon Inc.

Inventors: Wenqing Jiang, Serhan Uslubas, Zheng Li, Ming Tu, Shiva Shanker Pandiri
Multimedia processing method and apparatus, electronic device, and storage medium

Patent number: 12159649

Abstract: According to the embodiments of the disclosure, a multimedia processing method, device, electronic device, and storage medium are provided by obtaining a first multimedia resource; determining an initial text content corresponding to the first multimedia resource by performing speech recognition on audio data of the first multimedia resource, the audio data of the first multimedia resource comprises speech data of the initial text content; determining an invalid text content in the initial text content, the invalid text content is semantically non-informative; determining a first playing position of speech data of the invalid text content in the first multimedia resource; and cropping the first multimedia resource based on the first playing position to obtain a second multimedia resource, wherein audio data of the second multimedia resource comprises speech data of a target text content but does not comprise the speech data of the invalid text content.

Type: Grant

Filed: December 11, 2023

Date of Patent: December 3, 2024

Assignee: LEMON INC.

Inventors: Xin Zheng, Conghui Zhu, Rui Xia, Chuxiang Shang, Dejian Zhong, Yongsen Jiang, Ming Tu, Lelai Deng
SPEECH SYNTHESIS METHOD AND APPARATUS, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM

Publication number: 20240274120

Abstract: Provided are an audio synthesis method and apparatus, an electronic device, and a readable storage medium. In the present solution, conversion from a text to an audio having a target timbre is achieved by means of a pre-trained voice synthesis model, the voice synthesis model comprising a first feature extraction sub-model and a second feature extraction sub-model, wherein the first feature extraction sub-model outputs, according to an inputted text to be processed, an acoustic feature comprising a bottleneck feature; the second feature extraction sub-model outputs, according to the inputted first acoustic features, a Mel spectrum feature corresponding to the text to be processed; according to the Mel spectrum feature corresponding to the text to be processed, the target audio corresponding to the text to be processed is obtained, and the target audio has the target timbre.

Type: Application

Filed: September 16, 2022

Publication date: August 15, 2024

Inventors: Dongyang Dai, Yuanzhe Chen, Li Chen, Yuping Wang, Qiao Tian, Ming Tu, Rui Xia, Yuxuan Wang
MULTIMEDIA PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM

Publication number: 20240105234

Abstract: According to the embodiments of the disclosure, a multimedia processing method, device, electronic device, and storage medium are provided by obtaining a first multimedia resource; determining an initial text content corresponding to the first multimedia resource by performing speech recognition on audio data of the first multimedia resource, the audio data of the first multimedia resource comprises speech data of the initial text content; determining an invalid text content in the initial text content, the invalid text content is semantically non-informative; determining a first playing position of speech data of the invalid text content in the first multimedia resource; and cropping the first multimedia resource based on the first playing position to obtain a second multimedia resource, wherein audio data of the second multimedia resource comprises speech data of a target text content but does not comprise the speech data of the invalid text content.

Type: Application

Filed: December 11, 2023

Publication date: March 28, 2024

Inventors: Xin Zheng, Conghui Zhu, Rui Xia, Chuxiang Shang, Dejian Zhong, Yongsen Jiang, Ming Tu, Lelai Deng
METHOD AND APPARATUS FOR DETERMINING SPEECH SIMILARITY, AND PROGRAM PRODUCT

Publication number: 20240096347

Abstract: Embodiments provide a method and an apparatus for determining speech similarity, and a program product, which relate to speech technology. The method includes: playing exemplary audio, and acquiring evaluation audio of a user, where the exemplary audio is audio of specified content that is read by using a specified language; acquiring a standard pronunciation feature corresponding to the exemplary audio, and extracting, from the evaluation audio, an evaluation pronunciation feature corresponding to the standard pronunciation feature, where the standard pronunciation feature is used to reflect a specific pronunciation of the specified content in the specified language; and determining a feature difference between the standard pronunciation feature and the evaluation pronunciation feature, and determining similarity between the evaluation audio and the exemplary audio according to the feature difference.

Type: Application

Filed: January 31, 2022

Publication date: March 21, 2024

Inventors: Rui XIA, Ming TU, Chen DING, Weiming ZHENG
VOICE-CONTROLLED CONTENT CREATION

Publication number: 20230402068

Abstract: The present disclosure describes techniques for voice-controlled content creation. The techniques comprise monitoring voice commands spoken by a creator. Recording of a content may be initiated in response to recognizing a first voice command spoken by the creator. Recording of the content may be stopped in response to recognizing a second voice command spoken by the creator. A timestamp associated with the second voice command may be created. A segment may be automatically deleted from the content based on the timestamp. The segment may comprise a recording of the second voice command.

Type: Application

Filed: June 10, 2022

Publication date: December 14, 2023

Inventors: Wenqing Jiang, Serhan Uslubas, Zheng Li, Ming Tu, Shiva Shanker Pandiri
SEQUENCED DATA PROCESSING METHOD AND DEVICE, AND TEXT PROCESSING METHOD AND DEVICE

Publication number: 20230244704

Abstract: The present disclosure relates to a sequenced data processing method and device, and a text processing method and device, and relates to the field of data processing.

Type: Application

Filed: March 2, 2021

Publication date: August 3, 2023

Inventors: Ming TU, Jing HUANG, Xiaodong HE, Bowen ZHOU
SPEECH ANALYSIS ALGORITHMIC SYSTEM AND METHOD FOR OBJECTIVE EVALUATION AND/OR DISEASE DETECTION

Publication number: 20210193173

Abstract: Systems and methods use patient speech samples as inputs, use subjective multi-point ratings by speech-language pathologists of multiple perceptual dimensions of patient speech samples as further inputs, and extract laboratory-implemented features from the patient speech samples. A predictive software model learns the relationship between speech acoustics and the subjective ratings of such speech obtained from speech-language pathologists, and is configured to apply this information to evaluate new speech samples. Outputs may include objective evaluation of the plurality of perceptual dimensions for new speech samples and/or evaluation of disease onset, disease progression, or disease treatment efficacy for a condition involving dysarthria as a symptom, utilizing the new speech samples.

Type: Application

Filed: August 31, 2020

Publication date: June 24, 2021

Inventors: Visar Berisha, Ming Tu, Alan Wisler, Julie Liss
Speech analysis algorithmic system and method for objective evaluation and/or disease detection

Patent number: 10796715

Abstract: Systems and methods use patient speech samples as inputs, use subjective multi-point ratings by speech-language pathologists of multiple perceptual dimensions of patient speech samples as further inputs, and extract laboratory-implemented features from the patient speech samples. A predictive software model learns the relationship between speech acoustics and the subjective ratings of such speech obtained from speech-language pathologists, and is configured to apply this information to evaluate new speech samples. Outputs may include objective evaluation of the plurality of perceptual dimensions for new speech samples and/or evaluation of disease onset, disease progression, or disease treatment efficacy for a condition involving dysarthria as a symptom, utilizing the new speech samples.

Type: Grant

Filed: September 1, 2017

Date of Patent: October 6, 2020

Assignee: ARIZONA BOARD OF REGENTS ON BEHALF OF ARIZONA STATE UNIVERSITY

Inventors: Visar Berisha, Ming Tu, Alan Wisler, Julie Liss
Enhancing audio signals using sub-band deep neural networks

Patent number: 10510360

Abstract: Systems and methods for enhancing reverberated audio signals are disclosed. In one embodiment, a method is disclosed comprising receiving an audio signal; partitioning a frequency domain representation of the audio signal into a plurality of sub-band vectors; inputting each sub-band vector into a corresponding deep neural network; calculating, using the corresponding deep neural networks, a plurality of output vectors for each sub-band; concatenating the plurality of output vectors to generate a clean audio feature matrix; and converting the clean audio feature matrix into a time-domain audio signal.

Type: Grant

Filed: April 23, 2019

Date of Patent: December 17, 2019

Assignee: ALIBABA GROUP HOLDING LIMITED

Inventors: Tao Yu, Ming Tu, Gang Liu
ENHANCING AUDIO SIGNALS USING SUB-BAND DEEP NEURAL NETWORKS

Publication number: 20190251985

Abstract: Systems and methods for enhancing reverberated audio signals are disclosed. In one embodiment, a method is disclosed comprising receiving an audio signal; partitioning a frequency domain representation of the audio signal into a plurality of sub-band vectors; inputting each sub-band vector into a corresponding deep neural network; calculating, using the corresponding deep neural networks, a plurality of output vectors for each sub-band; concatenating the plurality of output vectors to generate a clean audio feature matrix; and converting the clean audio feature matrix into a time-domain audio signal.

Type: Application

Filed: April 23, 2019

Publication date: August 15, 2019

Inventors: Tao YU, Ming TU, Gang LIU
Enhancing audio signals using sub-band deep neural networks

Patent number: 10283140

Abstract: Systems and methods for enhancing reverberated audio signals are disclosed. In one embodiment, a method is disclosed comprising receiving an audio signal; partitioning a frequency domain representation of the audio signal into a plurality of sub-band vectors; inputting each sub-band vector into a corresponding deep neural network; calculating, using the corresponding deep neural networks, a plurality of output vectors for each sub-band; concatenating the plurality of output vectors to generate a clean audio feature matrix; and converting the clean audio feature matrix into a time-domain audio signal.

Type: Grant

Filed: January 12, 2018

Date of Patent: May 7, 2019

Assignee: ALIBABA GROUP HOLDING LIMITED

Inventors: Tao Yu, Ming Tu, Gang Liu
Liquid rationing device

Publication number: 20050127104

Abstract: A liquid rationing device comprises a base and a cover. The base is provided with an actuating element whose axle goes through the base. The axle is connected with a gear disk, which is provided with more than one roller on one side thereof. The cover is coupled to one side of the base and has a reservoir provided with a transmission hose with positioning extrusion. A positioning groove is provided on the reservoir such that the positioning extrusion of the transmission hose can be held therein and the transmission hose can surround the circle of the rollers. Accordingly, the actuating element can drive the gear disk and make the rollers on the gear disk to rotate and subsequently push and squeeze the transmission hose, thereby making displacement of the liquid in the transmission hose and obtaining the purposes of rationing.

Type: Application

Filed: December 16, 2003

Publication date: June 16, 2005

Inventor: Ming Tu