Patents by Inventor Ming Tu
Ming Tu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250384878Abstract: A speech recognition method and apparatus, and an electronic device. The method comprises: obtaining a first speech; obtaining a first text corresponding to a previous segment of speech of the first speech; obtaining a first set, the first set comprising a plurality of text identifications and a text feature corresponding to each of the plurality of text identifications, the text feature being a feature associated with a plurality of subsequent texts of a text corresponding to the text identification, the text feature being associated with frequencies of the plurality of subsequent texts of the text in a text set, and the first set being determined based on the text set; and determining, based on the first text and the first set, text content associated with the first speech.Type: ApplicationFiled: October 20, 2023Publication date: December 18, 2025Inventors: Ming TU, Lu LIU, Rui XIA, Xin LI, Chuanzeng HUANG, Yuxuan WANG
-
Publication number: 20250378821Abstract: A method, an apparatus, a device, and a storage medium for training a speech recognition model are described. An example method includes: obtaining first training data including a first set of speech data corresponding to a plurality of languages; training the speech recognition model by using the first training data to adjust a parameter of an encoding module; obtaining second training data, the second training data including a second set of speech data corresponding to the plurality of languages and first text data corresponding to the second set of speech data; processing the second set of speech data by using the speech recognition model to obtain second text data; and training the speech recognition model based at least on a comparison between the first text data and the second text data to adjust at least a parameter of the encoding module and a conversion module.Type: ApplicationFiled: June 11, 2025Publication date: December 11, 2025Inventors: Ming TU, Youjia HUANG, Chen SHEN, Lu LU, Yuxuan WANG
-
Patent number: 12380142Abstract: A method includes constructing a graph including a plurality of nodes for a set of sequences, wherein each node corresponds to a sequence in the set of sequences; for each node, determining an initial feature matrix of the node, wherein the initial feature matrix of the node includes initial vectors of various elements in a sequence corresponding to the node; and, inputting the initial feature matrix of the node of the graph into a graph sequence network to enable the graph sequence network to update the feature matrix of the node using the feature matrix(es) of adjacent node(s) of the node; and obtaining a feature matrix output by the graph sequence network of each node to perform a sequence-based classification prediction using output feature matrixes, wherein the feature matrix output for each node includes updated vectors corresponding to the various elements in the sequence corresponding to the node.Type: GrantFiled: March 2, 2021Date of Patent: August 5, 2025Assignees: BEIJING WODONG TIANJUN INFORMATION TECHNOLOGY CO., LTD., BEIJING JINGDONG CENTURY TRADING CO., LTD.Inventors: Ming Tu, Jing Huang, Xiaodong He, Bowen Zhou
-
Patent number: 12243563Abstract: The present disclosure describes techniques for voice-controlled content creation. The techniques comprise monitoring voice commands spoken by a creator. Recording of a content may be initiated in response to recognizing a first voice command spoken by the creator. Recording of the content may be stopped in response to recognizing a second voice command spoken by the creator. A timestamp associated with the second voice command may be created. A segment may be automatically deleted from the content based on the timestamp. The segment may comprise a recording of the second voice command.Type: GrantFiled: June 10, 2022Date of Patent: March 4, 2025Assignee: Lemon Inc.Inventors: Wenqing Jiang, Serhan Uslubas, Zheng Li, Ming Tu, Shiva Shanker Pandiri
-
Patent number: 12159649Abstract: According to the embodiments of the disclosure, a multimedia processing method, device, electronic device, and storage medium are provided by obtaining a first multimedia resource; determining an initial text content corresponding to the first multimedia resource by performing speech recognition on audio data of the first multimedia resource, the audio data of the first multimedia resource comprises speech data of the initial text content; determining an invalid text content in the initial text content, the invalid text content is semantically non-informative; determining a first playing position of speech data of the invalid text content in the first multimedia resource; and cropping the first multimedia resource based on the first playing position to obtain a second multimedia resource, wherein audio data of the second multimedia resource comprises speech data of a target text content but does not comprise the speech data of the invalid text content.Type: GrantFiled: December 11, 2023Date of Patent: December 3, 2024Assignee: LEMON INC.Inventors: Xin Zheng, Conghui Zhu, Rui Xia, Chuxiang Shang, Dejian Zhong, Yongsen Jiang, Ming Tu, Lelai Deng
-
Publication number: 20240274120Abstract: Provided are an audio synthesis method and apparatus, an electronic device, and a readable storage medium. In the present solution, conversion from a text to an audio having a target timbre is achieved by means of a pre-trained voice synthesis model, the voice synthesis model comprising a first feature extraction sub-model and a second feature extraction sub-model, wherein the first feature extraction sub-model outputs, according to an inputted text to be processed, an acoustic feature comprising a bottleneck feature; the second feature extraction sub-model outputs, according to the inputted first acoustic features, a Mel spectrum feature corresponding to the text to be processed; according to the Mel spectrum feature corresponding to the text to be processed, the target audio corresponding to the text to be processed is obtained, and the target audio has the target timbre.Type: ApplicationFiled: September 16, 2022Publication date: August 15, 2024Inventors: Dongyang Dai, Yuanzhe Chen, Li Chen, Yuping Wang, Qiao Tian, Ming Tu, Rui Xia, Yuxuan Wang
-
Publication number: 20240105234Abstract: According to the embodiments of the disclosure, a multimedia processing method, device, electronic device, and storage medium are provided by obtaining a first multimedia resource; determining an initial text content corresponding to the first multimedia resource by performing speech recognition on audio data of the first multimedia resource, the audio data of the first multimedia resource comprises speech data of the initial text content; determining an invalid text content in the initial text content, the invalid text content is semantically non-informative; determining a first playing position of speech data of the invalid text content in the first multimedia resource; and cropping the first multimedia resource based on the first playing position to obtain a second multimedia resource, wherein audio data of the second multimedia resource comprises speech data of a target text content but does not comprise the speech data of the invalid text content.Type: ApplicationFiled: December 11, 2023Publication date: March 28, 2024Inventors: Xin Zheng, Conghui Zhu, Rui Xia, Chuxiang Shang, Dejian Zhong, Yongsen Jiang, Ming Tu, Lelai Deng
-
Publication number: 20240096347Abstract: Embodiments provide a method and an apparatus for determining speech similarity, and a program product, which relate to speech technology. The method includes: playing exemplary audio, and acquiring evaluation audio of a user, where the exemplary audio is audio of specified content that is read by using a specified language; acquiring a standard pronunciation feature corresponding to the exemplary audio, and extracting, from the evaluation audio, an evaluation pronunciation feature corresponding to the standard pronunciation feature, where the standard pronunciation feature is used to reflect a specific pronunciation of the specified content in the specified language; and determining a feature difference between the standard pronunciation feature and the evaluation pronunciation feature, and determining similarity between the evaluation audio and the exemplary audio according to the feature difference.Type: ApplicationFiled: January 31, 2022Publication date: March 21, 2024Inventors: Rui XIA, Ming TU, Chen DING, Weiming ZHENG
-
Publication number: 20230402068Abstract: The present disclosure describes techniques for voice-controlled content creation. The techniques comprise monitoring voice commands spoken by a creator. Recording of a content may be initiated in response to recognizing a first voice command spoken by the creator. Recording of the content may be stopped in response to recognizing a second voice command spoken by the creator. A timestamp associated with the second voice command may be created. A segment may be automatically deleted from the content based on the timestamp. The segment may comprise a recording of the second voice command.Type: ApplicationFiled: June 10, 2022Publication date: December 14, 2023Inventors: Wenqing Jiang, Serhan Uslubas, Zheng Li, Ming Tu, Shiva Shanker Pandiri
-
Publication number: 20230244704Abstract: The present disclosure relates to a sequenced data processing method and device, and a text processing method and device, and relates to the field of data processing.Type: ApplicationFiled: March 2, 2021Publication date: August 3, 2023Inventors: Ming TU, Jing HUANG, Xiaodong HE, Bowen ZHOU
-
Publication number: 20210193173Abstract: Systems and methods use patient speech samples as inputs, use subjective multi-point ratings by speech-language pathologists of multiple perceptual dimensions of patient speech samples as further inputs, and extract laboratory-implemented features from the patient speech samples. A predictive software model learns the relationship between speech acoustics and the subjective ratings of such speech obtained from speech-language pathologists, and is configured to apply this information to evaluate new speech samples. Outputs may include objective evaluation of the plurality of perceptual dimensions for new speech samples and/or evaluation of disease onset, disease progression, or disease treatment efficacy for a condition involving dysarthria as a symptom, utilizing the new speech samples.Type: ApplicationFiled: August 31, 2020Publication date: June 24, 2021Inventors: Visar Berisha, Ming Tu, Alan Wisler, Julie Liss
-
Patent number: 10796715Abstract: Systems and methods use patient speech samples as inputs, use subjective multi-point ratings by speech-language pathologists of multiple perceptual dimensions of patient speech samples as further inputs, and extract laboratory-implemented features from the patient speech samples. A predictive software model learns the relationship between speech acoustics and the subjective ratings of such speech obtained from speech-language pathologists, and is configured to apply this information to evaluate new speech samples. Outputs may include objective evaluation of the plurality of perceptual dimensions for new speech samples and/or evaluation of disease onset, disease progression, or disease treatment efficacy for a condition involving dysarthria as a symptom, utilizing the new speech samples.Type: GrantFiled: September 1, 2017Date of Patent: October 6, 2020Assignee: ARIZONA BOARD OF REGENTS ON BEHALF OF ARIZONA STATE UNIVERSITYInventors: Visar Berisha, Ming Tu, Alan Wisler, Julie Liss
-
Patent number: 10510360Abstract: Systems and methods for enhancing reverberated audio signals are disclosed. In one embodiment, a method is disclosed comprising receiving an audio signal; partitioning a frequency domain representation of the audio signal into a plurality of sub-band vectors; inputting each sub-band vector into a corresponding deep neural network; calculating, using the corresponding deep neural networks, a plurality of output vectors for each sub-band; concatenating the plurality of output vectors to generate a clean audio feature matrix; and converting the clean audio feature matrix into a time-domain audio signal.Type: GrantFiled: April 23, 2019Date of Patent: December 17, 2019Assignee: ALIBABA GROUP HOLDING LIMITEDInventors: Tao Yu, Ming Tu, Gang Liu
-
Publication number: 20190251985Abstract: Systems and methods for enhancing reverberated audio signals are disclosed. In one embodiment, a method is disclosed comprising receiving an audio signal; partitioning a frequency domain representation of the audio signal into a plurality of sub-band vectors; inputting each sub-band vector into a corresponding deep neural network; calculating, using the corresponding deep neural networks, a plurality of output vectors for each sub-band; concatenating the plurality of output vectors to generate a clean audio feature matrix; and converting the clean audio feature matrix into a time-domain audio signal.Type: ApplicationFiled: April 23, 2019Publication date: August 15, 2019Inventors: Tao YU, Ming TU, Gang LIU
-
Patent number: 10283140Abstract: Systems and methods for enhancing reverberated audio signals are disclosed. In one embodiment, a method is disclosed comprising receiving an audio signal; partitioning a frequency domain representation of the audio signal into a plurality of sub-band vectors; inputting each sub-band vector into a corresponding deep neural network; calculating, using the corresponding deep neural networks, a plurality of output vectors for each sub-band; concatenating the plurality of output vectors to generate a clean audio feature matrix; and converting the clean audio feature matrix into a time-domain audio signal.Type: GrantFiled: January 12, 2018Date of Patent: May 7, 2019Assignee: ALIBABA GROUP HOLDING LIMITEDInventors: Tao Yu, Ming Tu, Gang Liu
-
Publication number: 20050127104Abstract: A liquid rationing device comprises a base and a cover. The base is provided with an actuating element whose axle goes through the base. The axle is connected with a gear disk, which is provided with more than one roller on one side thereof. The cover is coupled to one side of the base and has a reservoir provided with a transmission hose with positioning extrusion. A positioning groove is provided on the reservoir such that the positioning extrusion of the transmission hose can be held therein and the transmission hose can surround the circle of the rollers. Accordingly, the actuating element can drive the gear disk and make the rollers on the gear disk to rotate and subsequently push and squeeze the transmission hose, thereby making displacement of the liquid in the transmission hose and obtaining the purposes of rationing.Type: ApplicationFiled: December 16, 2003Publication date: June 16, 2005Inventor: Ming Tu