Patents by Inventor Ming Tu

Ming Tu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250384878
    Abstract: A speech recognition method and apparatus, and an electronic device. The method comprises: obtaining a first speech; obtaining a first text corresponding to a previous segment of speech of the first speech; obtaining a first set, the first set comprising a plurality of text identifications and a text feature corresponding to each of the plurality of text identifications, the text feature being a feature associated with a plurality of subsequent texts of a text corresponding to the text identification, the text feature being associated with frequencies of the plurality of subsequent texts of the text in a text set, and the first set being determined based on the text set; and determining, based on the first text and the first set, text content associated with the first speech.
    Type: Application
    Filed: October 20, 2023
    Publication date: December 18, 2025
    Inventors: Ming TU, Lu LIU, Rui XIA, Xin LI, Chuanzeng HUANG, Yuxuan WANG
  • Publication number: 20250378821
    Abstract: A method, an apparatus, a device, and a storage medium for training a speech recognition model are described. An example method includes: obtaining first training data including a first set of speech data corresponding to a plurality of languages; training the speech recognition model by using the first training data to adjust a parameter of an encoding module; obtaining second training data, the second training data including a second set of speech data corresponding to the plurality of languages and first text data corresponding to the second set of speech data; processing the second set of speech data by using the speech recognition model to obtain second text data; and training the speech recognition model based at least on a comparison between the first text data and the second text data to adjust at least a parameter of the encoding module and a conversion module.
    Type: Application
    Filed: June 11, 2025
    Publication date: December 11, 2025
    Inventors: Ming TU, Youjia HUANG, Chen SHEN, Lu LU, Yuxuan WANG
  • Patent number: 12380142
    Abstract: A method includes constructing a graph including a plurality of nodes for a set of sequences, wherein each node corresponds to a sequence in the set of sequences; for each node, determining an initial feature matrix of the node, wherein the initial feature matrix of the node includes initial vectors of various elements in a sequence corresponding to the node; and, inputting the initial feature matrix of the node of the graph into a graph sequence network to enable the graph sequence network to update the feature matrix of the node using the feature matrix(es) of adjacent node(s) of the node; and obtaining a feature matrix output by the graph sequence network of each node to perform a sequence-based classification prediction using output feature matrixes, wherein the feature matrix output for each node includes updated vectors corresponding to the various elements in the sequence corresponding to the node.
    Type: Grant
    Filed: March 2, 2021
    Date of Patent: August 5, 2025
    Assignees: BEIJING WODONG TIANJUN INFORMATION TECHNOLOGY CO., LTD., BEIJING JINGDONG CENTURY TRADING CO., LTD.
    Inventors: Ming Tu, Jing Huang, Xiaodong He, Bowen Zhou
  • Patent number: 12243563
    Abstract: The present disclosure describes techniques for voice-controlled content creation. The techniques comprise monitoring voice commands spoken by a creator. Recording of a content may be initiated in response to recognizing a first voice command spoken by the creator. Recording of the content may be stopped in response to recognizing a second voice command spoken by the creator. A timestamp associated with the second voice command may be created. A segment may be automatically deleted from the content based on the timestamp. The segment may comprise a recording of the second voice command.
    Type: Grant
    Filed: June 10, 2022
    Date of Patent: March 4, 2025
    Assignee: Lemon Inc.
    Inventors: Wenqing Jiang, Serhan Uslubas, Zheng Li, Ming Tu, Shiva Shanker Pandiri
  • Patent number: 12159649
    Abstract: According to the embodiments of the disclosure, a multimedia processing method, device, electronic device, and storage medium are provided by obtaining a first multimedia resource; determining an initial text content corresponding to the first multimedia resource by performing speech recognition on audio data of the first multimedia resource, the audio data of the first multimedia resource comprises speech data of the initial text content; determining an invalid text content in the initial text content, the invalid text content is semantically non-informative; determining a first playing position of speech data of the invalid text content in the first multimedia resource; and cropping the first multimedia resource based on the first playing position to obtain a second multimedia resource, wherein audio data of the second multimedia resource comprises speech data of a target text content but does not comprise the speech data of the invalid text content.
    Type: Grant
    Filed: December 11, 2023
    Date of Patent: December 3, 2024
    Assignee: LEMON INC.
    Inventors: Xin Zheng, Conghui Zhu, Rui Xia, Chuxiang Shang, Dejian Zhong, Yongsen Jiang, Ming Tu, Lelai Deng
  • Publication number: 20240274120
    Abstract: Provided are an audio synthesis method and apparatus, an electronic device, and a readable storage medium. In the present solution, conversion from a text to an audio having a target timbre is achieved by means of a pre-trained voice synthesis model, the voice synthesis model comprising a first feature extraction sub-model and a second feature extraction sub-model, wherein the first feature extraction sub-model outputs, according to an inputted text to be processed, an acoustic feature comprising a bottleneck feature; the second feature extraction sub-model outputs, according to the inputted first acoustic features, a Mel spectrum feature corresponding to the text to be processed; according to the Mel spectrum feature corresponding to the text to be processed, the target audio corresponding to the text to be processed is obtained, and the target audio has the target timbre.
    Type: Application
    Filed: September 16, 2022
    Publication date: August 15, 2024
    Inventors: Dongyang Dai, Yuanzhe Chen, Li Chen, Yuping Wang, Qiao Tian, Ming Tu, Rui Xia, Yuxuan Wang
  • Publication number: 20240105234
    Abstract: According to the embodiments of the disclosure, a multimedia processing method, device, electronic device, and storage medium are provided by obtaining a first multimedia resource; determining an initial text content corresponding to the first multimedia resource by performing speech recognition on audio data of the first multimedia resource, the audio data of the first multimedia resource comprises speech data of the initial text content; determining an invalid text content in the initial text content, the invalid text content is semantically non-informative; determining a first playing position of speech data of the invalid text content in the first multimedia resource; and cropping the first multimedia resource based on the first playing position to obtain a second multimedia resource, wherein audio data of the second multimedia resource comprises speech data of a target text content but does not comprise the speech data of the invalid text content.
    Type: Application
    Filed: December 11, 2023
    Publication date: March 28, 2024
    Inventors: Xin Zheng, Conghui Zhu, Rui Xia, Chuxiang Shang, Dejian Zhong, Yongsen Jiang, Ming Tu, Lelai Deng
  • Publication number: 20240096347
    Abstract: Embodiments provide a method and an apparatus for determining speech similarity, and a program product, which relate to speech technology. The method includes: playing exemplary audio, and acquiring evaluation audio of a user, where the exemplary audio is audio of specified content that is read by using a specified language; acquiring a standard pronunciation feature corresponding to the exemplary audio, and extracting, from the evaluation audio, an evaluation pronunciation feature corresponding to the standard pronunciation feature, where the standard pronunciation feature is used to reflect a specific pronunciation of the specified content in the specified language; and determining a feature difference between the standard pronunciation feature and the evaluation pronunciation feature, and determining similarity between the evaluation audio and the exemplary audio according to the feature difference.
    Type: Application
    Filed: January 31, 2022
    Publication date: March 21, 2024
    Inventors: Rui XIA, Ming TU, Chen DING, Weiming ZHENG
  • Publication number: 20230402068
    Abstract: The present disclosure describes techniques for voice-controlled content creation. The techniques comprise monitoring voice commands spoken by a creator. Recording of a content may be initiated in response to recognizing a first voice command spoken by the creator. Recording of the content may be stopped in response to recognizing a second voice command spoken by the creator. A timestamp associated with the second voice command may be created. A segment may be automatically deleted from the content based on the timestamp. The segment may comprise a recording of the second voice command.
    Type: Application
    Filed: June 10, 2022
    Publication date: December 14, 2023
    Inventors: Wenqing Jiang, Serhan Uslubas, Zheng Li, Ming Tu, Shiva Shanker Pandiri
  • Publication number: 20230244704
    Abstract: The present disclosure relates to a sequenced data processing method and device, and a text processing method and device, and relates to the field of data processing.
    Type: Application
    Filed: March 2, 2021
    Publication date: August 3, 2023
    Inventors: Ming TU, Jing HUANG, Xiaodong HE, Bowen ZHOU
  • Publication number: 20210193173
    Abstract: Systems and methods use patient speech samples as inputs, use subjective multi-point ratings by speech-language pathologists of multiple perceptual dimensions of patient speech samples as further inputs, and extract laboratory-implemented features from the patient speech samples. A predictive software model learns the relationship between speech acoustics and the subjective ratings of such speech obtained from speech-language pathologists, and is configured to apply this information to evaluate new speech samples. Outputs may include objective evaluation of the plurality of perceptual dimensions for new speech samples and/or evaluation of disease onset, disease progression, or disease treatment efficacy for a condition involving dysarthria as a symptom, utilizing the new speech samples.
    Type: Application
    Filed: August 31, 2020
    Publication date: June 24, 2021
    Inventors: Visar Berisha, Ming Tu, Alan Wisler, Julie Liss
  • Patent number: 10796715
    Abstract: Systems and methods use patient speech samples as inputs, use subjective multi-point ratings by speech-language pathologists of multiple perceptual dimensions of patient speech samples as further inputs, and extract laboratory-implemented features from the patient speech samples. A predictive software model learns the relationship between speech acoustics and the subjective ratings of such speech obtained from speech-language pathologists, and is configured to apply this information to evaluate new speech samples. Outputs may include objective evaluation of the plurality of perceptual dimensions for new speech samples and/or evaluation of disease onset, disease progression, or disease treatment efficacy for a condition involving dysarthria as a symptom, utilizing the new speech samples.
    Type: Grant
    Filed: September 1, 2017
    Date of Patent: October 6, 2020
    Assignee: ARIZONA BOARD OF REGENTS ON BEHALF OF ARIZONA STATE UNIVERSITY
    Inventors: Visar Berisha, Ming Tu, Alan Wisler, Julie Liss
  • Patent number: 10510360
    Abstract: Systems and methods for enhancing reverberated audio signals are disclosed. In one embodiment, a method is disclosed comprising receiving an audio signal; partitioning a frequency domain representation of the audio signal into a plurality of sub-band vectors; inputting each sub-band vector into a corresponding deep neural network; calculating, using the corresponding deep neural networks, a plurality of output vectors for each sub-band; concatenating the plurality of output vectors to generate a clean audio feature matrix; and converting the clean audio feature matrix into a time-domain audio signal.
    Type: Grant
    Filed: April 23, 2019
    Date of Patent: December 17, 2019
    Assignee: ALIBABA GROUP HOLDING LIMITED
    Inventors: Tao Yu, Ming Tu, Gang Liu
  • Publication number: 20190251985
    Abstract: Systems and methods for enhancing reverberated audio signals are disclosed. In one embodiment, a method is disclosed comprising receiving an audio signal; partitioning a frequency domain representation of the audio signal into a plurality of sub-band vectors; inputting each sub-band vector into a corresponding deep neural network; calculating, using the corresponding deep neural networks, a plurality of output vectors for each sub-band; concatenating the plurality of output vectors to generate a clean audio feature matrix; and converting the clean audio feature matrix into a time-domain audio signal.
    Type: Application
    Filed: April 23, 2019
    Publication date: August 15, 2019
    Inventors: Tao YU, Ming TU, Gang LIU
  • Patent number: 10283140
    Abstract: Systems and methods for enhancing reverberated audio signals are disclosed. In one embodiment, a method is disclosed comprising receiving an audio signal; partitioning a frequency domain representation of the audio signal into a plurality of sub-band vectors; inputting each sub-band vector into a corresponding deep neural network; calculating, using the corresponding deep neural networks, a plurality of output vectors for each sub-band; concatenating the plurality of output vectors to generate a clean audio feature matrix; and converting the clean audio feature matrix into a time-domain audio signal.
    Type: Grant
    Filed: January 12, 2018
    Date of Patent: May 7, 2019
    Assignee: ALIBABA GROUP HOLDING LIMITED
    Inventors: Tao Yu, Ming Tu, Gang Liu
  • Publication number: 20050127104
    Abstract: A liquid rationing device comprises a base and a cover. The base is provided with an actuating element whose axle goes through the base. The axle is connected with a gear disk, which is provided with more than one roller on one side thereof. The cover is coupled to one side of the base and has a reservoir provided with a transmission hose with positioning extrusion. A positioning groove is provided on the reservoir such that the positioning extrusion of the transmission hose can be held therein and the transmission hose can surround the circle of the rollers. Accordingly, the actuating element can drive the gear disk and make the rollers on the gear disk to rotate and subsequently push and squeeze the transmission hose, thereby making displacement of the liquid in the transmission hose and obtaining the purposes of rationing.
    Type: Application
    Filed: December 16, 2003
    Publication date: June 16, 2005
    Inventor: Ming Tu