Patents by Inventor Xiaokong MA

Xiaokong MA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11503155
    Abstract: The present disclosure discloses an interactive voice-control method and apparatus, a device and a medium. The method includes: obtaining a sound signal at a voice interaction device and recognized information that is recognized from the sound signal; determining an interaction confidence of the sound signal based at least on at least one of an acoustic feature representation of the sound signal and a semantic feature representation associated with the recognized information; determining a matching status between the recognized information and the sound signal; and providing the interaction confidence and the matching status for controlling a response of the voice interaction device to the sound signal.
    Type: Grant
    Filed: September 24, 2020
    Date of Patent: November 15, 2022
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Jinfeng Bai, Chuanlei Zhai, Xu Chen, Tao Chen, Xiaokong Ma, Ce Zhang, Zhen Wu, Xingyuan Peng, Zhijian Wang, Sheng Qian, Guibin Wang, Lei Jia
  • Patent number: 11250854
    Abstract: A method, apparatus, device, and storage medium for voice interaction. A specific embodiment of the method includes: extracting an acoustic feature from received voice data, the acoustic feature indicating a short-term amplitude spectrum characteristic of the voice data; applying the acoustic feature to a type recognition model to determine an intention type of the voice data, the intention type being one of an interaction intention type and a non-interaction intention type, and the type recognition model being constructed based on the acoustic feature of training voice data; and performing an interaction operation indicated by the voice data, based on determining that the intention type is the interaction intention type.
    Type: Grant
    Filed: June 8, 2020
    Date of Patent: February 15, 2022
    Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.
    Inventors: Xiaokong Ma, Ce Zhang, Jinfeng Bai, Lei Jia
  • Publication number: 20210210112
    Abstract: A model evaluation method includes obtaining M first audio signals synthesized by using a first to-be-evaluated speech synthesis model, and obtaining N second audio signals generated through recording; performing voiceprint extraction on each of the M first audio signals to obtain M first voiceprint features; performing voiceprint extraction on each of the N second audio signals to obtain N second voiceprint features; clustering the M first voiceprint features to obtain K first central features; clustering the N second voiceprint features to obtain J second central features; counting the cosine distances between the K first central features and the J second central features to obtain a first distance; and evaluating the first to-be-evaluated speech synthesis model based on the first distance.
    Type: Application
    Filed: March 18, 2021
    Publication date: July 8, 2021
    Inventors: Lin ZHENG, Changbin CHEN, Xiaokong MA, Yujuan SUN
  • Publication number: 20210158816
    Abstract: A method, apparatus, device, and storage medium for voice interaction. A specific embodiment of the method includes: extracting an acoustic feature from received voice data, the acoustic feature indicating a short-term amplitude spectrum characteristic of the voice data; applying the acoustic feature to a type recognition model to determine an intention type of the voice data, the intention type being one of an interaction intention type and a non-interaction intention type, and the type recognition model being constructed based on the acoustic feature of training voice data; and performing an interaction operation indicated by the voice data, based on determining that the intention type is the interaction intention type.
    Type: Application
    Filed: June 8, 2020
    Publication date: May 27, 2021
    Inventors: Xiaokong Ma, Ce Zhang, Jinfeng Bai, Lei Jia
  • Publication number: 20210127003
    Abstract: The present disclosure discloses an interactive voice-control method and apparatus, a device and a medium. The method includes: obtaining a sound signal at a voice interaction device and recognized information that is recognized from the sound signal; determining an interaction confidence of the sound signal based at least on at least one of an acoustic feature representation of the sound signal and a semantic feature representation associated with the recognized information; determining a matching status between the recognized information and the sound signal; and providing the interaction confidence and the matching status for controlling a response of the voice interaction device to the sound signal.
    Type: Application
    Filed: September 24, 2020
    Publication date: April 29, 2021
    Inventors: Jinfeng BAI, Chuanlei ZHAI, Xu CHEN, Tao CHEN, Xiaokong MA, Ce ZHANG, Zhen WU, Xingyuan PENG, Zhijian WANG, Sheng QIAN, Guibin WANG, Lei JIA
  • Patent number: 10943582
    Abstract: A method and apparatus of training an acoustic feature extracting model, a device and a computer storage medium. The method comprises: considering a first acoustic feature extracted respectively from speech data corresponding to user identifiers as training data; training an initial model based on a deep neural network based on a criterion of a minimum classification error, until a preset first stop condition is reached; using a triplet loss layer to replace a Softmax layer in the initial model to constitute an acoustic feature extracting model, and continuing to train the acoustic feature extracting model until a preset second stop condition is reached, the acoustic feature extracting model being used to output a second acoustic feature of the speech data; wherein the triplet loss layer is used to maximize similarity between the second acoustic features of the same user, and minimize similarity between the second acoustic features of different users.
    Type: Grant
    Filed: May 14, 2018
    Date of Patent: March 9, 2021
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Bing Jiang, Xiaokong Ma, Chao Li, Xiangang Li
  • Patent number: 10515627
    Abstract: A method and apparatus of building an acoustic feature extracting model, and an acoustic feature extracting method and apparatus. The method of building an acoustic feature extracting model comprises: considering first acoustic features extracted respectively from speech data corresponding to user identifiers as training data; using the training data to train a deep neural network to obtain an acoustic feature extracting model; wherein a target of training the deep neural network is to maximize similarity between the same user's second acoustic features and minimize similarity between different users' second acoustic features. The acoustic feature extracting model according to the present disclosure can self-learn optimal acoustic features that achieves a training target. As compared with a conventional acoustic feature extracting manner with a preset feature type and transformation manner, the acoustic feature extracting manner of the present disclosure achieves better flexibility and higher accuracy.
    Type: Grant
    Filed: May 15, 2018
    Date of Patent: December 24, 2019
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li
  • Publication number: 20180336889
    Abstract: A method and apparatus of building an acoustic feature extracting model, and an acoustic feature extracting method and apparatus. The method of building an acoustic feature extracting model comprises: considering first acoustic features extracted respectively from speech data corresponding to user identifiers as training data; using the training data to train a deep neural network to obtain an acoustic feature extracting model; wherein a target of training the deep neural network is to maximize similarity between the same user's second acoustic features and minimize similarity between different users' second acoustic features. The acoustic feature extracting model according to the present disclosure can self-learn optimal acoustic features that achieves a training target. As compared with a conventional acoustic feature extracting manner with a preset feature type and transformation manner, the acoustic feature extracting manner of the present disclosure achieves better flexibility and higher accuracy.
    Type: Application
    Filed: May 15, 2018
    Publication date: November 22, 2018
    Applicant: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD .
    Inventors: Chao LI, Xiaokong MA, Bing JIANG, Xiangang LI
  • Publication number: 20180336888
    Abstract: A method and apparatus of training an acoustic feature extracting model, a device and a computer storage medium. The method comprises: considering a first acoustic feature extracted respectively from speech data corresponding to user identifiers as training data; training an initial model based on a deep neural network based on a criterion of a minimum classification error, until a preset first stop condition is reached; using a triplet loss layer to replace a Softmax layer in the initial model to constitute an acoustic feature extracting model, and continuing to train the acoustic feature extracting model until a preset second stop condition is reached, the acoustic feature extracting model being used to output a second acoustic feature of the speech data; wherein the triplet loss layer is used to maximize similarity between the second acoustic features of the same user, and minimize similarity between the second acoustic features of different users.
    Type: Application
    Filed: May 14, 2018
    Publication date: November 22, 2018
    Applicant: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Bing JIANG, Xiaokong MA, Chao LI, Xiangang LI