Patents by Inventor Xiaokong MA

Xiaokong MA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Interactive voice-control method and apparatus, device and medium

Patent number: 11503155

Abstract: The present disclosure discloses an interactive voice-control method and apparatus, a device and a medium. The method includes: obtaining a sound signal at a voice interaction device and recognized information that is recognized from the sound signal; determining an interaction confidence of the sound signal based at least on at least one of an acoustic feature representation of the sound signal and a semantic feature representation associated with the recognized information; determining a matching status between the recognized information and the sound signal; and providing the interaction confidence and the matching status for controlling a response of the voice interaction device to the sound signal.

Type: Grant

Filed: September 24, 2020

Date of Patent: November 15, 2022

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventors: Jinfeng Bai, Chuanlei Zhai, Xu Chen, Tao Chen, Xiaokong Ma, Ce Zhang, Zhen Wu, Xingyuan Peng, Zhijian Wang, Sheng Qian, Guibin Wang, Lei Jia
Method and apparatus for voice interaction, device and computer-readable storage medium

Patent number: 11250854

Abstract: A method, apparatus, device, and storage medium for voice interaction. A specific embodiment of the method includes: extracting an acoustic feature from received voice data, the acoustic feature indicating a short-term amplitude spectrum characteristic of the voice data; applying the acoustic feature to a type recognition model to determine an intention type of the voice data, the intention type being one of an interaction intention type and a non-interaction intention type, and the type recognition model being constructed based on the acoustic feature of training voice data; and performing an interaction operation indicated by the voice data, based on determining that the intention type is the interaction intention type.

Type: Grant

Filed: June 8, 2020

Date of Patent: February 15, 2022

Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.

Inventors: Xiaokong Ma, Ce Zhang, Jinfeng Bai, Lei Jia
Model Evaluation Method and Device, and Electronic Device

Publication number: 20210210112

Abstract: A model evaluation method includes obtaining M first audio signals synthesized by using a first to-be-evaluated speech synthesis model, and obtaining N second audio signals generated through recording; performing voiceprint extraction on each of the M first audio signals to obtain M first voiceprint features; performing voiceprint extraction on each of the N second audio signals to obtain N second voiceprint features; clustering the M first voiceprint features to obtain K first central features; clustering the N second voiceprint features to obtain J second central features; counting the cosine distances between the K first central features and the J second central features to obtain a first distance; and evaluating the first to-be-evaluated speech synthesis model based on the first distance.

Type: Application

Filed: March 18, 2021

Publication date: July 8, 2021

Inventors: Lin ZHENG, Changbin CHEN, Xiaokong MA, Yujuan SUN
METHOD AND APPARATUS FOR VOICE INTERACTION, DEVICE AND COMPUTER READABLE STORATE MEDIUM

Publication number: 20210158816

Abstract: A method, apparatus, device, and storage medium for voice interaction. A specific embodiment of the method includes: extracting an acoustic feature from received voice data, the acoustic feature indicating a short-term amplitude spectrum characteristic of the voice data; applying the acoustic feature to a type recognition model to determine an intention type of the voice data, the intention type being one of an interaction intention type and a non-interaction intention type, and the type recognition model being constructed based on the acoustic feature of training voice data; and performing an interaction operation indicated by the voice data, based on determining that the intention type is the interaction intention type.

Type: Application

Filed: June 8, 2020

Publication date: May 27, 2021

Inventors: Xiaokong Ma, Ce Zhang, Jinfeng Bai, Lei Jia
INTERACTIVE VOICE-CONTROL METHOD AND APPARATUS, DEVICE AND MEDIUM

Publication number: 20210127003

Abstract: The present disclosure discloses an interactive voice-control method and apparatus, a device and a medium. The method includes: obtaining a sound signal at a voice interaction device and recognized information that is recognized from the sound signal; determining an interaction confidence of the sound signal based at least on at least one of an acoustic feature representation of the sound signal and a semantic feature representation associated with the recognized information; determining a matching status between the recognized information and the sound signal; and providing the interaction confidence and the matching status for controlling a response of the voice interaction device to the sound signal.

Type: Application

Filed: September 24, 2020

Publication date: April 29, 2021

Inventors: Jinfeng BAI, Chuanlei ZHAI, Xu CHEN, Tao CHEN, Xiaokong MA, Ce ZHANG, Zhen WU, Xingyuan PENG, Zhijian WANG, Sheng QIAN, Guibin WANG, Lei JIA
Method and apparatus of training acoustic feature extracting model, device and computer storage medium

Patent number: 10943582

Abstract: A method and apparatus of training an acoustic feature extracting model, a device and a computer storage medium. The method comprises: considering a first acoustic feature extracted respectively from speech data corresponding to user identifiers as training data; training an initial model based on a deep neural network based on a criterion of a minimum classification error, until a preset first stop condition is reached; using a triplet loss layer to replace a Softmax layer in the initial model to constitute an acoustic feature extracting model, and continuing to train the acoustic feature extracting model until a preset second stop condition is reached, the acoustic feature extracting model being used to output a second acoustic feature of the speech data; wherein the triplet loss layer is used to maximize similarity between the second acoustic features of the same user, and minimize similarity between the second acoustic features of different users.

Type: Grant

Filed: May 14, 2018

Date of Patent: March 9, 2021

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventors: Bing Jiang, Xiaokong Ma, Chao Li, Xiangang Li
Method and apparatus of building acoustic feature extracting model, and acoustic feature extracting method and apparatus

Patent number: 10515627

Abstract: A method and apparatus of building an acoustic feature extracting model, and an acoustic feature extracting method and apparatus. The method of building an acoustic feature extracting model comprises: considering first acoustic features extracted respectively from speech data corresponding to user identifiers as training data; using the training data to train a deep neural network to obtain an acoustic feature extracting model; wherein a target of training the deep neural network is to maximize similarity between the same user's second acoustic features and minimize similarity between different users' second acoustic features. The acoustic feature extracting model according to the present disclosure can self-learn optimal acoustic features that achieves a training target. As compared with a conventional acoustic feature extracting manner with a preset feature type and transformation manner, the acoustic feature extracting manner of the present disclosure achieves better flexibility and higher accuracy.

Type: Grant

Filed: May 15, 2018

Date of Patent: December 24, 2019

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventors: Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li
Method and Apparatus of Building Acoustic Feature Extracting Model, and Acoustic Feature Extracting Method and Apparatus

Publication number: 20180336889

Abstract: A method and apparatus of building an acoustic feature extracting model, and an acoustic feature extracting method and apparatus. The method of building an acoustic feature extracting model comprises: considering first acoustic features extracted respectively from speech data corresponding to user identifiers as training data; using the training data to train a deep neural network to obtain an acoustic feature extracting model; wherein a target of training the deep neural network is to maximize similarity between the same user's second acoustic features and minimize similarity between different users' second acoustic features. The acoustic feature extracting model according to the present disclosure can self-learn optimal acoustic features that achieves a training target. As compared with a conventional acoustic feature extracting manner with a preset feature type and transformation manner, the acoustic feature extracting manner of the present disclosure achieves better flexibility and higher accuracy.

Type: Application

Filed: May 15, 2018

Publication date: November 22, 2018

Applicant: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD .

Inventors: Chao LI, Xiaokong MA, Bing JIANG, Xiangang LI
Method and Apparatus of Training Acoustic Feature Extracting Model, Device and Computer Storage Medium

Publication number: 20180336888

Abstract: A method and apparatus of training an acoustic feature extracting model, a device and a computer storage medium. The method comprises: considering a first acoustic feature extracted respectively from speech data corresponding to user identifiers as training data; training an initial model based on a deep neural network based on a criterion of a minimum classification error, until a preset first stop condition is reached; using a triplet loss layer to replace a Softmax layer in the initial model to constitute an acoustic feature extracting model, and continuing to train the acoustic feature extracting model until a preset second stop condition is reached, the acoustic feature extracting model being used to output a second acoustic feature of the speech data; wherein the triplet loss layer is used to maximize similarity between the second acoustic features of the same user, and minimize similarity between the second acoustic features of different users.

Type: Application

Filed: May 14, 2018

Publication date: November 22, 2018

Applicant: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventors: Bing JIANG, Xiaokong MA, Chao LI, Xiangang LI