Patents by Inventor Jinfeng BAI

Jinfeng BAI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Speech processing method and method for generating speech processing model

Patent number: 12118989

Abstract: The present disclosure provides a speech processing method, and a method for generating a speech processing model, related to a field of signal processing technologies. The speech processing method includes: obtaining M speech signals to be processed and N reference signals; performing sub-band decomposition on each of the M speech signals and each of the N reference signals to obtain frequency-band components in each speech signal and each reference signal; processing the frequency-band components in each speech signal and each reference signal by using an echo cancellation model, to obtain an ideal ratio mask corresponding to the N reference signals in each frequency band of each speech signal; and performing echo cancellation on each frequency-band component of each speech signal based on the ideal ratio mask corresponding to the N reference signals in each frequency band of each speech signal, to obtain M echo-cancelled speech signals.

Type: Grant

Filed: October 21, 2021

Date of Patent: October 15, 2024

Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventors: Xu Chen, Jinfeng Bai, Runqiang Han, Lei Jia
Method and device for processing voice interaction, electronic device and storage medium

Patent number: 12112746

Abstract: The present disclosure provides a method and a device for processing voice interaction, an electronic device and a storage medium. The method includes: determining a first integrity of a voice instruction from a user by using a pre-trained integrity detection model in response to detecting that the voice instruction from the user is not a high-frequency instruction; determining a waiting duration for the voice instruction based on the first integrity and a preset integrity threshold, wherein the waiting duration for the voice instruction indicates a length of period between a time when a voice interaction device determines that receiving the voice instruction is completed and a time when the voice interaction device performs an operation in response to the voice instruction of the user; and controlling the voice interaction device to respond to the voice instruction of the user based on the waiting duration.

Type: Grant

Filed: September 15, 2021

Date of Patent: October 8, 2024

Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventors: Jinfeng Bai, Zhijian Wang, Cong Gao
Method and apparatus for speech interaction, and computer storage medium

Patent number: 11830482

Abstract: Embodiments of the present disclosure relate to a method and an apparatus for speech interaction, and a computer readable storage medium. The method may include determining text information corresponding to a received speech signal. The method also includes obtaining label information of the text information by labeling elements in the text information. In addition, the method further includes determining first intention information of the text information based on the label information. The method further includes determining a semantic of the text information based on the first intention information and the label information.

Type: Grant

Filed: June 8, 2020

Date of Patent: November 28, 2023

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD

Inventors: Zhen Wu, Yufang Wu, Hua Liang, Jiaxiang Ge, Xingyuan Peng, Jinfeng Bai, Lei Jia
Control method and control apparatus for speech interaction, storage medium and system

Patent number: 11823662

Abstract: The present disclosure discloses a control method and a control apparatus for speech interaction. The detailed implementation solution of the control method for the speech interaction includes: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the wake-up word result.

Type: Grant

Filed: January 26, 2021

Date of Patent: November 21, 2023

Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventors: Cong Gao, Saisai Zou, Jinfeng Bai, Lei Jia
Method and apparatus for recognizing voice

Patent number: 11735168

Abstract: A method and an apparatus for recognizing a voice are provided. The method may include: inputting a target voice into a pre-trained voice recognition model to obtain an initial text output by at least one recognition network in the voice recognition model, the recognition network including a plurality of preset types of processing layers, and at least one type of processing layer of the recognition network being obtained by training based on a voice sample in a preset direction interval; and determining a voice recognition result of the target voice, based on the initial text.

Type: Grant

Filed: March 23, 2021

Date of Patent: August 22, 2023

Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventors: Xin Li, Bin Huang, Ce Zhang, Jinfeng Bai, Lei Jia
Speech recognition method, device, and computer-readable storage medium

Patent number: 11620983

Abstract: The disclosure provides a speech recognition method, a device and a computer-readable storage medium. The method includes obtaining a first voice signal collected from a first microphone in a microphone array and a second voice signal collected from a second microphone in the microphone array, the microphone array including at least two microphones, such as two, three or six microphones. The method further includes extracting enhanced features associated with the first voice signal and the second voice signal through a neural network, and obtaining a speech recognition result based on the enhanced features extracted.

Type: Grant

Filed: August 10, 2020

Date of Patent: April 4, 2023

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD

Inventors: Ce Zhang, Bin Huang, Xin Li, Jinfeng Bai, Xu Chen, Lei Jia
Control method and control apparatus for speech interaction

Patent number: 11615784

Abstract: The present disclosure discloses a control method and a control apparatus for speech interaction. The detailed implementation solution of the control method for the speech interaction includes: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the wake-up word result.

Type: Grant

Filed: December 11, 2020

Date of Patent: March 28, 2023

Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventors: Cong Gao, Saisai Zou, Jinfeng Bai, Lei Jia
Interactive voice-control method and apparatus, device and medium

Patent number: 11503155

Abstract: The present disclosure discloses an interactive voice-control method and apparatus, a device and a medium. The method includes: obtaining a sound signal at a voice interaction device and recognized information that is recognized from the sound signal; determining an interaction confidence of the sound signal based at least on at least one of an acoustic feature representation of the sound signal and a semantic feature representation associated with the recognized information; determining a matching status between the recognized information and the sound signal; and providing the interaction confidence and the matching status for controlling a response of the voice interaction device to the sound signal.

Type: Grant

Filed: September 24, 2020

Date of Patent: November 15, 2022

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventors: Jinfeng Bai, Chuanlei Zhai, Xu Chen, Tao Chen, Xiaokong Ma, Ce Zhang, Zhen Wu, Xingyuan Peng, Zhijian Wang, Sheng Qian, Guibin Wang, Lei Jia
Training method and apparatus for a speech synthesis model, and storage medium

Patent number: 11488577

Abstract: The present application discloses a training method and an apparatus for a speech synthesis model, electronic device, and storage medium. The method includes: taking a syllable input sequence, a phoneme input sequence and a Chinese character input sequence of a current sample as inputs of an encoder of a model to be trained, to obtain encoded representations of these three sequences at an output end of the encoder; fusing the encoded representations of these three sequences, to obtain a weighted combination of these three sequences; taking the weighted combination as an input of an attention module, to obtain a weighted average of the weighted combination at each moment at an output end of the attention module; taking the weighted average as an input of a decoder of the model to be trained, to obtain a speech Mel spectrum of the current sample at an output end of the decoder.

Type: Grant

Filed: June 19, 2020

Date of Patent: November 1, 2022

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventors: Zhipeng Chen, Jinfeng Bai, Lei Jia
Method, apparatus, device and computer-readable storage medium for voice interaction

Patent number: 11393490

Abstract: According to embodiments of the present disclosure, a method, apparatus, device, and computer readable storage medium for voice interaction are provided. The method includes: determining a text corresponding to the voice signal based on a voice feature of a received voice signal. The method further includes: determining, based on the voice feature and the text, a matching degree between a reference voice feature of an element in the text and a target voice feature of the element. The method further includes: determining a first possibility that the voice signal is an executable command based on the text. The method further includes: determining a second possibility that the voice signal is the executable command based on the voice feature.

Type: Grant

Filed: June 8, 2020

Date of Patent: July 19, 2022

Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.

Inventors: Zhijian Wang, Jinfeng Bai, Sheng Qian, Lei Jia
Method, apparatus, and medium for processing speech signal

Patent number: 11322151

Abstract: According to embodiments of the disclosure, a method and an apparatus for processing a speech signal, and a computer-readable storage medium are provided. The method includes obtaining a set of speech feature representations of a speech signal received. The method also includes generating a set of source text feature representations based on a text recognized from the speech signal, each source text feature representation corresponding to an element in the text. The method also includes generating a set of target text feature representations based on the set of speech feature representations and the set of source text feature representations. The method also includes determining a match degree between the set of target text feature representations and a set of reference text feature representations predefined for the text, the match degree indicating an accuracy of recognizing of the text.

Type: Grant

Filed: June 22, 2020

Date of Patent: May 3, 2022

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD

Inventors: Chuanlei Zhai, Xu Chen, Jinfeng Bai, Lei Jia
Method and apparatus for voice interaction, device and computer-readable storage medium

Patent number: 11250854

Abstract: A method, apparatus, device, and storage medium for voice interaction. A specific embodiment of the method includes: extracting an acoustic feature from received voice data, the acoustic feature indicating a short-term amplitude spectrum characteristic of the voice data; applying the acoustic feature to a type recognition model to determine an intention type of the voice data, the intention type being one of an interaction intention type and a non-interaction intention type, and the type recognition model being constructed based on the acoustic feature of training voice data; and performing an interaction operation indicated by the voice data, based on determining that the intention type is the interaction intention type.

Type: Grant

Filed: June 8, 2020

Date of Patent: February 15, 2022

Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.

Inventors: Xiaokong Ma, Ce Zhang, Jinfeng Bai, Lei Jia
SPEECH PROCESSING METHOD AND METHOD FOR GENERATING SPEECH PROCESSING MODEL

Publication number: 20220044678

Abstract: The present disclosure provides a speech processing method, and a method for generating a speech processing model, related to a field of signal processing technologies. The speech processing method includes: obtaining M speech signals to be processed and N reference signals; performing sub-band decomposition on each of the M speech signals and each of the N reference signals to obtain frequency-band components in each speech signal and each reference signal; processing the frequency-band components in each speech signal and each reference signal by using an echo cancellation model, to obtain an ideal ratio mask corresponding to the N reference signals in each frequency band of each speech signal; and performing echo cancellation on each frequency-band component of each speech signal based on the ideal ratio mask corresponding to the N reference signals in each frequency band of each speech signal, to obtain M echo-cancelled speech signals.

Type: Application

Filed: October 21, 2021

Publication date: February 10, 2022

Inventors: Xu CHEN, Jinfeng BAI, Runqiang HAN, Lei JIA
METHOD AND DEVICE FOR PROCESSING VOICE INTERACTION, ELECTRONIC DEVICE AND STORAGE MEDIUM

Publication number: 20220005474

Abstract: The present disclosure provides a method and a device for processing voice interaction, an electronic device and a storage medium. The method includes: determining a first integrity of a voice instruction from a user by using a pre-trained integrity detection model in response to detecting that the voice instruction from the user is not a high-frequency instruction; determining a waiting duration for the voice instruction based on the first integrity and a preset integrity threshold, wherein the waiting duration for the voice instruction indicates a length of period between a time when a voice interaction device determines that receiving the voice instruction is completed and a time when the voice interaction device performs an operation in response to the voice instruction of the user; and controlling the voice interaction device to respond to the voice instruction of the user based on the waiting duration.

Type: Application

Filed: September 15, 2021

Publication date: January 6, 2022

Inventors: Jinfeng BAI, Zhijian WANG, Cong GAO
CONTROL METHOD AND CONTROL APPARATUS FOR SPEECH INTERACTION

Publication number: 20210407494

Abstract: The present disclosure discloses a control method and a control apparatus for speech interaction. The detailed implementation solution of the control method for the speech interaction includes: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the wake-up word result.

Type: Application

Filed: December 11, 2020

Publication date: December 30, 2021

Inventors: Cong GAO, Saisai ZOU, Jinfeng BAI, Lei JIA
CONTROL METHOD AND CONTROL APPARATUS FOR SPEECH INTERACTION, STORAGE MEDIUM AND SYSTEM

Publication number: 20210407496

Abstract: The present disclosure discloses a control method and a control apparatus for speech interaction. The detailed implementation solution of the control method for the speech interaction includes: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the wake-up word result.

Type: Application

Filed: January 26, 2021

Publication date: December 30, 2021

Inventors: Cong GAO, Saisai ZOU, Jinfeng BAI, Lei JIA
METHOD FOR PROCESSING SPEECH SIGNAL, ELECTRONIC DEVICE AND STORAGE MEDIUM

Publication number: 20210319802

Abstract: The disclosure provides a method for processing a speech signal, an electronic device and a storage medium. The method includes: obtaining a speech signal to be processed and a reference speech signal; obtaining a frequency-domain speech signal to be processed and a reference frequency-domain speech signal by respectively preprocessing the speech signal to be processed and the reference speech signal; obtaining a frequency-domain speech signal ratio by inputting the frequency-domain speech signal to be processed and the reference frequency-domain speech signal into a complex neural network model; and obtaining a target frequency-domain speech signal based on the frequency-domain speech signal ratio and the frequency-domain speech signal to be processed, and obtaining a target speech signal by processing the target frequency-domain speech signal.

Type: Application

Filed: June 8, 2021

Publication date: October 14, 2021

Inventor: Jinfeng BAI
METHOD AND APPARATUS FOR RECOGNIZING VOICE

Publication number: 20210233518

Abstract: A method and an apparatus for recognizing a voice are provided. The method may include: inputting a target voice into a pre-trained voice recognition model to obtain an initial text output by at least one recognition network in the voice recognition model, the recognition network including a plurality of preset types of processing layers, and at least one type of processing layer of the recognition network being obtained by training based on a voice sample in a preset direction interval; and determining a voice recognition result of the target voice, based on the initial text.

Type: Application

Filed: March 23, 2021

Publication date: July 29, 2021

Inventors: Xin LI, Bin HUANG, Ce ZHANG, Jinfeng BAI, Lei JIA
METHOD AND APPARATUS FOR DETECTING VOICE

Publication number: 20210210113

Abstract: The present disclosure provides a method and apparatus for detecting a voice, relates to the fields of voice processing and deep learning technology. The method may include: acquiring a target voice; and inputting the target voice into a pre-trained deep neural network to obtain whether the target voice has a sub-voice in each of a plurality of preset direction intervals, the deep neural network being used to predict whether the voice has a sub-voice in each of the plurality of direction intervals.

Type: Application

Filed: March 22, 2021

Publication date: July 8, 2021

Inventors: Xin Li, Bin Huang, Ce Zhang, Jinfeng Bai, Lei Jia
SPEECH RECOGNITION METHOD, DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM

Publication number: 20210158799

Abstract: The disclosure provides a speech recognition method, a device and a computer-readable storage medium. The method includes obtaining a first voice signal collected from a first microphone in a microphone array and a second voice signal collected from a second microphone in the microphone array, the microphone array including at least two microphones, such as two, three or six microphones. The method further includes extracting enhanced features associated with the first voice signal and the second voice signal through a neural network, and obtaining a speech recognition result based on the enhanced features extracted.

Type: Application

Filed: August 10, 2020

Publication date: May 27, 2021

Inventors: Ce ZHANG, Bin Huang, Xin Li, Jinfeng Bai, Xu Chen, Lei Jia

1 2 next