Patents by Inventor Jinfeng BAI

Jinfeng BAI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12118989
    Abstract: The present disclosure provides a speech processing method, and a method for generating a speech processing model, related to a field of signal processing technologies. The speech processing method includes: obtaining M speech signals to be processed and N reference signals; performing sub-band decomposition on each of the M speech signals and each of the N reference signals to obtain frequency-band components in each speech signal and each reference signal; processing the frequency-band components in each speech signal and each reference signal by using an echo cancellation model, to obtain an ideal ratio mask corresponding to the N reference signals in each frequency band of each speech signal; and performing echo cancellation on each frequency-band component of each speech signal based on the ideal ratio mask corresponding to the N reference signals in each frequency band of each speech signal, to obtain M echo-cancelled speech signals.
    Type: Grant
    Filed: October 21, 2021
    Date of Patent: October 15, 2024
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Xu Chen, Jinfeng Bai, Runqiang Han, Lei Jia
  • Patent number: 12112746
    Abstract: The present disclosure provides a method and a device for processing voice interaction, an electronic device and a storage medium. The method includes: determining a first integrity of a voice instruction from a user by using a pre-trained integrity detection model in response to detecting that the voice instruction from the user is not a high-frequency instruction; determining a waiting duration for the voice instruction based on the first integrity and a preset integrity threshold, wherein the waiting duration for the voice instruction indicates a length of period between a time when a voice interaction device determines that receiving the voice instruction is completed and a time when the voice interaction device performs an operation in response to the voice instruction of the user; and controlling the voice interaction device to respond to the voice instruction of the user based on the waiting duration.
    Type: Grant
    Filed: September 15, 2021
    Date of Patent: October 8, 2024
    Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.
    Inventors: Jinfeng Bai, Zhijian Wang, Cong Gao
  • Patent number: 11830482
    Abstract: Embodiments of the present disclosure relate to a method and an apparatus for speech interaction, and a computer readable storage medium. The method may include determining text information corresponding to a received speech signal. The method also includes obtaining label information of the text information by labeling elements in the text information. In addition, the method further includes determining first intention information of the text information based on the label information. The method further includes determining a semantic of the text information based on the first intention information and the label information.
    Type: Grant
    Filed: June 8, 2020
    Date of Patent: November 28, 2023
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD
    Inventors: Zhen Wu, Yufang Wu, Hua Liang, Jiaxiang Ge, Xingyuan Peng, Jinfeng Bai, Lei Jia
  • Patent number: 11823662
    Abstract: The present disclosure discloses a control method and a control apparatus for speech interaction. The detailed implementation solution of the control method for the speech interaction includes: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the wake-up word result.
    Type: Grant
    Filed: January 26, 2021
    Date of Patent: November 21, 2023
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Cong Gao, Saisai Zou, Jinfeng Bai, Lei Jia
  • Patent number: 11735168
    Abstract: A method and an apparatus for recognizing a voice are provided. The method may include: inputting a target voice into a pre-trained voice recognition model to obtain an initial text output by at least one recognition network in the voice recognition model, the recognition network including a plurality of preset types of processing layers, and at least one type of processing layer of the recognition network being obtained by training based on a voice sample in a preset direction interval; and determining a voice recognition result of the target voice, based on the initial text.
    Type: Grant
    Filed: March 23, 2021
    Date of Patent: August 22, 2023
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Xin Li, Bin Huang, Ce Zhang, Jinfeng Bai, Lei Jia
  • Patent number: 11620983
    Abstract: The disclosure provides a speech recognition method, a device and a computer-readable storage medium. The method includes obtaining a first voice signal collected from a first microphone in a microphone array and a second voice signal collected from a second microphone in the microphone array, the microphone array including at least two microphones, such as two, three or six microphones. The method further includes extracting enhanced features associated with the first voice signal and the second voice signal through a neural network, and obtaining a speech recognition result based on the enhanced features extracted.
    Type: Grant
    Filed: August 10, 2020
    Date of Patent: April 4, 2023
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD
    Inventors: Ce Zhang, Bin Huang, Xin Li, Jinfeng Bai, Xu Chen, Lei Jia
  • Patent number: 11615784
    Abstract: The present disclosure discloses a control method and a control apparatus for speech interaction. The detailed implementation solution of the control method for the speech interaction includes: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the wake-up word result.
    Type: Grant
    Filed: December 11, 2020
    Date of Patent: March 28, 2023
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Cong Gao, Saisai Zou, Jinfeng Bai, Lei Jia
  • Patent number: 11503155
    Abstract: The present disclosure discloses an interactive voice-control method and apparatus, a device and a medium. The method includes: obtaining a sound signal at a voice interaction device and recognized information that is recognized from the sound signal; determining an interaction confidence of the sound signal based at least on at least one of an acoustic feature representation of the sound signal and a semantic feature representation associated with the recognized information; determining a matching status between the recognized information and the sound signal; and providing the interaction confidence and the matching status for controlling a response of the voice interaction device to the sound signal.
    Type: Grant
    Filed: September 24, 2020
    Date of Patent: November 15, 2022
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Jinfeng Bai, Chuanlei Zhai, Xu Chen, Tao Chen, Xiaokong Ma, Ce Zhang, Zhen Wu, Xingyuan Peng, Zhijian Wang, Sheng Qian, Guibin Wang, Lei Jia
  • Patent number: 11488577
    Abstract: The present application discloses a training method and an apparatus for a speech synthesis model, electronic device, and storage medium. The method includes: taking a syllable input sequence, a phoneme input sequence and a Chinese character input sequence of a current sample as inputs of an encoder of a model to be trained, to obtain encoded representations of these three sequences at an output end of the encoder; fusing the encoded representations of these three sequences, to obtain a weighted combination of these three sequences; taking the weighted combination as an input of an attention module, to obtain a weighted average of the weighted combination at each moment at an output end of the attention module; taking the weighted average as an input of a decoder of the model to be trained, to obtain a speech Mel spectrum of the current sample at an output end of the decoder.
    Type: Grant
    Filed: June 19, 2020
    Date of Patent: November 1, 2022
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Zhipeng Chen, Jinfeng Bai, Lei Jia
  • Patent number: 11393490
    Abstract: According to embodiments of the present disclosure, a method, apparatus, device, and computer readable storage medium for voice interaction are provided. The method includes: determining a text corresponding to the voice signal based on a voice feature of a received voice signal. The method further includes: determining, based on the voice feature and the text, a matching degree between a reference voice feature of an element in the text and a target voice feature of the element. The method further includes: determining a first possibility that the voice signal is an executable command based on the text. The method further includes: determining a second possibility that the voice signal is the executable command based on the voice feature.
    Type: Grant
    Filed: June 8, 2020
    Date of Patent: July 19, 2022
    Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.
    Inventors: Zhijian Wang, Jinfeng Bai, Sheng Qian, Lei Jia
  • Patent number: 11322151
    Abstract: According to embodiments of the disclosure, a method and an apparatus for processing a speech signal, and a computer-readable storage medium are provided. The method includes obtaining a set of speech feature representations of a speech signal received. The method also includes generating a set of source text feature representations based on a text recognized from the speech signal, each source text feature representation corresponding to an element in the text. The method also includes generating a set of target text feature representations based on the set of speech feature representations and the set of source text feature representations. The method also includes determining a match degree between the set of target text feature representations and a set of reference text feature representations predefined for the text, the match degree indicating an accuracy of recognizing of the text.
    Type: Grant
    Filed: June 22, 2020
    Date of Patent: May 3, 2022
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD
    Inventors: Chuanlei Zhai, Xu Chen, Jinfeng Bai, Lei Jia
  • Patent number: 11250854
    Abstract: A method, apparatus, device, and storage medium for voice interaction. A specific embodiment of the method includes: extracting an acoustic feature from received voice data, the acoustic feature indicating a short-term amplitude spectrum characteristic of the voice data; applying the acoustic feature to a type recognition model to determine an intention type of the voice data, the intention type being one of an interaction intention type and a non-interaction intention type, and the type recognition model being constructed based on the acoustic feature of training voice data; and performing an interaction operation indicated by the voice data, based on determining that the intention type is the interaction intention type.
    Type: Grant
    Filed: June 8, 2020
    Date of Patent: February 15, 2022
    Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.
    Inventors: Xiaokong Ma, Ce Zhang, Jinfeng Bai, Lei Jia
  • Publication number: 20220044678
    Abstract: The present disclosure provides a speech processing method, and a method for generating a speech processing model, related to a field of signal processing technologies. The speech processing method includes: obtaining M speech signals to be processed and N reference signals; performing sub-band decomposition on each of the M speech signals and each of the N reference signals to obtain frequency-band components in each speech signal and each reference signal; processing the frequency-band components in each speech signal and each reference signal by using an echo cancellation model, to obtain an ideal ratio mask corresponding to the N reference signals in each frequency band of each speech signal; and performing echo cancellation on each frequency-band component of each speech signal based on the ideal ratio mask corresponding to the N reference signals in each frequency band of each speech signal, to obtain M echo-cancelled speech signals.
    Type: Application
    Filed: October 21, 2021
    Publication date: February 10, 2022
    Inventors: Xu CHEN, Jinfeng BAI, Runqiang HAN, Lei JIA
  • Publication number: 20220005474
    Abstract: The present disclosure provides a method and a device for processing voice interaction, an electronic device and a storage medium. The method includes: determining a first integrity of a voice instruction from a user by using a pre-trained integrity detection model in response to detecting that the voice instruction from the user is not a high-frequency instruction; determining a waiting duration for the voice instruction based on the first integrity and a preset integrity threshold, wherein the waiting duration for the voice instruction indicates a length of period between a time when a voice interaction device determines that receiving the voice instruction is completed and a time when the voice interaction device performs an operation in response to the voice instruction of the user; and controlling the voice interaction device to respond to the voice instruction of the user based on the waiting duration.
    Type: Application
    Filed: September 15, 2021
    Publication date: January 6, 2022
    Inventors: Jinfeng BAI, Zhijian WANG, Cong GAO
  • Publication number: 20210407494
    Abstract: The present disclosure discloses a control method and a control apparatus for speech interaction. The detailed implementation solution of the control method for the speech interaction includes: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the wake-up word result.
    Type: Application
    Filed: December 11, 2020
    Publication date: December 30, 2021
    Inventors: Cong GAO, Saisai ZOU, Jinfeng BAI, Lei JIA
  • Publication number: 20210407496
    Abstract: The present disclosure discloses a control method and a control apparatus for speech interaction. The detailed implementation solution of the control method for the speech interaction includes: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the wake-up word result.
    Type: Application
    Filed: January 26, 2021
    Publication date: December 30, 2021
    Inventors: Cong GAO, Saisai ZOU, Jinfeng BAI, Lei JIA
  • Publication number: 20210319802
    Abstract: The disclosure provides a method for processing a speech signal, an electronic device and a storage medium. The method includes: obtaining a speech signal to be processed and a reference speech signal; obtaining a frequency-domain speech signal to be processed and a reference frequency-domain speech signal by respectively preprocessing the speech signal to be processed and the reference speech signal; obtaining a frequency-domain speech signal ratio by inputting the frequency-domain speech signal to be processed and the reference frequency-domain speech signal into a complex neural network model; and obtaining a target frequency-domain speech signal based on the frequency-domain speech signal ratio and the frequency-domain speech signal to be processed, and obtaining a target speech signal by processing the target frequency-domain speech signal.
    Type: Application
    Filed: June 8, 2021
    Publication date: October 14, 2021
    Inventor: Jinfeng BAI
  • Publication number: 20210233518
    Abstract: A method and an apparatus for recognizing a voice are provided. The method may include: inputting a target voice into a pre-trained voice recognition model to obtain an initial text output by at least one recognition network in the voice recognition model, the recognition network including a plurality of preset types of processing layers, and at least one type of processing layer of the recognition network being obtained by training based on a voice sample in a preset direction interval; and determining a voice recognition result of the target voice, based on the initial text.
    Type: Application
    Filed: March 23, 2021
    Publication date: July 29, 2021
    Inventors: Xin LI, Bin HUANG, Ce ZHANG, Jinfeng BAI, Lei JIA
  • Publication number: 20210210113
    Abstract: The present disclosure provides a method and apparatus for detecting a voice, relates to the fields of voice processing and deep learning technology. The method may include: acquiring a target voice; and inputting the target voice into a pre-trained deep neural network to obtain whether the target voice has a sub-voice in each of a plurality of preset direction intervals, the deep neural network being used to predict whether the voice has a sub-voice in each of the plurality of direction intervals.
    Type: Application
    Filed: March 22, 2021
    Publication date: July 8, 2021
    Inventors: Xin Li, Bin Huang, Ce Zhang, Jinfeng Bai, Lei Jia
  • Publication number: 20210158799
    Abstract: The disclosure provides a speech recognition method, a device and a computer-readable storage medium. The method includes obtaining a first voice signal collected from a first microphone in a microphone array and a second voice signal collected from a second microphone in the microphone array, the microphone array including at least two microphones, such as two, three or six microphones. The method further includes extracting enhanced features associated with the first voice signal and the second voice signal through a neural network, and obtaining a speech recognition result based on the enhanced features extracted.
    Type: Application
    Filed: August 10, 2020
    Publication date: May 27, 2021
    Inventors: Ce ZHANG, Bin Huang, Xin Li, Jinfeng Bai, Xu Chen, Lei Jia