Patents by Inventor Xuefei Gong

Xuefei Gong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11875775
    Abstract: The present disclosure proposes a speech conversion scheme for non-parallel corpus training, to get rid of dependence on parallel text and resolve a technical problem that it is difficult to achieve speech conversion under conditions that resources and equipment are limited. A voice conversion system and a training method therefor are included. Compared with the prior art, according to the embodiments of the present disclosure: a trained speaker-independent automatic speech recognition model can be used for any source speaker, that is, the speaker is independent; and bottleneck features of audio are more abstract as compared with phonetic posteriorGram features, can reflect decoupling of spoken content and timbre of the speaker, and meanwhile are not closely bound with a phoneme class, and are not in a clear one-to-one correspondence relationship. In this way, a problem of inaccurate pronunciation caused by a recognition error in ASR is relieved to some extent.
    Type: Grant
    Filed: April 20, 2021
    Date of Patent: January 16, 2024
    Assignee: Nanjing Silicon Intelligence Technology Co., Ltd.
    Inventors: Huapeng Sima, Zhiqiang Mao, Xuefei Gong
  • Patent number: 11763801
    Abstract: Embodiments of the present application provide a method and system for outputting target audio, a readable storage medium, and an electronic device. The method includes: inputting source audio into a phonetic posteriorgram PPG classification network model to obtain a PPG feature vector, where the PPG feature vector is used for indicating a phoneme label corresponding to each frame of the source audio, and the PPG feature vector contains text information and prosodic information of the source audio; inputting the PPG feature vector into a voice conversion network model, and outputting an acoustic feature vector of target audio based on the phoneme label corresponding to the PPG feature vector, where the target audio contains a plurality pieces of audio with different timbres; and inputting the acoustic feature vector of the target audio into a voice coder, and outputting the target audio through the voice coder.
    Type: Grant
    Filed: August 29, 2022
    Date of Patent: September 19, 2023
    Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.
    Inventors: Huapeng Sima, Zhiqiang Mao, Xuefei Gong
  • Publication number: 20230197061
    Abstract: Embodiments of the present application provide a method and system for outputting target audio, a readable storage medium, and an electronic device. The method includes: inputting source audio into a phonetic posteriorgram PPG classification network model to obtain a PPG feature vector, where the PPG feature vector is used for indicating a phoneme label corresponding to each frame of the source audio, and the PPG feature vector contains text information and prosodic information of the source audio; inputting the PPG feature vector into a voice conversion network model, and outputting an acoustic feature vector of target audio based on the phoneme label corresponding to the PPG feature vector, where the target audio contains a plurality pieces of audio with different timbres; and inputting the acoustic feature vector of the target audio into a voice coder, and outputting the target audio through the voice coder.
    Type: Application
    Filed: August 29, 2022
    Publication date: June 22, 2023
    Inventors: Huapeng Sima, Zhiqiang Mao, Xuefei Gong
  • Publication number: 20220310063
    Abstract: The present disclosure proposes a speech conversion scheme for non-parallel corpus training, to get rid of dependence on parallel text and resolve a technical problem that it is difficult to achieve speech conversion under conditions that resources and equipment are limited. A voice conversion system and a training method therefor are included. Compared with the prior art, according to the embodiments of the present disclosure: a trained speaker-independent automatic speech recognition model can be used for any source speaker, that is, the speaker is independent; and bottleneck features of audio are more abstract as compared with phonetic posteriorGram features, can reflect decoupling of spoken content and timbre of the speaker, and meanwhile are not closely bound with a phoneme class, and are not in a clear one-to-one correspondence relationship. In this way, a problem of inaccurate pronunciation caused by a recognition error in ASR is relieved to some extent.
    Type: Application
    Filed: April 20, 2021
    Publication date: September 29, 2022
    Inventors: Huapeng Sima, Zhiqiang Mao, Xuefei Gong