Patents by Inventor Xuefei Gong

Xuefei Gong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Voice conversion system and training method therefor

Patent number: 11875775

Abstract: The present disclosure proposes a speech conversion scheme for non-parallel corpus training, to get rid of dependence on parallel text and resolve a technical problem that it is difficult to achieve speech conversion under conditions that resources and equipment are limited. A voice conversion system and a training method therefor are included. Compared with the prior art, according to the embodiments of the present disclosure: a trained speaker-independent automatic speech recognition model can be used for any source speaker, that is, the speaker is independent; and bottleneck features of audio are more abstract as compared with phonetic posteriorGram features, can reflect decoupling of spoken content and timbre of the speaker, and meanwhile are not closely bound with a phoneme class, and are not in a clear one-to-one correspondence relationship. In this way, a problem of inaccurate pronunciation caused by a recognition error in ASR is relieved to some extent.

Type: Grant

Filed: April 20, 2021

Date of Patent: January 16, 2024

Assignee: Nanjing Silicon Intelligence Technology Co., Ltd.

Inventors: Huapeng Sima, Zhiqiang Mao, Xuefei Gong
Method and system for outputting target audio, readable storage medium, and electronic device

Patent number: 11763801

Abstract: Embodiments of the present application provide a method and system for outputting target audio, a readable storage medium, and an electronic device. The method includes: inputting source audio into a phonetic posteriorgram PPG classification network model to obtain a PPG feature vector, where the PPG feature vector is used for indicating a phoneme label corresponding to each frame of the source audio, and the PPG feature vector contains text information and prosodic information of the source audio; inputting the PPG feature vector into a voice conversion network model, and outputting an acoustic feature vector of target audio based on the phoneme label corresponding to the PPG feature vector, where the target audio contains a plurality pieces of audio with different timbres; and inputting the acoustic feature vector of the target audio into a voice coder, and outputting the target audio through the voice coder.

Type: Grant

Filed: August 29, 2022

Date of Patent: September 19, 2023

Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng Sima, Zhiqiang Mao, Xuefei Gong
Method and System for Outputting Target Audio, Readable Storage Medium, and Electronic Device

Publication number: 20230197061

Abstract: Embodiments of the present application provide a method and system for outputting target audio, a readable storage medium, and an electronic device. The method includes: inputting source audio into a phonetic posteriorgram PPG classification network model to obtain a PPG feature vector, where the PPG feature vector is used for indicating a phoneme label corresponding to each frame of the source audio, and the PPG feature vector contains text information and prosodic information of the source audio; inputting the PPG feature vector into a voice conversion network model, and outputting an acoustic feature vector of target audio based on the phoneme label corresponding to the PPG feature vector, where the target audio contains a plurality pieces of audio with different timbres; and inputting the acoustic feature vector of the target audio into a voice coder, and outputting the target audio through the voice coder.

Type: Application

Filed: August 29, 2022

Publication date: June 22, 2023

Inventors: Huapeng Sima, Zhiqiang Mao, Xuefei Gong
Voice conversion system and training method therefor

Publication number: 20220310063

Abstract: The present disclosure proposes a speech conversion scheme for non-parallel corpus training, to get rid of dependence on parallel text and resolve a technical problem that it is difficult to achieve speech conversion under conditions that resources and equipment are limited. A voice conversion system and a training method therefor are included. Compared with the prior art, according to the embodiments of the present disclosure: a trained speaker-independent automatic speech recognition model can be used for any source speaker, that is, the speaker is independent; and bottleneck features of audio are more abstract as compared with phonetic posteriorGram features, can reflect decoupling of spoken content and timbre of the speaker, and meanwhile are not closely bound with a phoneme class, and are not in a clear one-to-one correspondence relationship. In this way, a problem of inaccurate pronunciation caused by a recognition error in ASR is relieved to some extent.

Type: Application

Filed: April 20, 2021

Publication date: September 29, 2022

Inventors: Huapeng Sima, Zhiqiang Mao, Xuefei Gong

Voice conversion system and training method therefor

Method and system for outputting target audio, readable storage medium, and electronic device

Method and System for Outputting Target Audio, Readable Storage Medium, and Electronic Device

Voice conversion system and training method therefor