Patents by Inventor Yongguo KANG

Yongguo KANG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHOD OF TRAINING DEEP LEARNING MODEL, AND METHOD OF SYNTHESIZING SPEECH

Publication number: 20250157457

Abstract: A method of training a deep learning model and a method of synthesizing a speech are provided, which relate to a field of artificial intelligence technology, in particular to fields of large model, large language model, generative model, deep learning, and speech processing technologies. The method of training a deep learning model includes: determining a reference speech feature of a sample speech, the reference speech feature being associated with a prosodic feature of the sample speech; retrieving a speech library using a sample text corresponding to the sample speech, so as to obtain a pronunciation expression feature of the sample text; inputting the pronunciation expression feature into the deep learning model to obtain an output speech feature; determining a loss of the deep learning model according to the reference speech feature and the output speech feature; and adjusting a parameter of the deep learning model according to the loss.

Type: Application

Filed: January 16, 2025

Publication date: May 15, 2025

Inventors: Bin HUANG, Tao SUN, Ce ZHANG, Yongguo KANG, Xiaoyin FU, Lei JIA
METHOD FOR SYNTHETIZING SPEECH AND ELECTRONIC DEVICE

Publication number: 20230081543

Abstract: A method for synthetizing a speech includes: obtaining a source speech; suppressing a noise in the source speech based on an amplitude component and/or phase component of the source speech, to obtain a noise-reduced speech; performing a speech recognition process on the noise-reduced speech to obtain corresponding text information; inputting the text information of the noise-reduced speech and a preset tag into a trained acoustic model to obtain a predicted acoustic feature matching the text information; and generating a target speech based on the predicted acoustic feature.

Type: Application

Filed: November 21, 2022

Publication date: March 16, 2023

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventors: Bo Peng, Yongguo Kang, Cong Gao
METHOD OF CONVERTING SPEECH, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM

Publication number: 20220383876

Abstract: A method of converting a speech, an electronic device, and a readable storage medium are provided, which relate to a field of artificial intelligence technology such as speech and deep learning, in particular to speech converting technology. The method of converting a speech includes: acquiring a first speech of a target speaker; acquiring a speech of an original speaker; extracting a first feature parameter of the first speech of the target speaker; extracting a second feature parameter of the speech of the original speaker; processing the first feature parameter and the second feature parameter to obtain a Mel spectrum information; and converting the Mel spectrum information to output a second speech of the target speaker having a tone identical to a tone of the first speech of the target speaker and a content identical to a content of the speech of the original speaker.

Type: Application

Filed: August 9, 2022

Publication date: December 1, 2022

Inventors: Yixiang CHEN, Junchao WANG, Yongguo KANG
METHOD AND APPARATUS FOR SPEECH GENERATION

Publication number: 20220301545

Abstract: A method for speech generation includes: acquiring speech information of an original speaker; performing text feature extraction on the speech information to obtain a text feature corresponding to the speech information; converting the text feature to an acoustic feature corresponding to a target speaker; and generating a target speech signal based on the acoustic feature.

Type: Application

Filed: June 1, 2022

Publication date: September 22, 2022

Inventors: Yongguo KANG, Junchao WANG
Method and apparatus for generating text-to-speech model

Patent number: 11017762

Abstract: Embodiments of the present disclosure disclose a method and apparatus for generating a text-to-speech model. A specific implementation of the method includes: obtaining a training sample set, a training sample including sample text information, sample audio data corresponding to the sample text information, and a fundamental frequency of the sample audio data; obtaining an initial deep neural network; and using the sample text information of the training sample in the training sample set as an input, and using the sample audio data corresponding to the input sample text information and the fundamental frequency of the sample audio data as an output, to train the initial deep neural network using a machine learning method, and defining the trained initial deep neural network as the text-to-speech model.

Type: Grant

Filed: December 28, 2018

Date of Patent: May 25, 2021

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventors: Yongguo Kang, Yu Gu
Speech broadcasting method, device, apparatus and computer-readable storage medium

Patent number: 11011175

Abstract: Embodiments of a speech broadcasting method, device, apparatus and a computer-readable storage medium are provided. The method can include: receiving recorded speech data from a plurality of speakers; extracting respective text features of the plurality of speakers from the recorded speech data, and allocating the plurality of speakers with respective identifications; and inputting the text features and the identifications of the speakers to a text-acoustic mapping model, to output speech features of the plurality of speakers; and establishing a mapping relationship between the text feature and the speech feature of each speaker. In the embodiments of the present application, a broadcaster can be selected to broadcast a text, greatly improving user experience of the text broadcasting.

Type: Grant

Filed: September 6, 2019

Date of Patent: May 18, 2021

Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.

Inventor: Yongguo Kang
Method and apparatus for generating speech synthesis model

Patent number: 10971131

Abstract: The present disclosure discloses a method and apparatus for generating a speech synthesis model. A specific embodiment of the method comprises: acquiring a plurality of types of training samples, each of the plurality of types of training samples including a text of the type, and a speech of the text having a style of speech corresponding to the type read by an announcer corresponding to the type; and training a neural network corresponding to a speech synthesis model using the plurality of types of training samples and an annotation of the style of speech in the each of the plurality of types of training samples to obtain the speech synthesis model, the speech synthesis model being used to synthesize speech of the announcer corresponding to each of the plurality of types having a plurality of styles.

Type: Grant

Filed: August 3, 2018

Date of Patent: April 6, 2021

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventor: Yongguo Kang
Speech synthesis method terminal and storage medium

Patent number: 10789938

Abstract: A speech synthesis method and device. The method comprises: determining language types of a statement to be synthesized; determining base models corresponding to the language types; determining a target timbre, performing adaptive transformation on the spectrum parameter models based on the target timbre, and training the statement to be synthesized based on the spectrum parameter models subjected to adaptive transformation to generate spectrum parameters; training the statement to be synthesized based on the fundamental frequency parameters to generate fundamental frequency parameters, and adjusting the fundamental frequency parameters based on the target timbre; and synthesizing the statement to be synthesized into a target speech based on the spectrum parameters, and the fundamental frequency parameters after adjusting.

Type: Grant

Filed: September 5, 2016

Date of Patent: September 29, 2020

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD

Inventors: Hao Li, Yongguo Kang
SPEECH PLAYING METHOD, AN INTELLIGENT DEVICE, AND COMPUTER READABLE STORAGE MEDIUM

Publication number: 20200184948

Abstract: The present disclosure provides a speech playing method, an intelligent device and a computer readable storage medium. The method includes obtaining an object to be played; recognizing a target object type of the object to be played; obtaining a playing label set matching with the object to be played based on the target object type; in which, the playing label set is configured to represent playing rules of the object to be played; and playing the object to be played based on the playing rules represented by the playing label set.

Type: Application

Filed: July 2, 2018

Publication date: June 11, 2020

Inventors: Lingjin XU, Yongguo KANG, Yangkai XU, Ben XU, Haiguang YUAN, Ran XU
SPEECH BROADCASTING METHOD, DEVICE, APPARATUS AND COMPUTER-READABLE STORAGE MEDIUM

Publication number: 20200135210

Abstract: Embodiments of a speech broadcasting method, device, apparatus and a computer-readable storage medium are provided. The method can include: receiving recorded speech data from a plurality of speakers; extracting respective text features of the plurality of speakers from the recorded speech data, and allocating the plurality of speakers with respective identifications; and inputting the text features and the identifications of the speakers to a text-acoustic mapping model, to output speech features of the plurality of speakers; and establishing a mapping relationship between the text feature and the speech feature of each speaker. In the embodiments of the present application, a broadcaster can be selected to broadcast a text, greatly improving user experience of the text broadcasting.

Type: Application

Filed: September 6, 2019

Publication date: April 30, 2020

Inventor: Yongguo KANG
METHOD AND APPARATUS FOR GENERATING TEXT-TO-SPEECH MODEL

Publication number: 20190355344

Abstract: Embodiments of the present disclosure disclose a method and apparatus for generating a text-to-speech model. A specific implementation of the method includes: obtaining a training sample set, a training sample including sample text information, sample audio data corresponding to the sample text information, and a fundamental frequency of the sample audio data; obtaining an initial deep neural network; and using the sample text information of the training sample in the training sample set as an input, and using the sample audio data corresponding to the input sample text information and the fundamental frequency of the sample audio data as an output, to train the initial deep neural network using a machine learning method, and defining the trained initial deep neural network as the text-to-speech model.

Type: Application

Filed: December 28, 2018

Publication date: November 21, 2019

Inventors: Yongguo Kang, Yu Gu
SPEECH SYNTHESIS METHOD TERMINAL AND STORAGE MEDIUM

Publication number: 20190213995

Abstract: A speech synthesis method and device. The method comprises: determining language types of a statement to be synthesized; determining base models corresponding to the language types; determining a target timbre, performing adaptive transformation on the spectrum parameter models based on the target timbre, and training the statement to be synthesized based on the spectrum parameter models subjected to adaptive transformation to generate spectrum parameters; training the statement to be synthesized based on the fundamental frequency parameters to generate fundamental frequency parameters, and adjusting the fundamental frequency parameters based on the target timbre; and synthesizing the statement to be synthesized into a target speech based on the spectrum parameters, and the fundamental frequency parameters after adjusting.

Type: Application

Filed: September 5, 2016

Publication date: July 11, 2019

Inventors: Hao LI, Yongguo KANG
METHOD AND APPARATUS FOR GENERATING SPEECH SYNTHESIS MODEL

Publication number: 20190096385

Abstract: The present disclosure discloses a method and apparatus for generating a speech synthesis model. A specific embodiment of the method comprises: acquiring a plurality of types of training samples, each of the plurality of types of training samples including a text of the type, and a speech of the text having a style of speech corresponding to the type read by an announcer corresponding to the type; and training a neural network corresponding to a speech synthesis model using the plurality of types of training samples and an annotation of the style of speech in the each of the plurality of types of training samples to obtain the speech synthesis model, the speech synthesis model being used to synthesize speech of the announcer corresponding to each of the plurality of types having a plurality of styles.

Type: Application

Filed: August 3, 2018

Publication date: March 28, 2019

Inventor: Yongguo KANG
Computer-implemented method and apparatus for generating grapheme-to-phoneme model

Patent number: 10181320

Abstract: A method and an apparatus for generating a g2p model based on AI are provided. The method includes: during performing a grapheme-to-phoneme conversion training by a neural network on each word in training data, screening nodes in a hidden layer of the neural network randomly according to a preset node ratio so as to obtain retaining nodes for training each word; training each word with a sub-neural network corresponding to the retaining nodes and updating a weight of each retaining node of the sub-neural network; and performing a mean processing on the weights of the retaining nodes of respective sub-neural networks, so as to generate the grapheme-to-phoneme model.

Type: Grant

Filed: December 28, 2016

Date of Patent: January 15, 2019

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventors: Zhijie Chen, Yongguo Kang
Computer-Implemented Method And Apparatus For Generating Grapheme-To-Phoneme Model

Publication number: 20170243575

Abstract: A method and an apparatus for generating a g2p model based on AI are provided. The method includes: during performing a grapheme-to-phoneme conversion training by a neural network on each word in training data, screening nodes in a hidden layer of the neural network randomly according to a preset node ratio so as to obtain retaining nodes for training each word; training each word with a sub-neural network corresponding to the retaining nodes and updating a weight of each retaining node of the sub-neural network; and performing a mean processing on the weights of the retaining nodes of respective sub-neural networks, so as to generate the grapheme-to-phoneme model.

Type: Application

Filed: December 28, 2016

Publication date: August 24, 2017

Inventors: Zhijie CHEN, Yongguo KANG