Patents by Inventor Shiyin Kang

Shiyin Kang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Artificial intelligence-based animation character drive method and related apparatus

Patent number: 12112417

Abstract: This application discloses an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.

Type: Grant

Filed: December 13, 2022

Date of Patent: October 8, 2024

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Linchao Bao, Shiyin Kang, Sheng Wang, Xiangkai Lin, Xing Ji, Zhantu Zhu, Kuongchi Lei, Deyi Tuo, Peng Liu
Voice synthesis method, model training method, device and computer device

Patent number: 12014720

Abstract: This application relates to a speech synthesis method and apparatus, a model training method and apparatus, and a computer device. The method includes: obtaining to-be-processed linguistic data; encoding the linguistic data, to obtain encoded linguistic data; obtaining an embedded vector for speech feature conversion, the embedded vector being generated according to a residual between synthesized reference speech data and reference speech data that correspond to the same reference linguistic data; and decoding the encoded linguistic data according to the embedded vector, to obtain target synthesized speech data on which the speech feature conversion is performed. The solution provided in this application can prevent quality of a synthesized speech from being affected by a semantic feature in a mel-frequency cepstrum.

Type: Grant

Filed: August 21, 2020

Date of Patent: June 18, 2024

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Xixin Wu, Mu Wang, Shiyin Kang, Dan Su, Dong Yu
Speech-driven animation method and apparatus based on artificial intelligence

Patent number: 12002138

Abstract: Embodiments of this application disclose a speech-driven animation method and apparatus based on artificial intelligence (AI). The method includes obtaining a first speech, the first speech comprising a plurality of speech frames; determining linguistics information corresponding to a speech frame in the first speech, the linguistics information being used for identifying a distribution possibility that the speech frame in the first speech pertains to phonemes; determining an expression parameter corresponding to the speech frame in the first speech according to the linguistics information; and enabling, according to the expression parameter, an animation character to make an expression corresponding to the first speech.

Type: Grant

Filed: October 8, 2021

Date of Patent: June 4, 2024

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Shiyin Kang, Deyi Tuo, Kuongchi Lei, Tianxiao Fu, Huirong Huang, Dan Su
ARTIFICIAL INTELLIGENCE-BASED ANIMATION CHARACTER DRIVE METHOD AND RELATED APPARATUS

Publication number: 20230123433

Abstract: This application discloses an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.

Type: Application

Filed: December 13, 2022

Publication date: April 20, 2023

Inventors: Linchao BAO, Shiyin KANG, Sheng WANG, Xiangkai LIN, Xing JI, Zhantu ZHU, Kuongchi LEI, Deyi TUO, Peng LIU
Artificial intelligence-based animation character drive method and related apparatus

Patent number: 11605193

Abstract: This application disclose an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.

Type: Grant

Filed: August 18, 2021

Date of Patent: March 14, 2023

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Linchao Bao, Shiyin Kang, Sheng Wang, Xiangkai Lin, Xing Ji, Zhantu Zhu, Kuongchi Lei, Deyi Tuo, Peng Liu
Method and apparatus for generating music

Patent number: 11301641

Abstract: A terminal for generating music may identify, based on execution of scenario recognition, scenarios for images previously received by the terminal. The terminal may generate respective description texts for the scenarios. The terminal may execute keyword-based rhyme matching based on the respective description texts. The terminal may generate respective rhyming lyrics corresponding to the images. The terminal may convert the respective rhyming lyrics corresponding to the images into a speech. The terminal may synthesize the speech with preset background music to obtain image music.

Type: Grant

Filed: October 22, 2019

Date of Patent: April 12, 2022

Assignee: Tencent Technology (Shenzhen) Company Limited

Inventors: Nan Wang, Wei Liu, Lin Ma, Wenhao Jiang, Guangzhi Li, Shiyin Kang, Deyi Tuo, Xiaolong Zhu, Youyi Zhang, Shaobin Lin, Yongsen Zheng, Zixin Zou, Jing He, Zaizhen Chen, Pinyi Li
SPEECH-DRIVEN ANIMATION METHOD AND APPARATUS BASED ON ARTIFICIAL INTELLIGENCE

Publication number: 20220044463

Abstract: Embodiments of this application disclose a speech-driven animation method and apparatus based on artificial intelligence (AI). The method includes obtaining a first speech, the first speech comprising a plurality of speech frames; determining linguistics information corresponding to a speech frame in the first speech, the linguistics information being used for identifying a distribution possibility that the speech frame in the first speech pertains to phonemes; determining an expression parameter corresponding to the speech frame in the first speech according to the linguistics information; and enabling, according to the expression parameter, an animation character to make an expression corresponding to the first speech.

Type: Application

Filed: October 8, 2021

Publication date: February 10, 2022

Inventors: Shiyin Kang, Deyi Tuo, Kuongchi Lei, Tianxiao Fu, Huirong Huang, Dan Su
ARTIFICIAL INTELLIGENCE-BASED ANIMATION CHARACTER DRIVE METHOD AND RELATED APPARATUS

Publication number: 20210383586

Abstract: This application disclose an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.

Type: Application

Filed: August 18, 2021

Publication date: December 9, 2021

Inventors: Linchao BAO, Shiyin Kang, Sheng Wang, Xiangkai Lin, Xing Ji, Zhantu Zhu, Kuongchi Lei, Deyi Tuo, Peng Liu
Enhancing hybrid self-attention structure with relative-position-aware bias for speech synthesis

Patent number: 11011154

Abstract: A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.

Type: Grant

Filed: February 8, 2019

Date of Patent: May 18, 2021

Assignee: TENCENT AMERICA LLC

Inventors: Shan Yang, Heng Lu, Shiyin Kang, Dong Yu
VOICE SYNTHESIS METHOD, MODEL TRAINING METHOD, DEVICE AND COMPUTER DEVICE

Publication number: 20200380949

Abstract: This application relates to a speech synthesis method and apparatus, a model training method and apparatus, and a computer device. The method includes: obtaining to-be-processed linguistic data; encoding the linguistic data, to obtain encoded linguistic data; obtaining an embedded vector for speech feature conversion, the embedded vector being generated according to a residual between synthesized reference speech data and reference speech data that correspond to the same reference linguistic data; and decoding the encoded linguistic data according to the embedded vector, to obtain target synthesized speech data on which the speech feature conversion is performed. The solution provided in this application can prevent quality of a synthesized speech from being affected by a semantic feature in a mel-frequency cepstrum.

Type: Application

Filed: August 21, 2020

Publication date: December 3, 2020

Inventors: Xixin WU, Mu WANG, Shiyin KANG, Dan SU, Dong YU
ENHANCING HYBRID SELF-ATTENTION STRUCTURE WITH RELATIVE-POSITION-AWARE BIAS FOR SPEECH SYNTHESIS

Publication number: 20200258496

Abstract: A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.

Type: Application

Filed: February 8, 2019

Publication date: August 13, 2020

Applicant: TENCENT AMERICA LLC

Inventors: Shan YANG, Heng LU, Shiyin KANG, Dong YU
METHOD AND APPARATUS FOR GENERATING MUSIC

Publication number: 20200051536

Abstract: A terminal for generating music may identify, based on execution of scenario recognition, scenarios for images previously received by the terminal. The terminal may generate respective description texts for the scenarios. The terminal may execute keyword-based rhyme matching based on the respective description texts. The terminal may generate respective rhyming lyrics corresponding to the images. The terminal may convert the respective rhyming lyrics corresponding to the images into a speech. The terminal may synthesize the speech with preset background music to obtain image music.

Type: Application

Filed: October 22, 2019

Publication date: February 13, 2020

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventors: Nan WANG, Wei LIU, Lin MA, Wenhao JIANG, Guangzhi LI, Shiyin KANG, Deyi TUO, Xiaolong ZHU, Youyi ZHANG, Shaobin LIN, Yongsen ZHENG, Zixin ZOU, Jing HE, Zaizhen CHEN, Pinyi LI
Phonetic posteriorgrams for many-to-one voice conversion

Patent number: 10176819

Abstract: A method for converting speech using phonetic posteriorgrams (PPGs). A target speech is obtained and a PPG is generated based on acoustic features of the target speech. Generating the PPG may include using a speaker-independent automatic speech recognition (SI-ASR) system for equalizing different speakers. The PPG includes a set of values corresponding to a range of times and a range of phonetic classes, the phonetic classes corresponding to senones. A mapping between the PPG and one or more segments of the target speech is generated. A source speech is obtained, and the source speech is converted into a converted speech based on the PPG and the mapping.

Type: Grant

Filed: June 9, 2017

Date of Patent: January 8, 2019

Assignee: The Chinese University of Hong Kong

Inventors: Lifa Sun, Kun Li, Hao Wang, Shiyin Kang, Mei Ling Helen Meng
PHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION

Publication number: 20180012613

Abstract: A method for converting speech using phonetic posteriorgrams (PPGs). A target speech is obtained and a PPG is generated based on acoustic features of the target speech. Generating the PPG may include using a speaker-independent automatic speech recognition (SI-ASR) system for equalizing different speakers. The PPG includes a set of values corresponding to a range of times and a range of phonetic classes, the phonetic classes corresponding to senones. A mapping between the PPG and one or more segments of the target speech is generated. A source speech is obtained, and the source speech are converted into a converted speech based on the PPG and the mapping.

Type: Application

Filed: June 9, 2017

Publication date: January 11, 2018

Inventors: Lifa Sun, Kun Li, Hao Wang, Shiyin Kang, Mei Ling Helen Meng