Patents by Inventor Shiyin Kang
Shiyin Kang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230123433Abstract: This application discloses an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.Type: ApplicationFiled: December 13, 2022Publication date: April 20, 2023Inventors: Linchao BAO, Shiyin KANG, Sheng WANG, Xiangkai LIN, Xing JI, Zhantu ZHU, Kuongchi LEI, Deyi TUO, Peng LIU
-
Patent number: 11605193Abstract: This application disclose an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.Type: GrantFiled: August 18, 2021Date of Patent: March 14, 2023Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventors: Linchao Bao, Shiyin Kang, Sheng Wang, Xiangkai Lin, Xing Ji, Zhantu Zhu, Kuongchi Lei, Deyi Tuo, Peng Liu
-
Patent number: 11301641Abstract: A terminal for generating music may identify, based on execution of scenario recognition, scenarios for images previously received by the terminal. The terminal may generate respective description texts for the scenarios. The terminal may execute keyword-based rhyme matching based on the respective description texts. The terminal may generate respective rhyming lyrics corresponding to the images. The terminal may convert the respective rhyming lyrics corresponding to the images into a speech. The terminal may synthesize the speech with preset background music to obtain image music.Type: GrantFiled: October 22, 2019Date of Patent: April 12, 2022Assignee: Tencent Technology (Shenzhen) Company LimitedInventors: Nan Wang, Wei Liu, Lin Ma, Wenhao Jiang, Guangzhi Li, Shiyin Kang, Deyi Tuo, Xiaolong Zhu, Youyi Zhang, Shaobin Lin, Yongsen Zheng, Zixin Zou, Jing He, Zaizhen Chen, Pinyi Li
-
Publication number: 20220044463Abstract: Embodiments of this application disclose a speech-driven animation method and apparatus based on artificial intelligence (AI). The method includes obtaining a first speech, the first speech comprising a plurality of speech frames; determining linguistics information corresponding to a speech frame in the first speech, the linguistics information being used for identifying a distribution possibility that the speech frame in the first speech pertains to phonemes; determining an expression parameter corresponding to the speech frame in the first speech according to the linguistics information; and enabling, according to the expression parameter, an animation character to make an expression corresponding to the first speech.Type: ApplicationFiled: October 8, 2021Publication date: February 10, 2022Inventors: Shiyin Kang, Deyi Tuo, Kuongchi Lei, Tianxiao Fu, Huirong Huang, Dan Su
-
Publication number: 20210383586Abstract: This application disclose an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.Type: ApplicationFiled: August 18, 2021Publication date: December 9, 2021Inventors: Linchao BAO, Shiyin Kang, Sheng Wang, Xiangkai Lin, Xing Ji, Zhantu Zhu, Kuongchi Lei, Deyi Tuo, Peng Liu
-
Patent number: 11011154Abstract: A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.Type: GrantFiled: February 8, 2019Date of Patent: May 18, 2021Assignee: TENCENT AMERICA LLCInventors: Shan Yang, Heng Lu, Shiyin Kang, Dong Yu
-
Publication number: 20200380949Abstract: This application relates to a speech synthesis method and apparatus, a model training method and apparatus, and a computer device. The method includes: obtaining to-be-processed linguistic data; encoding the linguistic data, to obtain encoded linguistic data; obtaining an embedded vector for speech feature conversion, the embedded vector being generated according to a residual between synthesized reference speech data and reference speech data that correspond to the same reference linguistic data; and decoding the encoded linguistic data according to the embedded vector, to obtain target synthesized speech data on which the speech feature conversion is performed. The solution provided in this application can prevent quality of a synthesized speech from being affected by a semantic feature in a mel-frequency cepstrum.Type: ApplicationFiled: August 21, 2020Publication date: December 3, 2020Inventors: Xixin WU, Mu WANG, Shiyin KANG, Dan SU, Dong YU
-
Publication number: 20200258496Abstract: A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.Type: ApplicationFiled: February 8, 2019Publication date: August 13, 2020Applicant: TENCENT AMERICA LLCInventors: Shan YANG, Heng LU, Shiyin KANG, Dong YU
-
Publication number: 20200051536Abstract: A terminal for generating music may identify, based on execution of scenario recognition, scenarios for images previously received by the terminal. The terminal may generate respective description texts for the scenarios. The terminal may execute keyword-based rhyme matching based on the respective description texts. The terminal may generate respective rhyming lyrics corresponding to the images. The terminal may convert the respective rhyming lyrics corresponding to the images into a speech. The terminal may synthesize the speech with preset background music to obtain image music.Type: ApplicationFiled: October 22, 2019Publication date: February 13, 2020Applicant: Tencent Technology (Shenzhen) Company LimitedInventors: Nan WANG, Wei LIU, Lin MA, Wenhao JIANG, Guangzhi LI, Shiyin KANG, Deyi TUO, Xiaolong ZHU, Youyi ZHANG, Shaobin LIN, Yongsen ZHENG, Zixin ZOU, Jing HE, Zaizhen CHEN, Pinyi LI
-
Patent number: 10176819Abstract: A method for converting speech using phonetic posteriorgrams (PPGs). A target speech is obtained and a PPG is generated based on acoustic features of the target speech. Generating the PPG may include using a speaker-independent automatic speech recognition (SI-ASR) system for equalizing different speakers. The PPG includes a set of values corresponding to a range of times and a range of phonetic classes, the phonetic classes corresponding to senones. A mapping between the PPG and one or more segments of the target speech is generated. A source speech is obtained, and the source speech is converted into a converted speech based on the PPG and the mapping.Type: GrantFiled: June 9, 2017Date of Patent: January 8, 2019Assignee: The Chinese University of Hong KongInventors: Lifa Sun, Kun Li, Hao Wang, Shiyin Kang, Mei Ling Helen Meng
-
Publication number: 20180012613Abstract: A method for converting speech using phonetic posteriorgrams (PPGs). A target speech is obtained and a PPG is generated based on acoustic features of the target speech. Generating the PPG may include using a speaker-independent automatic speech recognition (SI-ASR) system for equalizing different speakers. The PPG includes a set of values corresponding to a range of times and a range of phonetic classes, the phonetic classes corresponding to senones. A mapping between the PPG and one or more segments of the target speech is generated. A source speech is obtained, and the source speech are converted into a converted speech based on the PPG and the mapping.Type: ApplicationFiled: June 9, 2017Publication date: January 11, 2018Inventors: Lifa Sun, Kun Li, Hao Wang, Shiyin Kang, Mei Ling Helen Meng