Patents by Inventor Shiyin Kang

Shiyin Kang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230123433
    Abstract: This application discloses an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.
    Type: Application
    Filed: December 13, 2022
    Publication date: April 20, 2023
    Inventors: Linchao BAO, Shiyin KANG, Sheng WANG, Xiangkai LIN, Xing JI, Zhantu ZHU, Kuongchi LEI, Deyi TUO, Peng LIU
  • Patent number: 11605193
    Abstract: This application disclose an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.
    Type: Grant
    Filed: August 18, 2021
    Date of Patent: March 14, 2023
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Linchao Bao, Shiyin Kang, Sheng Wang, Xiangkai Lin, Xing Ji, Zhantu Zhu, Kuongchi Lei, Deyi Tuo, Peng Liu
  • Patent number: 11301641
    Abstract: A terminal for generating music may identify, based on execution of scenario recognition, scenarios for images previously received by the terminal. The terminal may generate respective description texts for the scenarios. The terminal may execute keyword-based rhyme matching based on the respective description texts. The terminal may generate respective rhyming lyrics corresponding to the images. The terminal may convert the respective rhyming lyrics corresponding to the images into a speech. The terminal may synthesize the speech with preset background music to obtain image music.
    Type: Grant
    Filed: October 22, 2019
    Date of Patent: April 12, 2022
    Assignee: Tencent Technology (Shenzhen) Company Limited
    Inventors: Nan Wang, Wei Liu, Lin Ma, Wenhao Jiang, Guangzhi Li, Shiyin Kang, Deyi Tuo, Xiaolong Zhu, Youyi Zhang, Shaobin Lin, Yongsen Zheng, Zixin Zou, Jing He, Zaizhen Chen, Pinyi Li
  • Publication number: 20220044463
    Abstract: Embodiments of this application disclose a speech-driven animation method and apparatus based on artificial intelligence (AI). The method includes obtaining a first speech, the first speech comprising a plurality of speech frames; determining linguistics information corresponding to a speech frame in the first speech, the linguistics information being used for identifying a distribution possibility that the speech frame in the first speech pertains to phonemes; determining an expression parameter corresponding to the speech frame in the first speech according to the linguistics information; and enabling, according to the expression parameter, an animation character to make an expression corresponding to the first speech.
    Type: Application
    Filed: October 8, 2021
    Publication date: February 10, 2022
    Inventors: Shiyin Kang, Deyi Tuo, Kuongchi Lei, Tianxiao Fu, Huirong Huang, Dan Su
  • Publication number: 20210383586
    Abstract: This application disclose an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.
    Type: Application
    Filed: August 18, 2021
    Publication date: December 9, 2021
    Inventors: Linchao BAO, Shiyin Kang, Sheng Wang, Xiangkai Lin, Xing Ji, Zhantu Zhu, Kuongchi Lei, Deyi Tuo, Peng Liu
  • Patent number: 11011154
    Abstract: A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.
    Type: Grant
    Filed: February 8, 2019
    Date of Patent: May 18, 2021
    Assignee: TENCENT AMERICA LLC
    Inventors: Shan Yang, Heng Lu, Shiyin Kang, Dong Yu
  • Publication number: 20200380949
    Abstract: This application relates to a speech synthesis method and apparatus, a model training method and apparatus, and a computer device. The method includes: obtaining to-be-processed linguistic data; encoding the linguistic data, to obtain encoded linguistic data; obtaining an embedded vector for speech feature conversion, the embedded vector being generated according to a residual between synthesized reference speech data and reference speech data that correspond to the same reference linguistic data; and decoding the encoded linguistic data according to the embedded vector, to obtain target synthesized speech data on which the speech feature conversion is performed. The solution provided in this application can prevent quality of a synthesized speech from being affected by a semantic feature in a mel-frequency cepstrum.
    Type: Application
    Filed: August 21, 2020
    Publication date: December 3, 2020
    Inventors: Xixin WU, Mu WANG, Shiyin KANG, Dan SU, Dong YU
  • Publication number: 20200258496
    Abstract: A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.
    Type: Application
    Filed: February 8, 2019
    Publication date: August 13, 2020
    Applicant: TENCENT AMERICA LLC
    Inventors: Shan YANG, Heng LU, Shiyin KANG, Dong YU
  • Publication number: 20200051536
    Abstract: A terminal for generating music may identify, based on execution of scenario recognition, scenarios for images previously received by the terminal. The terminal may generate respective description texts for the scenarios. The terminal may execute keyword-based rhyme matching based on the respective description texts. The terminal may generate respective rhyming lyrics corresponding to the images. The terminal may convert the respective rhyming lyrics corresponding to the images into a speech. The terminal may synthesize the speech with preset background music to obtain image music.
    Type: Application
    Filed: October 22, 2019
    Publication date: February 13, 2020
    Applicant: Tencent Technology (Shenzhen) Company Limited
    Inventors: Nan WANG, Wei LIU, Lin MA, Wenhao JIANG, Guangzhi LI, Shiyin KANG, Deyi TUO, Xiaolong ZHU, Youyi ZHANG, Shaobin LIN, Yongsen ZHENG, Zixin ZOU, Jing HE, Zaizhen CHEN, Pinyi LI
  • Patent number: 10176819
    Abstract: A method for converting speech using phonetic posteriorgrams (PPGs). A target speech is obtained and a PPG is generated based on acoustic features of the target speech. Generating the PPG may include using a speaker-independent automatic speech recognition (SI-ASR) system for equalizing different speakers. The PPG includes a set of values corresponding to a range of times and a range of phonetic classes, the phonetic classes corresponding to senones. A mapping between the PPG and one or more segments of the target speech is generated. A source speech is obtained, and the source speech is converted into a converted speech based on the PPG and the mapping.
    Type: Grant
    Filed: June 9, 2017
    Date of Patent: January 8, 2019
    Assignee: The Chinese University of Hong Kong
    Inventors: Lifa Sun, Kun Li, Hao Wang, Shiyin Kang, Mei Ling Helen Meng
  • Publication number: 20180012613
    Abstract: A method for converting speech using phonetic posteriorgrams (PPGs). A target speech is obtained and a PPG is generated based on acoustic features of the target speech. Generating the PPG may include using a speaker-independent automatic speech recognition (SI-ASR) system for equalizing different speakers. The PPG includes a set of values corresponding to a range of times and a range of phonetic classes, the phonetic classes corresponding to senones. A mapping between the PPG and one or more segments of the target speech is generated. A source speech is obtained, and the source speech are converted into a converted speech based on the PPG and the mapping.
    Type: Application
    Filed: June 9, 2017
    Publication date: January 11, 2018
    Inventors: Lifa Sun, Kun Li, Hao Wang, Shiyin Kang, Mei Ling Helen Meng