Patents by Inventor Shiyin Kang

Shiyin Kang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20200380949
    Abstract: This application relates to a speech synthesis method and apparatus, a model training method and apparatus, and a computer device. The method includes: obtaining to-be-processed linguistic data; encoding the linguistic data, to obtain encoded linguistic data; obtaining an embedded vector for speech feature conversion, the embedded vector being generated according to a residual between synthesized reference speech data and reference speech data that correspond to the same reference linguistic data; and decoding the encoded linguistic data according to the embedded vector, to obtain target synthesized speech data on which the speech feature conversion is performed. The solution provided in this application can prevent quality of a synthesized speech from being affected by a semantic feature in a mel-frequency cepstrum.
    Type: Application
    Filed: August 21, 2020
    Publication date: December 3, 2020
    Inventors: Xixin WU, Mu WANG, Shiyin KANG, Dan SU, Dong YU
  • Publication number: 20200258496
    Abstract: A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.
    Type: Application
    Filed: February 8, 2019
    Publication date: August 13, 2020
    Applicant: TENCENT AMERICA LLC
    Inventors: Shan YANG, Heng LU, Shiyin KANG, Dong YU
  • Publication number: 20200051536
    Abstract: A terminal for generating music may identify, based on execution of scenario recognition, scenarios for images previously received by the terminal. The terminal may generate respective description texts for the scenarios. The terminal may execute keyword-based rhyme matching based on the respective description texts. The terminal may generate respective rhyming lyrics corresponding to the images. The terminal may convert the respective rhyming lyrics corresponding to the images into a speech. The terminal may synthesize the speech with preset background music to obtain image music.
    Type: Application
    Filed: October 22, 2019
    Publication date: February 13, 2020
    Applicant: Tencent Technology (Shenzhen) Company Limited
    Inventors: Nan WANG, Wei LIU, Lin MA, Wenhao JIANG, Guangzhi LI, Shiyin KANG, Deyi TUO, Xiaolong ZHU, Youyi ZHANG, Shaobin LIN, Yongsen ZHENG, Zixin ZOU, Jing HE, Zaizhen CHEN, Pinyi LI
  • Patent number: 10176819
    Abstract: A method for converting speech using phonetic posteriorgrams (PPGs). A target speech is obtained and a PPG is generated based on acoustic features of the target speech. Generating the PPG may include using a speaker-independent automatic speech recognition (SI-ASR) system for equalizing different speakers. The PPG includes a set of values corresponding to a range of times and a range of phonetic classes, the phonetic classes corresponding to senones. A mapping between the PPG and one or more segments of the target speech is generated. A source speech is obtained, and the source speech is converted into a converted speech based on the PPG and the mapping.
    Type: Grant
    Filed: June 9, 2017
    Date of Patent: January 8, 2019
    Assignee: The Chinese University of Hong Kong
    Inventors: Lifa Sun, Kun Li, Hao Wang, Shiyin Kang, Mei Ling Helen Meng
  • Publication number: 20180012613
    Abstract: A method for converting speech using phonetic posteriorgrams (PPGs). A target speech is obtained and a PPG is generated based on acoustic features of the target speech. Generating the PPG may include using a speaker-independent automatic speech recognition (SI-ASR) system for equalizing different speakers. The PPG includes a set of values corresponding to a range of times and a range of phonetic classes, the phonetic classes corresponding to senones. A mapping between the PPG and one or more segments of the target speech is generated. A source speech is obtained, and the source speech are converted into a converted speech based on the PPG and the mapping.
    Type: Application
    Filed: June 9, 2017
    Publication date: January 11, 2018
    Inventors: Lifa Sun, Kun Li, Hao Wang, Shiyin Kang, Mei Ling Helen Meng