Patents by Inventor Kexin ZHAO

Kexin ZHAO has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11556775
    Abstract: Described herein are systems and methods for compressing and speeding up dense matrix multiplications as found, for examples, in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR). For compression, trace norm regularization technique embodiments were introduced and studied for training low rank factored versions of matrix multiplications. Compared to standard low rank training, the methods more consistently lead to good accuracy versus number of parameter trade-offs and can be used to speed-up training of large models. Faster inference may be further enabled on ARM processors through kernels optimized for small batch sizes, resulting in speed ups over the currently used library. Beyond LVCSR, the techniques are also generally applicable to embedded neural networks with large fully connected or recurrent layers.
    Type: Grant
    Filed: October 3, 2018
    Date of Patent: January 17, 2023
    Assignee: Baidu USA LLC
    Inventors: Markus Kliegl, Siddharth Goyal, Kexin Zhao, Kavya Srinet, Mohammad Shoeybi
  • Patent number: 11521592
    Abstract: WaveFlow is a small-footprint generative flow for raw audio, which may be directly trained with maximum likelihood. WaveFlow handles the long-range structure of waveform with a dilated two-dimensional (2D) convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow may provide a unified view of likelihood-based models for raw audio, including WaveNet and WaveGlow, which may be considered special cases. It generates high-fidelity speech, while synthesizing several orders of magnitude faster than existing systems since it uses only a few sequential steps to generate relatively long waveforms. WaveFlow significantly reduces the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Its small footprint with 5.91M parameters makes it 15 times smaller than some existing models. WaveFlow can generate 22.05 kHz high-fidelity audio 42.
    Type: Grant
    Filed: August 5, 2020
    Date of Patent: December 6, 2022
    Assignee: Baidu USA LLC
    Inventors: Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song
  • Publication number: 20220356643
    Abstract: The disclosure provides a method for improving the strength and dyeing of wool fibers, and belongs to the technical field of modification of textile materials. By using the feature that protein fiber macromolecules contain a large number of active groups such as hydroxyl groups, amino groups and carboxyl groups, which easily react with polyphenolic pigments formed by a phenolic compound catalyzed by an enzyme to form covalent bonding, the disclosure realizes low temperature dyeing of wool fibers while improving the fiber strength. The disclosure has mild operating conditions easy to control, and in view of increasingly emphasis on environmental protection nowadays, the use of the biological enzyme for dyeing wool fibers is safe, environmentally friendly and efficient, and has a long-term development prospect.
    Type: Application
    Filed: July 26, 2022
    Publication date: November 10, 2022
    Inventors: Jing SU, Hongbo WANG, Jie LI, Yu LI, Jiangtao Xiong, Kexin Zhao
  • Patent number: 11017761
    Abstract: Presented herein are embodiments of a non-autoregressive sequence-to-sequence model that converts text to an audio representation. Embodiment are fully convolutional, and a tested embodiment obtained about 46.7 times speed-up over a prior model at synthesis while maintaining comparable speech quality using a WaveNet vocoder. Interestingly, a tested embodiment also has fewer attention errors than the autoregressive model on challenging test sentences. In one or more embodiments, the first fully parallel neural text-to-speech system was built by applying the inverse autoregressive flow (IAF) as the parallel neural vocoder. System embodiments can synthesize speech from text through a single feed-forward pass. Also disclosed herein are embodiments of a novel approach to train the IAF from scratch as a generative model for raw waveform, which avoids the need for distillation from a separately trained WaveNet.
    Type: Grant
    Filed: October 16, 2019
    Date of Patent: May 25, 2021
    Assignee: Baidu USA LLC
    Inventors: Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao
  • Publication number: 20210090547
    Abstract: WaveFlow is a small-footprint generative flow for raw audio, which may be directly trained with maximum likelihood. WaveFlow handles the long-range structure of waveform with a dilated two-dimensional (2D) convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow may provide a unified view of likelihood-based models for raw audio, including WaveNet and WaveGlow, which may be considered special cases. It generates high-fidelity speech, while synthesizing several orders of magnitude faster than existing systems since it uses only a few sequential steps to generate relatively long waveforms. WaveFlow significantly reduces the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Its small footprint with 5.91M parameters makes it 15 times smaller than some existing models. WaveFlow can generate 22.05 kHz high-fidelity audio 42.
    Type: Application
    Filed: August 5, 2020
    Publication date: March 25, 2021
    Applicant: Baidu USA LLC
    Inventors: Wei PING, Kainan PENG, Kexin ZHAO, Zhao SONG
  • Publication number: 20200066253
    Abstract: Presented herein are embodiments of a non-autoregressive sequence-to-sequence model that converts text to an audio representation. Embodiment are fully convolutional, and a tested embodiment obtained about 46.7 times speed-up over a prior model at synthesis while maintaining comparable speech quality using a WaveNet vocoder. Interestingly, a tested embodiment also has fewer attention errors than the autoregressive model on challenging test sentences. In one or more embodiments, the first fully parallel neural text-to-speech system was built by applying the inverse autoregressive flow (IAF) as the parallel neural vocoder. System embodiments can synthesize speech from text through a single feed-forward pass. Also disclosed herein are embodiments of a novel approach to train the IAF from scratch as a generative model for raw waveform, which avoids the need for distillation from a separately trained WaveNet.
    Type: Application
    Filed: October 16, 2019
    Publication date: February 27, 2020
    Applicant: Baidu USA LLC
    Inventors: Kainan PENG, Wei PING, Zhao SONG, Kexin ZHAO
  • Publication number: 20190122108
    Abstract: Described herein are systems and methods for compressing and speeding up dense matrix multiplications as found, for examples, in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR). For compression, trace norm regularization technique embodiments were introduced and studied for training low rank factored versions of matrix multiplications. Compared to standard low rank training, the methods more consistently lead to good accuracy versus number of parameter trade-offs and can be used to speed-up training of large models. Faster inference may be further enabled on ARM processors through kernels optimized for small batch sizes, resulting in speed ups over the currently used library. Beyond LVCSR, the techniques are also generally applicable to embedded neural networks with large fully connected or recurrent layers.
    Type: Application
    Filed: October 3, 2018
    Publication date: April 25, 2019
    Applicant: Baidu USA LLC
    Inventors: Markus KLIEGL, Siddharth GOYAL, Kexin ZHAO, Kavya SRINET, Mohammad SHOEYBI