Patents by Inventor Kexin ZHAO
Kexin ZHAO has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11556775Abstract: Described herein are systems and methods for compressing and speeding up dense matrix multiplications as found, for examples, in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR). For compression, trace norm regularization technique embodiments were introduced and studied for training low rank factored versions of matrix multiplications. Compared to standard low rank training, the methods more consistently lead to good accuracy versus number of parameter trade-offs and can be used to speed-up training of large models. Faster inference may be further enabled on ARM processors through kernels optimized for small batch sizes, resulting in speed ups over the currently used library. Beyond LVCSR, the techniques are also generally applicable to embedded neural networks with large fully connected or recurrent layers.Type: GrantFiled: October 3, 2018Date of Patent: January 17, 2023Assignee: Baidu USA LLCInventors: Markus Kliegl, Siddharth Goyal, Kexin Zhao, Kavya Srinet, Mohammad Shoeybi
-
Patent number: 11521592Abstract: WaveFlow is a small-footprint generative flow for raw audio, which may be directly trained with maximum likelihood. WaveFlow handles the long-range structure of waveform with a dilated two-dimensional (2D) convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow may provide a unified view of likelihood-based models for raw audio, including WaveNet and WaveGlow, which may be considered special cases. It generates high-fidelity speech, while synthesizing several orders of magnitude faster than existing systems since it uses only a few sequential steps to generate relatively long waveforms. WaveFlow significantly reduces the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Its small footprint with 5.91M parameters makes it 15 times smaller than some existing models. WaveFlow can generate 22.05 kHz high-fidelity audio 42.Type: GrantFiled: August 5, 2020Date of Patent: December 6, 2022Assignee: Baidu USA LLCInventors: Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song
-
Publication number: 20220356643Abstract: The disclosure provides a method for improving the strength and dyeing of wool fibers, and belongs to the technical field of modification of textile materials. By using the feature that protein fiber macromolecules contain a large number of active groups such as hydroxyl groups, amino groups and carboxyl groups, which easily react with polyphenolic pigments formed by a phenolic compound catalyzed by an enzyme to form covalent bonding, the disclosure realizes low temperature dyeing of wool fibers while improving the fiber strength. The disclosure has mild operating conditions easy to control, and in view of increasingly emphasis on environmental protection nowadays, the use of the biological enzyme for dyeing wool fibers is safe, environmentally friendly and efficient, and has a long-term development prospect.Type: ApplicationFiled: July 26, 2022Publication date: November 10, 2022Inventors: Jing SU, Hongbo WANG, Jie LI, Yu LI, Jiangtao Xiong, Kexin Zhao
-
Patent number: 11017761Abstract: Presented herein are embodiments of a non-autoregressive sequence-to-sequence model that converts text to an audio representation. Embodiment are fully convolutional, and a tested embodiment obtained about 46.7 times speed-up over a prior model at synthesis while maintaining comparable speech quality using a WaveNet vocoder. Interestingly, a tested embodiment also has fewer attention errors than the autoregressive model on challenging test sentences. In one or more embodiments, the first fully parallel neural text-to-speech system was built by applying the inverse autoregressive flow (IAF) as the parallel neural vocoder. System embodiments can synthesize speech from text through a single feed-forward pass. Also disclosed herein are embodiments of a novel approach to train the IAF from scratch as a generative model for raw waveform, which avoids the need for distillation from a separately trained WaveNet.Type: GrantFiled: October 16, 2019Date of Patent: May 25, 2021Assignee: Baidu USA LLCInventors: Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao
-
Publication number: 20210090547Abstract: WaveFlow is a small-footprint generative flow for raw audio, which may be directly trained with maximum likelihood. WaveFlow handles the long-range structure of waveform with a dilated two-dimensional (2D) convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow may provide a unified view of likelihood-based models for raw audio, including WaveNet and WaveGlow, which may be considered special cases. It generates high-fidelity speech, while synthesizing several orders of magnitude faster than existing systems since it uses only a few sequential steps to generate relatively long waveforms. WaveFlow significantly reduces the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Its small footprint with 5.91M parameters makes it 15 times smaller than some existing models. WaveFlow can generate 22.05 kHz high-fidelity audio 42.Type: ApplicationFiled: August 5, 2020Publication date: March 25, 2021Applicant: Baidu USA LLCInventors: Wei PING, Kainan PENG, Kexin ZHAO, Zhao SONG
-
Publication number: 20200066253Abstract: Presented herein are embodiments of a non-autoregressive sequence-to-sequence model that converts text to an audio representation. Embodiment are fully convolutional, and a tested embodiment obtained about 46.7 times speed-up over a prior model at synthesis while maintaining comparable speech quality using a WaveNet vocoder. Interestingly, a tested embodiment also has fewer attention errors than the autoregressive model on challenging test sentences. In one or more embodiments, the first fully parallel neural text-to-speech system was built by applying the inverse autoregressive flow (IAF) as the parallel neural vocoder. System embodiments can synthesize speech from text through a single feed-forward pass. Also disclosed herein are embodiments of a novel approach to train the IAF from scratch as a generative model for raw waveform, which avoids the need for distillation from a separately trained WaveNet.Type: ApplicationFiled: October 16, 2019Publication date: February 27, 2020Applicant: Baidu USA LLCInventors: Kainan PENG, Wei PING, Zhao SONG, Kexin ZHAO
-
Publication number: 20190122108Abstract: Described herein are systems and methods for compressing and speeding up dense matrix multiplications as found, for examples, in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR). For compression, trace norm regularization technique embodiments were introduced and studied for training low rank factored versions of matrix multiplications. Compared to standard low rank training, the methods more consistently lead to good accuracy versus number of parameter trade-offs and can be used to speed-up training of large models. Faster inference may be further enabled on ARM processors through kernels optimized for small batch sizes, resulting in speed ups over the currently used library. Beyond LVCSR, the techniques are also generally applicable to embedded neural networks with large fully connected or recurrent layers.Type: ApplicationFiled: October 3, 2018Publication date: April 25, 2019Applicant: Baidu USA LLCInventors: Markus KLIEGL, Siddharth GOYAL, Kexin ZHAO, Kavya SRINET, Mohammad SHOEYBI