Patents by Inventor Kexin ZHAO

Kexin ZHAO has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Systems and methods for trace norm regularization and faster inference for embedded models

Patent number: 11556775

Abstract: Described herein are systems and methods for compressing and speeding up dense matrix multiplications as found, for examples, in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR). For compression, trace norm regularization technique embodiments were introduced and studied for training low rank factored versions of matrix multiplications. Compared to standard low rank training, the methods more consistently lead to good accuracy versus number of parameter trade-offs and can be used to speed-up training of large models. Faster inference may be further enabled on ARM processors through kernels optimized for small batch sizes, resulting in speed ups over the currently used library. Beyond LVCSR, the techniques are also generally applicable to embedded neural networks with large fully connected or recurrent layers.

Type: Grant

Filed: October 3, 2018

Date of Patent: January 17, 2023

Assignee: Baidu USA LLC

Inventors: Markus Kliegl, Siddharth Goyal, Kexin Zhao, Kavya Srinet, Mohammad Shoeybi
Small-footprint flow-based models for raw audio

Patent number: 11521592

Abstract: WaveFlow is a small-footprint generative flow for raw audio, which may be directly trained with maximum likelihood. WaveFlow handles the long-range structure of waveform with a dilated two-dimensional (2D) convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow may provide a unified view of likelihood-based models for raw audio, including WaveNet and WaveGlow, which may be considered special cases. It generates high-fidelity speech, while synthesizing several orders of magnitude faster than existing systems since it uses only a few sequential steps to generate relatively long waveforms. WaveFlow significantly reduces the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Its small footprint with 5.91M parameters makes it 15 times smaller than some existing models. WaveFlow can generate 22.05 kHz high-fidelity audio 42.

Type: Grant

Filed: August 5, 2020

Date of Patent: December 6, 2022

Assignee: Baidu USA LLC

Inventors: Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song
Method for Improving Strength and Dyeing of Wool Fibers

Publication number: 20220356643

Abstract: The disclosure provides a method for improving the strength and dyeing of wool fibers, and belongs to the technical field of modification of textile materials. By using the feature that protein fiber macromolecules contain a large number of active groups such as hydroxyl groups, amino groups and carboxyl groups, which easily react with polyphenolic pigments formed by a phenolic compound catalyzed by an enzyme to form covalent bonding, the disclosure realizes low temperature dyeing of wool fibers while improving the fiber strength. The disclosure has mild operating conditions easy to control, and in view of increasingly emphasis on environmental protection nowadays, the use of the biological enzyme for dyeing wool fibers is safe, environmentally friendly and efficient, and has a long-term development prospect.

Type: Application

Filed: July 26, 2022

Publication date: November 10, 2022

Inventors: Jing SU, Hongbo WANG, Jie LI, Yu LI, Jiangtao Xiong, Kexin Zhao
Parallel neural text-to-speech

Patent number: 11017761

Abstract: Presented herein are embodiments of a non-autoregressive sequence-to-sequence model that converts text to an audio representation. Embodiment are fully convolutional, and a tested embodiment obtained about 46.7 times speed-up over a prior model at synthesis while maintaining comparable speech quality using a WaveNet vocoder. Interestingly, a tested embodiment also has fewer attention errors than the autoregressive model on challenging test sentences. In one or more embodiments, the first fully parallel neural text-to-speech system was built by applying the inverse autoregressive flow (IAF) as the parallel neural vocoder. System embodiments can synthesize speech from text through a single feed-forward pass. Also disclosed herein are embodiments of a novel approach to train the IAF from scratch as a generative model for raw waveform, which avoids the need for distillation from a separately trained WaveNet.

Type: Grant

Filed: October 16, 2019

Date of Patent: May 25, 2021

Assignee: Baidu USA LLC

Inventors: Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao
SMALL-FOOTPRINT FLOW-BASED MODELS FOR RAW AUDIO

Publication number: 20210090547

Abstract: WaveFlow is a small-footprint generative flow for raw audio, which may be directly trained with maximum likelihood. WaveFlow handles the long-range structure of waveform with a dilated two-dimensional (2D) convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow may provide a unified view of likelihood-based models for raw audio, including WaveNet and WaveGlow, which may be considered special cases. It generates high-fidelity speech, while synthesizing several orders of magnitude faster than existing systems since it uses only a few sequential steps to generate relatively long waveforms. WaveFlow significantly reduces the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Its small footprint with 5.91M parameters makes it 15 times smaller than some existing models. WaveFlow can generate 22.05 kHz high-fidelity audio 42.

Type: Application

Filed: August 5, 2020

Publication date: March 25, 2021

Applicant: Baidu USA LLC

Inventors: Wei PING, Kainan PENG, Kexin ZHAO, Zhao SONG
PARALLEL NEURAL TEXT-TO-SPEECH

Publication number: 20200066253

Abstract: Presented herein are embodiments of a non-autoregressive sequence-to-sequence model that converts text to an audio representation. Embodiment are fully convolutional, and a tested embodiment obtained about 46.7 times speed-up over a prior model at synthesis while maintaining comparable speech quality using a WaveNet vocoder. Interestingly, a tested embodiment also has fewer attention errors than the autoregressive model on challenging test sentences. In one or more embodiments, the first fully parallel neural text-to-speech system was built by applying the inverse autoregressive flow (IAF) as the parallel neural vocoder. System embodiments can synthesize speech from text through a single feed-forward pass. Also disclosed herein are embodiments of a novel approach to train the IAF from scratch as a generative model for raw waveform, which avoids the need for distillation from a separately trained WaveNet.

Type: Application

Filed: October 16, 2019

Publication date: February 27, 2020

Applicant: Baidu USA LLC

Inventors: Kainan PENG, Wei PING, Zhao SONG, Kexin ZHAO
SYSTEMS AND METHODS FOR TRACE NORM REGULARIZATION AND FASTER INFERENCE FOR EMBEDDED MODELS

Publication number: 20190122108

Abstract: Described herein are systems and methods for compressing and speeding up dense matrix multiplications as found, for examples, in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR). For compression, trace norm regularization technique embodiments were introduced and studied for training low rank factored versions of matrix multiplications. Compared to standard low rank training, the methods more consistently lead to good accuracy versus number of parameter trade-offs and can be used to speed-up training of large models. Faster inference may be further enabled on ARM processors through kernels optimized for small batch sizes, resulting in speed ups over the currently used library. Beyond LVCSR, the techniques are also generally applicable to embedded neural networks with large fully connected or recurrent layers.

Type: Application

Filed: October 3, 2018

Publication date: April 25, 2019

Applicant: Baidu USA LLC

Inventors: Markus KLIEGL, Siddharth GOYAL, Kexin ZHAO, Kavya SRINET, Mohammad SHOEYBI

Systems and methods for trace norm regularization and faster inference for embedded models

Small-footprint flow-based models for raw audio

Method for Improving Strength and Dyeing of Wool Fibers

Parallel neural text-to-speech

SMALL-FOOTPRINT FLOW-BASED MODELS FOR RAW AUDIO

PARALLEL NEURAL TEXT-TO-SPEECH

SYSTEMS AND METHODS FOR TRACE NORM REGULARIZATION AND FASTER INFERENCE FOR EMBEDDED MODELS