Patents by Inventor Sercan Arik

Sercan Arik has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Resource-efficient neural architects

Patent number: 11741342

Abstract: Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy but lacked consideration of computational resource use. Presented herein are embodiments of a Resource-Efficient Neural Architect (RENA), an efficient resource-constrained NAS using reinforcement learning with network embedding. RENA embodiments use a policy network to process the network embeddings to generate new configurations. Example demonstrates of RENA embodiments on image recognition and keyword spotting (KWS) problems are also presented herein. RENA embodiments can find novel architectures that achieve high performance even with tight resource constraints. For the CIFAR10 dataset, the tested embodiment achieved 2.95% test error when compute intensity is greater than 100 FLOPs/byte, and 3.87% test error when model size was less than 3M parameters.

Type: Grant

Filed: March 8, 2019

Date of Patent: August 29, 2023

Assignee: Baidu USA LLC

Inventors: Yanqi Zhou, Siavash Ebrahimi, Sercan Arik, Haonan Yu, Hairong Liu, Gregory Diamos
Spectrogram to waveform synthesis using convolutional networks

Patent number: 11462209

Abstract: For the problem of waveform synthesis from spectrograms, presented herein are embodiments of an efficient neural network architecture, based on transposed convolutions to achieve a high compute intensity and fast inference. In one or more embodiments, for training of the convolutional vocoder architecture, losses are used that are related to perceptual audio quality, as well as a GAN framework to guide with a critic that discerns unrealistic waveforms. While yielding a high-quality audio, embodiments of the model can achieve more than 500 times faster than real-time audio synthesis. Multi-head convolutional neural network (MCNN) embodiments for waveform synthesis from spectrograms are also disclosed. MCNN embodiments enable significantly better utilization of modern multi-core processors than commonly-used iterative algorithms like Griffin-Lim and yield very fast (more than 300× real-time) waveform synthesis.

Type: Grant

Filed: March 27, 2019

Date of Patent: October 4, 2022

Assignee: Baidu USA LLC

Inventors: Sercan Arik, Hee Woo Jun, Eric Undersander, Gregory Diamos
Convolutional recurrent neural networks for small-footprint keyword spotting

Patent number: 10540961

Abstract: Described herein are systems and methods for creating and using Convolutional Recurrent Neural Networks (CRNNs) for small-footprint keyword spotting (KWS) systems. Inspired by the large-scale state-of-the-art speech recognition systems, in embodiments, the strengths of convolutional layers to utilize the structure in the data in time and frequency domains are combined with recurrent layers to utilize context for the entire processed frame. The effect of architecture parameters were examined to determine preferred model embodiments given the performance versus model size tradeoff. Various training strategies are provided to improve performance. In embodiments, using only ˜230 k parameters and yielding acceptably low latency, a CRNN model embodiment demonstrated high accuracy and robust performance in a wide range of environments.

Type: Grant

Filed: August 28, 2017

Date of Patent: January 21, 2020

Assignee: Baidu USA LLC

Inventors: Sercan Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Christopher Fougner, Ryan Prenger, Adam Coates
SPECTROGRAM TO WAVEFORM SYNTHESIS USING CONVOLUTIONAL NETWORKS

Publication number: 20190355347

Abstract: For the problem of waveform synthesis from spectrograms, presented herein are embodiments of an efficient neural network architecture, based on transposed convolutions to achieve a high compute intensity and fast inference. In one or more embodiments, for training of the convolutional vocoder architecture, losses are used that are related to perceptual audio quality, as well as a GAN framework to guide with a critic that discerns unrealistic waveforms. While yielding a high-quality audio, embodiments of the model can achieve more than 500 times faster than real-time audio synthesis. Multi-head convolutional neural network (MCNN) embodiments for waveform synthesis from spectrograms are also disclosed. MCNN embodiments enable significantly better utilization of modern multi-core processors than commonly-used iterative algorithms like Griffin-Lim and yield very fast (more than 300× real-time) waveform synthesis.

Type: Application

Filed: March 27, 2019

Publication date: November 21, 2019

Applicant: Baidu USA LLC

Inventors: Sercan ARIK, Hee Woo JUN, Eric UNDERSANDER, Gregory DIAMOS
RESOURCE-EFFICIENT NEURAL ARCHITECTS

Publication number: 20190354837

Abstract: Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy but lacked consideration of computational resource use. Presented herein are embodiments of a Resource-Efficient Neural Architect (RENA), an efficient resource-constrained NAS using reinforcement learning with network embedding. RENA embodiments use a policy network to process the network embeddings to generate new configurations. Example demonstrates of RENA embodiments on image recognition and keyword spotting (KWS) problems are also presented herein. RENA embodiments can find novel architectures that achieve high performance even with tight resource constraints. For the CIFAR10 dataset, the tested embodiment achieved 2.95% test error when compute intensity is greater than 100 FLOPs/byte, and 3.87% test error when model size was less than 3M parameters.

Type: Application

Filed: March 8, 2019

Publication date: November 21, 2019

Applicant: Baidu USA LLC

Inventors: Yanqi ZHOU, Siavash EBRAHIMI, Sercan ARIK, Haonan YU, Hairong LIU, Gregory DIAMOS
CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR SMALL-FOOTPRINT KEYWORD SPOTTING

Publication number: 20180261213

Abstract: Described herein are systems and methods for creating and using Convolutional Recurrent Neural Networks (CRNNs) for small-footprint keyword spotting (KWS) systems. Inspired by the large-scale state-of-the-art speech recognition systems, in embodiments, the strengths of convolutional layers to utilize the structure in the data in time and frequency domains are combined with recurrent layers to utilize context for the entire processed frame. The effect of architecture parameters were examined to determine preferred model embodiments given the performance versus model size tradeoff. Various training strategies are provided to improve performance. In embodiments, using only ˜230 k parameters and yielding acceptably low latency, a CRNN model embodiment demonstrated high accuracy and robust performance in a wide range of environments.

Type: Application

Filed: August 28, 2017

Publication date: September 13, 2018

Applicant: Baidu USA LLC

Inventors: Sercan Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Christopher Fougner, Ryan Prenger, Adam Coates

Resource-efficient neural architects

Spectrogram to waveform synthesis using convolutional networks

Convolutional recurrent neural networks for small-footprint keyword spotting

SPECTROGRAM TO WAVEFORM SYNTHESIS USING CONVOLUTIONAL NETWORKS

RESOURCE-EFFICIENT NEURAL ARCHITECTS

CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR SMALL-FOOTPRINT KEYWORD SPOTTING