Patents by Inventor Sercan Arik

Sercan Arik has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11741342
    Abstract: Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy but lacked consideration of computational resource use. Presented herein are embodiments of a Resource-Efficient Neural Architect (RENA), an efficient resource-constrained NAS using reinforcement learning with network embedding. RENA embodiments use a policy network to process the network embeddings to generate new configurations. Example demonstrates of RENA embodiments on image recognition and keyword spotting (KWS) problems are also presented herein. RENA embodiments can find novel architectures that achieve high performance even with tight resource constraints. For the CIFAR10 dataset, the tested embodiment achieved 2.95% test error when compute intensity is greater than 100 FLOPs/byte, and 3.87% test error when model size was less than 3M parameters.
    Type: Grant
    Filed: March 8, 2019
    Date of Patent: August 29, 2023
    Assignee: Baidu USA LLC
    Inventors: Yanqi Zhou, Siavash Ebrahimi, Sercan Arik, Haonan Yu, Hairong Liu, Gregory Diamos
  • Patent number: 11462209
    Abstract: For the problem of waveform synthesis from spectrograms, presented herein are embodiments of an efficient neural network architecture, based on transposed convolutions to achieve a high compute intensity and fast inference. In one or more embodiments, for training of the convolutional vocoder architecture, losses are used that are related to perceptual audio quality, as well as a GAN framework to guide with a critic that discerns unrealistic waveforms. While yielding a high-quality audio, embodiments of the model can achieve more than 500 times faster than real-time audio synthesis. Multi-head convolutional neural network (MCNN) embodiments for waveform synthesis from spectrograms are also disclosed. MCNN embodiments enable significantly better utilization of modern multi-core processors than commonly-used iterative algorithms like Griffin-Lim and yield very fast (more than 300× real-time) waveform synthesis.
    Type: Grant
    Filed: March 27, 2019
    Date of Patent: October 4, 2022
    Assignee: Baidu USA LLC
    Inventors: Sercan Arik, Hee Woo Jun, Eric Undersander, Gregory Diamos
  • Patent number: 10540961
    Abstract: Described herein are systems and methods for creating and using Convolutional Recurrent Neural Networks (CRNNs) for small-footprint keyword spotting (KWS) systems. Inspired by the large-scale state-of-the-art speech recognition systems, in embodiments, the strengths of convolutional layers to utilize the structure in the data in time and frequency domains are combined with recurrent layers to utilize context for the entire processed frame. The effect of architecture parameters were examined to determine preferred model embodiments given the performance versus model size tradeoff. Various training strategies are provided to improve performance. In embodiments, using only ˜230 k parameters and yielding acceptably low latency, a CRNN model embodiment demonstrated high accuracy and robust performance in a wide range of environments.
    Type: Grant
    Filed: August 28, 2017
    Date of Patent: January 21, 2020
    Assignee: Baidu USA LLC
    Inventors: Sercan Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Christopher Fougner, Ryan Prenger, Adam Coates
  • Publication number: 20190355347
    Abstract: For the problem of waveform synthesis from spectrograms, presented herein are embodiments of an efficient neural network architecture, based on transposed convolutions to achieve a high compute intensity and fast inference. In one or more embodiments, for training of the convolutional vocoder architecture, losses are used that are related to perceptual audio quality, as well as a GAN framework to guide with a critic that discerns unrealistic waveforms. While yielding a high-quality audio, embodiments of the model can achieve more than 500 times faster than real-time audio synthesis. Multi-head convolutional neural network (MCNN) embodiments for waveform synthesis from spectrograms are also disclosed. MCNN embodiments enable significantly better utilization of modern multi-core processors than commonly-used iterative algorithms like Griffin-Lim and yield very fast (more than 300× real-time) waveform synthesis.
    Type: Application
    Filed: March 27, 2019
    Publication date: November 21, 2019
    Applicant: Baidu USA LLC
    Inventors: Sercan ARIK, Hee Woo JUN, Eric UNDERSANDER, Gregory DIAMOS
  • Publication number: 20190354837
    Abstract: Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy but lacked consideration of computational resource use. Presented herein are embodiments of a Resource-Efficient Neural Architect (RENA), an efficient resource-constrained NAS using reinforcement learning with network embedding. RENA embodiments use a policy network to process the network embeddings to generate new configurations. Example demonstrates of RENA embodiments on image recognition and keyword spotting (KWS) problems are also presented herein. RENA embodiments can find novel architectures that achieve high performance even with tight resource constraints. For the CIFAR10 dataset, the tested embodiment achieved 2.95% test error when compute intensity is greater than 100 FLOPs/byte, and 3.87% test error when model size was less than 3M parameters.
    Type: Application
    Filed: March 8, 2019
    Publication date: November 21, 2019
    Applicant: Baidu USA LLC
    Inventors: Yanqi ZHOU, Siavash EBRAHIMI, Sercan ARIK, Haonan YU, Hairong LIU, Gregory DIAMOS
  • Publication number: 20180261213
    Abstract: Described herein are systems and methods for creating and using Convolutional Recurrent Neural Networks (CRNNs) for small-footprint keyword spotting (KWS) systems. Inspired by the large-scale state-of-the-art speech recognition systems, in embodiments, the strengths of convolutional layers to utilize the structure in the data in time and frequency domains are combined with recurrent layers to utilize context for the entire processed frame. The effect of architecture parameters were examined to determine preferred model embodiments given the performance versus model size tradeoff. Various training strategies are provided to improve performance. In embodiments, using only ˜230 k parameters and yielding acceptably low latency, a CRNN model embodiment demonstrated high accuracy and robust performance in a wide range of environments.
    Type: Application
    Filed: August 28, 2017
    Publication date: September 13, 2018
    Applicant: Baidu USA LLC
    Inventors: Sercan Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Christopher Fougner, Ryan Prenger, Adam Coates