Patents by Inventor Yanqi ZHOU

Yanqi ZHOU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240112027
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing neural architecture search for machine learning models. In one aspect, a method comprises receiving training data for a machine learning, generating a plurality of candidate neural networks for performing the machine learning task, wherein each candidate neural network comprises a plurality of instances of a layer block composed of a plurality of layers, for each candidate neural network, selecting a respective type for each of the plurality of layers from a set of layer types that comprises, training the candidate neural network and evaluating performance scores for the trained candidate neural networks as applied to the machine learning task, and determining a final neural network for performing the machine learning task based at least on the performance scores for the candidate neural networks.
    Type: Application
    Filed: September 28, 2023
    Publication date: April 4, 2024
    Inventors: Yanqi Zhou, Yanping Huang, Yifeng Lu, Andrew M. Dai, Siamak Shakeri, Zhifeng Chen, James Laudon, Quoc V. Le, Da Huang, Nan Du, David Richard So, Daiyi Peng, Yingwei Cui, Jeffrey Adgate Dean, Chang Lan
  • Publication number: 20240005129
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for jointly determining neural network architectures and hardware accelerator architectures.
    Type: Application
    Filed: October 1, 2021
    Publication date: January 4, 2024
    Inventors: Yanqi Zhou, Amir Yazdanbakhsh, Berkin Akin, Daiyi Peng, Yuxiong Zhu, Mingxing Tan, Xuanyi Dong
  • Publication number: 20230409867
    Abstract: Implementations are described herein for performing joint optimization of multi-task learning of dense predictions (MT-DP) and hardware-aware neural architecture search (NAS). In various implementations, a set of tasks to be performed using a resource-constrained edge computing system may be determined. Based on a base multi-task dense-prediction (MT-DP) architecture template, the set of tasks, and a plurality of hardware-based constraints of a target edge computing system, a network architecture search (NAS) may be used to sample candidate MT-DP architecture(s) from a search space of neural network architecture components. Each sampled candidate MT-DP architecture may include a distinct assembly of sampled neural network architecture components applied to the base MT-DP architecture template. Image data may be processed using the candidate MT-DP architecture(s) to determine performance metrics. These performance metrics may be used to jointly train the MT-DP architecture(s) and/or the NAS.
    Type: Application
    Filed: June 15, 2022
    Publication date: December 21, 2023
    Inventors: Chunfeng Wen, Yueqi Li, Zhiqiang Yuan, Minh Thanh Vu, Yanqi Zhou
  • Publication number: 20230376664
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining architectures of hardware accelerators.
    Type: Application
    Filed: October 11, 2021
    Publication date: November 23, 2023
    Inventors: Amir YAZDANBAKHSH, Christof ANGERMUELLER, Berkin AKIN, Yanqi ZHOU, James LAUDON, Ravi NARAYANASWAMI
  • Publication number: 20230306266
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for optimizing the execution of the operations of a neural network. One of the methods includes obtaining data representing a graph characterizing a plurality of operations of a neural network, wherein each node of the graph characterizes an operation of the neural network and each edge of the graph characterizes data dependency between the operations; processing the data representing the graph using a graph embedding neural network to generate an embedding of the graph; and processing the embedding of the graph using a policy neural network to generate a task output, wherein the task output comprises, for each of the plurality of operations of the neural network, a respective decision for a particular optimization task.
    Type: Application
    Filed: May 22, 2023
    Publication date: September 28, 2023
    Inventors: Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Lin-Kit Wong, Chao Ma, Qiumin Xu, Azalia Mirhoseini
  • Patent number: 11741342
    Abstract: Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy but lacked consideration of computational resource use. Presented herein are embodiments of a Resource-Efficient Neural Architect (RENA), an efficient resource-constrained NAS using reinforcement learning with network embedding. RENA embodiments use a policy network to process the network embeddings to generate new configurations. Example demonstrates of RENA embodiments on image recognition and keyword spotting (KWS) problems are also presented herein. RENA embodiments can find novel architectures that achieve high performance even with tight resource constraints. For the CIFAR10 dataset, the tested embodiment achieved 2.95% test error when compute intensity is greater than 100 FLOPs/byte, and 3.87% test error when model size was less than 3M parameters.
    Type: Grant
    Filed: March 8, 2019
    Date of Patent: August 29, 2023
    Assignee: Baidu USA LLC
    Inventors: Yanqi Zhou, Siavash Ebrahimi, Sercan Arik, Haonan Yu, Hairong Liu, Gregory Diamos
  • Publication number: 20230176840
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for compiler optimizations using a compiler optimization network. One of the methods includes receiving an input program, wherein the input program defines a graph of operation modules, wherein each node in the graph is a respective operation module, and each edge between nodes in the graph represents one operation module receiving the output generated by another operation module. The input program is processed by a compiler optimization network comprising a graph-embedding network that is configured to encode operation features and operation dependencies of the operation modules of the input program into a graph embedding representation and a policy network that is configured to generate an optimization action for each of one or more nodes encoded in the graph embedding representation.
    Type: Application
    Filed: June 7, 2021
    Publication date: June 8, 2023
    Inventors: Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Lin-Kit Wong, Chao Ma, Qiumin Xu, Hanxiao Liu, Phitchaya Mangpo Phothilimthana, Shen Wang, Anna Darling Goldie, Azalia Mirhoseini, James Laudon
  • Patent number: 11657289
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for optimizing the execution of the operations of a neural network. One of the methods includes obtaining data representing a graph characterizing a plurality of operations of a neural network, wherein each node of the graph characterizes an operation of the neural network and each edge of the graph characterizes data dependency between the operations; processing the data representing the graph using a graph embedding neural network to generate an embedding of the graph; and processing the embedding of the graph using a policy neural network to generate a task output, wherein the task output comprises, for each of the plurality of operations of the neural network, a respective decision for a particular optimization task.
    Type: Grant
    Filed: April 3, 2020
    Date of Patent: May 23, 2023
    Assignee: Google LLC
    Inventors: Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Lin-Kit Wong, Chao Ma, Qiumin Xu, Azalia Mirhoseini
  • Patent number: 11651763
    Abstract: Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.
    Type: Grant
    Filed: November 2, 2020
    Date of Patent: May 16, 2023
    Assignee: Baidu USA LLC
    Inventors: Sercan O. Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou
  • Patent number: 11593655
    Abstract: As deep learning application domains grow, a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements is extremely beneficial. Presented herein are large-scale empirical study of error and model size growth as training sets grow. Embodiments of a methodology for this measurement are introduced herein as well as embodiments for predicting other metrics, such as compute-related metrics. It is shown herein that power-law may be used to represent deep model relationships, such as error and training data size. It is also shown that model size scales sublinearly with data size. These scaling relationships have significant implications on deep learning research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about data set growth. They can also guide computing system design and underscore the importance of continued computational scaling.
    Type: Grant
    Filed: November 30, 2018
    Date of Patent: February 28, 2023
    Assignee: Baidu USA LLC
    Inventors: Joel Hestness, Gregory Diamos, Hee Woo Jun, Sharan Narang, Newsha Ardalani, Md Mostofa Ali Patwary, Yanqi Zhou
  • Patent number: 11238843
    Abstract: Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.
    Type: Grant
    Filed: September 26, 2018
    Date of Patent: February 1, 2022
    Assignee: Baidu USA LLC
    Inventors: Sercan O. Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou
  • Publication number: 20210248445
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for optimizing the execution of the operations of a neural network. One of the methods includes obtaining data representing a graph characterizing a plurality of operations of a neural network, wherein each node of the graph characterizes an operation of the neural network and each edge of the graph characterizes data dependency between the operations; processing the data representing the graph using a graph embedding neural network to generate an embedding of the graph; and processing the embedding of the graph using a policy neural network to generate a task output, wherein the task output comprises, for each of the plurality of operations of the neural network, a respective decision for a particular optimization task.
    Type: Application
    Filed: April 3, 2020
    Publication date: August 12, 2021
    Inventors: Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Lin-Kit Wong, Chao Ma, Qiumin Xu, Azalia Mirhoseini
  • Publication number: 20210049999
    Abstract: Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.
    Type: Application
    Filed: November 2, 2020
    Publication date: February 18, 2021
    Applicant: Baidu USA LLC
    Inventors: Sercan O. ARIK, Gregory DIAMOS, Andrew GIBIANSKY, John MILLER, Kainan PENG, Wei PING, Jonathan RAIMAN, Yanqi ZHOU
  • Patent number: 10896669
    Abstract: Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.
    Type: Grant
    Filed: May 8, 2018
    Date of Patent: January 19, 2021
    Assignee: Baidu USA LLC
    Inventors: Sercan O. Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou
  • Publication number: 20200175374
    Abstract: As deep learning application domains grow, a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements is extremely beneficial. Presented herein are large-scale empirical study of error and model size growth as training sets grow. Embodiments of a methodology for this measurement are introduced herein as well as embodiments for predicting other metrics, such as compute-related metrics. It is shown herein that power-law may be used to represent deep model relationships, such as error and training data size. It is also shown that model size scales sublinearly with data size. These scaling relationships have significant implications on deep learning research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about data set growth. They can also guide computing system design and underscore the importance of continued computational scaling.
    Type: Application
    Filed: November 30, 2018
    Publication date: June 4, 2020
    Applicant: Baidu USA LLC
    Inventors: Joel HESTNESS, Gregory DIAMOS, Hee Woo JUN, Sharan NARANG, Newsha ARDALANI, Md Mostofa Ali PATWARY, Yanqi ZHOU
  • Publication number: 20190354837
    Abstract: Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy but lacked consideration of computational resource use. Presented herein are embodiments of a Resource-Efficient Neural Architect (RENA), an efficient resource-constrained NAS using reinforcement learning with network embedding. RENA embodiments use a policy network to process the network embeddings to generate new configurations. Example demonstrates of RENA embodiments on image recognition and keyword spotting (KWS) problems are also presented herein. RENA embodiments can find novel architectures that achieve high performance even with tight resource constraints. For the CIFAR10 dataset, the tested embodiment achieved 2.95% test error when compute intensity is greater than 100 FLOPs/byte, and 3.87% test error when model size was less than 3M parameters.
    Type: Application
    Filed: March 8, 2019
    Publication date: November 21, 2019
    Applicant: Baidu USA LLC
    Inventors: Yanqi ZHOU, Siavash EBRAHIMI, Sercan ARIK, Haonan YU, Hairong LIU, Gregory DIAMOS
  • Publication number: 20190251952
    Abstract: Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.
    Type: Application
    Filed: September 26, 2018
    Publication date: August 15, 2019
    Applicant: Baidu USA LLC
    Inventors: Sercan O. ARIK, Jitong CHEN, Kainan PENG, Wei PING, Yanqi ZHOU
  • Publication number: 20180336880
    Abstract: Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.
    Type: Application
    Filed: May 8, 2018
    Publication date: November 22, 2018
    Applicant: Baidu USA LLC
    Inventors: Sercan O. ARIK, Gregory DIAMOS, Andrew GIBIANSKY, John MILLER, Kainan PENG, Wei PING, Jonathan RAIMAN, Yanqi ZHOU