Patents by Inventor Yanqi ZHOU

Yanqi ZHOU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

NEURAL NETWORK ARCHITECTURE SEARCH OVER COMPLEX BLOCK ARCHITECTURES

Publication number: 20240112027

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing neural architecture search for machine learning models. In one aspect, a method comprises receiving training data for a machine learning, generating a plurality of candidate neural networks for performing the machine learning task, wherein each candidate neural network comprises a plurality of instances of a layer block composed of a plurality of layers, for each candidate neural network, selecting a respective type for each of the plurality of layers from a set of layer types that comprises, training the candidate neural network and evaluating performance scores for the trained candidate neural networks as applied to the machine learning task, and determining a final neural network for performing the machine learning task based at least on the performance scores for the candidate neural networks.

Type: Application

Filed: September 28, 2023

Publication date: April 4, 2024

Inventors: Yanqi Zhou, Yanping Huang, Yifeng Lu, Andrew M. Dai, Siamak Shakeri, Zhifeng Chen, James Laudon, Quoc V. Le, Da Huang, Nan Du, David Richard So, Daiyi Peng, Yingwei Cui, Jeffrey Adgate Dean, Chang Lan
NEURAL ARCHITECTURE AND HARDWARE ACCELERATOR SEARCH

Publication number: 20240005129

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for jointly determining neural network architectures and hardware accelerator architectures.

Type: Application

Filed: October 1, 2021

Publication date: January 4, 2024

Inventors: Yanqi Zhou, Amir Yazdanbakhsh, Berkin Akin, Daiyi Peng, Yuxiong Zhu, Mingxing Tan, Xuanyi Dong
JOINT TRAINING OF NETWORK ARCHITECTURE SEARCH AND MULTI-TASK DENSE PREDICTION MODELS FOR EDGE DEPLOYMENT

Publication number: 20230409867

Abstract: Implementations are described herein for performing joint optimization of multi-task learning of dense predictions (MT-DP) and hardware-aware neural architecture search (NAS). In various implementations, a set of tasks to be performed using a resource-constrained edge computing system may be determined. Based on a base multi-task dense-prediction (MT-DP) architecture template, the set of tasks, and a plurality of hardware-based constraints of a target edge computing system, a network architecture search (NAS) may be used to sample candidate MT-DP architecture(s) from a search space of neural network architecture components. Each sampled candidate MT-DP architecture may include a distinct assembly of sampled neural network architecture components applied to the base MT-DP architecture template. Image data may be processed using the candidate MT-DP architecture(s) to determine performance metrics. These performance metrics may be used to jointly train the MT-DP architecture(s) and/or the NAS.

Type: Application

Filed: June 15, 2022

Publication date: December 21, 2023

Inventors: Chunfeng Wen, Yueqi Li, Zhiqiang Yuan, Minh Thanh Vu, Yanqi Zhou
EFFICIENT HARDWARE ACCELERATOR ARCHITECTURE EXPLORATION

Publication number: 20230376664

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining architectures of hardware accelerators.

Type: Application

Filed: October 11, 2021

Publication date: November 23, 2023

Inventors: Amir YAZDANBAKHSH, Christof ANGERMUELLER, Berkin AKIN, Yanqi ZHOU, James LAUDON, Ravi NARAYANASWAMI
COMPUTATIONAL GRAPH OPTIMIZATION

Publication number: 20230306266

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for optimizing the execution of the operations of a neural network. One of the methods includes obtaining data representing a graph characterizing a plurality of operations of a neural network, wherein each node of the graph characterizes an operation of the neural network and each edge of the graph characterizes data dependency between the operations; processing the data representing the graph using a graph embedding neural network to generate an embedding of the graph; and processing the embedding of the graph using a policy neural network to generate a task output, wherein the task output comprises, for each of the plurality of operations of the neural network, a respective decision for a particular optimization task.

Type: Application

Filed: May 22, 2023

Publication date: September 28, 2023

Inventors: Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Lin-Kit Wong, Chao Ma, Qiumin Xu, Azalia Mirhoseini
Resource-efficient neural architects

Patent number: 11741342

Abstract: Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy but lacked consideration of computational resource use. Presented herein are embodiments of a Resource-Efficient Neural Architect (RENA), an efficient resource-constrained NAS using reinforcement learning with network embedding. RENA embodiments use a policy network to process the network embeddings to generate new configurations. Example demonstrates of RENA embodiments on image recognition and keyword spotting (KWS) problems are also presented herein. RENA embodiments can find novel architectures that achieve high performance even with tight resource constraints. For the CIFAR10 dataset, the tested embodiment achieved 2.95% test error when compute intensity is greater than 100 FLOPs/byte, and 3.87% test error when model size was less than 3M parameters.

Type: Grant

Filed: March 8, 2019

Date of Patent: August 29, 2023

Assignee: Baidu USA LLC

Inventors: Yanqi Zhou, Siavash Ebrahimi, Sercan Arik, Haonan Yu, Hairong Liu, Gregory Diamos
LEARNED GRAPH OPTIMIZATIONS FOR COMPILERS

Publication number: 20230176840

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for compiler optimizations using a compiler optimization network. One of the methods includes receiving an input program, wherein the input program defines a graph of operation modules, wherein each node in the graph is a respective operation module, and each edge between nodes in the graph represents one operation module receiving the output generated by another operation module. The input program is processed by a compiler optimization network comprising a graph-embedding network that is configured to encode operation features and operation dependencies of the operation modules of the input program into a graph embedding representation and a policy network that is configured to generate an optimization action for each of one or more nodes encoded in the graph embedding representation.

Type: Application

Filed: June 7, 2021

Publication date: June 8, 2023

Inventors: Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Lin-Kit Wong, Chao Ma, Qiumin Xu, Hanxiao Liu, Phitchaya Mangpo Phothilimthana, Shen Wang, Anna Darling Goldie, Azalia Mirhoseini, James Laudon
Computational graph optimization

Patent number: 11657289

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for optimizing the execution of the operations of a neural network. One of the methods includes obtaining data representing a graph characterizing a plurality of operations of a neural network, wherein each node of the graph characterizes an operation of the neural network and each edge of the graph characterizes data dependency between the operations; processing the data representing the graph using a graph embedding neural network to generate an embedding of the graph; and processing the embedding of the graph using a policy neural network to generate a task output, wherein the task output comprises, for each of the plurality of operations of the neural network, a respective decision for a particular optimization task.

Type: Grant

Filed: April 3, 2020

Date of Patent: May 23, 2023

Assignee: Google LLC

Inventors: Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Lin-Kit Wong, Chao Ma, Qiumin Xu, Azalia Mirhoseini
Multi-speaker neural text-to-speech

Patent number: 11651763

Abstract: Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.

Type: Grant

Filed: November 2, 2020

Date of Patent: May 16, 2023

Assignee: Baidu USA LLC

Inventors: Sercan O. Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou
Predicting deep learning scaling

Patent number: 11593655

Abstract: As deep learning application domains grow, a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements is extremely beneficial. Presented herein are large-scale empirical study of error and model size growth as training sets grow. Embodiments of a methodology for this measurement are introduced herein as well as embodiments for predicting other metrics, such as compute-related metrics. It is shown herein that power-law may be used to represent deep model relationships, such as error and training data size. It is also shown that model size scales sublinearly with data size. These scaling relationships have significant implications on deep learning research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about data set growth. They can also guide computing system design and underscore the importance of continued computational scaling.

Type: Grant

Filed: November 30, 2018

Date of Patent: February 28, 2023

Assignee: Baidu USA LLC

Inventors: Joel Hestness, Gregory Diamos, Hee Woo Jun, Sharan Narang, Newsha Ardalani, Md Mostofa Ali Patwary, Yanqi Zhou
Systems and methods for neural voice cloning with a few samples

Patent number: 11238843

Abstract: Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.

Type: Grant

Filed: September 26, 2018

Date of Patent: February 1, 2022

Assignee: Baidu USA LLC

Inventors: Sercan O. Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou
COMPUTATIONAL GRAPH OPTIMIZATION

Publication number: 20210248445

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for optimizing the execution of the operations of a neural network. One of the methods includes obtaining data representing a graph characterizing a plurality of operations of a neural network, wherein each node of the graph characterizes an operation of the neural network and each edge of the graph characterizes data dependency between the operations; processing the data representing the graph using a graph embedding neural network to generate an embedding of the graph; and processing the embedding of the graph using a policy neural network to generate a task output, wherein the task output comprises, for each of the plurality of operations of the neural network, a respective decision for a particular optimization task.

Type: Application

Filed: April 3, 2020

Publication date: August 12, 2021

Inventors: Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Lin-Kit Wong, Chao Ma, Qiumin Xu, Azalia Mirhoseini
MULTI-SPEAKER NEURAL TEXT-TO-SPEECH

Publication number: 20210049999

Abstract: Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.

Type: Application

Filed: November 2, 2020

Publication date: February 18, 2021

Applicant: Baidu USA LLC

Inventors: Sercan O. ARIK, Gregory DIAMOS, Andrew GIBIANSKY, John MILLER, Kainan PENG, Wei PING, Jonathan RAIMAN, Yanqi ZHOU
Systems and methods for multi-speaker neural text-to-speech

Patent number: 10896669

Abstract: Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.

Type: Grant

Filed: May 8, 2018

Date of Patent: January 19, 2021

Assignee: Baidu USA LLC

Inventors: Sercan O. Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou
PREDICTING DEEP LEARNING SCALING

Publication number: 20200175374

Abstract: As deep learning application domains grow, a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements is extremely beneficial. Presented herein are large-scale empirical study of error and model size growth as training sets grow. Embodiments of a methodology for this measurement are introduced herein as well as embodiments for predicting other metrics, such as compute-related metrics. It is shown herein that power-law may be used to represent deep model relationships, such as error and training data size. It is also shown that model size scales sublinearly with data size. These scaling relationships have significant implications on deep learning research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about data set growth. They can also guide computing system design and underscore the importance of continued computational scaling.

Type: Application

Filed: November 30, 2018

Publication date: June 4, 2020

Applicant: Baidu USA LLC

Inventors: Joel HESTNESS, Gregory DIAMOS, Hee Woo JUN, Sharan NARANG, Newsha ARDALANI, Md Mostofa Ali PATWARY, Yanqi ZHOU
RESOURCE-EFFICIENT NEURAL ARCHITECTS

Publication number: 20190354837

Abstract: Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy but lacked consideration of computational resource use. Presented herein are embodiments of a Resource-Efficient Neural Architect (RENA), an efficient resource-constrained NAS using reinforcement learning with network embedding. RENA embodiments use a policy network to process the network embeddings to generate new configurations. Example demonstrates of RENA embodiments on image recognition and keyword spotting (KWS) problems are also presented herein. RENA embodiments can find novel architectures that achieve high performance even with tight resource constraints. For the CIFAR10 dataset, the tested embodiment achieved 2.95% test error when compute intensity is greater than 100 FLOPs/byte, and 3.87% test error when model size was less than 3M parameters.

Type: Application

Filed: March 8, 2019

Publication date: November 21, 2019

Applicant: Baidu USA LLC

Inventors: Yanqi ZHOU, Siavash EBRAHIMI, Sercan ARIK, Haonan YU, Hairong LIU, Gregory DIAMOS
SYSTEMS AND METHODS FOR NEURAL VOICE CLONING WITH A FEW SAMPLES

Publication number: 20190251952

Abstract: Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.

Type: Application

Filed: September 26, 2018

Publication date: August 15, 2019

Applicant: Baidu USA LLC

Inventors: Sercan O. ARIK, Jitong CHEN, Kainan PENG, Wei PING, Yanqi ZHOU
SYSTEMS AND METHODS FOR MULTI-SPEAKER NEURAL TEXT-TO-SPEECH

Publication number: 20180336880

Abstract: Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.

Type: Application

Filed: May 8, 2018

Publication date: November 22, 2018

Applicant: Baidu USA LLC

Inventors: Sercan O. ARIK, Gregory DIAMOS, Andrew GIBIANSKY, John MILLER, Kainan PENG, Wei PING, Jonathan RAIMAN, Yanqi ZHOU