Patents by Inventor Quoc V. Le
Quoc V. Le has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20220245928Abstract: Systems and methods of the present disclosure can include a computer-implemented method for efficient machine-learned model training. The method can include obtaining a plurality of training samples for a machine-learned model. The method can include, for one or more first training iterations, training, based at least in part on a first regularization magnitude configured to control a relative effect of one or more regularization techniques, the machine-learned model using one or more respective first training samples of the plurality of training samples. The method can include, for one or more second training iterations, training, based at least in part on a second regularization magnitude greater than the first regularization magnitude, the machine-learned model using one or more respective second training samples of the plurality of training samples.Type: ApplicationFiled: December 29, 2021Publication date: August 4, 2022Inventors: Mingxing Tan, Quoc V. Le
-
Publication number: 20220230048Abstract: Methods, systems, and apparatus, including computer-readable media, for scaling neural network architectures on hardware accelerators. A method includes receiving training data and information specifying target computing resources, and performing using the training data, a neural architecture search over a search space to identify an architecture for a base neural network. A plurality of scaling parameter values for scaling the base neural network can be identified, which can include repeatedly selecting a plurality of candidate scaling parameter values, and determining a measure of performance for the base neural network scaled according to the plurality of candidate scaling parameter values, in accordance with a plurality of second objectives including a latency objective. An architecture for a scaled neural network can be determined using the architecture of the base neural network scaled according to the plurality of scaling parameter values.Type: ApplicationFiled: February 12, 2021Publication date: July 21, 2022Inventors: Andrew Li, Sheng Li, Mingxing Tan, Ruoming Pang, Liqun Cheng, Quoc V. Le, Norman Paul Jouppi
-
Publication number: 20220215209Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a IT machine learning model. One of the methods includes receiving training data comprising a plurality of unlabeled training inputs and a plurality of labeled training inputs; generating augmented training data, comprising generating, for each of the plurality of unlabeled training inputs, a respective augmented training input by applying a data augmentation technique to the unlabeled training input; and training the machine learning model on the augmented training data. In particular, but not exclusively, the model may be trained for perceptual tasks (e.g. tasks relating to vision or speech).Type: ApplicationFiled: April 24, 2020Publication date: July 7, 2022Inventors: Thang Minh Luong, Quoc V. Le, Qizhe Xie, Zihang Dai
-
Publication number: 20220198145Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating author vectors. One of the methods includes obtaining a set of sequences of words, the set of sequences of words comprising a plurality of first sequences of words and, for each first sequence of words, a respective second sequence of words that follows the first sequence of words, wherein each first sequence of words and each second sequence of words has been classified as being authored by a first author; and training a neural network system on the first sequences and the second sequences to determine an author vector for the first author, wherein the author vector characterizes the first author.Type: ApplicationFiled: March 14, 2022Publication date: June 23, 2022Applicant: GOOGLE LLCInventors: Quoc V. Le, Brian Patrick Strope
-
Publication number: 20220188636Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network using meta pseudo-labels. One of the methods includes training a student neural network using pseudo-labels generated by a teacher neural network that is being trained jointly with the student neural network.Type: ApplicationFiled: December 14, 2021Publication date: June 16, 2022Inventors: Hieu Hy Pham, Zihang Dai, Qizhe Xie, Quoc V. Le
-
Publication number: 20220129740Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing inputs using neural networks that include one or more conditional convolutional layers. A conditional convolutional layer has a plurality of kernels and determines a respective input-dependent weight for each of the plurality of kernels and generates an input-dependent kernel by computing a weighted sum of the plurality of kernels in accordance with the respective input-dependent weights.Type: ApplicationFiled: January 23, 2020Publication date: April 28, 2022Inventors: Brandon Chauloon Yang, Quoc V. Le, Jiquan Ngiam, Gabriel Mintzer Bender
-
Publication number: 20220114400Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a machine learning model. One of the methods includes obtaining a training data set for training a machine learning model, the training data set comprising a plurality of training inputs; determining a plurality of data augmentation policies, wherein each data augmentation policy defines a procedure for processing a training input to generate a transformed training input; for each data augmentation policy, training the machine learning model using the data augmentation policy; determining, for each data augmentation policy, a quality measure of the machine learning model that has been trained using the data augmentation policy; and selecting a final data augmentation policy based using the quality measures of the machine learning models.Type: ApplicationFiled: December 20, 2021Publication date: April 14, 2022Inventors: Jonathon Shlens, Quoc V. Le, Ekin Dogus Cubuk, Barret Zoph
-
Publication number: 20220108058Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a computer chip placement. One of the methods includes obtaining netlist data for a computer chip; and generating a computer chip placement, comprising placing a respective macro node at each time step in a sequence comprising a plurality of time steps, the placing comprising, for each time step: generating an input representation for the time step; processing the input representation using a node placement neural network having a plurality of network parameters, wherein the node placement neural network is configured to process the input representation in accordance with current values of the network parameters to generate a score distribution over a plurality of positions on the surface of the computer chip; and assigning the macro node to be placed at the time step to a position from the plurality of positions using the score distribution.Type: ApplicationFiled: December 17, 2021Publication date: April 7, 2022Inventors: Anna Darling Goldie, Azalia Mirhoseini, Ebrahim Songhori, Wenjie Jiang, Shen Wang, Roger David Carpenter, Young-Joon Lee, Mustafa Nazim Yazgan, Chian-min Richard Ho, Quoc V. Le, James Laudon, Jeffrey Adgate Dean, Kavya Srinivasa Setty, Omkar Pathak
-
Publication number: 20220108204Abstract: A computer-implemented method of generating scale-permuted models can generate models having improved accuracy and reduced evaluation computational requirements. The method can include defining, by a computing system including one or more computing devices, a search space including a plurality of candidate permutations of a plurality of candidate feature blocks, each of the plurality of candidate feature blocks having a respective scale. The method can include performing, by the computing system, a plurality of search iterations by a search algorithm to select a scale-permuted model from the search space, the scale-permuted model based at least in part on a candidate permutation of the plurality of candidate permutations.Type: ApplicationFiled: October 1, 2020Publication date: April 7, 2022Inventors: Xianzhi Du, Yin Cui, Tsung-Yi Lin, Quoc V. Le, Pengchong Jin, Mingxing Tan, Golnaz Ghiasi, Xiaodan Song
-
Publication number: 20220101082Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representations of input sequences. One of the methods includes obtaining an input sequence, the input sequence comprising a plurality of inputs arranged according to an input order; processing the input sequence using a first long short term memory (LSTM) neural network to convert the input sequence into an alternative representation for the input sequence; and processing the alternative representation for the input sequence using a second LSTM neural network to generate a target sequence for the input sequence, the target sequence comprising a plurality of outputs arranged according to an output order.Type: ApplicationFiled: December 10, 2021Publication date: March 31, 2022Inventors: Oriol Vinyals, Quoc V. Le, Ilya Sutskever
-
Publication number: 20220101090Abstract: The present disclosure is directed to an automated neural architecture search approach for designing new neural network architectures such as, for example, resource-constrained mobile CNN models. In particular, the present disclosure provides systems and methods to perform neural architecture search using a novel factorized hierarchical search space that permits layer diversity throughout the network, thereby striking the right balance between flexibility and search space size. The resulting neural architectures are able to be run relatively faster and using relatively fewer computing resources (e.g., less processing power, less memory usage, less power consumption, etc.), all while remaining competitive with or even exceeding the performance (e.g., accuracy) of current state-of-the-art mobile-optimized models.Type: ApplicationFiled: October 6, 2021Publication date: March 31, 2022Inventors: Mingxing Tan, Quoc V. Le, Bo Chen, Vijay Vasudevan, Ruoming Pang
-
Publication number: 20220092387Abstract: A computing system for producing an architecture of a pyramid layer is disclosed. The computing system can include a controller model configured to generate new architectures for a pyramid layer that receives a plurality of input feature representations output by a backbone model and, in response, outputs a plurality of output feature representations. The plurality of input feature representations can have a plurality of different input resolutions, and the plurality of output feature representations can have a plurality of different output resolutions. The computing system can be configured to perform a plurality of iterations. For each iteration, the computing system can receive a new pyramid layer architecture as an output of the controller model and evaluate one or more performance characteristics of a machine-learned pyramidal feature model that includes the backbone model and one or more pyramid layers that have the new pyramid layer architecture.Type: ApplicationFiled: February 25, 2020Publication date: March 24, 2022Inventors: Quoc V. Le, Golnaz Ghiasi, Tsung-Yi Lin
-
Publication number: 20220083840Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, used to implement a self-training technique for generating neural network (NN) models. A first model is generated in response to training a first NN using labeled data. A respective pseudo label is generated for each item of unlabeled data when items of unlabeled data are processed using the first model. A second NN is used to process each item of a combined dataset to train the second NN. The combined dataset includes items of labeled data and a corresponding item for each respective pseudo label. Attributes of items in the combined dataset are modified to inject noise into the combined dataset when the second NN is trained. A second model is generated after the second NN is trained by processing items in the combined dataset, including processing items that represent the noise injected into the combined dataset.Type: ApplicationFiled: September 11, 2020Publication date: March 17, 2022Inventors: Thang Minh Luong, Quoc V. Le, Qizhe Xie
-
Patent number: 11275895Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating author vectors. One of the methods includes obtaining a set of sequences of words, the set of sequences of words comprising a plurality of first sequences of words and, for each first sequence of words, a respective second sequence of words that follows the first sequence of words, wherein each first sequence of words and each second sequence of words has been classified as being authored by a first author; and training a neural network system on the first sequences and the second sequences to determine an author vector for the first author, wherein the author vector characterizes the first author.Type: GrantFiled: March 19, 2020Date of Patent: March 15, 2022Assignee: GOOGLE LLCInventors: Brian Patrick Strope, Quoc V. Le
-
Publication number: 20220067304Abstract: Systems and methods are provided for training and using energy-based language models such as cloze language models. In particular, one aspect of the present disclosure is directed to an energy-based cloze language model for representation learning over text. In some instances, the models provided herein can be referred to as the “Electric” model. Similar to the BERT model, example models proposed herein can be a conditional generative model of tokens given their contexts. However, example models proposed herein do not mask text or output a full distribution over tokens that could occur in a context. Instead, the example proposed models assign a scalar energy score to each input token. Another aspect of the present disclosure provides techniques to train the proposed models to assign low energies to data tokens and high energies to other ones using an algorithm based on noise-contrastive estimation.Type: ApplicationFiled: August 27, 2021Publication date: March 3, 2022Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
-
Publication number: 20220043981Abstract: The present disclosure is directed to systems and methods for performing reading comprehension with machine learning. More specifically, the present disclosure is directed to a Neural Symbolic Reader (example implementations of which may be referred to as NeRd), which includes a reader to encode the passage and question, and a programmer to generate a program for multi-step reasoning. By using operators like span selection, the program can be executed over a natural language text passage to generate an answer to a natural language text question. NeRd is domain-agnostic such that the same neural architecture works for different domains. Further, NeRd it is compositional such that complex programs can be generated by compositionally applying the symbolic operators.Type: ApplicationFiled: August 6, 2020Publication date: February 10, 2022Inventors: Chen Liang, Wei Yu, Quoc V. Le, Xinyun Chen, Dengyong Zhou
-
Publication number: 20220028375Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing an utterance, and the input acoustic sequence comprising a respective acoustic feature representation at each of a first number of time steps; processing the input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence; processing the alternative representation for the input acoustic sequence using an attention-based Recurrent Neural Network (RNN) to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings; and generating a sequence of substrings that represent a transcription of the utterance.Type: ApplicationFiled: October 7, 2021Publication date: January 27, 2022Applicant: Google LLCInventors: William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Noam M. Shazeer
-
Publication number: 20220019869Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining an architecture for a task neural network that is configured to perform a particular machine learning task on a target set of hardware resources. When deployed on a target set of hardware, such as a collection of datacenter accelerators, the task neural network may be capable of performing the particular machine learning task with enhanced accuracy and speed.Type: ApplicationFiled: September 30, 2020Publication date: January 20, 2022Inventors: Sheng Li, Norman Paul Jouppi, Quoc V. Le, Mingxing Tan, Ruoming Pang, Liqun Cheng, Andrew Li
-
Publication number: 20220012537Abstract: Generally, the present disclosure is directed to systems and methods that generate augmented training data for machine-learned models via application of one or more augmentation techniques to audiographic images that visually represent audio signals. In particular, the present disclosure provides a number of novel augmentation operations which can be performed directly upon the audiographic image (e.g., as opposed to the raw audio data) to generate augmented training data that results in improved model performance. As an example, the audiographic images can be or include one or more spectrograms or filter bank sequences.Type: ApplicationFiled: September 28, 2021Publication date: January 13, 2022Inventors: Daniel Sung-Joon Park, Quoc V. Le, William Chan, Ekin Dogus Cubuk, Barret Zoph, Yu Zhang, Chung-Cheng Chiu
-
Patent number: 11222252Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representations of input sequences. One of the methods includes obtaining an input sequence, the input sequence comprising a plurality of inputs arranged according to an input order; processing the input sequence using a first long short term memory (LSTM) neural network to convert the input sequence into an alternative representation for the input sequence; and processing the alternative representation for the input sequence using a second LSTM neural network to generate a target sequence for the input sequence, the target sequence comprising a plurality of outputs arranged according to an output order.Type: GrantFiled: December 6, 2018Date of Patent: January 11, 2022Assignee: Google LLCInventors: Oriol Vinyals, Quoc V. Le, Ilya Sutskever