Abstract: A computer-based system trains a neural network by solving a double layer optimization problem. The system includes an input interface to receive an input to the neural network and labels of the input to the neural network; a processor to solve a double layer optimization to produce parameters of the neural network, and an output interface to output the parameters of the neural network. The double layer optimization includes an optimization of a first layer subject to an optimization of a second layer. The optimization of the first layer minimizes a difference between an output of the neural network processing the input and the labels of the input to the neural network, the optimization of the second layer minimizes a distance between a non-negative output vector of each layer and a corresponding input vector to each layer. The input vector of a current layer is a linear transformation of the non-negative output vector of the previous layer.
Type:
Grant
Filed:
November 16, 2017
Date of Patent:
November 9, 2021
Assignee:
Mitsubishi Electric Research Laboratories, Inc.
Abstract: Machine learning classification models which are robust against label noise are provided. Noise may be modelled explicitly by modelling “label flips”, where incorrect binary labels are “flipped” relative to their ground truth value. Distributions of label flips may be modelled as prior and posterior distributions in a flexible architecture for machine learning systems. An arbitrary classification model may be provided within the system. The classification model is made more robust to label noise by operation of the prior and posterior distributions. Particular prior and approximating posterior distributions are disclosed.
Abstract: An online system uses multiple machine learning models to select content for providing to a user of the online system. Specifically, the online system trains a general model that intakes a first set of features and outputs predictions at a general level. The online system further trains a residual model that intakes a second set of features. The residual model predicts a residual (e.g., an error) of the predictions outputted by the general model. Therefore, the predicted residual from the residual model is combined with the prediction from the general model in order to correct for the over-generality of the general model. The online system may use the combined prediction to send content to users.
Type:
Grant
Filed:
September 29, 2017
Date of Patent:
August 31, 2021
Assignee:
Facebook, Inc.
Inventors:
Andrew Donald Yates, Gunjit Singh, Kurt Dodge Runke
Abstract: A method, computer readable medium, and system are disclosed for implementing a temporal ensembling model for training a deep neural network. The method for training the deep neural network includes the steps of receiving a set of training data for a deep neural network and training the deep neural network utilizing the set of training data by: analyzing the plurality of input vectors by the deep neural network to generate a plurality of prediction vectors, and, for each prediction vector in the plurality of prediction vectors corresponding to the particular input vector, computing a loss term associated with the particular input vector by combining a supervised component and an unsupervised component according to a weighting function and updating the target prediction vector associated with the particular input vector.