Patents by Inventor Rafid Reza Mahmood

Rafid Reza Mahmood has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ESTIMATING OPTIMAL TRAINING DATA SET SIZE FOR MACHINE LEARNING MODEL SYSTEMS AND APPLICATIONS

Publication number: 20230385687

Abstract: Approaches for training data set size estimation for machine learning model systems and applications are described. Examples include a machine learning model training system that estimates target data requirements for training a machine learning model, given an approximate relationship between training data set size and model performance using one or more validation score estimation functions. To derive a validation score estimation function, a regression data set is generated from training data, and subsets of the regression data set are used to train the machine learning model. A validation score is computed for the subsets and used to compute regression function parameters to curve fit the selected regression function to the training data set. The validation score estimation function is then solved for and provides an output of an estimate of the number additional training samples needed for the validation score estimation function to meet or exceed a target validation score.

Type: Application

Filed: May 31, 2022

Publication date: November 30, 2023

Inventors: Rafid Reza Mahmood, James Robert Lucas, David Jesus Acuna Marrero, Daiqing Li, Jonah Philion, Jose Manuel Alvarez Lopez, Zhiding Yu, Sanja Fidler, Marc Law
ESTIMATING OPTIMAL TRAINING DATA SET SIZES FOR MACHINE LEARNING MODEL SYSTEMS AND APPLICATIONS

Publication number: 20230376849

Abstract: In various examples, estimating optimal training data set sizes for machine learning model systems and applications. Systems and methods are disclosed that estimate an amount of data to include in a training data set, where the training data set is then used to train one or more machine learning models to reach a target validation performance. To estimate the amount of training data, subsets of an initial training data set may be used to train the machine learning model(s) in order to determine estimates for the minimum amount of training data needed to train the machine learning model(s) to reach the target validation performance. The estimates may then be used to generate one or more functions, such as a cumulative density function and/or a probability density function, wherein the function(s) is then used to estimate the amount of training data needed to train the machine learning model(s).

Type: Application

Filed: May 16, 2023

Publication date: November 23, 2023

Inventors: Rafid Reza Mahmood, Marc Law, James Robert Lucas, Zhiding Yu, Jose Manuel Alvarez Lopez, Sanja Fidler
OPTIMIZED ACTIVE LEARNING USING INTEGER PROGRAMMING

Publication number: 20230244985

Abstract: In various examples, a representative subset of data points are queried or selected using integer programming to minimize the Wasserstein distance between the selected data points and the data set from which they were selected. A Generalized Benders Decomposition (GBD) may be used to decompose and iteratively solve the minimization problem, providing a globally optimal solution (an identified subset of data points that match the distribution of their data set) within a threshold tolerance. Data selection may be accelerated by applying one or more constraints while iterating, such as optimality cuts that leverage properties of the Wasserstein distance and/or pruning constraints that reduce the search space of candidate data points. In an active learning implementation, a representative subset of unlabeled data points may be selected using GBD, labeled, and used to train machine learning model(s) over one or more cycles of active learning.

Type: Application

Filed: February 2, 2022

Publication date: August 3, 2023

Inventors: Rafid Reza Mahmood, Sanja Fidler, Marc Law

ESTIMATING OPTIMAL TRAINING DATA SET SIZE FOR MACHINE LEARNING MODEL SYSTEMS AND APPLICATIONS

ESTIMATING OPTIMAL TRAINING DATA SET SIZES FOR MACHINE LEARNING MODEL SYSTEMS AND APPLICATIONS

OPTIMIZED ACTIVE LEARNING USING INTEGER PROGRAMMING