Patents by Inventor Michael Langford
Michael Langford has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12380122Abstract: Methods and systems are described herein for facilitating generation of synthetic datasets having a change point. The system may receive a command to generate a synthetic time series dataset. The system may generate data points for components of the synthetic dataset, the components including a seasonality function, a trend function, and a noise function. The system may modify the trend function to a different trend function by modifying a level or a slope of the trend function. The system may generate a change point by replacing a subset of consecutive data points generated using the trend function with consecutive data points generated using the different trend function. The system may then generate the synthetic time series dataset having a change point by combining the seasonality data points, the trend data points, and the noise data points into corresponding time slots of the synthetic time series dataset.Type: GrantFiled: November 22, 2023Date of Patent: August 5, 2025Assignee: Capital One Services, LLCInventors: Justin Essert, Zhengqing Liu, Vannia Gonzalez Macias, Pratik Gandhi, Michael Langford
-
Publication number: 20250165485Abstract: Methods and systems are described herein for facilitating generation of synthetic datasets having a change point. The system may receive a command to generate a synthetic time series dataset. The system may generate data points for components of the synthetic dataset, the components including a seasonality function, a trend function, and a noise function. The system may modify the trend function to a different trend function by modifying a level or a slope of the trend function. The system may generate a change point by replacing a subset of consecutive data points generated using the trend function with consecutive data points generated using the different trend function. The system may then generate the synthetic time series dataset having a change point by combining the seasonality data points, the trend data points, and the noise data points into corresponding time slots of the synthetic time series dataset.Type: ApplicationFiled: November 22, 2023Publication date: May 22, 2025Applicant: Capital One Services, LLCInventors: Justin ESSERT, Zhengqing LIU, Vannia GONZALEZ MACIAS, Pratik GANDHI, Michael LANGFORD
-
Publication number: 20250156730Abstract: Systems and methods for minimizing dimensionality of a high-dimensionality dataset during feature engineering. The system achieves this by using a tabular neural network to extract non-linear transformations of features without dramatically increasing the dimensionality of the original dataset. The system receives an original dataset for classification and a defined number of final features (e.g., dimensionality) that result from the synthetic feature creation and the neural network embedding process. Once an architecture of a model is determined, a model is fit on a synthetic feature set (e.g., a second dataset comprising synthetic features) with a given classification as a target.Type: ApplicationFiled: November 15, 2023Publication date: May 15, 2025Applicant: Capital One Services, LLCInventor: Michael LANGFORD
-
Publication number: 20250156770Abstract: Systems and methods for novel uses and/or improvements to artificial intelligence applications, particularly in the context of practical applications featuring less complex model architectures. As one example, systems and methods described herein may achieve the technical benefits of a more complex model architecture through an ensemble of less complex models while reducing the overall training burden (e.g., in terms of computing resources, training time, and/or technical feasibility).Type: ApplicationFiled: November 15, 2023Publication date: May 15, 2025Applicant: Capital One Services, LLCInventor: Michael LANGFORD
-
Publication number: 20250139456Abstract: Methods and systems are described herein for minimizing development time in artificial intelligence models by automating model selection based on dataset fittings of time-series data prior to hyperparameter optimization. The system may select a statistical profile type to identify in a first dataset. The system may retrieve a statistical model corresponding to the statistical profile type. The system may select, based on a first statistical profile, a first untrained model from a first plurality of untrained models for training, wherein the first plurality of untrained models comprises respective algorithms for time-series forecasting and wherein each of the first plurality of untrained models comprises default hyperparameter tuning. The system may, based on selecting the first untrained model, tune a first hyperparameter of the first untrained model using the first dataset.Type: ApplicationFiled: October 31, 2023Publication date: May 1, 2025Applicant: Capital One Services, LLCInventor: Michael LANGFORD
-
Publication number: 20250139455Abstract: Systems and methods for minimizing development time in artificial intelligence models by automating model selection based on dataset fittings of time-series data prior to hyperparameter optimization. The systems and methods use a scoring policy based on a plurality of labeled datasets that score one or more results contained within the aggregate statistical profile. The system may dynamically identify particular criteria in statistical data that indicates an effectiveness of a given model on a given dataset. These criteria (e.g., the scoring policy) may then be updated over time as new datasets, statistical analyses, and/or aggregated statistical profiles are developed within affecting the underlying models and/or datasets.Type: ApplicationFiled: October 31, 2023Publication date: May 1, 2025Applicant: Capital One Services, LLCInventor: Michael LANGFORD
-
Publication number: 20250139440Abstract: Methods and systems are described herein for minimizing development time in artificial intelligence models by automating model selection based on dataset fittings of time-series data prior to hyperparameter optimization. For example, the system may apply a profiling model using a time-series embedding of the dataset combined with the aggregate statistical profile. In either case, the profiling model may be trained on the scoring policy and/or a time-series embedding of the dataset combined with the aggregate statistical profile to determine a likelihood of the effectiveness of a given model on the given dataset and/or likely hyperparameters for the given model.Type: ApplicationFiled: October 31, 2023Publication date: May 1, 2025Applicant: Capital One Services, LLCInventor: Michael LANGFORD
-
Publication number: 20250139503Abstract: Methods and systems are described herein for minimizing development time in artificial intelligence models by automating model selection based on dataset fittings of time-series data prior to hyperparameter optimization. The systems and methods described herein aim to reduce the redundancies and improve the efficiencies of model selection, model training, and/or hyperparameter selection. The systems and methods achieve this by using information about the attributes of the time-series dataset that may be used to determine a model that may be most effective at fitting a given dataset. If a model is selected prior to hyperparameter optimization, the time and resources spent training, fitting, and/or tuning models that are not selected can be avoided.Type: ApplicationFiled: October 31, 2023Publication date: May 1, 2025Applicant: Capital One Services, LLCInventor: Michael LANGFORD
-
Publication number: 20250139502Abstract: Methods and systems are described herein for minimizing development time in artificial intelligence models by automating model selection based on dataset fittings of time-series data prior to hyperparameter optimization.Type: ApplicationFiled: October 31, 2023Publication date: May 1, 2025Applicant: Capital One Services, LLCInventors: Michael LANGFORD, Abhisek JANA, Rajesh Kanna DURAIRAJ
-
Publication number: 20250077503Abstract: Disclosed embodiments may include a system for providing a nearest neighbors classification pipeline with automated dimensionality reduction. The system may receive a dataset. The system may determine whether the dataset has a first dimensionality that exceeds a predetermined threshold. When dataset has a dimensionality that exceeds a predetermined threshold, the system may prompt a user to input an explained variance threshold ratio. The system may receive the explained variance threshold ratio. The system may iteratively perform a binary search on the dataset to determine a reduced dimensionality having a total explained variance ratio closest to but not less than the explained variance threshold ratio. The system may reduce the dataset to the reduced dimensionality to generate a reduced dataset. The system may train a machine learning model using the reduced dataset.Type: ApplicationFiled: March 7, 2024Publication date: March 6, 2025Inventor: Michael LANGFORD
-
Publication number: 20240185116Abstract: Disclosed embodiments may include a method for bagging ensemble classifiers for imbalanced big data. The system may receive user input comprising a number of machine learning base models to generate. The system may generate the machine learning base models based on the user input. Iteratively for each machine learning base model of the machine learning base models until all machine learning base models are trained, the system may: determine a chunk for a machine learning base model of the machine learning base models, wherein the chunk comprises all minority cases from training data and a plurality of majority cases from the training data and train the machine learning base model with the chunk.Type: ApplicationFiled: December 1, 2022Publication date: June 6, 2024Inventor: Michael Langford
-
Publication number: 20240104421Abstract: A method includes obtaining a first dataset comprising a first set of features and generating a second set of features based on the first set of features set by providing the first dataset to feature primitive stacks that respectively corresponds to features of the second set of features. The method further includes determining a reduced feature set based on the second set of features and a count of correlation values between features of the second set of features, wherein the correlation values satisfy a correlation threshold. The method further includes storing the reduced feature set in a database in association with the first set of features based on a determination that a second dataset comprising the reduced feature set satisfies a set of criteria.Type: ApplicationFiled: September 26, 2022Publication date: March 28, 2024Applicant: Capital One Services, LLCInventor: Michael LANGFORD
-
Publication number: 20240104436Abstract: A method includes obtaining a first dataset including a first feature set, generating a first set of feature values by providing the first dataset to a set of feature primitive stacks, and determining a reduced set of feature values based on the first set of feature values by dimensionally reducing features of the first set of feature values. The method further includes generating an intermediate set of feature values by providing a value of the first dataset and a value of the reduced set of feature values to at least one feature primitive of the set of feature primitive stacks. The method further includes updating the reduced set of feature values by dimensionally reducing features of the intermediate set of feature values and storing a second dataset including features of the intermediate set of feature values in association with the first feature set.Type: ApplicationFiled: September 26, 2022Publication date: March 28, 2024Applicant: Capital One Services, LLCInventor: Michael LANGFORD
-
Publication number: 20240095551Abstract: Systems and methods for successively imputing missing feature values using machine learning to sequentially fill in missing feature values in partially-filled datasets, and by using the information in populated records of the dataset. The systems and methods disclosed herein may be useful in many machine learning contexts and application where datasets are missing values.Type: ApplicationFiled: September 15, 2022Publication date: March 21, 2024Inventor: Michael Langford
-
Patent number: 11934384Abstract: Disclosed embodiments may include a system for providing a nearest neighbors classification pipeline with automated dimensionality reduction. The system may receive a dataset. The system may determine whether the dataset has a first dimensionality that exceeds a predetermined threshold. When dataset has a dimensionality that exceeds a predetermined threshold, the system may prompt a user to input an explained variance threshold ratio. The system may receive the explained variance threshold ratio. The system may iteratively perform a binary search on the dataset to determine a reduced dimensionality having a total explained variance ratio closest to but not less than the explained variance threshold ratio. The system may reduce the dataset to the reduced dimensionality to generate a reduced dataset. The system may train a machine learning model using the reduced dataset.Type: GrantFiled: December 1, 2022Date of Patent: March 19, 2024Assignee: CAPITAL ONE SERVICES, LLCInventor: Michael Langford
-
Publication number: 20240078415Abstract: A method may be provided for selecting embedding dimension, which can include receiving a trained machine learning (ML) model and a graph neural network (GNN) and extracting, from the received ML model, a count of a number of neurons in a penultimate layer and node embeddings for each input graph node in GNN neurons in the penultimate layer. An importance threshold input for filtering the node embeddings can be received, and a tree-based model may be used to return feature importance values. The extracted node embeddings may be input into the tree-based model and an importance metric of each of the node embedding dimensions may be determined from the penultimate layer neurons. The penultimate layer neuron count of the ML model may be restricted to correspond to a number of the highest importance node embedding dimensions and the ML model may be trained using the restricted penultimate layer.Type: ApplicationFiled: September 7, 2022Publication date: March 7, 2024Inventor: Michael Langford
-
Publication number: 20240070528Abstract: Systems and methods are provided for evaluating and selecting an ensemble of machine language models using extremely randomized bootstrap aggregation (e.g., bagging) with replacement. The method may include the use of a plurality of base models to produce a combined (or aggregated) output. Original data may be randomly sampled with replacement to create N subsets of bootstrapped data for which each of the N-selected base models may produce a prediction based on their subset of data. The individual predictions may be combined and evaluated, and an ensemble having the highest performance may be selected and trained for production. Certain implementations of the disclosed technology can eliminate the need for apriori knowledge about which model (or models) will provide accurate predictions.Type: ApplicationFiled: August 31, 2022Publication date: February 29, 2024Inventor: Michael Langford
-
Publication number: 20240013089Abstract: Systems and methods, as described herein, relate to sequential synthesis and selection for feature engineering. A dataset may be associated with a label defining a machine-learning target attribute and a received operation that can be applied to at least one of the existing features of the dataset. One or more potential features may be generated by applying the operation to one or more existing features. For each of the one or more potential features, a feature importance algorithm may be applied to the respective feature along with the one or more existing features, generating a respective feature importance value. Respective feature importance values may be generated for each of the one or more existing features based on applying the feature importance algorithm and used to sort the potential features. A level of correlation to each of the one or more existing features may be determined to make sure it is under a threshold level to avoid new features heavily correlated to existing ones.Type: ApplicationFiled: July 7, 2022Publication date: January 11, 2024Inventor: Michael Langford
-
Publication number: 20230419189Abstract: The exemplary embodiments may provide a stacked machine learning model ensemble pipeline architecture selector that selects a well-suited stacked machine learning model ensemble pipeline architecture for a specified configuration input and a target data set. The stacked machine learning model ensemble pipeline architecture selector may generate and score possible stacked machine learning model ensemble pipeline architectures to locate one that is well-suited for the target data set and the conforms with the configuration input. The stacked machine learning model ensemble pipeline architecture selector may use genetic programming to generate successive generations of possible stacked ensemble pipeline architectures and to score those architectures to determine how well-suited they are. In this manner, the stacked machine learning model ensemble pipeline architecture selector may converge on an architecture that is well-suited, for example, that meet one or more scores, evaluation metrics, and/or the like.Type: ApplicationFiled: June 24, 2022Publication date: December 28, 2023Applicant: Capital One Services, LLCInventors: Michael LANGFORD, Jakub KRZEPTOWSKI-MUCHA, Krishna BALAM
-
Publication number: 20230196125Abstract: Various embodiments are generally directed to techniques for optimizing hyperparameters, such as optimizing different combinations of hyperparameters, for instance. Some embodiments are particularly directed using a genetic or Bayesian algorithm to identify and optimize different combinations of hyperparameters for a machine learning (ML) model. Many embodiments construct a search using a genetic algorithm that prioritizes the most important hyperparameters in influencing model performance.Type: ApplicationFiled: December 16, 2021Publication date: June 22, 2023Applicant: Capital One Services, LLCInventor: Michael LANGFORD