Patents Assigned to BigML, Inc.

Selection of machine learning algorithms

Patent number: 12169789

Abstract: Systems and methods of selecting machine learning models/algorithms for a candidate dataset are disclosed. A computer system may access historical data of a set of algorithms applied to a set of benchmark datasets; select a first algorithm of the set of algorithms; apply the first algorithm to an input dataset to create a model of the input dataset; evaluate and store results of the applying; and add the first algorithm to a set of tried algorithms. The computer system may select a next algorithm of the algorithm set via submodular optimization based on the historical data and the set of tried algorithms; apply the next algorithm to the input dataset; capture a next result based on the applying; add the next result to update the set of tried algorithms; and repeat the submodular optimization. The procedure may continue until a termination condition is reached.

Type: Grant

Filed: January 20, 2023

Date of Patent: December 17, 2024

Assignee: BIGML, INC.

Inventor: Charles Parker
SELECTION OF MACHINE LEARNING ALGORITHMS

Publication number: 20170286839

Abstract: Systems and methods of selecting machine learning models/algorithms for a candidate dataset are disclosed. A computer system may access historical data of a set of algorithms applied to a set of benchmark datasets; select a first algorithm of the set of algorithms; apply the first algorithm to an input dataset to create a model of the input dataset; evaluate and store results of the applying; and add the first algorithm to a set of tried algorithms. The computer system may select a next algorithm of the algorithm set via submodular optimization based on the historical data and the set of tried algorithms; apply the next algorithm to the input dataset; capture a next result based on the applying; add the next result to update the set of tried algorithms; and repeat the submodular optimization. The procedure may continue until a termination condition is reached.

Type: Application

Filed: April 3, 2017

Publication date: October 5, 2017

Applicant: BigML, Inc.

Inventor: Charles Parker
PREDICTIVE MODELING AND DATA ANALYSIS IN A SECURE SHARED SYSTEM

Publication number: 20170140302

Abstract: A system and method enables users to selectively expose and optionally monetize their data resources, for example on a web site. Data assets such as datasets and models can be exposed by the proprietor on a public gallery for use by others. Fees may be charged, for example, per new model, or per prediction using a model. Users may selectively expose public datasets or public models while keeping their raw data private.

Type: Application

Filed: January 26, 2017

Publication date: May 18, 2017

Applicant: BigML, Inc.

Inventors: Francisco J. MARTIN, Oscar ROVIRA, Jos VERWOERD, Poul PETERSEN, Charles PARKER, Jose Antonio ORTEGA, Beatriz GARCIA, J. Justin DONALDSON, Antonio BLASCO, Adam ASHENFELTER
EVOLVING PARALLEL SYSTEM TO AUTOMATICALLY IMPROVE THE PERFORMANCE OF MULTIPLE CONCURRENT TASKS ON LARGE DATASETS

Publication number: 20170090980

Abstract: We describe a high-level computational framework especially well suited to parallel operations on large datasets. In a system in accordance with this framework, there is at least one, and generally several, instances of an architecture deployment as further described. We use the term “architecture deployment” herein to mean a cooperating group of processes together with the hardware on which the processes are executed. This is not to imply a one-to-one association of any process to particular hardware. To the contrary, as detailed below, an architecture deployment may dynamically spawn another deployment as appropriate, including provisioning needed hardware. The active architecture deployments together form a system that dynamically processes jobs requested by a user-customer, in accordance with customer's monetary budget and other criteria, in a robust and automatically scalable environment.

Type: Application

Filed: December 7, 2016

Publication date: March 30, 2017

Applicant: BigML, Inc.

Inventors: Francisco J. Martin, Adam Ashenfelter, J. Justin Donaldson, Jos Verwoerd, Jose Antonio Ortega, Charles Parker
Predictive modeling and data analysis in a secure shared system

Patent number: 9576246

Abstract: A system and method enables users to selectively expose and optionally monetize their data resources, for example on a web site. Data assets such as datasets and models can be exposed by the proprietor on a public gallery for use by others. Fees may be charged, for example, per new model, or per prediction using a model. Users may selectively expose public datasets or public models while keeping their raw data private.

Type: Grant

Filed: September 12, 2013

Date of Patent: February 21, 2017

Assignee: BIGML, INC.

Inventors: Francisco J. Martin, Oscar Rovira, Jos Verwoerd, Poul Petersen, Charles Parker, Jose Antonio Ortega, Beatriz Garcia, J. Justin Donaldson, Antonio Blasco, Adam Ashenfelter
INTERACTIVE VISUALIZATION OF BIG DATA SETS AND MODELS INCLUDING TEXTUAL DATA

Publication number: 20170032026

Abstract: Systems and processes are disclosed for advanced text analysis in the field of big data analytics and visualization: Users can now factor text into their predictive models, alongside regression, time/date and categorical information. This is ideal for building models where text content may play a prominent role (e.g., social media or customer service logs). Multiple data types, including text fields, may be combined together in datasets and models, and may be presented in various interactive visualization displays.

Type: Application

Filed: October 12, 2016

Publication date: February 2, 2017

Applicant: BigML, Inc.

Inventors: Charles Parker, Adam Ashenfelter
Evolving parallel system to automatically improve the performance of multiple concurrent tasks on large datasets

Patent number: 9558036

Abstract: We describe a high-level computational framework especially well suited to parallel operations on large datasets. In a system in accordance with this framework, there is at least one, and generally several, instances of an architecture deployment as further described. We use the term “architecture deployment” herein to mean a cooperating group of processes together with the hardware on which the processes are executed. This is not to imply a one-to-one association of any process to particular hardware. To the contrary, as detailed below, an architecture deployment may dynamically spawn another deployment as appropriate, including provisioning needed hardware. The active architecture deployments together form a system that dynamically processes jobs requested by a user-customer, in accordance with customer's monetary budget and other criteria, in a robust and automatically scalable environment.

Type: Grant

Filed: May 29, 2015

Date of Patent: January 31, 2017

Assignee: BigML, Inc.

Inventors: Francisco J. Martin, Adam Ashenfelter, J. Justin Donaldson, Jos Verwoerd, Jose Antonio Ortega, Charles Parker
Interactive visualization of big data sets and models including textual data

Patent number: 9501540

Abstract: Systems and processes are disclosed for advanced text analysis in the field of big data analytics and visualization: Users can now factor text into their predictive models, alongside regression, time/date and categorical information. This is ideal for building models where text content may play a prominent role (e.g., social media or customer service logs). Multiple data types, including text fields, may be combined together in datasets and models, and may be presented in various interactive visualization displays.

Type: Grant

Filed: September 25, 2014

Date of Patent: November 22, 2016

Assignee: BIGML, INC.

Inventors: Charles Parker, Adam Ashenfelter
PREDICTIVE MODELING OF DATA CLUSTERS

Publication number: 20160292578

Abstract: The present disclosure pertains to a system and method for predictive modeling of data clusters. The system and method include creating a dataset from a data source comprising data points, identifying a number of clusters based at least in part on a similarity metric between the data points, generating a model for each of the number of clusters based at least in part on identifying the number of clusters, visually displaying the number of clusters, receiving an indication of selection of a particular cluster, and replacing the visual display of the identified number of clusters with a visual display of the model corresponding to the particular cluster in response to receiving an indication of selection of a model icon.

Type: Application

Filed: April 1, 2016

Publication date: October 6, 2016

Applicant: BigML, Inc.

Inventor: Adam Ashenfelter
Methods for building regression trees in a distributed computing environment

Patent number: 9269054

Abstract: Systems and methods are disclosed for building and using decision trees, preferably in a scalable and distributed manner. Our system can be used to create and use classification trees, regression trees, or a combination of regression trees called a gradient boosted regression tree (GBRT). Our system leverages approximate histograms in new ways to process large datasets, or data streams, while limiting inter-process communication bandwidth requirements. Further, in some embodiments, a scalable network of computers or processors is utilized for fast computation of decision trees. Preferably, the network comprises a tree structure of processors, comprising a master node and a plurality of worker nodes or “workers,” again arranged to limit necessary communications.

Type: Grant

Filed: November 9, 2012

Date of Patent: February 23, 2016

Assignee: BigML, Inc.

Inventors: Francisco J. Martin, Adam Ashenfelter, J. Justin Donaldson, Jos Verwoerd, Jose Antonio Ortega, Charles Parker
Evolving parallel system to automatically improve the performance of multiple concurrent tasks on large datasets

Patent number: 9098326

Abstract: We describe a high-level computational framework especially well suited to parallel operations on large datasets. In a system in accordance with this framework, there is at least one, and generally several, instances of an architecture deployment as further described. We use the term “architecture deployment” herein to mean a cooperating group of processes together with the hardware on which the processes are executed. This is not to imply a one-to-one association of any process to particular hardware. To the contrary, as detailed below, an architecture deployment may dynamically spawn another deployment as appropriate, including provisioning needed hardware. The active architecture deployments together form a system that dynamically processes jobs requested by a user-customer, in accordance with customer's monetary budget and other criteria, in a robust and automatically scalable environment.

Type: Grant

Filed: November 9, 2012

Date of Patent: August 4, 2015

Assignee: BigML, Inc.

Inventors: Francisco J. Martin, Adam Ashenfelter, J. Justin Donaldson, Jos Verwoerd, Jose Antonio Ortega, Charles Parker
INTERACTIVE VISUALIZATION SYSTEM AND METHOD

Publication number: 20150081685

Abstract: A system and method generates and displays an interactive space-filling graphical representation of a model based at least in part on a dataset having data items. The space-filling graphical representation may have a plurality of segments arranged to realize a type of visualization and sized in proportion to a number of data items represented by the segment to convey particular information about the dataset.

Type: Application

Filed: September 24, 2014

Publication date: March 19, 2015

Applicant: BigML, Inc.

Inventors: Adam Ashenfelter, David Gerster, Oscar Rovira
METHOD AND APPARATUS FOR VISUALIZING AND INTERACTING WITH DECISION TREES

Publication number: 20130117280

Abstract: A decision tree model is generated from sample data. A visualization system may automatically prune the decision tree model based on characteristics of nodes or branches in the decision tree or based on artifacts associated with model generation. For example, only nodes or questions in the decision tree receiving a largest amount of the sample data may be displayed in the decision tree. The nodes also may be displayed in a manner to more readily identify associated fields or metrics. For example, the nodes may be displayed in different colors and the colors may be associated with different node questions or answers.

Type: Application

Filed: November 2, 2012

Publication date: May 9, 2013

Applicant: BigML, Inc.

Inventor: BigML, Inc.