Patents Assigned to DataRobot, Inc.
  • Publication number: 20230065870
    Abstract: This disclosure relates generally to artificial intelligence structured to generate models based on multimodal input. At least one aspect is directed to a system. The system can include a data processing system comprising memory and one or more processors to generate, by a first model trained using machine learning with input including one or more first features each associated with data structures having a plurality of distinct data types, one or more second features compatible with one of the distinct data types, generate, by a second model trained with input including the second features, a plurality of cluster classifications each compatible with one or more of the distinct data types, and cause a user interface to present one or more of the data structures rendered according to a spatial structure based on the second features and the cluster classifications.
    Type: Application
    Filed: August 30, 2022
    Publication date: March 2, 2023
    Applicant: DataRobot, Inc.
    Inventors: Ivan Pyzow, David Michael McGarry, Mikhail Yakubovskiy, Ee Kin Chin, Mykyta Yarmak, Yuliia Bezuhla, Zachary Albert Mayer
  • Publication number: 20230067026
    Abstract: Automated data analytics techniques for non-tabular data sets may include methods and systems for (1) automatically developing models that perform tasks in the domains of computer vision, audio processing, speech processing, text processing, or natural language processing; (2) automatically developing models that analyze heterogeneous data sets containing image data and non-image data, and/or heterogeneous data sets containing tabular data and non-tabular data; (3) determining the importance of an image feature with respect to a modeling task, (4) explaining the value of a modeling target based at least in part on an image feature, and (5) detecting drift in image data. In some cases, multi-stage models may be developed, wherein a pre-trained feature extraction model extracts low-, mid-, high-, and/or highest-level features of non-tabular data, and a data analytics models uses those features (or features derived therefrom) to perform a data analytics task.
    Type: Application
    Filed: February 17, 2021
    Publication date: March 2, 2023
    Applicant: DataRobot, Inc.
    Inventors: Yurii Huts, Chin Ee Kin, Anton Kasyanov, Zachary Albert Mayer, Xavier Conort, Hon Nian Chua, Sabari Shanmugam, Atanas Mitkov Atanasov, Ivan Richard Pyzow
  • Publication number: 20230051833
    Abstract: Systems and methods of epidemiological modeling using machine learning are provided, and can include receiving values for an occurrence of the infectious disease during a first time period, generating, from a model trained by a machine learning system, predictions for the occurrence of the infectious disease over a second time period, performing, by a simulator using the predictions, one or more simulations of the occurrence of the infectious disease in one or more geographic regions during one or more time periods subsequent to the second time period, and providing, to a user interface, a first simulation of the one or more simulations performed by the simulator for a first geographic region of the one or more geographic regions during a time period of the one or more time periods.
    Type: Application
    Filed: July 28, 2022
    Publication date: February 16, 2023
    Applicant: DataRobot, Inc.
    Inventors: Jeremy Achin, Michael Schmidt, Mackenzie Heiser, Jona Sassenhagen, Oleg Baranovskiy, Jared Shamwell, Hon Nian Chua, Joao Paulo Gomes, Maxence Jeunesse, Yung Siang Liau, Julian Wergieluk, Jay Cameron Schuren, Mark Steadman, Mohak Saxena, Samuel Clark, Noa Flaherty, Jarred Bultema, Nathan Robert Cameron, Amanda Schierz, Vinay Venkata Wunnava, Xavier Conort, Gregory Michaelson, Anton Suslov, Madeleine Mott, Sergey Yurgenson, Christopher James Monsour, Matthew Joseph Nitzken, Patrick Allen Farrell, Jared Bowns, Dustin Burke, Ievgenii Baliuk, Rishabh Raman
  • Publication number: 20230004796
    Abstract: Systems and methods are described for developing and using neural network models. An example method of training a neural network includes: oscillating a learning rate while performing a preliminary training of a neural network; determining, based on the preliminary training, a number of training epochs to perform for a subsequent training session, and training the neural network using the determined number of training epochs. The systems and methods can be used to build neural network models that efficiently and accurately handle heterogeneous data.
    Type: Application
    Filed: May 13, 2022
    Publication date: January 5, 2023
    Applicant: DataRobot, Inc.
    Inventors: Zachary Albert Mayer, Jason McGhee, Jesse Bannon, Joshua Matthew Weiner
  • Publication number: 20230004486
    Abstract: The system can identify data stored in repositories that indicate changes in the version of the application relative to a prior version of the application tested or deployed before receipt of the request to test the performance of the version of the application. The system can determine, based on the data and using machine learning with historical data associated with applications tested or deployed to test performance of the version, and without execution of the tests, a score for each of a plurality of tests configured to test performance of the version of the application. The system can select, based on the scores, a subset of the tests to execute, and provide an indication of the selected subset of the tests to cause execution of the subset of the tests to evaluate performance of the version of the application prior to deployment of the version of the application.
    Type: Application
    Filed: July 1, 2022
    Publication date: January 5, 2023
    Applicant: DataRobot, Inc.
    Inventors: Borys Drozhak, Ievgenii Baliuk, Dustin Burke
  • Patent number: 11514369
    Abstract: Systems and methods are described for interpreting machine learning model predictions. An example method includes: providing a machine learning model configured to receive a plurality of features as input and provide a prediction as output, wherein the plurality of features includes an engineered feature including a combination of two or more parent features; calculating a Shapley value for each feature in the plurality of features; and allocating a respective portion of the Shapley value for the engineered feature to each of the two or more parent features.
    Type: Grant
    Filed: June 11, 2021
    Date of Patent: November 29, 2022
    Assignee: DataRobot, Inc.
    Inventors: Mark Benjamin Romanowsky, Jared Bowns, Thomas Whitehead, Thomas Stearns, Xavier Conort, Anastasiia Tamazlykar, Mohak Saxena
  • Publication number: 20220358528
    Abstract: An apparatus has a memory with processor-executable instructions and a processor operatively coupled to the memory. The apparatus receives datasets including time series data points that are descriptive of a feature of a given entity. The processor determines a time series characteristic based on the data content, and selects, based on the determined characteristic, a set of entrant forecasting models from a pool of forecasting models stored in the memory. Next, the processor trains each entrant forecasting model with the time series data points to produce a set of trained entrant forecasting models. The processor executes each trained entrant forecasting model to generate a set of forecasted values indicating estimations of the feature of the given entity. Thereafter the processor selects at least one forecasting model from the set of trained entrant forecasting models based on computed accuracy evaluations performed over the set of forecasted values.
    Type: Application
    Filed: February 14, 2022
    Publication date: November 10, 2022
    Applicant: DataRobot, Inc.
    Inventors: John Bledsoe, Jeff Gabriel, Jason Montgomery, Ryan Sevey, Matt Steinpreis, Craig Vermeer, Ryan West
  • Publication number: 20220335030
    Abstract: Cache optimization for data preparation includes: generating a data traversal program that represents a result of a set of sequenced data preparation operations performed on one or more sets of data, wherein the data traversal program indicates how to assemble one or more affected columns in the one or more sets of data to derive the result; in response to receiving a specification of the set of sequenced operations to be performed on the one or more sets of data, accessing the data traversal program that represents the result or a stored copy of the data traversal program that represents the result; assembling the one or more affected columns in the one or more sets of data according to the data traversal program to re-generate the result; and outputting the result.
    Type: Application
    Filed: July 1, 2022
    Publication date: October 20, 2022
    Applicant: DataRobot, Inc.
    Inventors: Dave Brewster, Victor Tso
  • Patent number: 11461304
    Abstract: Signature-based cache optimization for data preparation includes: performing a first set of sequenced data preparation operations on one or more sets of data to generate a plurality of transformation results; caching one or more of the plurality of transformation results and one or more corresponding operation signatures, a cached operation signature being derived based at least in part on a subset of sequenced operations that generated a corresponding result; receiving a specification of a second set of sequenced operations; determining an operation signature associated with the second set of sequenced operations; identifying a cached result among the cached results based at least in part on the determined operation signature; and outputting the cached result.
    Type: Grant
    Filed: March 10, 2020
    Date of Patent: October 4, 2022
    Assignee: DataRobot, Inc.
    Inventors: Dave Brewster, Victor Tze-Yeuan Tso
  • Publication number: 20220284183
    Abstract: A step editor for data preparation can instruct a user interface to present a first plurality of operations to be applied in a sequential order to one or more sets of data, receive user inputs including at least one indication to mute at least one operation of the first plurality of operations to prevent the processors from performing the at least one operation, generate a second plurality of operations, the second plurality of operations to be applied in a sequential order to the sets of data and comprising the first plurality of operations excluding the operation muted by the user inputs, obtain a cached data traversal program associated with the second plurality of operations and comprising a representation of a result of transforming the sets of data, and instruct the user interface to present output based at least in part on execution of the cached data traversal program.
    Type: Application
    Filed: March 25, 2022
    Publication date: September 8, 2022
    Applicant: DataRobot, Inc.
    Inventors: Nenshad Bardoliwalla, Michael Matthews, Ian Timourian, Jing Chen, Lilia Gutnik, Whitman Kwok, Dave Brewster, Victor Tze-Yeuan Tso
  • Publication number: 20220237516
    Abstract: Data modeling systems and methods are described. A data modeling method may include receiving user input specifying a structure of at least a portion of a data model and a complexity value associated with the structure; (a) generating one or more data models; (b) determining complexity scores for the respective data models; (c) for each of the data models: determining whether to select the respective data model for evaluation based, at least in part, on the complexity score of the respective data model, and if the respective data model is selected for evaluation, evaluating an accuracy of the respective data model for one or more data sets; and repeating steps (a)-(c) until one or more specified termination criteria are satisfied, wherein a first of the generated data models includes the specified structure, and wherein the complexity score for the first data model is determined based, at least in part, on the complexity value associated with the structure.
    Type: Application
    Filed: April 6, 2022
    Publication date: July 28, 2022
    Applicant: DataRobot, Inc.
    Inventors: Michael Schmidt, Dylan Sherry, Hongmin Fan
  • Patent number: 11386075
    Abstract: Methods for detection of anomalous data samples from a plurality of data samples are provided. In some embodiments, an anomaly detection procedure that includes a plurality of tasks is executed to identify the anomalous data samples from the plurality of data samples.
    Type: Grant
    Filed: November 6, 2020
    Date of Patent: July 12, 2022
    Assignee: DataRobot, Inc.
    Inventors: Amanda Claire Schierz, Jeremy Achin, Zachary Albert Mayer
  • Publication number: 20220199266
    Abstract: Systems and methods of epidemiological modeling using machine learning are provided, and can include receiving values for an occurrence of the infectious disease during a first time period, generating, from a model trained by a machine learning system, predictions for the occurrence of the infectious disease over a second time period, performing, by a simulator using the predictions, one or more simulations of the occurrence of the infectious disease in one or more geographic regions during one or more time periods subsequent to the second time period, and providing, to a user interface, a first simulation of the one or more simulations performed by the simulator for a first geographic region of the one or more geographic regions during a time period of the one or more time periods.
    Type: Application
    Filed: December 9, 2021
    Publication date: June 23, 2022
    Applicant: DataRobot, Inc.
    Inventors: Jeremy Achin, Earl Jared Shamwell, Michael Schmidt, Mackenzie Heiser, Patrick Farrell, Matt Nitzken, Jared Bowns, Nathan Cameron, Adam Beairsto, Jay Schuren, Mohak Saxena
  • Patent number: 11361246
    Abstract: Various systems and methods provide an intuitive user interface that enables automatic specification of queries and constraints for analysis by ML component. Various implementations provide methodologies for automatically formulating machine learning (“ML”) and optimization queries. The automatic generation of ML and/or optimization queries can be configured to use examples to facilitate formulation of ML and optimization queries. One example method includes accepting input data specifying variables and data values associated with the variables. Within the input data any unspecified data records are identified, and a relationship between the variables specified in the input data and a variable associated with the at least one unspecified data record is automatically determined. The relationship can be automatically determined based on training data contained within the input data. Once a relationship is established a ML problem can be automatically generated.
    Type: Grant
    Filed: September 17, 2018
    Date of Patent: June 14, 2022
    Assignee: DataRobot, Inc.
    Inventor: Michael Schmidt
  • Patent number: 11334795
    Abstract: Systems and methods are described for developing and using neural network models. An example method of training a neural network includes: oscillating a learning rate while performing a preliminary training of a neural network; determining, based on the preliminary training, a number of training epochs to perform for a subsequent training session; and training the neural network using the determined number of training epochs. The systems and methods can be used to build neural network models that efficiently and accurately handle heterogeneous data.
    Type: Grant
    Filed: March 11, 2021
    Date of Patent: May 17, 2022
    Assignee: DataRobot, Inc.
    Inventors: Zachary Albert Mayer, Jason McGhee, Jesse Bannon, Joshua Matthew Weiner
  • Publication number: 20220076166
    Abstract: Described herein are systems and methods for providing data sets from a constantly changing database to a streaming machine learning component. In one embodiment, a data streaming sub-system receives multiple incoming streams of data sets, in which each stream is generated in real-time by one of multiple data sources. The streaming sub-system sends data sets, on-the-fly as they are received, to storage in the memory of a database, in which there is a linkage between the storage and the time of arrival or the time of storage, of the data sets. The database receives, from a machine learning component, a request to receive data sets according to a particular time or time period. In response to such request, the database identifies such data sets according to the particular time or time period and sends them to the machine learning component.
    Type: Application
    Filed: November 15, 2021
    Publication date: March 10, 2022
    Applicant: DataRobot, Inc.
    Inventors: Swaminathan Sundararaman, Nisha Darshi Talagala, Gal Zuckerman
  • Publication number: 20220076164
    Abstract: Training computer models by generating time-aware training datasets is provided. A system receives a secondary dataset to be combined with a primary dataset for generation of a training dataset. The primary dataset includes a plurality of data records where at least one data record corresponds to a time-of-prediction value corresponding to a timestamp at which at least one data record was used to generate a prediction. The secondary dataset includes a plurality of features where at least one feature corresponds to a timestamp value. The system selects a feature within the secondary dataset with a timestamp that precedes or matches a time-of-prediction value for a corresponding data record within the primary dataset. The system generates the training dataset that includes the primary dataset and the selected feature. The system trains a model using the generated training dataset.
    Type: Application
    Filed: September 8, 2021
    Publication date: March 10, 2022
    Applicant: DataRobot, Inc.
    Inventors: Xavier Conort, Hon Nian Chua, Yung Siang Liau, Harry Dinh
  • Patent number: 11250449
    Abstract: An apparatus has a memory with processor-executable instructions and a processor operatively coupled to the memory. The apparatus receives datasets including time series data points that are descriptive of a feature of a given entity. The processor determines a time series characteristic based on the data content, and selects, based on the determined characteristic, a set of entrant forecasting models from a pool of forecasting models stored in the memory. Next, the processor trains each entrant forecasting model with the time series data points to produce a set of trained entrant forecasting models. The processor executes each trained entrant forecasting model to generate a set of forecasted values indicating estimations of the feature of the given entity. Thereafter the processor selects at least one forecasting model from the set of trained entrant forecasting models based on computed accuracy evaluations performed over the set of forecasted values.
    Type: Grant
    Filed: July 9, 2019
    Date of Patent: February 15, 2022
    Assignee: DataRobot, Inc.
    Inventors: John Bledsoe, Jeff Gabriel, Jason Montgomery, Ryan Sevey, Matt Steinpreis, Craig Vermeer, Ryan West
  • Patent number: 11176483
    Abstract: Described herein are systems and methods for providing data sets from a constantly changing database to a streaming machine learning component. In one embodiment, a data streaming sub-system receives multiple incoming streams of data sets, in which each stream is generated in real-time by one of multiple data sources. The streaming sub-system sends data sets, on-the-fly as they are received, to storage in the memory of a database, in which there is a linkage between the storage and the time of arrival or the time of storage, of the data sets. The database receives, from a machine learning component, a request to receive data sets according to a particular time or time period. In response to such request, the database identifies such data sets according to the particular time or time period and sends them to the machine learning component.
    Type: Grant
    Filed: May 3, 2017
    Date of Patent: November 16, 2021
    Assignee: DataRobot Inc.
    Inventors: Swaminathan Sundararaman, Nisha Darshi Talagala, Gal Zuckerman
  • Patent number: 10984367
    Abstract: Systems and techniques for predictive data analytics are described. In a method for selecting a predictive model for a prediction problem, the suitabilities of predictive modeling procedures for the prediction problem may be determined based on characteristics of the prediction problem and/or on attributes of the respective modeling procedures. A subset of the predictive modeling procedures may be selected based on the determined suitabilities of the selected modeling procedures for the prediction problem. A resource allocation schedule allocating computational resources for execution of the selected modeling procedures may be generated, based on the determined suitabilities of the selected modeling procedures for the prediction problem. Results of the execution of the selected modeling procedures in accordance with the resource allocation schedule may be obtained. A predictive model for the prediction problem may be selected based on those results.
    Type: Grant
    Filed: May 5, 2017
    Date of Patent: April 20, 2021
    Assignee: DataRobot, Inc.
    Inventors: Jeremy Achin, Thomas DeGodoy, Timothy Owen, Xavier Conort