Patents Assigned to DataRobot, Inc.
  • Publication number: 20230206610
    Abstract: Disclosed herein at methods and systems for visualizing machine learning model performance. One method comprises receiving a request to provide a visual representation of a machine learning technique executed on a set of images to generate a first attribute and a second attribute for each image; executing the machine learning model to receive the first and the second attribute for each image; mapping the first attribute to a visual distinctiveness protocol; identifying a distance for each image, the distance representing a difference between the second attribute predicted by the model for each pair of respective images within the set of images; and providing for display at least a subset of the set of images arranged in accordance with their respective distance and having a visual attribute corresponding to the mapped first attribute for each image.
    Type: Application
    Filed: December 29, 2022
    Publication date: June 29, 2023
    Applicant: DataRobot, Inc.
    Inventors: Ivan Pyzow, Pavlo Kochubei, Yehor Kolchyba, Sylvain Ferrandiz, Anton Kasyanov
  • Publication number: 20230196101
    Abstract: An automated machine learning (“ML”) method may include training a first machine learning model using a first machine learning algorithm and a training data set; validating the first machine learning model using a validation data set, wherein validating the first machine learning model comprises generating an error data set; training a second machine learning model to predict a suitability of the first machine learning model for analyzing an inference data set, wherein the second machine learning model is trained using a second machine learning algorithm and the error data set; and triggering a remedial action associated with the first or second machine learning model in response to a predicted suitability of the first machine learning model for analyzing the inference data set not satisfying a suitability threshold.
    Type: Application
    Filed: November 16, 2022
    Publication date: June 22, 2023
    Applicant: DataRobot, Inc.
    Inventors: Sindhu Ghanta, Drew Roselli, Nisha Talagala, Vinay Sridhar, Swaminathan Sundararaman, Lior Amar, Lior Khermosh, Bharath Ramsundar, Sriram Subramanian
  • Publication number: 20230186174
    Abstract: Segmenting data and forecasting by a combination of models trained on segmented data is provided. A system compares, with a first model, values of timestamps corresponding to data points to determine a time series dependency between the data points. The system generates, with the first model and based on the time series dependency, a first cluster with first data points and a second cluster with second data points. The system allocates, by a controller, a second model to the first cluster, and a third model to the second cluster. The system trains the second model based on the time series dependency and the first data points. The system trains the third model based on the time series dependency and the second data points. The system generates a fourth model based on a combination of the second trained model and the third trained model.
    Type: Application
    Filed: December 9, 2022
    Publication date: June 15, 2023
    Applicant: DataRobot, Inc.
    Inventors: Matt Nitzken, David McGarry, Roman Midianyi, Anatolli Stehni
  • Publication number: 20230186116
    Abstract: Aspects of this technical solution can identify, by a second machine learning model receiving as input first features, second features having respective impact metrics that satisfy an impact threshold, the impact threshold indicating that the second features modify various forecast data points, cause a graphical user interface to present the forecast including one or more of the first features having respective first visual properties corresponding to identifiers of respective ones of the first features, cause the graphical user interface to present the forecast including the second features having a second visual property corresponding to an indication that the second features satisfy the impact threshold, and cause the graphical user interface to modify the forecast including the second features to include an explanation portion including metrics of the second features, the metrics corresponding to respective time points of a time dependency relationship.
    Type: Application
    Filed: December 9, 2022
    Publication date: June 15, 2023
    Applicant: DataRobot, Inc.
    Inventors: Ina Ko, Borys Kupar, Yulia Bezhula, Kyrylo Kniazev
  • Publication number: 20230186175
    Abstract: Comparing a challenger model with a primary model is provided herein. In an embodiment, a system comprises one or more processors, coupled to memory, configured to determine, based on a comparison of a first model that is deployed as a primary model with a second model that is acting as a challenger model, that the second model performs better than the first model based on at least one performance metric; determine, based on a comparison of a characteristic of the first model with a characteristic of the second model, to skip a validation process for the second model; and establish the second model as the primary model in the deployment to replace the first model in the deployment.
    Type: Application
    Filed: December 9, 2022
    Publication date: June 15, 2023
    Applicant: DataRobot, Inc.
    Inventors: Bohdan Usatov, Chris Li, Evan Chang, Tristan Spauding, Christopher Cozzi
  • Publication number: 20230091610
    Abstract: This disclosure relates generally to using machine learning models to generate current time-series features using machine learning and validate time-series machine learning model output. At least one aspect is directed to a system with one or more processors, coupled to memory, to segment a time series range into a first segment for an instance of time, the segment associated with a value for a target feature and a timestamp for the value, segment the time series range into an input segment associated with a plurality of input features and a segment timestamp less than or equal to the timestamp, generate a model trained with input comprising values for the target feature and timestamps for the values less than or equal to the segment timestamp, and transform at least one of the input features based at least on the model.
    Type: Application
    Filed: September 12, 2022
    Publication date: March 23, 2023
    Applicant: DataRobot, Inc.
    Inventors: Anastasiia Tamazlykar, Igor Iaroshenkno, Mark Steadman, Jilian Schwiep, Peter Michael Simon, Zachary Deane-Mayer, Brett Rowley, Jing Qiang Goh
  • Publication number: 20230083891
    Abstract: Disclosed herein are methods and systems to generate and revise a workflow that utilizes machine learning model nodes and other analytical nodes to analyze data and generate a decision via allowing a user to interact with input elements of a graphical user interface. The methods and systems use a processor to provide, for rendering by a user device, a graphical user interface comprising at least a first graphical indicator corresponding to a computer model node within workflow code and a second graphical indicator corresponding to a decision node within the workflow code, the computer model node visually connected with the decision node; and in response to receiving, via a user interacting with the graphical user interface, an additional node corresponding to at least one analytical protocol, revise the workflow code, by adding the analytical protocol before an execution of the decision node.
    Type: Application
    Filed: September 12, 2022
    Publication date: March 16, 2023
    Applicant: DataRobot, Inc.
    Inventors: Jeremy Achin, Ina Ko, Stephen James Millet, Daniel Thomas Trost, Igor Veksler
  • Publication number: 20230065870
    Abstract: This disclosure relates generally to artificial intelligence structured to generate models based on multimodal input. At least one aspect is directed to a system. The system can include a data processing system comprising memory and one or more processors to generate, by a first model trained using machine learning with input including one or more first features each associated with data structures having a plurality of distinct data types, one or more second features compatible with one of the distinct data types, generate, by a second model trained with input including the second features, a plurality of cluster classifications each compatible with one or more of the distinct data types, and cause a user interface to present one or more of the data structures rendered according to a spatial structure based on the second features and the cluster classifications.
    Type: Application
    Filed: August 30, 2022
    Publication date: March 2, 2023
    Applicant: DataRobot, Inc.
    Inventors: Ivan Pyzow, David Michael McGarry, Mikhail Yakubovskiy, Ee Kin Chin, Mykyta Yarmak, Yuliia Bezuhla, Zachary Albert Mayer
  • Publication number: 20230067026
    Abstract: Automated data analytics techniques for non-tabular data sets may include methods and systems for (1) automatically developing models that perform tasks in the domains of computer vision, audio processing, speech processing, text processing, or natural language processing; (2) automatically developing models that analyze heterogeneous data sets containing image data and non-image data, and/or heterogeneous data sets containing tabular data and non-tabular data; (3) determining the importance of an image feature with respect to a modeling task, (4) explaining the value of a modeling target based at least in part on an image feature, and (5) detecting drift in image data. In some cases, multi-stage models may be developed, wherein a pre-trained feature extraction model extracts low-, mid-, high-, and/or highest-level features of non-tabular data, and a data analytics models uses those features (or features derived therefrom) to perform a data analytics task.
    Type: Application
    Filed: February 17, 2021
    Publication date: March 2, 2023
    Applicant: DataRobot, Inc.
    Inventors: Yurii Huts, Chin Ee Kin, Anton Kasyanov, Zachary Albert Mayer, Xavier Conort, Hon Nian Chua, Sabari Shanmugam, Atanas Mitkov Atanasov, Ivan Richard Pyzow
  • Publication number: 20230051833
    Abstract: Systems and methods of epidemiological modeling using machine learning are provided, and can include receiving values for an occurrence of the infectious disease during a first time period, generating, from a model trained by a machine learning system, predictions for the occurrence of the infectious disease over a second time period, performing, by a simulator using the predictions, one or more simulations of the occurrence of the infectious disease in one or more geographic regions during one or more time periods subsequent to the second time period, and providing, to a user interface, a first simulation of the one or more simulations performed by the simulator for a first geographic region of the one or more geographic regions during a time period of the one or more time periods.
    Type: Application
    Filed: July 28, 2022
    Publication date: February 16, 2023
    Applicant: DataRobot, Inc.
    Inventors: Jeremy Achin, Michael Schmidt, Mackenzie Heiser, Jona Sassenhagen, Oleg Baranovskiy, Jared Shamwell, Hon Nian Chua, Joao Paulo Gomes, Maxence Jeunesse, Yung Siang Liau, Julian Wergieluk, Jay Cameron Schuren, Mark Steadman, Mohak Saxena, Samuel Clark, Noa Flaherty, Jarred Bultema, Nathan Robert Cameron, Amanda Schierz, Vinay Venkata Wunnava, Xavier Conort, Gregory Michaelson, Anton Suslov, Madeleine Mott, Sergey Yurgenson, Christopher James Monsour, Matthew Joseph Nitzken, Patrick Allen Farrell, Jared Bowns, Dustin Burke, Ievgenii Baliuk, Rishabh Raman
  • Publication number: 20230004486
    Abstract: The system can identify data stored in repositories that indicate changes in the version of the application relative to a prior version of the application tested or deployed before receipt of the request to test the performance of the version of the application. The system can determine, based on the data and using machine learning with historical data associated with applications tested or deployed to test performance of the version, and without execution of the tests, a score for each of a plurality of tests configured to test performance of the version of the application. The system can select, based on the scores, a subset of the tests to execute, and provide an indication of the selected subset of the tests to cause execution of the subset of the tests to evaluate performance of the version of the application prior to deployment of the version of the application.
    Type: Application
    Filed: July 1, 2022
    Publication date: January 5, 2023
    Applicant: DataRobot, Inc.
    Inventors: Borys Drozhak, Ievgenii Baliuk, Dustin Burke
  • Publication number: 20230004796
    Abstract: Systems and methods are described for developing and using neural network models. An example method of training a neural network includes: oscillating a learning rate while performing a preliminary training of a neural network; determining, based on the preliminary training, a number of training epochs to perform for a subsequent training session, and training the neural network using the determined number of training epochs. The systems and methods can be used to build neural network models that efficiently and accurately handle heterogeneous data.
    Type: Application
    Filed: May 13, 2022
    Publication date: January 5, 2023
    Applicant: DataRobot, Inc.
    Inventors: Zachary Albert Mayer, Jason McGhee, Jesse Bannon, Joshua Matthew Weiner
  • Patent number: 11514369
    Abstract: Systems and methods are described for interpreting machine learning model predictions. An example method includes: providing a machine learning model configured to receive a plurality of features as input and provide a prediction as output, wherein the plurality of features includes an engineered feature including a combination of two or more parent features; calculating a Shapley value for each feature in the plurality of features; and allocating a respective portion of the Shapley value for the engineered feature to each of the two or more parent features.
    Type: Grant
    Filed: June 11, 2021
    Date of Patent: November 29, 2022
    Assignee: DataRobot, Inc.
    Inventors: Mark Benjamin Romanowsky, Jared Bowns, Thomas Whitehead, Thomas Stearns, Xavier Conort, Anastasiia Tamazlykar, Mohak Saxena
  • Publication number: 20220358528
    Abstract: An apparatus has a memory with processor-executable instructions and a processor operatively coupled to the memory. The apparatus receives datasets including time series data points that are descriptive of a feature of a given entity. The processor determines a time series characteristic based on the data content, and selects, based on the determined characteristic, a set of entrant forecasting models from a pool of forecasting models stored in the memory. Next, the processor trains each entrant forecasting model with the time series data points to produce a set of trained entrant forecasting models. The processor executes each trained entrant forecasting model to generate a set of forecasted values indicating estimations of the feature of the given entity. Thereafter the processor selects at least one forecasting model from the set of trained entrant forecasting models based on computed accuracy evaluations performed over the set of forecasted values.
    Type: Application
    Filed: February 14, 2022
    Publication date: November 10, 2022
    Applicant: DataRobot, Inc.
    Inventors: John Bledsoe, Jeff Gabriel, Jason Montgomery, Ryan Sevey, Matt Steinpreis, Craig Vermeer, Ryan West
  • Publication number: 20220335030
    Abstract: Cache optimization for data preparation includes: generating a data traversal program that represents a result of a set of sequenced data preparation operations performed on one or more sets of data, wherein the data traversal program indicates how to assemble one or more affected columns in the one or more sets of data to derive the result; in response to receiving a specification of the set of sequenced operations to be performed on the one or more sets of data, accessing the data traversal program that represents the result or a stored copy of the data traversal program that represents the result; assembling the one or more affected columns in the one or more sets of data according to the data traversal program to re-generate the result; and outputting the result.
    Type: Application
    Filed: July 1, 2022
    Publication date: October 20, 2022
    Applicant: DataRobot, Inc.
    Inventors: Dave Brewster, Victor Tso
  • Patent number: 11461304
    Abstract: Signature-based cache optimization for data preparation includes: performing a first set of sequenced data preparation operations on one or more sets of data to generate a plurality of transformation results; caching one or more of the plurality of transformation results and one or more corresponding operation signatures, a cached operation signature being derived based at least in part on a subset of sequenced operations that generated a corresponding result; receiving a specification of a second set of sequenced operations; determining an operation signature associated with the second set of sequenced operations; identifying a cached result among the cached results based at least in part on the determined operation signature; and outputting the cached result.
    Type: Grant
    Filed: March 10, 2020
    Date of Patent: October 4, 2022
    Assignee: DataRobot, Inc.
    Inventors: Dave Brewster, Victor Tze-Yeuan Tso
  • Publication number: 20220284183
    Abstract: A step editor for data preparation can instruct a user interface to present a first plurality of operations to be applied in a sequential order to one or more sets of data, receive user inputs including at least one indication to mute at least one operation of the first plurality of operations to prevent the processors from performing the at least one operation, generate a second plurality of operations, the second plurality of operations to be applied in a sequential order to the sets of data and comprising the first plurality of operations excluding the operation muted by the user inputs, obtain a cached data traversal program associated with the second plurality of operations and comprising a representation of a result of transforming the sets of data, and instruct the user interface to present output based at least in part on execution of the cached data traversal program.
    Type: Application
    Filed: March 25, 2022
    Publication date: September 8, 2022
    Applicant: DataRobot, Inc.
    Inventors: Nenshad Bardoliwalla, Michael Matthews, Ian Timourian, Jing Chen, Lilia Gutnik, Whitman Kwok, Dave Brewster, Victor Tze-Yeuan Tso
  • Publication number: 20220237516
    Abstract: Data modeling systems and methods are described. A data modeling method may include receiving user input specifying a structure of at least a portion of a data model and a complexity value associated with the structure; (a) generating one or more data models; (b) determining complexity scores for the respective data models; (c) for each of the data models: determining whether to select the respective data model for evaluation based, at least in part, on the complexity score of the respective data model, and if the respective data model is selected for evaluation, evaluating an accuracy of the respective data model for one or more data sets; and repeating steps (a)-(c) until one or more specified termination criteria are satisfied, wherein a first of the generated data models includes the specified structure, and wherein the complexity score for the first data model is determined based, at least in part, on the complexity value associated with the structure.
    Type: Application
    Filed: April 6, 2022
    Publication date: July 28, 2022
    Applicant: DataRobot, Inc.
    Inventors: Michael Schmidt, Dylan Sherry, Hongmin Fan
  • Patent number: 11386075
    Abstract: Methods for detection of anomalous data samples from a plurality of data samples are provided. In some embodiments, an anomaly detection procedure that includes a plurality of tasks is executed to identify the anomalous data samples from the plurality of data samples.
    Type: Grant
    Filed: November 6, 2020
    Date of Patent: July 12, 2022
    Assignee: DataRobot, Inc.
    Inventors: Amanda Claire Schierz, Jeremy Achin, Zachary Albert Mayer
  • Publication number: 20220199266
    Abstract: Systems and methods of epidemiological modeling using machine learning are provided, and can include receiving values for an occurrence of the infectious disease during a first time period, generating, from a model trained by a machine learning system, predictions for the occurrence of the infectious disease over a second time period, performing, by a simulator using the predictions, one or more simulations of the occurrence of the infectious disease in one or more geographic regions during one or more time periods subsequent to the second time period, and providing, to a user interface, a first simulation of the one or more simulations performed by the simulator for a first geographic region of the one or more geographic regions during a time period of the one or more time periods.
    Type: Application
    Filed: December 9, 2021
    Publication date: June 23, 2022
    Applicant: DataRobot, Inc.
    Inventors: Jeremy Achin, Earl Jared Shamwell, Michael Schmidt, Mackenzie Heiser, Patrick Farrell, Matt Nitzken, Jared Bowns, Nathan Cameron, Adam Beairsto, Jay Schuren, Mohak Saxena