Patents by Inventor Horst Cornelius Samulowitz

Horst Cornelius Samulowitz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240144084
    Abstract: A method of data augmentation includes receiving, by a processor, a set of data including a plurality of variables, mapping each variable to one or more target concepts associated with a name of each variable, and acquiring a set of semantic transforms, each semantic transform including a function applied to one or more concepts mapped to a respective variable. The method also includes comparing the one or more target concepts to the one or more concepts of each semantic transform, selecting at least one semantic transform based on the comparing, generating an expression for each selected semantic transform, each expression configured to apply a function of a selected semantic transform to at least one of the plurality of variables, and augmenting the set of data for use in an application by adding each expression to the set of data.
    Type: Application
    Filed: November 2, 2022
    Publication date: May 2, 2024
    Inventors: Horst Cornelius Samulowitz, Udayan Khurana, Kavitha Srinivas, TAKAAKI TATEISHI, IBRAHIM ABDELAZIZ, Julian Timothy Dolby
  • Publication number: 20240144026
    Abstract: A computer-implemented method, according to one approach, includes issuing a hyperparameter optimization (HPO) query to a plurality of computing devices. HPO results are received from the plurality of computing devices, and the HPO results include a set of hyperparameter (HP)/rank value pairs. The method further includes computing, based on the set of HP/rank value pairs, a global set of HPs from the HPO results for federated learning (FL) training. An indication of the global set of HPs is output to the plurality of computing devices. A computer program product, according to another approach, includes a computer readable storage medium having program instructions embodied therewith. The program instructions are readable and/or executable by a computer to cause the computer to perform the foregoing method.
    Type: Application
    Filed: February 28, 2023
    Publication date: May 2, 2024
    Inventors: Yi Zhou, Parikshit Ram, Theodoros Salonidis, Nathalie Baracaldo Angel, Horst Cornelius Samulowitz, Heiko H. Ludwig
  • Patent number: 11966340
    Abstract: To automate time series forecasting machine learning pipeline generation, a data allocation size of time series data may be determined based on one or more characteristics of a time series data set. The time series data may be allocated for use by candidate machine learning pipelines based on the data allocation size. Features for the time series data may be determined and cached by the candidate machine learning pipelines. Predictions of each of the candidate machine learning pipelines using at least the one or more features may be evaluated. A ranked list of machine learning pipelines may be automatically generated from the candidate machine learning pipelines for time series forecasting based upon evaluating predictions of each of the one or more candidate machine learning pipelines.
    Type: Grant
    Filed: March 15, 2022
    Date of Patent: April 23, 2024
    Assignee: International Business Machines Corporation
    Inventors: Long Vu, Bei Chen, Xuan-Hong Dang, Peter Daniel Kirchner, Syed Yousaf Shah, Dhavalkumar C. Patel, Si Er Han, Ji Hui Yang, Jun Wang, Jing James Xu, Dakuo Wang, Gregory Bramble, Horst Cornelius Samulowitz, Saket K. Sathe, Wesley M. Gifford, Petros Zerfos
  • Patent number: 11954424
    Abstract: A processor may receive structured data. The structured data may include one or more columns and associated column names. The processor may analyze the structured data. Analyzing the structured data may include gathering a requisite set of keywords from the associated column names across all columns and/or a sample of column cells. The processor may access a corpus of documents. Each of the documents in the corpus may be associated with a respective keyword. The processor may search the corpus of documents based on the requisite set of keywords. The processor may summarize one or more documents associated with the requisite set of keywords.
    Type: Grant
    Filed: May 2, 2022
    Date of Patent: April 9, 2024
    Assignee: International Business Machines Corporation
    Inventors: Horst Cornelius Samulowitz, Kavitha Srinivas
  • Patent number: 11941541
    Abstract: Methods, computer program products and/or systems are provided that perform the following operations: obtaining a performance matrix representing accuracies obtained by executing a plurality of pipelines on a plurality of training data sets, wherein a pipeline comprises a series of operations performed on a data set; selecting a defined number of top pipelines as potential pipelines for a testing data set based, at least in part, on a similarity between the testing data set and each of the plurality of training data sets represented in the performance matrix; storing results from executing each of the potential pipelines as a new data set; determining a pipeline accuracy for each of the potential pipelines when executed against the testing data set; and providing a recommended pipeline for use with the testing data set based, at least in part, on the pipeline accuracy for each potential pipeline.
    Type: Grant
    Filed: August 10, 2020
    Date of Patent: March 26, 2024
    Assignee: International Business Machines Corporation
    Inventors: Saket Sathe, Gregory Bramble, Horst Cornelius Samulowitz, Charu C. Aggarwal
  • Publication number: 20240069873
    Abstract: Techniques for computer software code analysis are disclosed. One or more data flows are generated, based on analyzing software code using static analysis. A data object is identified in the software code using the one or more data flows, the data object relating to a structured dataset. A correspondence between a code expression in the software code and a characteristic of the structured dataset is identified, based on analyzing one or more reads from and one or more writes to the data object using the one or more data flows. The code expression for the structured dataset is analyzed, based on the correspondence, including at least one of: (i) generating a software code recommendation engine based on the code expression and the structured dataset, or (ii) generating one or more lambda expressions for application to the structured dataset, based on the code expression.
    Type: Application
    Filed: August 25, 2022
    Publication date: February 29, 2024
    Inventors: Julian Timothy DOLBY, Horst Cornelius SAMULOWITZ, Kavitha SRINIVAS
  • Patent number: 11868230
    Abstract: Computer hardware and/or software that performs the following operations: (i) assessing a performance of a plurality of unsupervised machine learning pipelines against a plurality of data sets; (ii) associating the performance with meta-features corresponding to respective pipeline/data set combinations; (iii) training a supervised meta-learning model using the associated performance and meta-features as training data; and (iv) utilizing the trained model to identify one or more pipelines for processing an input data set.
    Type: Grant
    Filed: March 11, 2022
    Date of Patent: January 9, 2024
    Assignee: International Business Machines Corporation
    Inventors: Saket K. Sathe, Long Vu, Peter Daniel Kirchner, Horst Cornelius Samulowitz
  • Patent number: 11861469
    Abstract: An embodiment of the invention may include a method, computer program product, and system for creating a data analysis tool. The method may include a computing device that generates an AI pipeline based on an input dataset, wherein the AI pipeline is generated using an Automated Machine Learning program. The method may include converting the AI pipeline to a non-native format of the Automated Machine Learning program. This may enable the AI pipeline to be used outside of the Automated Machine Learning program, thereby increasing the usefulness of the created program by not tying it to the Automated Machine Learning program. Additionally, this may increase the efficiency of running the AI pipeline by eliminating unnecessary computations performed by the Automated Machine Learning program.
    Type: Grant
    Filed: July 2, 2020
    Date of Patent: January 2, 2024
    Assignee: International Business Machines Corporation
    Inventors: Peter Daniel Kirchner, Gregory Bramble, Horst Cornelius Samulowitz, Dakuo Wang, Arunima Chaudhary, Gregory Filla
  • Patent number: 11816127
    Abstract: A quality determination method, system, and computer program product that includes performing a dimensionality reduction on a high-dimensional dataset to form a dimensional-reduced dataset and determining, using a machine learning tool executed on a computing device, a quality of the dimensional-reduced dataset via a review of an extracted feature extracted from the dimensional-reduced dataset.
    Type: Grant
    Filed: February 26, 2021
    Date of Patent: November 14, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Petr Novotny, Aindrila Basak, Shaikh Shahriar Quader, Horst Cornelius Samulowitz, Chad Marston
  • Publication number: 20230351101
    Abstract: A processor may receive structured data. The structured data may include one or more columns and associated column names. The processor may analyze the structured data. Analyzing the structured data may include gathering a requisite set of keywords from the associated column names across all columns and/or a sample of column cells. The processor may access a corpus of documents. Each of the documents in the corpus may be associated with a respective keyword. The processor may search the corpus of documents based on the requisite set of keywords. The processor may summarize one or more documents associated with the requisite set of keywords.
    Type: Application
    Filed: May 2, 2022
    Publication date: November 2, 2023
    Inventors: Horst Cornelius Samulowitz, Kavitha Srinivas
  • Publication number: 20230289277
    Abstract: Computer hardware and/or software that performs the following operations: (i) assessing a performance of a plurality of unsupervised machine learning pipelines against a plurality of data sets; (ii) associating the performance with meta-features corresponding to respective pipeline/data set combinations; (iii) training a supervised meta-learning model using the associated performance and meta-features as training data; and (iv) utilizing the trained model to identify one or more pipelines for processing an input data set.
    Type: Application
    Filed: March 11, 2022
    Publication date: September 14, 2023
    Inventors: Saket K. Sathe, Long VU, Peter Daniel Kirchner, Horst Cornelius Samulowitz
  • Patent number: 11681931
    Abstract: A system that provides a mathematical formulation for new problem of model validation and model selection in presence of test data feedback. The system comprises a memory that stores computer-executable components. A processor, operably coupled to the memory, executes the computer-executable components stored in the memory. A selection component selects a metric of performance evaluation accuracy; and a configuration component configures performance evaluation schemes for machine learning algorithms. A characterization component employs a supervised learning-based approach to characterize relationship between the configuration of the performance evaluation scheme and fidelity of performance estimates; and an optimization component that optimizes accuracy of the machine learning algorithms as a function of size of training data set relative to size of validation data set through selection of values associated with the configuration parameters.
    Type: Grant
    Filed: September 24, 2019
    Date of Patent: June 20, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bo Zhang, Gregory Bramble, Parikshit Ram, Horst Cornelius Samulowitz
  • Publication number: 20230186168
    Abstract: A computer-implemented method according to one embodiment includes issuing a hyperparameter optimization (HPO) query to a plurality of computing devices; receiving HPO results from each of the plurality of computing devices; generating a unified performance metric surface utilizing the HPO results from each of the plurality of computing devices; and determining optimal global hyperparameters, utilizing the unified performance metric surface.
    Type: Application
    Filed: December 9, 2021
    Publication date: June 15, 2023
    Inventors: Yi Zhou, Parikshit Ram, Nathalie Baracaldo Angel, Theodoros Salonidis, Horst Cornelius Samulowitz, Martin Wistuba, Heiko H. Ludwig
  • Publication number: 20230177032
    Abstract: A computer-implemented method according to one embodiment includes identifying a data set and meta information; and augmenting the data set with additional features in response to an automatic analysis of the data set in view of the meta information.
    Type: Application
    Filed: December 8, 2021
    Publication date: June 8, 2023
    Inventors: Daniel Karl I. Weidele, Lisa Amini, Udayan Khurana, Kavitha Srinivas, Horst Cornelius Samulowitz, Takaaki Tateishi, Carolina Maria Spina, Dakuo Wang, Abel Valente, Arunima Chaudhary, Toshihiro Takahashi
  • Patent number: 11663251
    Abstract: A method, system, and computer program product are disclosed. The method includes extracting at least one identifier from a formula in a document and extracting text passages in the document that contain the identifier(s). The method also includes selecting an identifier and extracted text passages containing the identifier, as well as generating identifier-passage pairs for the selected text passages and the identifier. Further, the method includes submitting the identifier-passage pairs to a question answering (QA) model, which generates candidate answers from the selected text passages. A definition of the identifier is then selected from the candidate answers.
    Type: Grant
    Filed: September 8, 2021
    Date of Patent: May 30, 2023
    Assignee: International Business Machines Corporation
    Inventors: William Karol Lynch, Kavitha Srinivas, Horst Cornelius Samulowitz, Fabio Lorenzi
  • Patent number: 11620582
    Abstract: Techniques regarding one or more automated machine learning processes that analyze time series data are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a time series analysis component that selects a machine learning pipeline for meta transfer learning on time series data by sequentially allocating subsets of training data from the time series data amongst a plurality of machine learning pipeline candidates.
    Type: Grant
    Filed: July 29, 2020
    Date of Patent: April 4, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bei Chen, Long Vu, Syed Yousaf Shah, Xuan-Hong Dang, Peter Daniel Kirchner, Si Er Han, Ji Hui Yang, Jun Wang, Jing James Xu, Dakuo Wang, Dhavalkumar C. Patel, Gregory Bramble, Horst Cornelius Samulowitz, Saket Sathe, Chuang Gan
  • Publication number: 20230076089
    Abstract: A method, system, and computer program product are disclosed. The method includes extracting at least one identifier from a formula in a document and extracting text passages in the document that contain the identifier(s). The method also includes selecting an identifier and extracted text passages containing the identifier, as well as generating identifier-passage pairs for the selected text passages and the identifier. Further, the method includes submitting the identifier-passage pairs to a question answering (QA) model, which generates candidate answers from the selected text passages. A definition of the identifier is then selected from the candidate answers.
    Type: Application
    Filed: September 8, 2021
    Publication date: March 9, 2023
    Inventors: William Karol Lynch, Kavitha Srinivas, Horst Cornelius Samulowitz, FABIO LORENZI
  • Patent number: 11599826
    Abstract: Embodiments relate to a system, program product, and method for employing feature engineering to improve classifier performance. A first machine learning (ML) model with a first learning program is selected. The first selected ML model is operatively associated with a first structured dataset. First features in the first dataset directed at performance of the selected ML model are identified. A second structured dataset is assessed with respect to the identified features in the first dataset, and new features in the second dataset are identified, where the new features are semantically related to the identified features in the first dataset. The first dataset is dynamically augmented with the identified new features in the second dataset. The dynamically augmented first dataset is applied to the selected ML model to subject an embedded learning algorithm of the selected ML model to training using the augmented first dataset.
    Type: Grant
    Filed: January 13, 2020
    Date of Patent: March 7, 2023
    Assignee: International Business Machines Corporation
    Inventors: Udayan Khurana, Sainyam Galhotra, Oktie Hassanzadeh, Kavitha Srinivas, Horst Cornelius Samulowitz
  • Publication number: 20220366269
    Abstract: A dataset including features and values associated with the features can be received. Each of the features in the dataset can be mapped to a corresponding node in a knowledge graph based on the concept represented by the corresponding node. The knowledge graph can be traversed to find a candidate node connected to at least one mapped node, the candidate node not being mapped to a feature in the dataset. A concept associated with the candidate node can be identified as a new feature. A machine learning model pipeline can use the features in the dataset and the new feature to select a subset of features for training a machine learning model.
    Type: Application
    Filed: May 11, 2021
    Publication date: November 17, 2022
    Inventors: Dakuo Wang, Udayan Khurana, Daniel Karl I. Weidele, Arunima Chaudhary, Carolina Maria Spina, Abel Valente, Chuang Gan, Horst Cornelius Samulowitz, Lisa Amini
  • Publication number: 20220327058
    Abstract: To automate time series forecasting machine learning pipeline generation, a data allocation size of time series data may be determined based on one or more characteristics of a time series data set. The time series data may be allocated for use by candidate machine learning pipelines based on the data allocation size. Features for the time series data may be determined and cached by the candidate machine learning pipelines. Predictions of each of the candidate machine learning pipelines using at least the one or more features may be evaluated. A ranked list of machine learning pipelines may be automatically generated from the candidate machine learning pipelines for time series forecasting based upon evaluating predictions of each of the one or more candidate machine learning pipelines.
    Type: Application
    Filed: March 15, 2022
    Publication date: October 13, 2022
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Long VU, Bei CHEN, Xuan-Hong DANG, Peter Daniel KIRCHNER, Syed Yousaf SHAH, Dhavalkumar C. PATEL, Si Er HAN, Ji Hui YANG, Jun WANG, Jing James XU, Dakuo WANG, Gregory BRAMBLE, Horst Cornelius SAMULOWITZ, Saket K. SATHE, Wesley M. GIFFORD, Petros ZERFOS