Patents by Inventor Shaikh Shahriar Quader

Shaikh Shahriar Quader has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

NATURAL LANGUAGE QUERY PROCESSING BASED ON MACHINE LEARNING TO PERFORM A TASK

Publication number: 20240112074

Abstract: An embodiment of the present invention extracts information from a natural language query requesting performance of a task. A machine learning model determines a task that corresponds to the task requested by the natural language query based on the extracted information. A query is generated for retrieving data from a plurality of different data sources based on the extracted information. The data for the determined task is retrieved from the plurality of different data sources based on the generated query. The determined task is performed using the retrieved data. Present invention embodiments include a method, system, and computer program product for processing a natural language query in substantially the same manner described above.

Type: Application

Filed: September 30, 2022

Publication date: April 4, 2024

Inventors: Bryson Chisholm, Shikhar Kwatra, Shaikh Shahriar Quader, Ayesha Bhangu, Jack Zhang, Shabana Dhayananth, Tarandeep kaur Randhawa
SELECTING A HIGH COVERAGE DATASET

Publication number: 20240070522

Abstract: Providing a representative dataset from an initial dataset by accessing a dataset associated with a machine learning model, receiving input parameters associated with the representative dataset selection, the input parameters including an evaluation metric, determining a density of a plurality of datapoints associated with the dataset, training a first iteration of a machine learning model using a first data point selected according to the density, determining a first value of the evaluation metric for the first iteration of the machine learning model, generating a representative subset based on the first value of the evaluation metric value, and providing the representative dataset and a final machine learning model trained using the representative dataset.

Type: Application

Filed: August 23, 2022

Publication date: February 29, 2024

Inventors: Shaikh Shahriar Quader, Aindrila Basak, Adrian Mahjour, Petr Novotny, CARLO APPUGLIESE, Berthold Reinwald, Dheeraj Arremsetty
Data-analysis-based, noisy labeled and unlabeled datapoint detection and rectification for machine-learning

Patent number: 11853908

Abstract: Noisy labeled and unlabeled datapoint detection and rectification in a training dataset for machine-learning is facilitated by a processor(s) obtaining a training dataset for use in training a machine-learning model. The processor(s) applies ensemble machine-learning and a generative model to the training dataset to detect noisy labeled datapoints in the training dataset, and create a clean dataset with preliminary labels added for any unlabeled datapoints in the training dataset. Data-driven active learning and the clean dataset are used by the processor(s) to facilitate generating an active-learned dataset with true labels added for one or more selected datapoints of a datapoint pool including the detected noisy labeled datapoints and the unlabeled datapoints of the training dataset. The machine-learning model is trained by the processor(s) using, at least in part, the clean dataset and the active-learned dataset.

Type: Grant

Filed: May 13, 2020

Date of Patent: December 26, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Shaikh Shahriar Quader, Mona Nashaat Ali Elmowafy, Darrell Christopher Reimer
Quality assessment of extracted features from high-dimensional machine learning datasets

Patent number: 11816127

Abstract: A quality determination method, system, and computer program product that includes performing a dimensionality reduction on a high-dimensional dataset to form a dimensional-reduced dataset and determining, using a machine learning tool executed on a computing device, a quality of the dimensional-reduced dataset via a review of an extracted feature extracted from the dimensional-reduced dataset.

Type: Grant

Filed: February 26, 2021

Date of Patent: November 14, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Petr Novotny, Aindrila Basak, Shaikh Shahriar Quader, Horst Cornelius Samulowitz, Chad Marston
ERRONEOUS CELL DETECTION USING AN ARTIFICIAL INTELLIGENCE MODEL

Publication number: 20230153566

Abstract: Classification of cell data includes obtaining a target dataset and an artificial intelligence (AI) model trained to identify relationship(s) between cells of a row and classify whether a focus cell of the row is erroneous based on the identified relationship(s), and applying the AI model to the target dataset to identify erroneous cell(s) thereof. The applying includes selecting a row of cells of the target dataset, inputting the selected row of cells to the AI model with an identification of a focus cell, the focus cell to be classified by the AI model, classifying the focus cell to obtain a classification of the focus cell, the classifying identifying whether the focus cell is erroneous, and outputting an indication of the classification of the focus cell.

Type: Application

Filed: November 18, 2021

Publication date: May 18, 2023

Inventors: Shaikh Shahriar Quader, Omar Al-Shamali, James Miller, Yannick Saillet, Albert Maier, Remus Lazar
CLASSIFICATION OF ERRONEOUS CELL DATA

Publication number: 20230078134

Abstract: Classification of erroneous cell data includes using at least one transformation function, the at least one transformation function determined based on correlations of observed cell data to correct call data, to automatically generate training examples that correlate erroneous data values to correct data values as informed by the at least one transformation function; augmenting an initial training set of labeled training examples with the generated training examples to produce an augmented training set; and training a machine learning model using the augmented training set to classify observed cell data based on a comparison between the observed cell data and data that the machine learning model predicts.

Type: Application

Filed: November 7, 2022

Publication date: March 16, 2023

Inventors: Shaikh Shahriar QUADER, Piotr MIERZEJEWSKI, Mona Nashaat Ali ELMOWAFY
Classification of erroneous cell data

Patent number: 11574250

Abstract: Classification of erroneous cell data includes performing unsupervised pre-training of a machine learning model to learn a bidirectional encoder representation of data cells, obtaining an initial training set, with labeled training examples that correlate observed cell data to correct cell data, for training the machine learning model to classify cell data, automatically augmenting the initial training set to produce an augmented training set, where the augmenting includes identifying patterns in the labeled training examples, generating transformation functions, and using the transformation functions, learning an augmentation strategy and automatically generating additional training examples correlating erroneous data values to correct data values, and training the machine learning model using the augmented training set.

Type: Grant

Filed: August 12, 2020

Date of Patent: February 7, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Shaikh Shahriar Quader, Piotr Mierzejewski, Mona Nashaat Ali Elmowafy
Learning-based workload resource optimization for database management systems

Patent number: 11500830

Abstract: A DBMS training subsystem trains a DBMS workload-manager model with training data identifying resources used to execute previous DBMS data-access requests. The subsystem integrates each request's high-level features and compile-time operations into a vector and clusters similar vectors into templates. The requests are divided into workloads each represented by a training histogram that describes the distribution of templates associated with the workload and identifies the total amounts and types of resources consumed when executing the entire workload.

Type: Grant

Filed: October 15, 2020

Date of Patent: November 15, 2022

Assignee: International Business Machines Corporation

Inventors: Shaikh Shahriar Quader, Nicolas Andres Jaramillo Duran, Sumona Mukhopadhyay, Emmanouil Papangelis, Marin Litoiu, David Kalmuk, Piotr Mierzejewski
QUALITY ASSESSMENT OF EXTRACTED FEATURES FROM HIGH-DIMENSIONAL MACHINE LEARNING DATASETS

Publication number: 20220292107

Abstract: A quality determination method, system, and computer program product that includes performing a dimensionality reduction on a high-dimensional dataset to form a dimensional-reduced dataset and determining, using a machine learning tool executed on a computing device, a quality of the dimensional-reduced dataset via a review of an extracted feature extracted from the dimensional-reduced dataset.

Type: Application

Filed: February 26, 2021

Publication date: September 15, 2022

Inventors: Petr Novotny, Aindrila Basak, Shaikh Shahriar Quader, Horst Cornelius Samulowitz, Chad Marston
MACHINE LEARNING MODEL DEPLOYMENT WITHIN A DATABASE MANAGEMENT SYSTEM

Publication number: 20220237503

Abstract: Model data comprising a model object and model metadata is extracted from a trained model. The model data is integrated within a function executable from within a database system environment. The integrated function is deployed within the database system environment, the deploying activating the trained model for execution within the database system environment.

Type: Application

Filed: January 26, 2021

Publication date: July 28, 2022

Applicant: International Business Machines Corporation

Inventors: CARLO APPUGLIESE, Dheeraj Arremsetty, Ravikumar Govindan, Rakshith Dasenahalli Lingaraju, Timothy Thomas Bohn, Shaikh Shahriar Quader, Carmen-Gabriela Stefanita, Ingo Schuster
LEARNING-BASED WORKLOAD RESOURCE OPTIMIZATION FOR DATABASE MANAGEMENT SYSTEMS

Publication number: 20220121633

Abstract: A DBMS training subsystem trains a DBMS workload-manager model with training data identifying resources used to execute previous DBMS data-access requests. The subsystem integrates each request's high-level features and compile-time operations into a vector and clusters similar vectors into templates. The requests are divided into workloads each represented by a training histogram that describes the distribution of templates associated with the workload and identifies the total amounts and types of resources consumed when executing the entire workload.

Type: Application

Filed: October 15, 2020

Publication date: April 21, 2022

Inventors: Shaikh Shahriar Quader, Nicolas Andres Jaramillo Duran, Sumona Mukhopadhyay, Emmanouil Papangelis, Marin Litoiu, David Kalmuk, Piotr Mierzejewski
PROCESSING LARGE MACHINE LEARNING DATASETS

Publication number: 20220075761

Abstract: Embodiments of the present invention provide methods, computer program products, and systems. Embodiments of the present invention can receive, by a computing device, a request to access a datapoint of a machine learning dataset contained in a database. Embodiments of the present invention can access, by the computing device, a virtual data frame that includes a schema which represents a structure of the machine learning dataset in the database. Embodiments of the present invention can retrieve, by the computing device, the datapoint of the machine learning utilizing the virtual data frame and return, by the computing device, the retrieved datapoint in response to the request.

Type: Application

Filed: September 8, 2020

Publication date: March 10, 2022

Inventors: Petr Novotny, Hong Min, Shaikh Shahriar Quader
CLASSIFICATION OF ERRONEOUS CELL DATA

Publication number: 20220051126

Abstract: Classification of erroneous cell data includes performing unsupervised pre-training of a machine learning model to learn a bidirectional encoder representation of data cells, obtaining an initial training set, with labeled training examples that correlate observed cell data to correct cell data, for training the machine learning model to classify cell data, automatically augmenting the initial training set to produce an augmented training set, where the augmenting includes identifying patterns in the labeled training examples, generating transformation functions, and using the transformation functions, learning an augmentation strategy and automatically generating additional training examples correlating erroneous data values to correct data values, and training the machine learning model using the augmented training set.

Type: Application

Filed: August 12, 2020

Publication date: February 17, 2022

Inventors: Shaikh Shahriar QUADER, Piotr MIERZEJEWSKI, Mona Nashaat Ali ELMOWAFY
DATA-ANALYSIS-BASED, NOISY LABELED AND UNLABELED DATAPOINT DETECTION AND RECTIFICATION FOR MACHINE-LEARNING

Publication number: 20210357776

Abstract: Noisy labeled and unlabeled datapoint detection and rectification in a training dataset for machine-learning is facilitated by a processor(s) obtaining a training dataset for use in training a machine-learning model. The processor(s) applies ensemble machine-learning and a generative model to the training dataset to detect noisy labeled datapoints in the training dataset, and create a clean dataset with preliminary labels added for any unlabeled datapoints in the training dataset. Data-driven active learning and the clean dataset are used by the processor(s) to facilitate generating an active-learned dataset with true labels added for one or more selected datapoints of a datapoint pool including the detected noisy labeled datapoints and the unlabeled datapoints of the training dataset. The machine-learning model is trained by the processor(s) using, at least in part, the clean dataset and the active-learned dataset.

Type: Application

Filed: May 13, 2020

Publication date: November 18, 2021

Inventors: Shaikh Shahriar QUADER, Mona Nashaat Ali ELMOWAFY, Darrell Christopher REIMER
LABELING DATA USING AUTOMATED WEAK SUPERVISION

Publication number: 20210209412

Abstract: A computer-implemented method includes: receiving, by a computing device, data comprising a labeled dataset and an unlabeled dataset; generating, by the computing device, a set of heuristics using the labeled dataset; generating, by the computing device, a vector of initial labels by labeling each point in the unlabeled dataset using the set of heuristics; generating, by the computing device, a refined set of heuristics using data-driven active learning; generating, by the computing device, a vector of training labels by automatically labeling each point in the unlabeled dataset using the refined set of heuristics; and outputting, by the computing device, the vector of training labels to a client device or a data repository.

Type: Application

Filed: January 2, 2020

Publication date: July 8, 2021

Inventors: Shaikh Shahriar Quader, Jean-François Puget, Mona Nashaat Ali Elmowafy