Patents by Inventor Samiulla Zakir Hussain Shaikh

Samiulla Zakir Hussain Shaikh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Generation of causal explanations for text models

Patent number: 12099805

Abstract: One embodiment provides a method, comprising: receiving an input sentence for a classification by a machine-learning model, where the classification is based upon a sentiment of the input sentence; splitting the input sentence into a plurality of tokens, each of the plurality of tokens corresponding to a term within the input sentence; creating a causal subgraph from the plurality of tokens, wherein the creating is based upon a causal relationship identified between tokens of the plurality of tokens; identifying, using the causal subgraph, tokens of the plurality of tokens influencing the classification; and generating, based upon the tokens of the plurality of tokens, a causal explanation for the classification, wherein the causal explanation identifies at least one portion of the input sentence resulting in the classification.

Type: Grant

Filed: November 19, 2021

Date of Patent: September 24, 2024

Assignee: International Business Machines Corporation

Inventors: Naveen Panwar, Deepak Vijaykeerthy, Nishtha Madaan, Samiulla Zakir Hussain Shaikh, Diptikalyan Saha
Scalable identification of duplicate datasets in heterogeneous datasets

Patent number: 11886385

Abstract: An embodiment for identifying and sorting duplicate datasets within a large pool of heterogeneous datasets may include received a plurality of heterogeneous datasets. The embodiment may automatically compare schema information and metadata within each of the received plurality of heterogeneous datasets to generate name-based similarity scores for each dataset. The embodiment may also automatically compare data distribution information within each of the received plurality of heterogeneous datasets to generate a plurality of data distribution similarity scores for each heterogeneous dataset. The embodiment may further include automatically calculating an overall distance metric using the name-based similarity scores and plurality of data distribution similarity scores. The embodiment may also include based on the calculate overall distance metric, automatically generating distance graphs that identifying clusters of similar datasets and illustrate inferred lineage for the clusters of similar datasets.

Type: Grant

Filed: June 2, 2022

Date of Patent: January 30, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Praduemn K. Goyal, Sandeep Hans, Samiulla Zakir Hussain Shaikh, Diptikalyan Saha
SCALABLE IDENTIFICATION OF DUPLICATE DATASETS IN HETEROGENEOUS DATASETS

Publication number: 20230394011

Abstract: An embodiment for identifying and sorting duplicate datasets within a large pool of heterogeneous datasets may include received a plurality of heterogeneous datasets. The embodiment may automatically compare schema information and metadata within each of the received plurality of heterogeneous datasets to generate name-based similarity scores for each dataset. The embodiment may also automatically compare data distribution information within each of the received plurality of heterogeneous datasets to generate a plurality of data distribution similarity scores for each heterogeneous dataset. The embodiment may further include automatically calculating an overall distance metric using the name-based similarity scores and plurality of data distribution similarity scores. The embodiment may also include based on the calculate overall distance metric, automatically generating distance graphs that identifying clusters of similar datasets and illustrate inferred lineage for the clusters of similar datasets.

Type: Application

Filed: June 2, 2022

Publication date: December 7, 2023

Inventors: Praduemn K. Goyal, Sandeep Hans, Samiulla Zakir Hussain Shaikh, Diptikalyan Saha
TESTING MODELS IN DATA PIPELINE

Publication number: 20230068513

Abstract: Embodiments of the present invention provide computer-implemented methods, computer program products and computer systems. Embodiments of the present invention can, in response to receiving information, generate a data profile for a model that includes metadata for data requirements, model specific requirements, and data quality metrics. Embodiments of the present invention can generate one or more perturbations for training data associated with the received information and validate at least one perturbation of the one or more perturbations of training data as relevant test data based, at least in part on context associated with the model. Embodiments of the present invention can then generate one or more test scenarios based on the at least one validated perturbation and varying hyperparameters of the model and generate a test report based on an execution of at least one generated test scenario of the generated one or more test scenarios.

Type: Application

Filed: September 1, 2021

Publication date: March 2, 2023

Inventors: Sattwati Kundu, Samiulla Zakir Hussain Shaikh
Bias detection for unstructured text

Patent number: 11551102

Abstract: One embodiment provides a method, including: receiving a target unstructured document for determining whether the target unstructured document comprises biased information; identifying an objective of the target unstructured document by extracting, from the target unstructured document, (i) entities and (ii) relationships between the entities; creating a structured knowledge base, wherein the creating comprises (i) creating an entry in the structured knowledge base corresponding to the target unstructured document, (ii) identifying other unstructured documents having a similarity to the target unstructured document, and (iii) generating an entry in the structured knowledge base corresponding to each of the other unstructured documents; applying a bias detection technique on the structured knowledge base; and providing an indication of whether the target unstructured document comprises bias.

Type: Grant

Filed: April 15, 2019

Date of Patent: January 10, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Pranay Kumar Lohia, Rajmohan Chandrahasan, Himanshu Gupta, Samiulla Zakir Hussain Shaikh, Sameep Mehta, Atul Kumar
Generating explanations for context aware sequence-to-sequence models

Patent number: 11521065

Abstract: Methods, systems, and computer program products for generating explanations for a semantic parser are provided herein. A computer-implemented method includes providing to a generative model (i) at least one query and (ii) a context of at least one dataset applicable to the at least one query, wherein the generative model generates a plurality of perturbations for the at least one input query based on the context; providing the plurality of perturbations as inputs to a context aware sequence-to-sequence model, thereby obtaining a plurality of outputs; and generating, for (i) an additional query provided as input to the context aware sequence-to-sequence model and (ii) a context applicable to the additional query, an explanation indicative of one or more parts of the additional query that contributes to an output corresponding to the additional query, based at least in part on the plurality of outputs corresponding to the perturbations.

Type: Grant

Filed: February 6, 2020

Date of Patent: December 6, 2022

Assignee: International Business Machines Corporation

Inventors: Rachamalla Anirudh Reddy, Pranay Kumar Lohia, Samiulla Zakir Hussain Shaikh, Diptikalyan Saha, Sameep Mehta
Trustworthiness of artificial intelligence models in presence of anomalous data

Patent number: 11455554

Abstract: Methods, systems, and computer program products for improving trustworthiness of artificial intelligence models in presence of anomalous data are provided herein. A method includes obtaining a machine learning model and a set of training data; determining one or more anomalous data points in said set of training data; for a given one of said anomalous data points, identifying attributes that decrease confidence with respect to at least one output of said machine learning model; determining that a root cause of said decreased confidence corresponds to one of: a class imbalance issue related to said at least one attribute, a confused class issue related to said at least one attribute, a low density issue related to said at least one attribute, and an adversarial issue related to said at least one attribute; and performing step(s) to improve said confidence based at least in part on said determined root cause.

Type: Grant

Filed: November 25, 2019

Date of Patent: September 27, 2022

Assignee: International Business Machines Corporation

Inventors: Pranay Kumar Lohia, Diptikalyan Saha, Aniya Aggarwal, Gagandeep Singh, Rema Ananthanarayanan, Samiulla Zakir Hussain Shaikh, Sandeep Hans
Domain aware explainable anomaly and drift detection for multi-variate raw data using a constraint repository

Patent number: 11321304

Abstract: Methods, systems, and computer program products for domain aware explainable anomaly and drift detection for multi-variate raw data using a constraint repository are provided herein. A computer-implemented method includes obtaining a set of data and information indicative of a domain of said set of data; obtaining constraints from a domain-indexed constraint repository based on said set of data and said information, wherein the domain-indexed constraint repository comprises a knowledge graph having a plurality of nodes, wherein each node comprises an attribute associated with at least one of a plurality of domains and constraints corresponding to the attribute; detecting anomalies in said set of data based on whether portions of said set of data violate said retrieved constraints; generating an explanation corresponding to each of the anomalies that describe the attributes corresponding to the violated constraints; and outputting an indication of the anomalies and the corresponding explanation.

Type: Grant

Filed: September 27, 2019

Date of Patent: May 3, 2022

Assignee: International Business Machines Corporation

Inventors: Sandeep Hans, Samiulla Zakir Hussain Shaikh, Rema Ananthanarayanan, Diptikalyan Saha, Aniya Aggarwal, Gagandeep Singh, Pranay Kumar Lohia, Manish Anand Bhide, Sameep Mehta
Model quality and related models using provenance data

Patent number: 11205138

Abstract: A method, computer system, and a computer program product for utilizing provenance data to improve machine learning is provided. Embodiments of the present invention may include collecting provenance data. Embodiments of the present invention may include identifying model quality improvements based on the collected provenance data. Embodiments of the present invention may include identifying related models based on the collected provenance data. Embodiments of the present invention may include recommending model quality improvements to a user.

Type: Grant

Filed: May 22, 2019

Date of Patent: December 21, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Samiulla Zakir Hussain Shaikh, Himanshu Gupta, Rajmohan Chandrahasan, Sameep Mehta, Manish Anand Bhide
Generating a framework for prioritizing machine learning model offerings via a platform

Patent number: 11157983

Abstract: Methods, systems, and computer program products for generating a framework for prioritizing machine learning model offerings via a platform are provided herein. A computer-implemented method includes processing, via a computing platform, a machine learning model input by a first user and metadata corresponding to the machine learning model input by the first user; automatically comparing, via the computing platform, the metadata corresponding to the machine learning model with metadata corresponding to one or more existing machine learning models stored by the computing platform; automatically calculating, via the computing platform, initial pricing information for the machine learning model based on the comparison; and outputting, via an interactive user interface of the computing platform, the machine learning model to one or more additional users for purchase in accordance with the calculated initial pricing information.

Type: Grant

Filed: July 8, 2019

Date of Patent: October 26, 2021

Assignee: International Business Machines Corporation

Inventors: Kalapriya Kannan, Samiulla Zakir Hussain Shaikh, Pranay Kumar Lohia, Vijay Arya, Sameep Mehta
GENERATING EXPLANATIONS FOR CONTEXT AWARE SEQUENCE-TO-SEQUENCE MODELS

Publication number: 20210248455

Abstract: Methods, systems, and computer program products for generating explanations for a semantic parser are provided herein. A computer-implemented method includes providing to a generative model (i) at least one query and (ii) a context of at least one dataset applicable to the at least one query, wherein the generative model generates a plurality of perturbations for the at least one input query based on the context; providing the plurality of perturbations as inputs to a context aware sequence-to-sequence model, thereby obtaining a plurality of outputs; and generating, for (i) an additional query provided as input to the context aware sequence-to-sequence model and (ii) a context applicable to the additional query, an explanation indicative of one or more parts of the additional query that contributes to an output corresponding to the additional query, based at least in part on the plurality of outputs corresponding to the perturbations.

Type: Application

Filed: February 6, 2020

Publication date: August 12, 2021

Inventors: Rachamalla Anirudh Reddy, Pranay Kumar Lohia, Samiulla Zakir Hussain Shaikh, Diptikalyan Saha, Sameep Mehta
TRUSTWORTHINESS OF ARTIFICIAL INTELLIGENCE MODELS IN PRESENCE OF ANOMALOUS DATA

Publication number: 20210158183

Abstract: Methods, systems, and computer program products for improving trustworthiness of artificial intelligence models in presence of anomalous data are provided herein. A method includes obtaining a machine learning model and a set of training data; determining one or more anomalous data points in said set of training data; for a given one of said anomalous data points, identifying attributes that decrease confidence with respect to at least one output of said machine learning model; determining that a root cause of said decreased confidence corresponds to one of: a class imbalance issue related to said at least one attribute, a confused class issue related to said at least one attribute, a low density issue related to said at least one attribute, and an adversarial issue related to said at least one attribute; and performing step(s) to improve said confidence based at least in part on said determined root cause.

Type: Application

Filed: November 25, 2019

Publication date: May 27, 2021

Inventors: Pranay Kumar Lohia, Diptikalyan Saha, Aniya Aggarwal, Gagandeep Singh, Rema Ananthanarayanan, Samiulla Zakir Hussain Shaikh, Sandeep Hans
DOMAIN AWARE EXPLAINABLE ANOMALY AND DRIFT DETECTION FOR MULTI-VARIATE RAW DATA USING A CONSTRAINT REPOSITORY

Publication number: 20210097052

Abstract: Methods, systems, and computer program products for domain aware explainable anomaly and drift detection for multi-variate raw data using a constraint repository are provided herein. A computer-implemented method includes obtaining a set of data and information indicative of a domain of said set of data; obtaining constraints from a domain-indexed constraint repository based on said set of data and said information, wherein the domain-indexed constraint repository comprises a knowledge graph having a plurality of nodes, wherein each node comprises an attribute associated with at least one of a plurality of domains and constraints corresponding to the attribute; detecting anomalies in said set of data based on whether portions of said set of data violate said retrieved constraints; generating an explanation corresponding to each of the anomalies that describe the attributes corresponding to the violated constraints; and outputting an indication of the anomalies and the corresponding explanation.

Type: Application

Filed: September 27, 2019

Publication date: April 1, 2021

Inventors: Sandeep Hans, Samiulla Zakir Hussain Shaikh, Rema Ananthanarayanan, Diptikalyan Saha, Aniya Aggarwal, Gagandeep Singh, Pranay Kumar Lohia, Manish Anand Bhide, Sameep Mehta
Generating a Framework for Prioritizing Machine Learning Model Offerings Via a Platform

Publication number: 20210012404

Abstract: Methods, systems, and computer program products for generating a framework for prioritizing machine learning model offerings via a platform are provided herein. A computer-implemented method includes processing, via a computing platform, a machine learning model input by a first user and metadata corresponding to the machine learning model input by the first user; automatically comparing, via the computing platform, the metadata corresponding to the machine learning model with metadata corresponding to one or more existing machine learning models stored by the computing platform; automatically calculating, via the computing platform, initial pricing information for the machine learning model based on the comparison; and outputting, via an interactive user interface of the computing platform, the machine learning model to one or more additional users for purchase in accordance with the calculated initial pricing information.

Type: Application

Filed: July 8, 2019

Publication date: January 14, 2021

Inventors: Kalapriya Kannan, Samiulla Zakir Hussain Shaikh, Pranay Kumar Lohia, Vijay Arya, Sameep Mehta
Transformations of a user-interface modality of an application

Patent number: 10884713

Abstract: Transforming a user-interface modality of a software application can include identifying a first workflow segment corresponding to a UI modality of an application developed to run on a predetermined data processing platform and selecting one or more other workflow segments to transform the UI modality of the application. Each other workflow segment performs on a different data processing platform a function comparable to a function performable by the first workflow segment. The one or more other workflow segments can be selected from a multi-member set of alternative workflow segments that are semantically similar to the first workflow segment. The selecting can be based on classifying the first workflow segment with a classification model trained using machine learning to map workflow segments and corresponding UI modalities to different processing platforms.

Type: Grant

Filed: February 25, 2019

Date of Patent: January 5, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Samiulla Zakir Hussain Shaikh, Vijay Ekambaram, Padmanabha Venkatagiri Seshadri, Shinoj Zacharias
MODEL QUALITY AND RELATED MODELS USING PROVENANCE DATA

Publication number: 20200372398

Abstract: A method, computer system, and a computer program product for utilizing provenance data to improve machine learning is provided. Embodiments of the present invention may include collecting provenance data. Embodiments of the present invention may include identifying model quality improvements based on the collected provenance data. Embodiments of the present invention may include identifying related models based on the collected provenance data. Embodiments of the present invention may include recommending model quality improvements to a user.

Type: Application

Filed: May 22, 2019

Publication date: November 26, 2020

Inventors: Samiulla Zakir Hussain Shaikh, HIMANSHU GUPTA, Rajmohan Chandrahasan, Sameep Mehta, Manish Anand Bhide
BIAS DETECTION FOR UNSTRUCTURED TEXT

Publication number: 20200327424

Abstract: One embodiment provides a method, including: receiving a target unstructured document for determining whether the target unstructured document comprises biased information; identifying an objective of the target unstructured document by extracting, from the target unstructured document, (i) entities and (ii) relationships between the entities; creating a structured knowledge base, wherein the creating comprises (i) creating an entry in the structured knowledge base corresponding to the target unstructured document, (ii) identifying other unstructured documents having a similarity to the target unstructured document, and (iii) generating an entry in the structured knowledge base corresponding to each of the other unstructured documents; applying a bias detection technique on the structured knowledge base; and providing an indication of whether the target unstructured document comprises bias.

Type: Application

Filed: April 15, 2019

Publication date: October 15, 2020

Inventors: Pranay Kumar Lohia, Rajmohan Chandrahasan, Himanshu Gupta, Samiulla Zakir Hussain Shaikh, Sameep Mehta, Atul Kumar
MODALITY TRANSFORMATIONS

Publication number: 20200272432

Abstract: Transforming a user-interface modality of a software application can include identifying a first workflow segment corresponding to a UI modality of an application developed to run on a predetermined data processing platform and selecting one or more other workflow segments to transform the UI modality of the application. Each other workflow segment performs on a different data processing platform a function comparable to a function performable by the first workflow segment. The one or more other workflow segments can be selected from a multi-member set of alternative workflow segments that are semantically similar to the first workflow segment. The selecting can be based on classifying the first workflow segment with a classification model trained using machine learning to map workflow segments and corresponding UI modalities to different processing platforms.

Type: Application

Filed: February 25, 2019

Publication date: August 27, 2020

Inventors: Samiulla Zakir Hussain Shaikh, Vijay Ekambaram, Padmanabha Venkatagiri Seshadri, Shinoj Zacharias
Smart examination evaluation based on run time challenge response backed by guess detection

Patent number: 10665123

Abstract: One embodiment provides a method, including: obtaining a first question on a multiple choice exam comprising a plurality of possible answers; receiving his or her answer selection of one of the plurality of possible answers; identifying an answer time corresponding to the amount of time between presentation of the question and receiving the answer selection; determining if the test taker is guessing the answer to the question, wherein the determining comprises (i) comparing the answer time to a predetermined threshold answer time and (ii) ascertaining that the test taker is guessing when the answer time is outside the threshold answer time; providing to the test taker one or more additional questions, wherein the one or more additional questions are related to the first question; and evaluating the test taker using at least one of: the answer selection and any test taker response to the one or more additional questions.

Type: Grant

Filed: June 9, 2017

Date of Patent: May 26, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Vijay Ekambaram, Vikas Joshi, Samiulla Zakir Hussain Shaikh
SMART EXAMINATION EVALUATION BASED ON RUN TIME CHALLENGE RESPONSE BACKED BY GUESS DETECTION

Publication number: 20180357917

Abstract: One embodiment provides a method, including: obtaining a first question on a multiple choice exam comprising a plurality of possible answers; receiving his or her answer selection of one of the plurality of possible answers; identifying an answer time corresponding to the amount of time between presentation of the question and receiving the answer selection; determining if the test taker is guessing the answer to the question, wherein the determining comprises (i) comparing the answer time to a predetermined threshold answer time and (ii) ascertaining that the test taker is guessing when the answer time is outside the threshold answer time; providing to the test taker one or more additional questions, wherein the one or more additional questions are related to the first question; and evaluating the test taker using at least one of: the answer selection and any test taker response to the one or more additional questions.

Type: Application

Filed: June 9, 2017

Publication date: December 13, 2018

Inventors: Vijay Ekambaram, Vikas Joshi, Samiulla Zakir Hussain Shaikh