Patents by Inventor Samiulla Zakir Hussain Shaikh
Samiulla Zakir Hussain Shaikh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12099805Abstract: One embodiment provides a method, comprising: receiving an input sentence for a classification by a machine-learning model, where the classification is based upon a sentiment of the input sentence; splitting the input sentence into a plurality of tokens, each of the plurality of tokens corresponding to a term within the input sentence; creating a causal subgraph from the plurality of tokens, wherein the creating is based upon a causal relationship identified between tokens of the plurality of tokens; identifying, using the causal subgraph, tokens of the plurality of tokens influencing the classification; and generating, based upon the tokens of the plurality of tokens, a causal explanation for the classification, wherein the causal explanation identifies at least one portion of the input sentence resulting in the classification.Type: GrantFiled: November 19, 2021Date of Patent: September 24, 2024Assignee: International Business Machines CorporationInventors: Naveen Panwar, Deepak Vijaykeerthy, Nishtha Madaan, Samiulla Zakir Hussain Shaikh, Diptikalyan Saha
-
Patent number: 11886385Abstract: An embodiment for identifying and sorting duplicate datasets within a large pool of heterogeneous datasets may include received a plurality of heterogeneous datasets. The embodiment may automatically compare schema information and metadata within each of the received plurality of heterogeneous datasets to generate name-based similarity scores for each dataset. The embodiment may also automatically compare data distribution information within each of the received plurality of heterogeneous datasets to generate a plurality of data distribution similarity scores for each heterogeneous dataset. The embodiment may further include automatically calculating an overall distance metric using the name-based similarity scores and plurality of data distribution similarity scores. The embodiment may also include based on the calculate overall distance metric, automatically generating distance graphs that identifying clusters of similar datasets and illustrate inferred lineage for the clusters of similar datasets.Type: GrantFiled: June 2, 2022Date of Patent: January 30, 2024Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Praduemn K. Goyal, Sandeep Hans, Samiulla Zakir Hussain Shaikh, Diptikalyan Saha
-
Publication number: 20230394011Abstract: An embodiment for identifying and sorting duplicate datasets within a large pool of heterogeneous datasets may include received a plurality of heterogeneous datasets. The embodiment may automatically compare schema information and metadata within each of the received plurality of heterogeneous datasets to generate name-based similarity scores for each dataset. The embodiment may also automatically compare data distribution information within each of the received plurality of heterogeneous datasets to generate a plurality of data distribution similarity scores for each heterogeneous dataset. The embodiment may further include automatically calculating an overall distance metric using the name-based similarity scores and plurality of data distribution similarity scores. The embodiment may also include based on the calculate overall distance metric, automatically generating distance graphs that identifying clusters of similar datasets and illustrate inferred lineage for the clusters of similar datasets.Type: ApplicationFiled: June 2, 2022Publication date: December 7, 2023Inventors: Praduemn K. Goyal, Sandeep Hans, Samiulla Zakir Hussain Shaikh, Diptikalyan Saha
-
Publication number: 20230068513Abstract: Embodiments of the present invention provide computer-implemented methods, computer program products and computer systems. Embodiments of the present invention can, in response to receiving information, generate a data profile for a model that includes metadata for data requirements, model specific requirements, and data quality metrics. Embodiments of the present invention can generate one or more perturbations for training data associated with the received information and validate at least one perturbation of the one or more perturbations of training data as relevant test data based, at least in part on context associated with the model. Embodiments of the present invention can then generate one or more test scenarios based on the at least one validated perturbation and varying hyperparameters of the model and generate a test report based on an execution of at least one generated test scenario of the generated one or more test scenarios.Type: ApplicationFiled: September 1, 2021Publication date: March 2, 2023Inventors: Sattwati Kundu, Samiulla Zakir Hussain Shaikh
-
Patent number: 11551102Abstract: One embodiment provides a method, including: receiving a target unstructured document for determining whether the target unstructured document comprises biased information; identifying an objective of the target unstructured document by extracting, from the target unstructured document, (i) entities and (ii) relationships between the entities; creating a structured knowledge base, wherein the creating comprises (i) creating an entry in the structured knowledge base corresponding to the target unstructured document, (ii) identifying other unstructured documents having a similarity to the target unstructured document, and (iii) generating an entry in the structured knowledge base corresponding to each of the other unstructured documents; applying a bias detection technique on the structured knowledge base; and providing an indication of whether the target unstructured document comprises bias.Type: GrantFiled: April 15, 2019Date of Patent: January 10, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Pranay Kumar Lohia, Rajmohan Chandrahasan, Himanshu Gupta, Samiulla Zakir Hussain Shaikh, Sameep Mehta, Atul Kumar
-
Patent number: 11521065Abstract: Methods, systems, and computer program products for generating explanations for a semantic parser are provided herein. A computer-implemented method includes providing to a generative model (i) at least one query and (ii) a context of at least one dataset applicable to the at least one query, wherein the generative model generates a plurality of perturbations for the at least one input query based on the context; providing the plurality of perturbations as inputs to a context aware sequence-to-sequence model, thereby obtaining a plurality of outputs; and generating, for (i) an additional query provided as input to the context aware sequence-to-sequence model and (ii) a context applicable to the additional query, an explanation indicative of one or more parts of the additional query that contributes to an output corresponding to the additional query, based at least in part on the plurality of outputs corresponding to the perturbations.Type: GrantFiled: February 6, 2020Date of Patent: December 6, 2022Assignee: International Business Machines CorporationInventors: Rachamalla Anirudh Reddy, Pranay Kumar Lohia, Samiulla Zakir Hussain Shaikh, Diptikalyan Saha, Sameep Mehta
-
Patent number: 11455554Abstract: Methods, systems, and computer program products for improving trustworthiness of artificial intelligence models in presence of anomalous data are provided herein. A method includes obtaining a machine learning model and a set of training data; determining one or more anomalous data points in said set of training data; for a given one of said anomalous data points, identifying attributes that decrease confidence with respect to at least one output of said machine learning model; determining that a root cause of said decreased confidence corresponds to one of: a class imbalance issue related to said at least one attribute, a confused class issue related to said at least one attribute, a low density issue related to said at least one attribute, and an adversarial issue related to said at least one attribute; and performing step(s) to improve said confidence based at least in part on said determined root cause.Type: GrantFiled: November 25, 2019Date of Patent: September 27, 2022Assignee: International Business Machines CorporationInventors: Pranay Kumar Lohia, Diptikalyan Saha, Aniya Aggarwal, Gagandeep Singh, Rema Ananthanarayanan, Samiulla Zakir Hussain Shaikh, Sandeep Hans
-
Patent number: 11321304Abstract: Methods, systems, and computer program products for domain aware explainable anomaly and drift detection for multi-variate raw data using a constraint repository are provided herein. A computer-implemented method includes obtaining a set of data and information indicative of a domain of said set of data; obtaining constraints from a domain-indexed constraint repository based on said set of data and said information, wherein the domain-indexed constraint repository comprises a knowledge graph having a plurality of nodes, wherein each node comprises an attribute associated with at least one of a plurality of domains and constraints corresponding to the attribute; detecting anomalies in said set of data based on whether portions of said set of data violate said retrieved constraints; generating an explanation corresponding to each of the anomalies that describe the attributes corresponding to the violated constraints; and outputting an indication of the anomalies and the corresponding explanation.Type: GrantFiled: September 27, 2019Date of Patent: May 3, 2022Assignee: International Business Machines CorporationInventors: Sandeep Hans, Samiulla Zakir Hussain Shaikh, Rema Ananthanarayanan, Diptikalyan Saha, Aniya Aggarwal, Gagandeep Singh, Pranay Kumar Lohia, Manish Anand Bhide, Sameep Mehta
-
Patent number: 11205138Abstract: A method, computer system, and a computer program product for utilizing provenance data to improve machine learning is provided. Embodiments of the present invention may include collecting provenance data. Embodiments of the present invention may include identifying model quality improvements based on the collected provenance data. Embodiments of the present invention may include identifying related models based on the collected provenance data. Embodiments of the present invention may include recommending model quality improvements to a user.Type: GrantFiled: May 22, 2019Date of Patent: December 21, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Samiulla Zakir Hussain Shaikh, Himanshu Gupta, Rajmohan Chandrahasan, Sameep Mehta, Manish Anand Bhide
-
Patent number: 11157983Abstract: Methods, systems, and computer program products for generating a framework for prioritizing machine learning model offerings via a platform are provided herein. A computer-implemented method includes processing, via a computing platform, a machine learning model input by a first user and metadata corresponding to the machine learning model input by the first user; automatically comparing, via the computing platform, the metadata corresponding to the machine learning model with metadata corresponding to one or more existing machine learning models stored by the computing platform; automatically calculating, via the computing platform, initial pricing information for the machine learning model based on the comparison; and outputting, via an interactive user interface of the computing platform, the machine learning model to one or more additional users for purchase in accordance with the calculated initial pricing information.Type: GrantFiled: July 8, 2019Date of Patent: October 26, 2021Assignee: International Business Machines CorporationInventors: Kalapriya Kannan, Samiulla Zakir Hussain Shaikh, Pranay Kumar Lohia, Vijay Arya, Sameep Mehta
-
Publication number: 20210248455Abstract: Methods, systems, and computer program products for generating explanations for a semantic parser are provided herein. A computer-implemented method includes providing to a generative model (i) at least one query and (ii) a context of at least one dataset applicable to the at least one query, wherein the generative model generates a plurality of perturbations for the at least one input query based on the context; providing the plurality of perturbations as inputs to a context aware sequence-to-sequence model, thereby obtaining a plurality of outputs; and generating, for (i) an additional query provided as input to the context aware sequence-to-sequence model and (ii) a context applicable to the additional query, an explanation indicative of one or more parts of the additional query that contributes to an output corresponding to the additional query, based at least in part on the plurality of outputs corresponding to the perturbations.Type: ApplicationFiled: February 6, 2020Publication date: August 12, 2021Inventors: Rachamalla Anirudh Reddy, Pranay Kumar Lohia, Samiulla Zakir Hussain Shaikh, Diptikalyan Saha, Sameep Mehta
-
Publication number: 20210158183Abstract: Methods, systems, and computer program products for improving trustworthiness of artificial intelligence models in presence of anomalous data are provided herein. A method includes obtaining a machine learning model and a set of training data; determining one or more anomalous data points in said set of training data; for a given one of said anomalous data points, identifying attributes that decrease confidence with respect to at least one output of said machine learning model; determining that a root cause of said decreased confidence corresponds to one of: a class imbalance issue related to said at least one attribute, a confused class issue related to said at least one attribute, a low density issue related to said at least one attribute, and an adversarial issue related to said at least one attribute; and performing step(s) to improve said confidence based at least in part on said determined root cause.Type: ApplicationFiled: November 25, 2019Publication date: May 27, 2021Inventors: Pranay Kumar Lohia, Diptikalyan Saha, Aniya Aggarwal, Gagandeep Singh, Rema Ananthanarayanan, Samiulla Zakir Hussain Shaikh, Sandeep Hans
-
Publication number: 20210097052Abstract: Methods, systems, and computer program products for domain aware explainable anomaly and drift detection for multi-variate raw data using a constraint repository are provided herein. A computer-implemented method includes obtaining a set of data and information indicative of a domain of said set of data; obtaining constraints from a domain-indexed constraint repository based on said set of data and said information, wherein the domain-indexed constraint repository comprises a knowledge graph having a plurality of nodes, wherein each node comprises an attribute associated with at least one of a plurality of domains and constraints corresponding to the attribute; detecting anomalies in said set of data based on whether portions of said set of data violate said retrieved constraints; generating an explanation corresponding to each of the anomalies that describe the attributes corresponding to the violated constraints; and outputting an indication of the anomalies and the corresponding explanation.Type: ApplicationFiled: September 27, 2019Publication date: April 1, 2021Inventors: Sandeep Hans, Samiulla Zakir Hussain Shaikh, Rema Ananthanarayanan, Diptikalyan Saha, Aniya Aggarwal, Gagandeep Singh, Pranay Kumar Lohia, Manish Anand Bhide, Sameep Mehta
-
Publication number: 20210012404Abstract: Methods, systems, and computer program products for generating a framework for prioritizing machine learning model offerings via a platform are provided herein. A computer-implemented method includes processing, via a computing platform, a machine learning model input by a first user and metadata corresponding to the machine learning model input by the first user; automatically comparing, via the computing platform, the metadata corresponding to the machine learning model with metadata corresponding to one or more existing machine learning models stored by the computing platform; automatically calculating, via the computing platform, initial pricing information for the machine learning model based on the comparison; and outputting, via an interactive user interface of the computing platform, the machine learning model to one or more additional users for purchase in accordance with the calculated initial pricing information.Type: ApplicationFiled: July 8, 2019Publication date: January 14, 2021Inventors: Kalapriya Kannan, Samiulla Zakir Hussain Shaikh, Pranay Kumar Lohia, Vijay Arya, Sameep Mehta
-
Patent number: 10884713Abstract: Transforming a user-interface modality of a software application can include identifying a first workflow segment corresponding to a UI modality of an application developed to run on a predetermined data processing platform and selecting one or more other workflow segments to transform the UI modality of the application. Each other workflow segment performs on a different data processing platform a function comparable to a function performable by the first workflow segment. The one or more other workflow segments can be selected from a multi-member set of alternative workflow segments that are semantically similar to the first workflow segment. The selecting can be based on classifying the first workflow segment with a classification model trained using machine learning to map workflow segments and corresponding UI modalities to different processing platforms.Type: GrantFiled: February 25, 2019Date of Patent: January 5, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Samiulla Zakir Hussain Shaikh, Vijay Ekambaram, Padmanabha Venkatagiri Seshadri, Shinoj Zacharias
-
Publication number: 20200372398Abstract: A method, computer system, and a computer program product for utilizing provenance data to improve machine learning is provided. Embodiments of the present invention may include collecting provenance data. Embodiments of the present invention may include identifying model quality improvements based on the collected provenance data. Embodiments of the present invention may include identifying related models based on the collected provenance data. Embodiments of the present invention may include recommending model quality improvements to a user.Type: ApplicationFiled: May 22, 2019Publication date: November 26, 2020Inventors: Samiulla Zakir Hussain Shaikh, HIMANSHU GUPTA, Rajmohan Chandrahasan, Sameep Mehta, Manish Anand Bhide
-
Publication number: 20200327424Abstract: One embodiment provides a method, including: receiving a target unstructured document for determining whether the target unstructured document comprises biased information; identifying an objective of the target unstructured document by extracting, from the target unstructured document, (i) entities and (ii) relationships between the entities; creating a structured knowledge base, wherein the creating comprises (i) creating an entry in the structured knowledge base corresponding to the target unstructured document, (ii) identifying other unstructured documents having a similarity to the target unstructured document, and (iii) generating an entry in the structured knowledge base corresponding to each of the other unstructured documents; applying a bias detection technique on the structured knowledge base; and providing an indication of whether the target unstructured document comprises bias.Type: ApplicationFiled: April 15, 2019Publication date: October 15, 2020Inventors: Pranay Kumar Lohia, Rajmohan Chandrahasan, Himanshu Gupta, Samiulla Zakir Hussain Shaikh, Sameep Mehta, Atul Kumar
-
Publication number: 20200272432Abstract: Transforming a user-interface modality of a software application can include identifying a first workflow segment corresponding to a UI modality of an application developed to run on a predetermined data processing platform and selecting one or more other workflow segments to transform the UI modality of the application. Each other workflow segment performs on a different data processing platform a function comparable to a function performable by the first workflow segment. The one or more other workflow segments can be selected from a multi-member set of alternative workflow segments that are semantically similar to the first workflow segment. The selecting can be based on classifying the first workflow segment with a classification model trained using machine learning to map workflow segments and corresponding UI modalities to different processing platforms.Type: ApplicationFiled: February 25, 2019Publication date: August 27, 2020Inventors: Samiulla Zakir Hussain Shaikh, Vijay Ekambaram, Padmanabha Venkatagiri Seshadri, Shinoj Zacharias
-
Patent number: 10665123Abstract: One embodiment provides a method, including: obtaining a first question on a multiple choice exam comprising a plurality of possible answers; receiving his or her answer selection of one of the plurality of possible answers; identifying an answer time corresponding to the amount of time between presentation of the question and receiving the answer selection; determining if the test taker is guessing the answer to the question, wherein the determining comprises (i) comparing the answer time to a predetermined threshold answer time and (ii) ascertaining that the test taker is guessing when the answer time is outside the threshold answer time; providing to the test taker one or more additional questions, wherein the one or more additional questions are related to the first question; and evaluating the test taker using at least one of: the answer selection and any test taker response to the one or more additional questions.Type: GrantFiled: June 9, 2017Date of Patent: May 26, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Vijay Ekambaram, Vikas Joshi, Samiulla Zakir Hussain Shaikh
-
Publication number: 20180357917Abstract: One embodiment provides a method, including: obtaining a first question on a multiple choice exam comprising a plurality of possible answers; receiving his or her answer selection of one of the plurality of possible answers; identifying an answer time corresponding to the amount of time between presentation of the question and receiving the answer selection; determining if the test taker is guessing the answer to the question, wherein the determining comprises (i) comparing the answer time to a predetermined threshold answer time and (ii) ascertaining that the test taker is guessing when the answer time is outside the threshold answer time; providing to the test taker one or more additional questions, wherein the one or more additional questions are related to the first question; and evaluating the test taker using at least one of: the answer selection and any test taker response to the one or more additional questions.Type: ApplicationFiled: June 9, 2017Publication date: December 13, 2018Inventors: Vijay Ekambaram, Vikas Joshi, Samiulla Zakir Hussain Shaikh