Patents by Inventor Samiulla Zakir Hussain Shaikh

Samiulla Zakir Hussain Shaikh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12099805
    Abstract: One embodiment provides a method, comprising: receiving an input sentence for a classification by a machine-learning model, where the classification is based upon a sentiment of the input sentence; splitting the input sentence into a plurality of tokens, each of the plurality of tokens corresponding to a term within the input sentence; creating a causal subgraph from the plurality of tokens, wherein the creating is based upon a causal relationship identified between tokens of the plurality of tokens; identifying, using the causal subgraph, tokens of the plurality of tokens influencing the classification; and generating, based upon the tokens of the plurality of tokens, a causal explanation for the classification, wherein the causal explanation identifies at least one portion of the input sentence resulting in the classification.
    Type: Grant
    Filed: November 19, 2021
    Date of Patent: September 24, 2024
    Assignee: International Business Machines Corporation
    Inventors: Naveen Panwar, Deepak Vijaykeerthy, Nishtha Madaan, Samiulla Zakir Hussain Shaikh, Diptikalyan Saha
  • Patent number: 11886385
    Abstract: An embodiment for identifying and sorting duplicate datasets within a large pool of heterogeneous datasets may include received a plurality of heterogeneous datasets. The embodiment may automatically compare schema information and metadata within each of the received plurality of heterogeneous datasets to generate name-based similarity scores for each dataset. The embodiment may also automatically compare data distribution information within each of the received plurality of heterogeneous datasets to generate a plurality of data distribution similarity scores for each heterogeneous dataset. The embodiment may further include automatically calculating an overall distance metric using the name-based similarity scores and plurality of data distribution similarity scores. The embodiment may also include based on the calculate overall distance metric, automatically generating distance graphs that identifying clusters of similar datasets and illustrate inferred lineage for the clusters of similar datasets.
    Type: Grant
    Filed: June 2, 2022
    Date of Patent: January 30, 2024
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Praduemn K. Goyal, Sandeep Hans, Samiulla Zakir Hussain Shaikh, Diptikalyan Saha
  • Publication number: 20230394011
    Abstract: An embodiment for identifying and sorting duplicate datasets within a large pool of heterogeneous datasets may include received a plurality of heterogeneous datasets. The embodiment may automatically compare schema information and metadata within each of the received plurality of heterogeneous datasets to generate name-based similarity scores for each dataset. The embodiment may also automatically compare data distribution information within each of the received plurality of heterogeneous datasets to generate a plurality of data distribution similarity scores for each heterogeneous dataset. The embodiment may further include automatically calculating an overall distance metric using the name-based similarity scores and plurality of data distribution similarity scores. The embodiment may also include based on the calculate overall distance metric, automatically generating distance graphs that identifying clusters of similar datasets and illustrate inferred lineage for the clusters of similar datasets.
    Type: Application
    Filed: June 2, 2022
    Publication date: December 7, 2023
    Inventors: Praduemn K. Goyal, Sandeep Hans, Samiulla Zakir Hussain Shaikh, Diptikalyan Saha
  • Publication number: 20230068513
    Abstract: Embodiments of the present invention provide computer-implemented methods, computer program products and computer systems. Embodiments of the present invention can, in response to receiving information, generate a data profile for a model that includes metadata for data requirements, model specific requirements, and data quality metrics. Embodiments of the present invention can generate one or more perturbations for training data associated with the received information and validate at least one perturbation of the one or more perturbations of training data as relevant test data based, at least in part on context associated with the model. Embodiments of the present invention can then generate one or more test scenarios based on the at least one validated perturbation and varying hyperparameters of the model and generate a test report based on an execution of at least one generated test scenario of the generated one or more test scenarios.
    Type: Application
    Filed: September 1, 2021
    Publication date: March 2, 2023
    Inventors: Sattwati Kundu, Samiulla Zakir Hussain Shaikh
  • Patent number: 11551102
    Abstract: One embodiment provides a method, including: receiving a target unstructured document for determining whether the target unstructured document comprises biased information; identifying an objective of the target unstructured document by extracting, from the target unstructured document, (i) entities and (ii) relationships between the entities; creating a structured knowledge base, wherein the creating comprises (i) creating an entry in the structured knowledge base corresponding to the target unstructured document, (ii) identifying other unstructured documents having a similarity to the target unstructured document, and (iii) generating an entry in the structured knowledge base corresponding to each of the other unstructured documents; applying a bias detection technique on the structured knowledge base; and providing an indication of whether the target unstructured document comprises bias.
    Type: Grant
    Filed: April 15, 2019
    Date of Patent: January 10, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Pranay Kumar Lohia, Rajmohan Chandrahasan, Himanshu Gupta, Samiulla Zakir Hussain Shaikh, Sameep Mehta, Atul Kumar
  • Patent number: 11521065
    Abstract: Methods, systems, and computer program products for generating explanations for a semantic parser are provided herein. A computer-implemented method includes providing to a generative model (i) at least one query and (ii) a context of at least one dataset applicable to the at least one query, wherein the generative model generates a plurality of perturbations for the at least one input query based on the context; providing the plurality of perturbations as inputs to a context aware sequence-to-sequence model, thereby obtaining a plurality of outputs; and generating, for (i) an additional query provided as input to the context aware sequence-to-sequence model and (ii) a context applicable to the additional query, an explanation indicative of one or more parts of the additional query that contributes to an output corresponding to the additional query, based at least in part on the plurality of outputs corresponding to the perturbations.
    Type: Grant
    Filed: February 6, 2020
    Date of Patent: December 6, 2022
    Assignee: International Business Machines Corporation
    Inventors: Rachamalla Anirudh Reddy, Pranay Kumar Lohia, Samiulla Zakir Hussain Shaikh, Diptikalyan Saha, Sameep Mehta
  • Patent number: 11455554
    Abstract: Methods, systems, and computer program products for improving trustworthiness of artificial intelligence models in presence of anomalous data are provided herein. A method includes obtaining a machine learning model and a set of training data; determining one or more anomalous data points in said set of training data; for a given one of said anomalous data points, identifying attributes that decrease confidence with respect to at least one output of said machine learning model; determining that a root cause of said decreased confidence corresponds to one of: a class imbalance issue related to said at least one attribute, a confused class issue related to said at least one attribute, a low density issue related to said at least one attribute, and an adversarial issue related to said at least one attribute; and performing step(s) to improve said confidence based at least in part on said determined root cause.
    Type: Grant
    Filed: November 25, 2019
    Date of Patent: September 27, 2022
    Assignee: International Business Machines Corporation
    Inventors: Pranay Kumar Lohia, Diptikalyan Saha, Aniya Aggarwal, Gagandeep Singh, Rema Ananthanarayanan, Samiulla Zakir Hussain Shaikh, Sandeep Hans
  • Patent number: 11321304
    Abstract: Methods, systems, and computer program products for domain aware explainable anomaly and drift detection for multi-variate raw data using a constraint repository are provided herein. A computer-implemented method includes obtaining a set of data and information indicative of a domain of said set of data; obtaining constraints from a domain-indexed constraint repository based on said set of data and said information, wherein the domain-indexed constraint repository comprises a knowledge graph having a plurality of nodes, wherein each node comprises an attribute associated with at least one of a plurality of domains and constraints corresponding to the attribute; detecting anomalies in said set of data based on whether portions of said set of data violate said retrieved constraints; generating an explanation corresponding to each of the anomalies that describe the attributes corresponding to the violated constraints; and outputting an indication of the anomalies and the corresponding explanation.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: May 3, 2022
    Assignee: International Business Machines Corporation
    Inventors: Sandeep Hans, Samiulla Zakir Hussain Shaikh, Rema Ananthanarayanan, Diptikalyan Saha, Aniya Aggarwal, Gagandeep Singh, Pranay Kumar Lohia, Manish Anand Bhide, Sameep Mehta
  • Patent number: 11205138
    Abstract: A method, computer system, and a computer program product for utilizing provenance data to improve machine learning is provided. Embodiments of the present invention may include collecting provenance data. Embodiments of the present invention may include identifying model quality improvements based on the collected provenance data. Embodiments of the present invention may include identifying related models based on the collected provenance data. Embodiments of the present invention may include recommending model quality improvements to a user.
    Type: Grant
    Filed: May 22, 2019
    Date of Patent: December 21, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Samiulla Zakir Hussain Shaikh, Himanshu Gupta, Rajmohan Chandrahasan, Sameep Mehta, Manish Anand Bhide
  • Patent number: 11157983
    Abstract: Methods, systems, and computer program products for generating a framework for prioritizing machine learning model offerings via a platform are provided herein. A computer-implemented method includes processing, via a computing platform, a machine learning model input by a first user and metadata corresponding to the machine learning model input by the first user; automatically comparing, via the computing platform, the metadata corresponding to the machine learning model with metadata corresponding to one or more existing machine learning models stored by the computing platform; automatically calculating, via the computing platform, initial pricing information for the machine learning model based on the comparison; and outputting, via an interactive user interface of the computing platform, the machine learning model to one or more additional users for purchase in accordance with the calculated initial pricing information.
    Type: Grant
    Filed: July 8, 2019
    Date of Patent: October 26, 2021
    Assignee: International Business Machines Corporation
    Inventors: Kalapriya Kannan, Samiulla Zakir Hussain Shaikh, Pranay Kumar Lohia, Vijay Arya, Sameep Mehta
  • Publication number: 20210248455
    Abstract: Methods, systems, and computer program products for generating explanations for a semantic parser are provided herein. A computer-implemented method includes providing to a generative model (i) at least one query and (ii) a context of at least one dataset applicable to the at least one query, wherein the generative model generates a plurality of perturbations for the at least one input query based on the context; providing the plurality of perturbations as inputs to a context aware sequence-to-sequence model, thereby obtaining a plurality of outputs; and generating, for (i) an additional query provided as input to the context aware sequence-to-sequence model and (ii) a context applicable to the additional query, an explanation indicative of one or more parts of the additional query that contributes to an output corresponding to the additional query, based at least in part on the plurality of outputs corresponding to the perturbations.
    Type: Application
    Filed: February 6, 2020
    Publication date: August 12, 2021
    Inventors: Rachamalla Anirudh Reddy, Pranay Kumar Lohia, Samiulla Zakir Hussain Shaikh, Diptikalyan Saha, Sameep Mehta
  • Publication number: 20210158183
    Abstract: Methods, systems, and computer program products for improving trustworthiness of artificial intelligence models in presence of anomalous data are provided herein. A method includes obtaining a machine learning model and a set of training data; determining one or more anomalous data points in said set of training data; for a given one of said anomalous data points, identifying attributes that decrease confidence with respect to at least one output of said machine learning model; determining that a root cause of said decreased confidence corresponds to one of: a class imbalance issue related to said at least one attribute, a confused class issue related to said at least one attribute, a low density issue related to said at least one attribute, and an adversarial issue related to said at least one attribute; and performing step(s) to improve said confidence based at least in part on said determined root cause.
    Type: Application
    Filed: November 25, 2019
    Publication date: May 27, 2021
    Inventors: Pranay Kumar Lohia, Diptikalyan Saha, Aniya Aggarwal, Gagandeep Singh, Rema Ananthanarayanan, Samiulla Zakir Hussain Shaikh, Sandeep Hans
  • Publication number: 20210097052
    Abstract: Methods, systems, and computer program products for domain aware explainable anomaly and drift detection for multi-variate raw data using a constraint repository are provided herein. A computer-implemented method includes obtaining a set of data and information indicative of a domain of said set of data; obtaining constraints from a domain-indexed constraint repository based on said set of data and said information, wherein the domain-indexed constraint repository comprises a knowledge graph having a plurality of nodes, wherein each node comprises an attribute associated with at least one of a plurality of domains and constraints corresponding to the attribute; detecting anomalies in said set of data based on whether portions of said set of data violate said retrieved constraints; generating an explanation corresponding to each of the anomalies that describe the attributes corresponding to the violated constraints; and outputting an indication of the anomalies and the corresponding explanation.
    Type: Application
    Filed: September 27, 2019
    Publication date: April 1, 2021
    Inventors: Sandeep Hans, Samiulla Zakir Hussain Shaikh, Rema Ananthanarayanan, Diptikalyan Saha, Aniya Aggarwal, Gagandeep Singh, Pranay Kumar Lohia, Manish Anand Bhide, Sameep Mehta
  • Publication number: 20210012404
    Abstract: Methods, systems, and computer program products for generating a framework for prioritizing machine learning model offerings via a platform are provided herein. A computer-implemented method includes processing, via a computing platform, a machine learning model input by a first user and metadata corresponding to the machine learning model input by the first user; automatically comparing, via the computing platform, the metadata corresponding to the machine learning model with metadata corresponding to one or more existing machine learning models stored by the computing platform; automatically calculating, via the computing platform, initial pricing information for the machine learning model based on the comparison; and outputting, via an interactive user interface of the computing platform, the machine learning model to one or more additional users for purchase in accordance with the calculated initial pricing information.
    Type: Application
    Filed: July 8, 2019
    Publication date: January 14, 2021
    Inventors: Kalapriya Kannan, Samiulla Zakir Hussain Shaikh, Pranay Kumar Lohia, Vijay Arya, Sameep Mehta
  • Patent number: 10884713
    Abstract: Transforming a user-interface modality of a software application can include identifying a first workflow segment corresponding to a UI modality of an application developed to run on a predetermined data processing platform and selecting one or more other workflow segments to transform the UI modality of the application. Each other workflow segment performs on a different data processing platform a function comparable to a function performable by the first workflow segment. The one or more other workflow segments can be selected from a multi-member set of alternative workflow segments that are semantically similar to the first workflow segment. The selecting can be based on classifying the first workflow segment with a classification model trained using machine learning to map workflow segments and corresponding UI modalities to different processing platforms.
    Type: Grant
    Filed: February 25, 2019
    Date of Patent: January 5, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Samiulla Zakir Hussain Shaikh, Vijay Ekambaram, Padmanabha Venkatagiri Seshadri, Shinoj Zacharias
  • Publication number: 20200372398
    Abstract: A method, computer system, and a computer program product for utilizing provenance data to improve machine learning is provided. Embodiments of the present invention may include collecting provenance data. Embodiments of the present invention may include identifying model quality improvements based on the collected provenance data. Embodiments of the present invention may include identifying related models based on the collected provenance data. Embodiments of the present invention may include recommending model quality improvements to a user.
    Type: Application
    Filed: May 22, 2019
    Publication date: November 26, 2020
    Inventors: Samiulla Zakir Hussain Shaikh, HIMANSHU GUPTA, Rajmohan Chandrahasan, Sameep Mehta, Manish Anand Bhide
  • Publication number: 20200327424
    Abstract: One embodiment provides a method, including: receiving a target unstructured document for determining whether the target unstructured document comprises biased information; identifying an objective of the target unstructured document by extracting, from the target unstructured document, (i) entities and (ii) relationships between the entities; creating a structured knowledge base, wherein the creating comprises (i) creating an entry in the structured knowledge base corresponding to the target unstructured document, (ii) identifying other unstructured documents having a similarity to the target unstructured document, and (iii) generating an entry in the structured knowledge base corresponding to each of the other unstructured documents; applying a bias detection technique on the structured knowledge base; and providing an indication of whether the target unstructured document comprises bias.
    Type: Application
    Filed: April 15, 2019
    Publication date: October 15, 2020
    Inventors: Pranay Kumar Lohia, Rajmohan Chandrahasan, Himanshu Gupta, Samiulla Zakir Hussain Shaikh, Sameep Mehta, Atul Kumar
  • Publication number: 20200272432
    Abstract: Transforming a user-interface modality of a software application can include identifying a first workflow segment corresponding to a UI modality of an application developed to run on a predetermined data processing platform and selecting one or more other workflow segments to transform the UI modality of the application. Each other workflow segment performs on a different data processing platform a function comparable to a function performable by the first workflow segment. The one or more other workflow segments can be selected from a multi-member set of alternative workflow segments that are semantically similar to the first workflow segment. The selecting can be based on classifying the first workflow segment with a classification model trained using machine learning to map workflow segments and corresponding UI modalities to different processing platforms.
    Type: Application
    Filed: February 25, 2019
    Publication date: August 27, 2020
    Inventors: Samiulla Zakir Hussain Shaikh, Vijay Ekambaram, Padmanabha Venkatagiri Seshadri, Shinoj Zacharias
  • Patent number: 10665123
    Abstract: One embodiment provides a method, including: obtaining a first question on a multiple choice exam comprising a plurality of possible answers; receiving his or her answer selection of one of the plurality of possible answers; identifying an answer time corresponding to the amount of time between presentation of the question and receiving the answer selection; determining if the test taker is guessing the answer to the question, wherein the determining comprises (i) comparing the answer time to a predetermined threshold answer time and (ii) ascertaining that the test taker is guessing when the answer time is outside the threshold answer time; providing to the test taker one or more additional questions, wherein the one or more additional questions are related to the first question; and evaluating the test taker using at least one of: the answer selection and any test taker response to the one or more additional questions.
    Type: Grant
    Filed: June 9, 2017
    Date of Patent: May 26, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Vijay Ekambaram, Vikas Joshi, Samiulla Zakir Hussain Shaikh
  • Publication number: 20180357917
    Abstract: One embodiment provides a method, including: obtaining a first question on a multiple choice exam comprising a plurality of possible answers; receiving his or her answer selection of one of the plurality of possible answers; identifying an answer time corresponding to the amount of time between presentation of the question and receiving the answer selection; determining if the test taker is guessing the answer to the question, wherein the determining comprises (i) comparing the answer time to a predetermined threshold answer time and (ii) ascertaining that the test taker is guessing when the answer time is outside the threshold answer time; providing to the test taker one or more additional questions, wherein the one or more additional questions are related to the first question; and evaluating the test taker using at least one of: the answer selection and any test taker response to the one or more additional questions.
    Type: Application
    Filed: June 9, 2017
    Publication date: December 13, 2018
    Inventors: Vijay Ekambaram, Vikas Joshi, Samiulla Zakir Hussain Shaikh