Patents by Inventor Sarthak Dash

Sarthak Dash has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11907842
    Abstract: A system comprises a memory that stores computer-executable components; and a processor, operably coupled to the memory, that executes the computer-executable components. The system includes a receiving component that receives a corpus of data; a relation extraction component that generates noisy knowledge graphs from the corpus; and a training component that acquires global representations of entities and relation by training from output of the relation extraction component.
    Type: Grant
    Filed: January 13, 2023
    Date of Patent: February 20, 2024
    Assignee: NTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alfio Massimiliano Gliozzo, Sarthak Dash, Michael Robert Glass, Mustafa Canim
  • Publication number: 20230409806
    Abstract: An embodiment for encoding permutation-invariant representations of linearized tabular data. The embodiment may receive input including tabular data and linearize a column or row within the received tabular data. The embodiment may automatically assign an increasing sequence of position identifiers to each non-delimiting tokenized cell in the linearized column or row until a header delimiter is reached. The embodiment may, in response to reaching the header delimiter, automatically assign a monotonically increasing sequence of position identifiers for each non-delimiting tokenized cell positioned after the header delimiter, restarting from an integer corresponding to 1 greater than the position identifier assigned to the header delimiter for each non-delimiting tokenized cell positioned after cell delimiters.
    Type: Application
    Filed: June 17, 2022
    Publication date: December 21, 2023
    Inventors: Sarthak Dash, Sugato Bagchi, NANDANA MIHINDUKULASOORIYA, Alfio Massimiliano Gliozzo
  • Patent number: 11694035
    Abstract: One embodiment of the present invention provides a method comprising receiving a text corpus, and generating a first list of triples based on the text corpus. Each triple of the first list comprises a first term representing a candidate hyponym, a second term representing a candidate hypernym, and a frequency value indicative of a number of times a hypernymy relation is observed between the candidate hyponym and the candidate hypernym in the text corpus. The method further comprises training a neural network for hypernym induction based on the first list. The trained neural network is a strict partial order network (SPON) model.
    Type: Grant
    Filed: June 9, 2021
    Date of Patent: July 4, 2023
    Assignee: International Business Machines Corporation
    Inventors: Sarthak Dash, Alfio Massimiliano Gliozzo, Md Faisal Mahbub Chowdhury
  • Patent number: 11693896
    Abstract: Techniques regarding autonomous classification and/or identification of various types of noise comprised within a knowledge graph are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a knowledge extraction component, operatively coupled to the processor, that can classify a type of noise comprised within a knowledge graph. The type of noise can be generated by an information extraction process.
    Type: Grant
    Filed: September 25, 2018
    Date of Patent: July 4, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nandana Sampath Mihindukulasooriya, Oktie Hassanzadeh, Alfio Massimiliano Gliozzo, Sarthak Dash
  • Publication number: 20230177335
    Abstract: A system comprises a memory that stores computer-executable components; and a processor, operably coupled to the memory, that executes the computer-executable components. The system includes a receiving component that receives a corpus of data; a relation extraction component that generates noisy knowledge graphs from the corpus; and a training component that acquires global representations of entities and relation by training from output of the relation extraction component.
    Type: Application
    Filed: January 13, 2023
    Publication date: June 8, 2023
    Inventors: Alfio Massimiliano Gliozzo, Sarthak Dash, Michael Robert Glass, Mustafa Canim
  • Publication number: 20230087667
    Abstract: Embodiments of the present invention provide computer-implemented methods, computer program products and computer systems. Embodiments of the present invention can, in response to receiving information, learn entity representations and cluster assignments of respective entity representations in a joint manner for both entities and relations of respective entities.
    Type: Application
    Filed: September 21, 2021
    Publication date: March 23, 2023
    Inventors: Sarthak Dash, Gaetano Rossiello, NANDANA MIHINDUKULASOORIYA, Sugato Bagchi, Alfio Massimiliano Gliozzo
  • Patent number: 11574179
    Abstract: A system comprises a memory that stores computer-executable components; and a processor, operably coupled to the memory, that executes the computer-executable components. The system includes a receiving component that receives a corpus of data; a relation extraction component that generates noisy knowledge graphs from the corpus; and a training component that acquires global representations of entities and relation by training from output of the relation extraction component.
    Type: Grant
    Filed: January 7, 2019
    Date of Patent: February 7, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alfio Massimiliano Gliozzo, Sarthak Dash, Michael Robert Glass, Mustafa Canim
  • Patent number: 11526688
    Abstract: One embodiment of the invention provides a method for terminology ranking for use in natural language processing. The method comprises receiving a list of terms extracted from a corpus, where the list comprises a ranking of the terms based on frequencies of the terms across the corpus. The method further comprises accessing a domain ontology associated with the corpus, and re-ranking the list based on the domain ontology. The resulting re-ranked list comprises a different ranking of the terms based on relevance of the terms using knowledge from the domain ontology. The method further comprises generating clusters of terms via a trained model adapted to the corpus, and boosting a rank of at least one term of the re-ranked list based on the clusters to increase a relevance of the at least one term using knowledge from the trained model.
    Type: Grant
    Filed: April 16, 2020
    Date of Patent: December 13, 2022
    Assignee: International Business Machines Corporation
    Inventors: Nandana Mihindukulasooriya, Ruchi Mahindru, Md Faisal Mahbub Chowdhury, Yu Deng, Alfio Massimiliano Gliozzo, Sarthak Dash, Nicolas Rodolfo Fauceglia, Gaetano Rossiello
  • Patent number: 11520762
    Abstract: A computer-implemented method according to one embodiment includes converting an input question into a vector form using trained word embeddings; constructing a type similarity matrix using a predetermined ontology; and determining a score for all possible types for the input question, based on the input question in vector form and the type similarity matrix.
    Type: Grant
    Filed: December 13, 2019
    Date of Patent: December 6, 2022
    Assignee: International Business Machines Corporation
    Inventors: Sarthak Dash, Gaetano Rossiello, Alfio Massimiliano Gliozzo, Robert G. Farrell, Bassem Makni, Avirup Sil, Vittorio Castelli, Radu Florian
  • Patent number: 11500910
    Abstract: Techniques regarding similarity based negative sample analysis are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a similarity component that can determine similarity metrics for respective entities based on a vector space model. The respective entities can be represented by a dataset. Also, the computer executable components can comprise a sampling component that can perform a negative sampling analysis on the dataset based on the similarity metrics.
    Type: Grant
    Filed: March 21, 2018
    Date of Patent: November 15, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sarthak Dash, Alfio Massimiliano Gliozzo, Michael Robert Glass
  • Patent number: 11423307
    Abstract: A system, computer program product, and method are provided for employing a graph neural network (GNN) to construct a taxonomy. The GNN is subject to a training cycle and an inference cycle. The training cycle encodes cross-domain terms pairs from a set of noisy cross domain pairs extracted from a corpora, and outputs a preliminary taxonomy. The inference cycle identifies candidate term pairs and selectively subjects the candidate term pairs to selective filtering to produce a system predicted taxonomy from the preliminary taxonomy.
    Type: Grant
    Filed: June 3, 2020
    Date of Patent: August 23, 2022
    Assignee: International Business Machines Corporation
    Inventors: Chao Shang, Sarthak Dash, Md Faisal Mahbub Chowdhury, Alfio Massimiliano Gliozzo
  • Publication number: 20210383205
    Abstract: A system, computer program product, and method are provided for employing a graph neural network (GNN) to construct a taxonomy. The GNN is subject to a training cycle and an inference cycle. The training cycle encodes cross-domain terms pairs from a set of noisy cross domain pairs extracted from a corpora, and outputs a preliminary taxonomy. The inference cycle identifies candidate term pairs and selectively subjects the candidate term pairs to selective filtering to produce a system predicted taxonomy from the preliminary taxonomy.
    Type: Application
    Filed: June 3, 2020
    Publication date: December 9, 2021
    Applicant: International Business Machines Corporation
    Inventors: Chao Shang, Sarthak Dash, Md Faisal Mahbub Chowdhury, Alfio Massimiliano Gliozzo
  • Publication number: 20210326636
    Abstract: One embodiment of the invention provides a method for terminology ranking for use in natural language processing. The method comprises receiving a list of terms extracted from a corpus, where the list comprises a ranking of the terms based on frequencies of the terms across the corpus. The method further comprises accessing a domain ontology associated with the corpus, and re-ranking the list based on the domain ontology. The resulting re-ranked list comprises a different ranking of the terms based on relevance of the terms using knowledge from the domain ontology. The method further comprises generating clusters of terms via a trained model adapted to the corpus, and boosting a rank of at least one term of the re-ranked list based on the clusters to increase a relevance of the at least one term using knowledge from the trained model.
    Type: Application
    Filed: April 16, 2020
    Publication date: October 21, 2021
    Inventors: Nandana Mihindukulasooriya, Ruchi Mahindru, Md Faisal Mahbub Chowdhury, Yu Deng, Alfio Massimiliano Gliozzo, Sarthak Dash, Nicolas Rodolfo Fauceglia, Gaetano Rossiello
  • Patent number: 11138523
    Abstract: A method, system and computer-usable medium are disclosed for reducing labeled data imbalances when training an active learning system. The ratio of instances having positive labels or negative labels in a collection of labeled instances associated with an input category used for learning is determined. A first instance for annotation is selected from a collection of unlabeled instances if a first threshold for negative instances, and a first threshold confidence level of being a positive instance of the input category, have been met. A second instance for annotation is selected if a second threshold for positive instances, and a second threshold confidence level of being a negative instance of the input category, have been met. The first and second instances are respectively annotated with a positive and negative label and added to the collection of labeled instances, which are then used for training.
    Type: Grant
    Filed: July 27, 2016
    Date of Patent: October 5, 2021
    Assignee: International Business Machines Corporation
    Inventors: Md Faisal M. Chowdhury, Sarthak Dash, Alfio M. Gliozzo
  • Publication number: 20210303800
    Abstract: One embodiment of the present invention provides a method comprising receiving a text corpus, and generating a first list of triples based on the text corpus. Each triple of the first list comprises a first term representing a candidate hyponym, a second term representing a candidate hypernym, and a frequency value indicative of a number of times a hypernymy relation is observed between the candidate hyponym and the candidate hypernym in the text corpus. The method further comprises training a neural network for hypernym induction based on the first list. The trained neural network is a strict partial order network (SPON) model.
    Type: Application
    Filed: June 9, 2021
    Publication date: September 30, 2021
    Inventors: Sarthak Dash, Alfio Massimiliano Gliozzo, Md Faisal Mahbub Chowdhury
  • Patent number: 11068665
    Abstract: One embodiment of the present invention provides a method comprising receiving a text corpus, and generating a first list of triples based on the text corpus. Each triple of the first list comprises a first term representing a candidate hyponym, a second term representing a candidate hypernym, and a frequency value indicative of a number of times a hypernymy relation is observed between the candidate hyponym and the candidate hypernym in the text corpus. The method further comprises training a neural network for hypernym induction based on the first list. The trained neural network is a strict partial order network (SPON) model.
    Type: Grant
    Filed: September 18, 2019
    Date of Patent: July 20, 2021
    Assignee: International Business Machines Corporation
    Inventors: Sarthak Dash, Alfio Massimiliano Gliozzo, Md Faisal Mahbub Chowdhury
  • Publication number: 20210182258
    Abstract: A computer-implemented method according to one embodiment includes converting an input question into a vector form using trained word embeddings; constructing a type similarity matrix using a predetermined ontology; and determining a score for all possible types for the input question, based on the input question in vector form and the type similarity matrix.
    Type: Application
    Filed: December 13, 2019
    Publication date: June 17, 2021
    Inventors: Sarthak Dash, Gaetano Rossiello, Alfio Massimiliano Gliozzo, Robert G. Farrell, Bassem Makni, Avirup Sil, Vittorio Castelli, Radu Florian
  • Publication number: 20210081500
    Abstract: One embodiment of the present invention provides a method comprising receiving a text corpus, and generating a first list of triples based on the text corpus. Each triple of the first list comprises a first term representing a candidate hyponym, a second term representing a candidate hypernym, and a frequency value indicative of a number of times a hypernymy relation is observed between the candidate hyponym and the candidate hypernym in the text corpus. The method further comprises training a neural network for hypernym induction based on the first list. The trained neural network is a strict partial order network (SPON) model.
    Type: Application
    Filed: September 18, 2019
    Publication date: March 18, 2021
    Inventors: Sarthak Dash, Alfio Massimiliano Gliozzo, Md Faisal Mahbub Chowdhury
  • Publication number: 20200218968
    Abstract: A system comprises a memory that stores computer-executable components; and a processor, operably coupled to the memory, that executes the computer-executable components. The system includes a receiving component that receives a corpus of data; a relation extraction component that generates noisy knowledge graphs from the corpus; and a training component that acquires global representations of entities and relation by training from output of the relation extraction component.
    Type: Application
    Filed: January 7, 2019
    Publication date: July 9, 2020
    Inventors: Alfio Massimiliano Gliozzo, Sarthak Dash, Michael Robert Glass, Mustafa Canim
  • Publication number: 20200097861
    Abstract: Techniques regarding autonomous classification and/or identification of various types of noise comprised within a knowledge graph are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a knowledge extraction component, operatively coupled to the processor, that can classify a type of noise comprised within a knowledge graph. The type of noise can be generated by an information extraction process.
    Type: Application
    Filed: September 25, 2018
    Publication date: March 26, 2020
    Inventors: Nandana Sampath Mihindukulasooriya, Oktie Hassanzadeh, Alfio Massimiliano Gliozzo, Sarthak Dash