Patents by Inventor Sarthak Dash
Sarthak Dash has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11907842Abstract: A system comprises a memory that stores computer-executable components; and a processor, operably coupled to the memory, that executes the computer-executable components. The system includes a receiving component that receives a corpus of data; a relation extraction component that generates noisy knowledge graphs from the corpus; and a training component that acquires global representations of entities and relation by training from output of the relation extraction component.Type: GrantFiled: January 13, 2023Date of Patent: February 20, 2024Assignee: NTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alfio Massimiliano Gliozzo, Sarthak Dash, Michael Robert Glass, Mustafa Canim
-
Publication number: 20230409806Abstract: An embodiment for encoding permutation-invariant representations of linearized tabular data. The embodiment may receive input including tabular data and linearize a column or row within the received tabular data. The embodiment may automatically assign an increasing sequence of position identifiers to each non-delimiting tokenized cell in the linearized column or row until a header delimiter is reached. The embodiment may, in response to reaching the header delimiter, automatically assign a monotonically increasing sequence of position identifiers for each non-delimiting tokenized cell positioned after the header delimiter, restarting from an integer corresponding to 1 greater than the position identifier assigned to the header delimiter for each non-delimiting tokenized cell positioned after cell delimiters.Type: ApplicationFiled: June 17, 2022Publication date: December 21, 2023Inventors: Sarthak Dash, Sugato Bagchi, NANDANA MIHINDUKULASOORIYA, Alfio Massimiliano Gliozzo
-
Patent number: 11694035Abstract: One embodiment of the present invention provides a method comprising receiving a text corpus, and generating a first list of triples based on the text corpus. Each triple of the first list comprises a first term representing a candidate hyponym, a second term representing a candidate hypernym, and a frequency value indicative of a number of times a hypernymy relation is observed between the candidate hyponym and the candidate hypernym in the text corpus. The method further comprises training a neural network for hypernym induction based on the first list. The trained neural network is a strict partial order network (SPON) model.Type: GrantFiled: June 9, 2021Date of Patent: July 4, 2023Assignee: International Business Machines CorporationInventors: Sarthak Dash, Alfio Massimiliano Gliozzo, Md Faisal Mahbub Chowdhury
-
Patent number: 11693896Abstract: Techniques regarding autonomous classification and/or identification of various types of noise comprised within a knowledge graph are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a knowledge extraction component, operatively coupled to the processor, that can classify a type of noise comprised within a knowledge graph. The type of noise can be generated by an information extraction process.Type: GrantFiled: September 25, 2018Date of Patent: July 4, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Nandana Sampath Mihindukulasooriya, Oktie Hassanzadeh, Alfio Massimiliano Gliozzo, Sarthak Dash
-
Publication number: 20230177335Abstract: A system comprises a memory that stores computer-executable components; and a processor, operably coupled to the memory, that executes the computer-executable components. The system includes a receiving component that receives a corpus of data; a relation extraction component that generates noisy knowledge graphs from the corpus; and a training component that acquires global representations of entities and relation by training from output of the relation extraction component.Type: ApplicationFiled: January 13, 2023Publication date: June 8, 2023Inventors: Alfio Massimiliano Gliozzo, Sarthak Dash, Michael Robert Glass, Mustafa Canim
-
Publication number: 20230087667Abstract: Embodiments of the present invention provide computer-implemented methods, computer program products and computer systems. Embodiments of the present invention can, in response to receiving information, learn entity representations and cluster assignments of respective entity representations in a joint manner for both entities and relations of respective entities.Type: ApplicationFiled: September 21, 2021Publication date: March 23, 2023Inventors: Sarthak Dash, Gaetano Rossiello, NANDANA MIHINDUKULASOORIYA, Sugato Bagchi, Alfio Massimiliano Gliozzo
-
Patent number: 11574179Abstract: A system comprises a memory that stores computer-executable components; and a processor, operably coupled to the memory, that executes the computer-executable components. The system includes a receiving component that receives a corpus of data; a relation extraction component that generates noisy knowledge graphs from the corpus; and a training component that acquires global representations of entities and relation by training from output of the relation extraction component.Type: GrantFiled: January 7, 2019Date of Patent: February 7, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alfio Massimiliano Gliozzo, Sarthak Dash, Michael Robert Glass, Mustafa Canim
-
Patent number: 11526688Abstract: One embodiment of the invention provides a method for terminology ranking for use in natural language processing. The method comprises receiving a list of terms extracted from a corpus, where the list comprises a ranking of the terms based on frequencies of the terms across the corpus. The method further comprises accessing a domain ontology associated with the corpus, and re-ranking the list based on the domain ontology. The resulting re-ranked list comprises a different ranking of the terms based on relevance of the terms using knowledge from the domain ontology. The method further comprises generating clusters of terms via a trained model adapted to the corpus, and boosting a rank of at least one term of the re-ranked list based on the clusters to increase a relevance of the at least one term using knowledge from the trained model.Type: GrantFiled: April 16, 2020Date of Patent: December 13, 2022Assignee: International Business Machines CorporationInventors: Nandana Mihindukulasooriya, Ruchi Mahindru, Md Faisal Mahbub Chowdhury, Yu Deng, Alfio Massimiliano Gliozzo, Sarthak Dash, Nicolas Rodolfo Fauceglia, Gaetano Rossiello
-
Patent number: 11520762Abstract: A computer-implemented method according to one embodiment includes converting an input question into a vector form using trained word embeddings; constructing a type similarity matrix using a predetermined ontology; and determining a score for all possible types for the input question, based on the input question in vector form and the type similarity matrix.Type: GrantFiled: December 13, 2019Date of Patent: December 6, 2022Assignee: International Business Machines CorporationInventors: Sarthak Dash, Gaetano Rossiello, Alfio Massimiliano Gliozzo, Robert G. Farrell, Bassem Makni, Avirup Sil, Vittorio Castelli, Radu Florian
-
Patent number: 11500910Abstract: Techniques regarding similarity based negative sample analysis are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a similarity component that can determine similarity metrics for respective entities based on a vector space model. The respective entities can be represented by a dataset. Also, the computer executable components can comprise a sampling component that can perform a negative sampling analysis on the dataset based on the similarity metrics.Type: GrantFiled: March 21, 2018Date of Patent: November 15, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sarthak Dash, Alfio Massimiliano Gliozzo, Michael Robert Glass
-
Patent number: 11423307Abstract: A system, computer program product, and method are provided for employing a graph neural network (GNN) to construct a taxonomy. The GNN is subject to a training cycle and an inference cycle. The training cycle encodes cross-domain terms pairs from a set of noisy cross domain pairs extracted from a corpora, and outputs a preliminary taxonomy. The inference cycle identifies candidate term pairs and selectively subjects the candidate term pairs to selective filtering to produce a system predicted taxonomy from the preliminary taxonomy.Type: GrantFiled: June 3, 2020Date of Patent: August 23, 2022Assignee: International Business Machines CorporationInventors: Chao Shang, Sarthak Dash, Md Faisal Mahbub Chowdhury, Alfio Massimiliano Gliozzo
-
Publication number: 20210383205Abstract: A system, computer program product, and method are provided for employing a graph neural network (GNN) to construct a taxonomy. The GNN is subject to a training cycle and an inference cycle. The training cycle encodes cross-domain terms pairs from a set of noisy cross domain pairs extracted from a corpora, and outputs a preliminary taxonomy. The inference cycle identifies candidate term pairs and selectively subjects the candidate term pairs to selective filtering to produce a system predicted taxonomy from the preliminary taxonomy.Type: ApplicationFiled: June 3, 2020Publication date: December 9, 2021Applicant: International Business Machines CorporationInventors: Chao Shang, Sarthak Dash, Md Faisal Mahbub Chowdhury, Alfio Massimiliano Gliozzo
-
Publication number: 20210326636Abstract: One embodiment of the invention provides a method for terminology ranking for use in natural language processing. The method comprises receiving a list of terms extracted from a corpus, where the list comprises a ranking of the terms based on frequencies of the terms across the corpus. The method further comprises accessing a domain ontology associated with the corpus, and re-ranking the list based on the domain ontology. The resulting re-ranked list comprises a different ranking of the terms based on relevance of the terms using knowledge from the domain ontology. The method further comprises generating clusters of terms via a trained model adapted to the corpus, and boosting a rank of at least one term of the re-ranked list based on the clusters to increase a relevance of the at least one term using knowledge from the trained model.Type: ApplicationFiled: April 16, 2020Publication date: October 21, 2021Inventors: Nandana Mihindukulasooriya, Ruchi Mahindru, Md Faisal Mahbub Chowdhury, Yu Deng, Alfio Massimiliano Gliozzo, Sarthak Dash, Nicolas Rodolfo Fauceglia, Gaetano Rossiello
-
Patent number: 11138523Abstract: A method, system and computer-usable medium are disclosed for reducing labeled data imbalances when training an active learning system. The ratio of instances having positive labels or negative labels in a collection of labeled instances associated with an input category used for learning is determined. A first instance for annotation is selected from a collection of unlabeled instances if a first threshold for negative instances, and a first threshold confidence level of being a positive instance of the input category, have been met. A second instance for annotation is selected if a second threshold for positive instances, and a second threshold confidence level of being a negative instance of the input category, have been met. The first and second instances are respectively annotated with a positive and negative label and added to the collection of labeled instances, which are then used for training.Type: GrantFiled: July 27, 2016Date of Patent: October 5, 2021Assignee: International Business Machines CorporationInventors: Md Faisal M. Chowdhury, Sarthak Dash, Alfio M. Gliozzo
-
Publication number: 20210303800Abstract: One embodiment of the present invention provides a method comprising receiving a text corpus, and generating a first list of triples based on the text corpus. Each triple of the first list comprises a first term representing a candidate hyponym, a second term representing a candidate hypernym, and a frequency value indicative of a number of times a hypernymy relation is observed between the candidate hyponym and the candidate hypernym in the text corpus. The method further comprises training a neural network for hypernym induction based on the first list. The trained neural network is a strict partial order network (SPON) model.Type: ApplicationFiled: June 9, 2021Publication date: September 30, 2021Inventors: Sarthak Dash, Alfio Massimiliano Gliozzo, Md Faisal Mahbub Chowdhury
-
Patent number: 11068665Abstract: One embodiment of the present invention provides a method comprising receiving a text corpus, and generating a first list of triples based on the text corpus. Each triple of the first list comprises a first term representing a candidate hyponym, a second term representing a candidate hypernym, and a frequency value indicative of a number of times a hypernymy relation is observed between the candidate hyponym and the candidate hypernym in the text corpus. The method further comprises training a neural network for hypernym induction based on the first list. The trained neural network is a strict partial order network (SPON) model.Type: GrantFiled: September 18, 2019Date of Patent: July 20, 2021Assignee: International Business Machines CorporationInventors: Sarthak Dash, Alfio Massimiliano Gliozzo, Md Faisal Mahbub Chowdhury
-
Publication number: 20210182258Abstract: A computer-implemented method according to one embodiment includes converting an input question into a vector form using trained word embeddings; constructing a type similarity matrix using a predetermined ontology; and determining a score for all possible types for the input question, based on the input question in vector form and the type similarity matrix.Type: ApplicationFiled: December 13, 2019Publication date: June 17, 2021Inventors: Sarthak Dash, Gaetano Rossiello, Alfio Massimiliano Gliozzo, Robert G. Farrell, Bassem Makni, Avirup Sil, Vittorio Castelli, Radu Florian
-
Publication number: 20210081500Abstract: One embodiment of the present invention provides a method comprising receiving a text corpus, and generating a first list of triples based on the text corpus. Each triple of the first list comprises a first term representing a candidate hyponym, a second term representing a candidate hypernym, and a frequency value indicative of a number of times a hypernymy relation is observed between the candidate hyponym and the candidate hypernym in the text corpus. The method further comprises training a neural network for hypernym induction based on the first list. The trained neural network is a strict partial order network (SPON) model.Type: ApplicationFiled: September 18, 2019Publication date: March 18, 2021Inventors: Sarthak Dash, Alfio Massimiliano Gliozzo, Md Faisal Mahbub Chowdhury
-
Publication number: 20200218968Abstract: A system comprises a memory that stores computer-executable components; and a processor, operably coupled to the memory, that executes the computer-executable components. The system includes a receiving component that receives a corpus of data; a relation extraction component that generates noisy knowledge graphs from the corpus; and a training component that acquires global representations of entities and relation by training from output of the relation extraction component.Type: ApplicationFiled: January 7, 2019Publication date: July 9, 2020Inventors: Alfio Massimiliano Gliozzo, Sarthak Dash, Michael Robert Glass, Mustafa Canim
-
Publication number: 20200097861Abstract: Techniques regarding autonomous classification and/or identification of various types of noise comprised within a knowledge graph are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a knowledge extraction component, operatively coupled to the processor, that can classify a type of noise comprised within a knowledge graph. The type of noise can be generated by an information extraction process.Type: ApplicationFiled: September 25, 2018Publication date: March 26, 2020Inventors: Nandana Sampath Mihindukulasooriya, Oktie Hassanzadeh, Alfio Massimiliano Gliozzo, Sarthak Dash