Patents by Inventor Sarthak Dash

Sarthak Dash has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Canonicalization of data within open knowledge graphs

Patent number: 12632724

Abstract: Embodiments of the present invention provide computer-implemented methods, computer program products and computer systems. Embodiments of the present invention can, in response to receiving information, learn entity representations and cluster assignments of respective entity representations in a joint manner for both entities and relations of respective entities.

Type: Grant

Filed: September 21, 2021

Date of Patent: May 19, 2026

Assignee: International Business Machines Corporation

Inventors: Sarthak Dash, Gaetano Rossiello, Nandana Mihindukulasooriya, Sugato Bagchi, Alfio Massimiliano Gliozzo
Artificial intelligence based metadata semantic enrichment

Patent number: 12547837

Abstract: Mechanisms are provided for automatically generating semantical enhanced metadata for a structured data structure. Multi-task machine learning training is performed, based on data comprising separate sets of training data samples for each of a plurality of semantic metadata enhancement tasks, of a base artificial intelligence (AI) computer model to thereby generate a fine-tuned AI computer model trained to specifically generate semantically enhanced metadata for structured data structures. A prompt is received that specifies a structure of an input structured data structure and requests a semantic metadata enhancement task from the plurality of semantic metadata enhancement tasks. The fine-tuned AI computer model processes the prompt to generate semantically enhanced metadata for the structure of the input structured data structure and provide it to a downstream computing system for performing a downstream computing operation based on the semantically enhanced metadata.

Type: Grant

Filed: January 10, 2024

Date of Patent: February 10, 2026

Assignee: International Business Machines Corporation

Inventors: Md Faisal Mahbub Chowdhury, Alfio Massimiliano Gliozzo, Nandana Sampath Mihindukulasooriya, Michael Robert Glass, Sarthak Dash, Sugato Bagchi, Gaetano Rossiello
Artificial Intelligence Based Metadata Semantic Enrichment

Publication number: 20250225328

Abstract: Mechanisms are provided for automatically generating semantical enhanced metadata for a structured data structure. Multi-task machine learning training is performed, based on data comprising separate sets of training data samples for each of a plurality of semantic metadata enhancement tasks, of a base artificial intelligence (AI) computer model to thereby generate a fine-tuned AI computer model trained to specifically generate semantically enhanced metadata for structured data structures. A prompt is received that specifies a structure of an input structured data structure and requests a semantic metadata enhancement task from the plurality of semantic metadata enhancement tasks. The fine-tuned AI computer model processes the prompt to generate semantically enhanced metadata for the structure of the input structured data structure and provide it to a downstream computing system for performing a downstream computing operation based on the semantically enhanced metadata.

Type: Application

Filed: January 10, 2024

Publication date: July 10, 2025

Inventors: Md Faisal Mahbub Chowdhury, Alfio Massimiliano Gliozzo, NANDANA SAMPATH MIHINDUKULASOORIYA, Michael Robert Glass, Sarthak Dash, Sugato Bagchi, Gaetano Rossiello
Permutation invariance for representing linearized tabular data

Patent number: 12242796

Abstract: An embodiment for encoding permutation-invariant representations of linearized tabular data. The embodiment may receive input including tabular data and linearize a column or row within the received tabular data. The embodiment may automatically assign an increasing sequence of position identifiers to each non-delimiting tokenized cell in the linearized column or row until a header delimiter is reached. The embodiment may, in response to reaching the header delimiter, automatically assign a monotonically increasing sequence of position identifiers for each non-delimiting tokenized cell positioned after the header delimiter, restarting from an integer corresponding to 1 greater than the position identifier assigned to the header delimiter for each non-delimiting tokenized cell positioned after cell delimiters.

Type: Grant

Filed: June 17, 2022

Date of Patent: March 4, 2025

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sarthak Dash, Sugato Bagchi, Nandana Mihindukulasooriya, Alfio Massimiliano Gliozzo
Linking tabular columns to unseen ontologies

Patent number: 12216635

Abstract: An embodiment for improved linking of tabular columns to column types in an ontology unseen during training. The embodiment may for a target table, encode a target tabular query column, table headers, and target types independently to generate permutation invariant representations of tabular data associated with the target table. The embodiment may, for each of the target types, extract and further encode auxiliary information. The embodiment may process the encoded tabular data to obtain a first vector and a second vector. The embodiment may concatenate the first vector and the second vector to generate a final query vector. The embodiment may process the encoded target types through a third transformer to obtain a third vector. The embodiment may calculate a score to model interactions between the target tabular query column of the target table and the target types.

Type: Grant

Filed: June 6, 2023

Date of Patent: February 4, 2025

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sarthak Dash, Sugato Bagchi, Nandana Sampath Mihindukulasooriya, Alfio Massimiliano Gliozzo
LINKING TABULAR COLUMNS TO UNSEEN ONTOLOGIES

Publication number: 20240411741

Abstract: An embodiment for improved linking of tabular columns to column types in an ontology unseen during training. The embodiment may for a target table, encode a target tabular query column, table headers, and target types independently to generate permutation invariant representations of tabular data associated with the target table. The embodiment may, for each of the target types, extract and further encode auxiliary information. The embodiment may process the encoded tabular data to obtain a first vector and a second vector. The embodiment may concatenate the first vector and the second vector to generate a final query vector. The embodiment may process the encoded target types through a third transformer to obtain a third vector. The embodiment may calculate a score to model interactions between the target tabular query column of the target table and the target types.

Type: Application

Filed: June 6, 2023

Publication date: December 12, 2024

Inventors: Sarthak Dash, Sugato Bagchi, NANDANA SAMPATH MIHINDUKULASOORIYA, Alfio Massimiliano Gliozzo
Deep symbolic validation of information extraction systems

Patent number: 11907842

Abstract: A system comprises a memory that stores computer-executable components; and a processor, operably coupled to the memory, that executes the computer-executable components. The system includes a receiving component that receives a corpus of data; a relation extraction component that generates noisy knowledge graphs from the corpus; and a training component that acquires global representations of entities and relation by training from output of the relation extraction component.

Type: Grant

Filed: January 13, 2023

Date of Patent: February 20, 2024

Assignee: NTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alfio Massimiliano Gliozzo, Sarthak Dash, Michael Robert Glass, Mustafa Canim
PERMUTATION INVARIANCE FOR REPRESENTING LINEARIZED TABULAR DATA

Publication number: 20230409806

Abstract: An embodiment for encoding permutation-invariant representations of linearized tabular data. The embodiment may receive input including tabular data and linearize a column or row within the received tabular data. The embodiment may automatically assign an increasing sequence of position identifiers to each non-delimiting tokenized cell in the linearized column or row until a header delimiter is reached. The embodiment may, in response to reaching the header delimiter, automatically assign a monotonically increasing sequence of position identifiers for each non-delimiting tokenized cell positioned after the header delimiter, restarting from an integer corresponding to 1 greater than the position identifier assigned to the header delimiter for each non-delimiting tokenized cell positioned after cell delimiters.

Type: Application

Filed: June 17, 2022

Publication date: December 21, 2023

Inventors: Sarthak Dash, Sugato Bagchi, NANDANA MIHINDUKULASOORIYA, Alfio Massimiliano Gliozzo
Noise detection in knowledge graphs

Patent number: 11693896

Abstract: Techniques regarding autonomous classification and/or identification of various types of noise comprised within a knowledge graph are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a knowledge extraction component, operatively coupled to the processor, that can classify a type of noise comprised within a knowledge graph. The type of noise can be generated by an information extraction process.

Type: Grant

Filed: September 25, 2018

Date of Patent: July 4, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Nandana Sampath Mihindukulasooriya, Oktie Hassanzadeh, Alfio Massimiliano Gliozzo, Sarthak Dash
Hypernym detection using strict partial order networks

Patent number: 11694035

Abstract: One embodiment of the present invention provides a method comprising receiving a text corpus, and generating a first list of triples based on the text corpus. Each triple of the first list comprises a first term representing a candidate hyponym, a second term representing a candidate hypernym, and a frequency value indicative of a number of times a hypernymy relation is observed between the candidate hyponym and the candidate hypernym in the text corpus. The method further comprises training a neural network for hypernym induction based on the first list. The trained neural network is a strict partial order network (SPON) model.

Type: Grant

Filed: June 9, 2021

Date of Patent: July 4, 2023

Assignee: International Business Machines Corporation

Inventors: Sarthak Dash, Alfio Massimiliano Gliozzo, Md Faisal Mahbub Chowdhury
DEEP SYMBOLIC VALIDATION OF INFORMATION EXTRACTION SYSTEMS

Publication number: 20230177335

Abstract: A system comprises a memory that stores computer-executable components; and a processor, operably coupled to the memory, that executes the computer-executable components. The system includes a receiving component that receives a corpus of data; a relation extraction component that generates noisy knowledge graphs from the corpus; and a training component that acquires global representations of entities and relation by training from output of the relation extraction component.

Type: Application

Filed: January 13, 2023

Publication date: June 8, 2023

Inventors: Alfio Massimiliano Gliozzo, Sarthak Dash, Michael Robert Glass, Mustafa Canim
CANONICALIZATION OF DATA WITHIN OPEN KNOWLEDGE GRAPHS

Publication number: 20230087667

Abstract: Embodiments of the present invention provide computer-implemented methods, computer program products and computer systems. Embodiments of the present invention can, in response to receiving information, learn entity representations and cluster assignments of respective entity representations in a joint manner for both entities and relations of respective entities.

Type: Application

Filed: September 21, 2021

Publication date: March 23, 2023

Inventors: Sarthak Dash, Gaetano Rossiello, NANDANA MIHINDUKULASOORIYA, Sugato Bagchi, Alfio Massimiliano Gliozzo
Deep symbolic validation of information extraction systems

Patent number: 11574179

Abstract: A system comprises a memory that stores computer-executable components; and a processor, operably coupled to the memory, that executes the computer-executable components. The system includes a receiving component that receives a corpus of data; a relation extraction component that generates noisy knowledge graphs from the corpus; and a training component that acquires global representations of entities and relation by training from output of the relation extraction component.

Type: Grant

Filed: January 7, 2019

Date of Patent: February 7, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alfio Massimiliano Gliozzo, Sarthak Dash, Michael Robert Glass, Mustafa Canim
Discovering ranked domain relevant terms using knowledge

Patent number: 11526688

Abstract: One embodiment of the invention provides a method for terminology ranking for use in natural language processing. The method comprises receiving a list of terms extracted from a corpus, where the list comprises a ranking of the terms based on frequencies of the terms across the corpus. The method further comprises accessing a domain ontology associated with the corpus, and re-ranking the list based on the domain ontology. The resulting re-ranked list comprises a different ranking of the terms based on relevance of the terms using knowledge from the domain ontology. The method further comprises generating clusters of terms via a trained model adapted to the corpus, and boosting a rank of at least one term of the re-ranked list based on the clusters to increase a relevance of the at least one term using knowledge from the trained model.

Type: Grant

Filed: April 16, 2020

Date of Patent: December 13, 2022

Assignee: International Business Machines Corporation

Inventors: Nandana Mihindukulasooriya, Ruchi Mahindru, Md Faisal Mahbub Chowdhury, Yu Deng, Alfio Massimiliano Gliozzo, Sarthak Dash, Nicolas Rodolfo Fauceglia, Gaetano Rossiello
Performing fine-grained question type classification

Patent number: 11520762

Abstract: A computer-implemented method according to one embodiment includes converting an input question into a vector form using trained word embeddings; constructing a type similarity matrix using a predetermined ontology; and determining a score for all possible types for the input question, based on the input question in vector form and the type similarity matrix.

Type: Grant

Filed: December 13, 2019

Date of Patent: December 6, 2022

Assignee: International Business Machines Corporation

Inventors: Sarthak Dash, Gaetano Rossiello, Alfio Massimiliano Gliozzo, Robert G. Farrell, Bassem Makni, Avirup Sil, Vittorio Castelli, Radu Florian
Similarity based negative sampling analysis

Patent number: 11500910

Abstract: Techniques regarding similarity based negative sample analysis are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a similarity component that can determine similarity metrics for respective entities based on a vector space model. The respective entities can be represented by a dataset. Also, the computer executable components can comprise a sampling component that can perform a negative sampling analysis on the dataset based on the similarity metrics.

Type: Grant

Filed: March 21, 2018

Date of Patent: November 15, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sarthak Dash, Alfio Massimiliano Gliozzo, Michael Robert Glass
Taxonomy construction via graph-based cross-domain knowledge transfer

Patent number: 11423307

Abstract: A system, computer program product, and method are provided for employing a graph neural network (GNN) to construct a taxonomy. The GNN is subject to a training cycle and an inference cycle. The training cycle encodes cross-domain terms pairs from a set of noisy cross domain pairs extracted from a corpora, and outputs a preliminary taxonomy. The inference cycle identifies candidate term pairs and selectively subjects the candidate term pairs to selective filtering to produce a system predicted taxonomy from the preliminary taxonomy.

Type: Grant

Filed: June 3, 2020

Date of Patent: August 23, 2022

Assignee: International Business Machines Corporation

Inventors: Chao Shang, Sarthak Dash, Md Faisal Mahbub Chowdhury, Alfio Massimiliano Gliozzo
Taxonomy Construction via Graph-Based Cross-domain Knowledge Transfer

Publication number: 20210383205

Abstract: A system, computer program product, and method are provided for employing a graph neural network (GNN) to construct a taxonomy. The GNN is subject to a training cycle and an inference cycle. The training cycle encodes cross-domain terms pairs from a set of noisy cross domain pairs extracted from a corpora, and outputs a preliminary taxonomy. The inference cycle identifies candidate term pairs and selectively subjects the candidate term pairs to selective filtering to produce a system predicted taxonomy from the preliminary taxonomy.

Type: Application

Filed: June 3, 2020

Publication date: December 9, 2021

Applicant: International Business Machines Corporation

Inventors: Chao Shang, Sarthak Dash, Md Faisal Mahbub Chowdhury, Alfio Massimiliano Gliozzo
DISCOVERING RANKED DOMAIN RELEVANT TERMS USING KNOWLEDGE

Publication number: 20210326636

Abstract: One embodiment of the invention provides a method for terminology ranking for use in natural language processing. The method comprises receiving a list of terms extracted from a corpus, where the list comprises a ranking of the terms based on frequencies of the terms across the corpus. The method further comprises accessing a domain ontology associated with the corpus, and re-ranking the list based on the domain ontology. The resulting re-ranked list comprises a different ranking of the terms based on relevance of the terms using knowledge from the domain ontology. The method further comprises generating clusters of terms via a trained model adapted to the corpus, and boosting a rank of at least one term of the re-ranked list based on the clusters to increase a relevance of the at least one term using knowledge from the trained model.

Type: Application

Filed: April 16, 2020

Publication date: October 21, 2021

Inventors: Nandana Mihindukulasooriya, Ruchi Mahindru, Md Faisal Mahbub Chowdhury, Yu Deng, Alfio Massimiliano Gliozzo, Sarthak Dash, Nicolas Rodolfo Fauceglia, Gaetano Rossiello
Greedy active learning for reducing labeled data imbalances

Patent number: 11138523

Abstract: A method, system and computer-usable medium are disclosed for reducing labeled data imbalances when training an active learning system. The ratio of instances having positive labels or negative labels in a collection of labeled instances associated with an input category used for learning is determined. A first instance for annotation is selected from a collection of unlabeled instances if a first threshold for negative instances, and a first threshold confidence level of being a positive instance of the input category, have been met. A second instance for annotation is selected if a second threshold for positive instances, and a second threshold confidence level of being a negative instance of the input category, have been met. The first and second instances are respectively annotated with a positive and negative label and added to the collection of labeled instances, which are then used for training.

Type: Grant

Filed: July 27, 2016

Date of Patent: October 5, 2021

Assignee: International Business Machines Corporation

Inventors: Md Faisal M. Chowdhury, Sarthak Dash, Alfio M. Gliozzo

1 2 next