Patents by Inventor Prithviraj Sen

Prithviraj Sen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11900070
    Abstract: A computer-implemented method according to one embodiment includes receiving, at a deep neural network (DNN), a plurality of sentences each having an associated label; training the DNN, utilizing the plurality of sentences and associated labels; and producing a linguistic expression (LE) utilizing the trained DNN.
    Type: Grant
    Filed: February 3, 2020
    Date of Patent: February 13, 2024
    Assignee: International Business Machines Corporation
    Inventors: Prithviraj Sen, Siddhartha Brahma, Yunyao Li, Laura Chiticariu, Rajasekar Krishnamurthy, Shivakumar Vaithyanathan, Marina Danilevsky Hailpern
  • Patent number: 11875131
    Abstract: Providing a predictive model for a target language by determining an instance weight for a labeled source language textual unit according to a set of unlabeled target language textual units, scaling, by the one or more computer processors, an error between a predicted label for the source language textual unit and a ground-truth label for the source language textual unit according to the instance weight, updating, by the one or more computer processors, network parameters of a predictive neural network model for the target language according to the error, and providing, by the one or more computer processors, the predictive neural network model for the target language to a user.
    Type: Grant
    Filed: September 16, 2020
    Date of Patent: January 16, 2024
    Assignee: International Business Machines Corporation
    Inventors: Zihui Li, Yunyao Li, Prithviraj Sen, Huaiyu Zhu
  • Patent number: 11829496
    Abstract: One embodiment provides for a method for evaluation of an artificial intelligence (AI) service, the method includes partitioning, by a processor, data into in-domain data and out-of-domain data. The processor defines held-out data from both of the in-domain data and the out-of-domain data for evaluation by each of domain and sub-domain based on building a taxonomy of both domains and sub-domains for the AI service. The held-out data is excluded from training data used for training the AI service. The processor further determines distribution underlying performance metrics for the held-out data using bootstrap validation processing. The processor also determines performance guarantees for multiple settings conditioned on multiple characteristics of an application scenario for the held-out data of the taxonomy based on the underlying performance metrics.
    Type: Grant
    Filed: June 28, 2022
    Date of Patent: November 28, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Prithviraj Sen, Rajasekar Krishnamurthy, Yunyao Li, Shivakumar Vaithyanathan, Hao Wang, Sang Don Han
  • Publication number: 20230222290
    Abstract: A system, computer program product, and method are provided for active learning (AL) for matching heterogeneous entity representations. The task in entity resolution (ER) is to find pairs from datasets that correspond to the same entity. A labeled training dataset is leveraged to train a first artificial intelligence (AI) model, with the first AI model training employing a pre-trained language model. A second AI model is trained with the language model updated by the first AI model, with the second AI model creating a candidate set of likely duplicate pairs. A subset is selectively identified from the candidate set. The labeled training set is augmented with the subset.
    Type: Application
    Filed: January 11, 2022
    Publication date: July 13, 2023
    Inventors: Prithviraj Sen, Sunita Sarawagi, Arjit Jain
  • Patent number: 11650970
    Abstract: Methods, systems, and computer program products for extracting structure and semantics from tabular data are provided herein. A computer-implemented method includes processing tabular data comprising data cells and header cells, wherein the processing includes: identifying one or more regions within the tabular data, wherein each of the regions comprises one or more of the data cells; matching some of the regions to one or more of the header cells, wherein the matched header cells are semantically related to the data cells inside the matched region; and generating, based on the matching, an output describing semantic relationships between the data cells and the header cells. The method also includes creating, for each data cell, a tuple comprising semantic information contained within one or more of the header cells that pertains to the data cell.
    Type: Grant
    Filed: March 9, 2018
    Date of Patent: May 16, 2023
    Assignee: International Business Machines Corporation
    Inventors: Xilun Chen, Laura Chiticariu, Alexandre Evfimievski, Marina Danilevsky Hailpern, Prithviraj Sen
  • Patent number: 11645525
    Abstract: In an approach, a processor trains a statistical classifier and a set of micro classifiers. A processor receives an input to be classified by the statistical classifier. A processor receives a label assigned to the input by the statistical classifier and respective labels assigned by each micro classifier of the set of micro classifiers. A processor determines that the label assigned by the statistical classifier is the same as at least one label assigned by at least one micro classifier of the set of micro classifiers. A processor generates a natural language explanation for assigning the label using the at least one micro classifier and the label. A processor outputs the label and the natural language explanation to a user of a computing device. A processor receives user feedback from the user in the form of an acceptance or a rejection of the natural language explanation.
    Type: Grant
    Filed: May 27, 2020
    Date of Patent: May 9, 2023
    Assignee: International Business Machines Corporation
    Inventors: Poornima Chozhiyath Raman, Prithviraj Sen, Yunyao Li, Dakshi Agrawal
  • Publication number: 20230100883
    Abstract: To improve the technological process of computerized answering of logical queries over incomplete knowledge bases, obtain a first order logic query; with a trained, computerized neural network, convert the first order logic query into a logic embedding; and answer the first order logic query using the logic embedding.
    Type: Application
    Filed: September 28, 2021
    Publication date: March 30, 2023
    Inventors: Francois Pierre Luus, Prithviraj Sen, Ryan Nelson Riegel, Ndivhuwo Makondo, Thabang Doreen Lebese, Naweed Aghmad Khan, Pavan Kapanipathi
  • Patent number: 11501111
    Abstract: Methods, systems, and computer program products for learning models for entity resolution using active learning are provided herein. A computer-implemented method includes determining a set of data items related to a task associated with structured knowledge base creation, and outputting the set of data items to a user for labeling. Such a method also includes generating, based on a user-labeled version of the set of data items, a candidate model for executing the task, and one or more generalized versions of the candidate model. Additionally, such a method can also include generating a final model based on one or more iterations of analysis of the candidate model and analysis of the one or more generalized versions of the candidate model, and performing the task by executing the final model on one or more datasets.
    Type: Grant
    Filed: April 6, 2018
    Date of Patent: November 15, 2022
    Assignee: International Business Machines Corporation
    Inventors: Kun Qian, Lucian Popa, Prithviraj Sen, Min Li
  • Publication number: 20220327331
    Abstract: One embodiment provides for a method for evaluation of an artificial intelligence (AI) service, the method includes partitioning, by a processor, data into in-domain data and out-of-domain data. The processor defines held-out data from both of the in-domain data and the out-of-domain data for evaluation by each of domain and sub-domain based on building a taxonomy of both domains and sub-domains for the AI service. The held-out data is excluded from training data used for training the AI service. The processor further determines distribution underlying performance metrics for the held-out data using bootstrap validation processing. The processor also determines performance guarantees for multiple settings conditioned on multiple characteristics of an application scenario for the held-out data of the taxonomy based on the underlying performance metrics.
    Type: Application
    Filed: June 28, 2022
    Publication date: October 13, 2022
    Inventors: Prithviraj Sen, Rajasekar Krishnamurthy, Yunyao Li, Shivakumar Vaithyanathan, Hao Wang, Sang Don Han
  • Publication number: 20220300799
    Abstract: A system, computer program product, and method are provided for entity linking in a logical neural network (LNN). A set of features are generated for one or more entity-mention pairs in an annotated dataset. The generated set of features is evaluated against an entity linking LNN rule template having one or more logically connected rules and corresponding connective weights organized in a tree structure. An artificial neural network is leveraged along with a corresponding machine learning algorithm to learn the connective weights. The connective weights associated with the logically connected rules are selectively updated and a learned model is generated with learned thresholds and the learned weights for the logically connected rules.
    Type: Application
    Filed: March 16, 2021
    Publication date: September 22, 2022
    Applicant: International Business Machines Corporation
    Inventors: Hang Jiang, Sairam Gurajada, Lucian Popa, Prithviraj Sen, Alexander Gray, Yunyao Li
  • Patent number: 11429816
    Abstract: One embodiment provides for a method for evaluation of an artificial intelligence (AI) service, the method includes partitioning, by a processor, data into in-domain data and out-of-domain data. The processor defines held-out data from the in-domain data and the out-of-domain data for evaluation by domain and sub-domain based on building a taxonomy of domains and sub-domains for the AI service. The processor further determines distribution underlying performance metrics for the held-out data using statistical processing. The processor also determines performance guarantees for multiple settings conditioned on multiple characteristics of an application scenario for the held-out data of the taxonomy based on the underlying performance metrics. The processor further provides confidence intervals based on the performance guarantees.
    Type: Grant
    Filed: September 6, 2018
    Date of Patent: August 30, 2022
    Assignee: International Business Machines Corporation
    Inventors: Prithviraj Sen, Rajasekar Krishnamurthy, Yunyao Li, Shivakumar Vaithyanathan, Hao Wang, Sang Don Han
  • Publication number: 20220269858
    Abstract: A system, computer program product, and method are provided for jointly learning dictionary based rules and dictionary candidates. Natural language text is received and parsed into subsets, with the subset being subjected to natural language processing to identify one or more verbs within the subset. The identified verbs are evaluated with respect to a dictionary and one or more rules. The evaluation is directed at each predicate in the rules with respect to the identified verbs. A neural network is leveraged to jointly induce modification of the rules and one or more dictionaries responsive to the evaluation.
    Type: Application
    Filed: February 19, 2021
    Publication date: August 25, 2022
    Applicant: International Business Machines Corporation
    Inventors: Prithviraj Sen, Marina Danilevsky Hailpern, Yunyao Li
  • Publication number: 20220197977
    Abstract: A computer-implemented method is provided for predicting future data values or target labels of multivariate time series data. The method includes receiving the multivariate time series data having present values, systematic missing values, and random missing values. The method further includes masking the present values, the systematic missing values, and the random missing values using triplet encodings. The method also includes determining time intervals between current missing values, from among the systematic missing values and the random missing values, and immediately preceding ones of the present values. The method additionally includes training, by a computing device, at least one recurrent neural network with the triplet encodings, the time intervals, and multivariate time series data to perform a feedforward pass on the recurrent neural network predicting the future data values or the target labels.
    Type: Application
    Filed: December 22, 2020
    Publication date: June 23, 2022
    Inventors: Mu Qiao, Yuya Jeremy Ong, Prithviraj Sen, Berthold Reinwald
  • Publication number: 20220188974
    Abstract: A method, system, and computer program product for learning entity resolution rules for determining whether entities are matching. The method may include receiving historical pairs of entities. The method may also include determining a set of rules for determining whether a pair of entities are matching, where the set of rules comprises a plurality of conditions. The method may also include developing, using a deep neural network, an entity resolution model based on the historical pairs of entities. The method may also include receiving a new pair of entities. The method may also include applying the entity resolution model to the new pair of entities. The method may also include determining whether one or more rules from the set of rules are satisfied for the new pair of entities. The method may also include categorizing the new pair of entities as matching or not matching.
    Type: Application
    Filed: December 14, 2020
    Publication date: June 16, 2022
    Inventors: Sheshera Mysore, Sairam Gurajada, Lucian Popa, Kun Qian, Prithviraj Sen
  • Publication number: 20220083744
    Abstract: Providing a predictive model for a target language by determining an instance weight for a labeled source language textual unit according to a set of unlabeled target language textual units, scaling, by the one or more computer processors, an error between a predicted label for the source language textual unit and a ground-truth label for the source language textual unit according to the instance weight, updating, by the one or more computer processors, network parameters of a predictive neural network model for the target language according to the error, and providing, by the one or more computer processors, the predictive neural network model for the target language to a user.
    Type: Application
    Filed: September 16, 2020
    Publication date: March 17, 2022
    Inventors: Zihui Li, Yunyao Li, Prithviraj Sen, Huaiyu Zhu
  • Publication number: 20220058465
    Abstract: In an approach for forecasting in multivariate irregularly sampled time series, a processor receives time series data having one or more missing values. A processor determines, from the time series data, non-missing values present in the time series data. A processor determines, from the time series data, zero or more mask values for the time series data. A processor determines time interval values. A processor inputs the one or more missing values, the non-missing values, the zero or more mask values, and the time interval values into a recurrent neural network. A processor determines a predicted value for the one or more missing values.
    Type: Application
    Filed: August 24, 2020
    Publication date: February 24, 2022
    Inventors: Prithviraj Sen, Berthold Reinwald, Shivam Srivastava
  • Patent number: 11200413
    Abstract: Methods, systems, and computer program products for table recognition in PDF documents are provided herein. A computer-implemented method includes discretizing one or more contiguous areas of a PDF document; identifying one or more white-space separator lines within the one or more discretized contiguous areas of the PDF document; detecting one or more candidate table regions within the one or more discretized contiguous areas of the PDF document by clustering the one or more white-space separator lines into one or more grids; and outputting at least one of the candidate table regions as a finalized table in accordance with scores assigned to each of the one or more candidate table regions based on (i) border information and (ii) cell structure information.
    Type: Grant
    Filed: July 31, 2018
    Date of Patent: December 14, 2021
    Assignee: International Business Machines Corporation
    Inventors: Douglas Ronald Burdick, Wei Cheng, Alexandre Evfimievski, Marina Danilevsky Hailpern, Rajasekar Krishnamurthy, Shajith Ikbal Mohamed, Prithviraj Sen, Shivakumar Vaithyanathan
  • Publication number: 20210374516
    Abstract: In an approach, a processor trains a statistical classifier and a set of micro classifiers. A processor receives an input to be classified by the statistical classifier. A processor receives a label assigned to the input by the statistical classifier and respective labels assigned by each micro classifier of the set of micro classifiers. A processor determines that the label assigned by the statistical classifier is the same as at least one label assigned by at least one micro classifier of the set of micro classifiers. A processor generates a natural language explanation for assigning the label using the at least one micro classifier and the label. A processor outputs the label and the natural language explanation to a user of a computing device. A processor receives user feedback from the user in the form of an acceptance or a rejection of the natural language explanation.
    Type: Application
    Filed: May 27, 2020
    Publication date: December 2, 2021
    Inventors: POORNIMA CHOZHIYATH RAMAN, PRITHVIRAJ SEN, YUNYAO LI, DAKSHI AGRAWAL
  • Publication number: 20210271817
    Abstract: A computer-implemented method according to one embodiment includes receiving a plurality of linguistic expressions (LEs); changing one or more conditions of the plurality of linguistic expressions to create an updated plurality of linguistic expressions, utilizing a visual exploration framework (VEF) that visually presents to a user each of the plurality of linguistic expressions; and including the updated plurality of linguistic expressions in a model used to classify input sentences. According to another embodiment, a computer-implemented method includes receiving (i) a set of linguistic expressions (LEs) and (ii) a set of labeled data as input, where the LEs are logical combinations of predicates learned from the labeled data, and each data point in the labeled data comprises a piece of text and ground-truth labels; presenting the LEs in a visual exploration framework; and allowing a user to sort, filter, subset, and select LEs based on different criteria, utilizing the framework.
    Type: Application
    Filed: February 28, 2020
    Publication date: September 2, 2021
    Inventors: Prithviraj Sen, Yiwei Yang, Yunyao Li, Eser Kandogan
  • Publication number: 20210240917
    Abstract: A computer-implemented method according to one embodiment includes receiving, at a deep neural network (DNN), a plurality of sentences each having an associated label; training the DNN, utilizing the plurality of sentences and associated labels; and producing a linguistic expression (LE) utilizing the trained DNN.
    Type: Application
    Filed: February 3, 2020
    Publication date: August 5, 2021
    Inventors: Prithviraj Sen, Siddhartha Brahma, Yunyao Li, Laura Chiticariu, Rajasekar Krishnamurthy, Shivakumar Vaithyanathan, Marina Danilevsky Hailpern