Patents by Inventor Prithviraj Sen

Prithviraj Sen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210240917
    Abstract: A computer-implemented method according to one embodiment includes receiving, at a deep neural network (DNN), a plurality of sentences each having an associated label; training the DNN, utilizing the plurality of sentences and associated labels; and producing a linguistic expression (LE) utilizing the trained DNN.
    Type: Application
    Filed: February 3, 2020
    Publication date: August 5, 2021
    Inventors: Prithviraj Sen, Siddhartha Brahma, Yunyao Li, Laura Chiticariu, Rajasekar Krishnamurthy, Shivakumar Vaithyanathan, Marina Danilevsky Hailpern
  • Patent number: 10776269
    Abstract: One embodiment provides for a method that includes performing, by a processor, active learning of large scale entity resolution using a distributed compute memoing cache to eliminate redundant computation. Link feature vector tables are determined for intermediate results of the active learning of large scale entity resolution. The link feature vector tables are managed by a two-level cache hierarchy.
    Type: Grant
    Filed: July 24, 2018
    Date of Patent: September 15, 2020
    Assignee: International Business Machines Corporation
    Inventors: Min Li, Lucian Popa, Prithviraj Sen
  • Publication number: 20200082228
    Abstract: One embodiment provides for a method for evaluation of an artificial intelligence (AI) service, the method includes partitioning, by a processor, data into in-domain data and out-of-domain data. The processor defines held-out data from the in-domain data and the out-of-domain data for evaluation by domain and sub-domain based on building a taxonomy of domains and sub-domains for the AI service. The processor further determines distribution underlying performance metrics for the held-out data using statistical processing. The processor also determines performance guarantees for multiple settings conditioned on multiple characteristics of an application scenario for the held-out data of the taxonomy based on the underlying performance metrics. The processor further provides confidence intervals based on the performance guarantees.
    Type: Application
    Filed: September 6, 2018
    Publication date: March 12, 2020
    Inventors: Prithviraj Sen, Rajasekar Krishnamurthy, Yunyao Li, Shivakumar Vaithyanathan, Hao Wang, Sang Don Han
  • Publication number: 20200042785
    Abstract: Methods, systems, and computer program products for table recognition in PDF documents are provided herein. A computer-implemented method includes discretizing one or more contiguous areas of a PDF document; identifying one or more white-space separator lines within the one or more discretized contiguous areas of the PDF document; detecting one or more candidate table regions within the one or more discretized contiguous areas of the PDF document by clustering the one or more white-space separator lines into one or more grids; and outputting at least one of the candidate table regions as a finalized table in accordance with scores assigned to each of the one or more candidate table regions based on (i) border information and (ii) cell structure information.
    Type: Application
    Filed: July 31, 2018
    Publication date: February 6, 2020
    Inventors: Douglas Ronald Burdick, Wei Cheng, Alexandre Evfimievski, Marina Danilevsky Hailpern, Rajasekar Krishnamurthy, Shajith Ikbal Mohamed, Prithviraj Sen, Shivakumar Vaithyanathan
  • Publication number: 20200034293
    Abstract: One embodiment provides for a method that includes performing, by a processor, active learning of large scale entity resolution using a distributed compute memoing cache to eliminate redundant computation. Link feature vector tables are determined for intermediate results of the active learning of large scale entity resolution. The link feature vector tables are managed by a two-level cache hierarchy.
    Type: Application
    Filed: July 24, 2018
    Publication date: January 30, 2020
    Inventors: Min Li, Lucian Popa, Prithviraj Sen
  • Publication number: 20190311229
    Abstract: Methods, systems, and computer program products for learning models for entity resolution using active learning are provided herein. A computer-implemented method includes determining a set of data items related to a task associated with structured knowledge base creation, and outputting the set of data items to a user for labeling. Such a method also includes generating, based on a user-labeled version of the set of data items, a candidate model for executing the task, and one or more generalized versions of the candidate model. Additionally, such a method can also include generating a final model based on one or more iterations of analysis of the candidate model and analysis of the one or more generalized versions of the candidate model, and performing the task by executing the final model on one or more datasets.
    Type: Application
    Filed: April 6, 2018
    Publication date: October 10, 2019
    Inventors: Kun Qian, Lucian Popa, Prithviraj Sen, Min Li
  • Publication number: 20190278853
    Abstract: Methods, systems, and computer program products for extracting structure and semantics from tabular data are provided herein. A computer-implemented method includes processing tabular data comprising data cells and header cells, wherein the processing includes: identifying one or more regions within the tabular data, wherein each of the regions comprises one or more of the data cells; matching some of the regions to one or more of the header cells, wherein the matched header cells are semantically related to the data cells inside the matched region; and generating, based on the matching, an output describing semantic relationships between the data cells and the header cells. The method also includes creating, for each data cell, a tuple comprising semantic information contained within one or more of the header cells that pertains to the data cell.
    Type: Application
    Filed: March 9, 2018
    Publication date: September 12, 2019
    Inventors: Xilun Chen, Laura Chiticariu, Alexandre Evfimievski, Marina Danilevsky Hailpern, Prithviraj Sen
  • Patent number: 10289963
    Abstract: One embodiment provides a method for developing a text analytics program for extracting at least one target concept including: utilizing at least one processor to execute computer code that performs the steps of: initiating a development tool that accepts user input to develop rules for extraction of features of the at least one target concept within a dataset comprising textual information; developing, using the rules for feature extraction, an evaluation dataset comprising at least one document annotated with the at least one target concept to be extracted by the text analytics program; creating, using the rules for feature extraction, a rule-based annotator to extract the at least one target concept; training, using the evaluation dataset, a machine-learning annotator to extract the at least one target concept within the dataset; combining the rule-based annotator and the machine learning annotator to form a combined annotator; evaluating, using the evaluation dataset, extraction performance of the combine
    Type: Grant
    Filed: February 27, 2017
    Date of Patent: May 14, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Laura Chiticariu, Jeffrey Thomas Kreulen, Rajasekar Krishnamurthy, Prithviraj Sen, Shivakumar Vaithyanathan
  • Patent number: 10228922
    Abstract: Parallel execution of machine learning programs is provided. Program code is received. The program code contains at least one parallel for statement having a plurality of iterations. A parallel execution plan is determined for the program code. According to the parallel execution plan, the plurality of iterations is partitioned into a plurality of tasks. Each task comprises at least one iteration. The iterations of each task are independent. Data required by the plurality of tasks is determined. An access pattern by the plurality of tasks of the data is determined. The data is partitioned based on the access pattern.
    Type: Grant
    Filed: January 12, 2016
    Date of Patent: March 12, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Matthias Boehm, Douglas Burdick, Berthold Reinwald, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan
  • Publication number: 20180246867
    Abstract: One embodiment provides a method for developing a text analytics program for extracting at least one target concept including: utilizing at least one processor to execute computer code that performs the steps of: initiating a development tool that accepts user input to develop rules for extraction of features of the at least one target concept within a dataset comprising textual information; developing, using the rules for feature extraction, an evaluation dataset comprising at least one document annotated with the at least one target concept to be extracted by the text analytics program; creating, using the rules for feature extraction, a rule-based annotator to extract the at least one target concept; training, using the evaluation dataset, a machine-learning annotator to extract the at least one target concept within the dataset; combining the rule-based annotator and the machine learning annotator to form a combined annotator; evaluating, using the evaluation dataset, extraction performance of the combine
    Type: Application
    Filed: February 27, 2017
    Publication date: August 30, 2018
    Inventors: Laura Chiticariu, Jeffrey Thomas Kreulen, Rajasekar Krishnamurthy, Prithviraj Sen, Shivakumar Vaithyanathan
  • Patent number: 9665625
    Abstract: In a method for maximizing information content of logs, a log message from an executing software program is received. The log message includes a timestamp, a source code location ID, a session ID, and a log message text. The timestamp, the source code location ID, and the session ID of the log message are stored in a lossless buffer. A hash function value of the session ID is determined. It is determined that the hash function value of the session ID is less than a hash value threshold. The log message text is stored in a session buffer in response to determining that the hash function value of the session ID is less than the hash value threshold, wherein the session buffer contains log message texts of log messages with corresponding hash function values less than the hash value threshold.
    Type: Grant
    Filed: June 25, 2014
    Date of Patent: May 30, 2017
    Assignee: International Business Machines Corporation
    Inventors: Frederick R. Reiss, Saeed Ghanbari, Prithviraj Sen
  • Publication number: 20160124730
    Abstract: Parallel execution of machine learning programs is provided. Program code is received. The program code contains at least one parallel for statement having a plurality of iterations. A parallel execution plan is determined for the program code. According to the parallel execution plan, the plurality of iterations is partitioned into a plurality of tasks. Each task comprises at least one iteration. The iterations of each task are independent. Data required by the plurality of tasks is determined. An access pattern by the plurality of tasks of the data is determined. The data is partitioned based on the access pattern.
    Type: Application
    Filed: January 12, 2016
    Publication date: May 5, 2016
    Inventors: Matthias Boehm, Doughlas Burdick, Berthold Reinwald, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan
  • Patent number: 9286044
    Abstract: Hybrid parallelization strategies for machine learning programs on top of MapReduce are provided. In one embodiment, a method of and computer program product for parallel execution of machine learning programs are provided. Program code is received. The program code contains at least one parallel for statement having a plurality of iterations. A parallel execution plan is determined for the program code. According to the parallel execution plan, the plurality of iterations is partitioned into a plurality of tasks. Each task comprises at least one iteration. The iterations of each task are independent.
    Type: Grant
    Filed: June 27, 2014
    Date of Patent: March 15, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Matthias Boehm, Douglas Burdick, Berthold Reinwald, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan
  • Publication number: 20150378696
    Abstract: Hybrid parallelization strategies for machine learning programs on top of MapReduce are provided. In one embodiment, a method of and computer program product for parallel execution of machine learning programs are provided. Program code is received. The program code contains at least one parallel for statement having a plurality of iterations. A parallel execution plan is determined for the program code. According to the parallel execution plan, the plurality of iterations is partitioned into a plurality of tasks. Each task comprises at least one iteration. The iterations of each task are independent.
    Type: Application
    Filed: June 27, 2014
    Publication date: December 31, 2015
    Inventors: Matthias Boehm, Douglas Burdick, Berthold Reinwald, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan
  • Publication number: 20150379008
    Abstract: In a method for maximizing information content of logs, a log message from an executing software program is received. The log message includes a timestamp, a source code location ID, a session ID, and a log message text. The timestamp, the source code location ID, and the session ID of the log message are stored in a lossless buffer. A hash function value of the session ID is determined. It is determined that the hash function value of the session ID is less than a hash value threshold. The log message text is stored in a session buffer in response to determining that the hash function value of the session ID is less than the hash value threshold, wherein the session buffer contains log message texts of log messages with corresponding hash function values less than the hash value threshold.
    Type: Application
    Filed: June 25, 2014
    Publication date: December 31, 2015
    Inventors: Frederick R. Reiss, Saeed Ghanbari, Prithviraj Sen