Patents by Inventor Prithviraj Sen

Prithviraj Sen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

PRODUCING EXPLAINABLE RULES VIA DEEP LEARNING

Publication number: 20210240917

Abstract: A computer-implemented method according to one embodiment includes receiving, at a deep neural network (DNN), a plurality of sentences each having an associated label; training the DNN, utilizing the plurality of sentences and associated labels; and producing a linguistic expression (LE) utilizing the trained DNN.

Type: Application

Filed: February 3, 2020

Publication date: August 5, 2021

Inventors: Prithviraj Sen, Siddhartha Brahma, Yunyao Li, Laura Chiticariu, Rajasekar Krishnamurthy, Shivakumar Vaithyanathan, Marina Danilevsky Hailpern
Two level compute memoing for large scale entity resolution

Patent number: 10776269

Abstract: One embodiment provides for a method that includes performing, by a processor, active learning of large scale entity resolution using a distributed compute memoing cache to eliminate redundant computation. Link feature vector tables are determined for intermediate results of the active learning of large scale entity resolution. The link feature vector tables are managed by a two-level cache hierarchy.

Type: Grant

Filed: July 24, 2018

Date of Patent: September 15, 2020

Assignee: International Business Machines Corporation

Inventors: Min Li, Lucian Popa, Prithviraj Sen
EVALUATING QUALITY OF ARTIFICIAL INTELLIGENCE (AI) SERVICES

Publication number: 20200082228

Abstract: One embodiment provides for a method for evaluation of an artificial intelligence (AI) service, the method includes partitioning, by a processor, data into in-domain data and out-of-domain data. The processor defines held-out data from the in-domain data and the out-of-domain data for evaluation by domain and sub-domain based on building a taxonomy of domains and sub-domains for the AI service. The processor further determines distribution underlying performance metrics for the held-out data using statistical processing. The processor also determines performance guarantees for multiple settings conditioned on multiple characteristics of an application scenario for the held-out data of the taxonomy based on the underlying performance metrics. The processor further provides confidence intervals based on the performance guarantees.

Type: Application

Filed: September 6, 2018

Publication date: March 12, 2020

Inventors: Prithviraj Sen, Rajasekar Krishnamurthy, Yunyao Li, Shivakumar Vaithyanathan, Hao Wang, Sang Don Han
Table Recognition in Portable Document Format Documents

Publication number: 20200042785

Abstract: Methods, systems, and computer program products for table recognition in PDF documents are provided herein. A computer-implemented method includes discretizing one or more contiguous areas of a PDF document; identifying one or more white-space separator lines within the one or more discretized contiguous areas of the PDF document; detecting one or more candidate table regions within the one or more discretized contiguous areas of the PDF document by clustering the one or more white-space separator lines into one or more grids; and outputting at least one of the candidate table regions as a finalized table in accordance with scores assigned to each of the one or more candidate table regions based on (i) border information and (ii) cell structure information.

Type: Application

Filed: July 31, 2018

Publication date: February 6, 2020

Inventors: Douglas Ronald Burdick, Wei Cheng, Alexandre Evfimievski, Marina Danilevsky Hailpern, Rajasekar Krishnamurthy, Shajith Ikbal Mohamed, Prithviraj Sen, Shivakumar Vaithyanathan
TWO LEVEL COMPUTE MEMOING FOR LARGE SCALE ENTITY RESOLUTION

Publication number: 20200034293

Abstract: One embodiment provides for a method that includes performing, by a processor, active learning of large scale entity resolution using a distributed compute memoing cache to eliminate redundant computation. Link feature vector tables are determined for intermediate results of the active learning of large scale entity resolution. The link feature vector tables are managed by a two-level cache hierarchy.

Type: Application

Filed: July 24, 2018

Publication date: January 30, 2020

Inventors: Min Li, Lucian Popa, Prithviraj Sen
Learning Models For Entity Resolution Using Active Learning

Publication number: 20190311229

Abstract: Methods, systems, and computer program products for learning models for entity resolution using active learning are provided herein. A computer-implemented method includes determining a set of data items related to a task associated with structured knowledge base creation, and outputting the set of data items to a user for labeling. Such a method also includes generating, based on a user-labeled version of the set of data items, a candidate model for executing the task, and one or more generalized versions of the candidate model. Additionally, such a method can also include generating a final model based on one or more iterations of analysis of the candidate model and analysis of the one or more generalized versions of the candidate model, and performing the task by executing the final model on one or more datasets.

Type: Application

Filed: April 6, 2018

Publication date: October 10, 2019

Inventors: Kun Qian, Lucian Popa, Prithviraj Sen, Min Li
Extracting Structure and Semantics from Tabular Data

Publication number: 20190278853

Abstract: Methods, systems, and computer program products for extracting structure and semantics from tabular data are provided herein. A computer-implemented method includes processing tabular data comprising data cells and header cells, wherein the processing includes: identifying one or more regions within the tabular data, wherein each of the regions comprises one or more of the data cells; matching some of the regions to one or more of the header cells, wherein the matched header cells are semantically related to the data cells inside the matched region; and generating, based on the matching, an output describing semantic relationships between the data cells and the header cells. The method also includes creating, for each data cell, a tuple comprising semantic information contained within one or more of the header cells that pertains to the data cell.

Type: Application

Filed: March 9, 2018

Publication date: September 12, 2019

Inventors: Xilun Chen, Laura Chiticariu, Alexandre Evfimievski, Marina Danilevsky Hailpern, Prithviraj Sen
Unified text analytics annotator development life cycle combining rule-based and machine learning based techniques

Patent number: 10289963

Abstract: One embodiment provides a method for developing a text analytics program for extracting at least one target concept including: utilizing at least one processor to execute computer code that performs the steps of: initiating a development tool that accepts user input to develop rules for extraction of features of the at least one target concept within a dataset comprising textual information; developing, using the rules for feature extraction, an evaluation dataset comprising at least one document annotated with the at least one target concept to be extracted by the text analytics program; creating, using the rules for feature extraction, a rule-based annotator to extract the at least one target concept; training, using the evaluation dataset, a machine-learning annotator to extract the at least one target concept within the dataset; combining the rule-based annotator and the machine learning annotator to form a combined annotator; evaluating, using the evaluation dataset, extraction performance of the combine

Type: Grant

Filed: February 27, 2017

Date of Patent: May 14, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Laura Chiticariu, Jeffrey Thomas Kreulen, Rajasekar Krishnamurthy, Prithviraj Sen, Shivakumar Vaithyanathan
Hybrid parallelization strategies for machine learning programs on top of mapreduce

Patent number: 10228922

Abstract: Parallel execution of machine learning programs is provided. Program code is received. The program code contains at least one parallel for statement having a plurality of iterations. A parallel execution plan is determined for the program code. According to the parallel execution plan, the plurality of iterations is partitioned into a plurality of tasks. Each task comprises at least one iteration. The iterations of each task are independent. Data required by the plurality of tasks is determined. An access pattern by the plurality of tasks of the data is determined. The data is partitioned based on the access pattern.

Type: Grant

Filed: January 12, 2016

Date of Patent: March 12, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Matthias Boehm, Douglas Burdick, Berthold Reinwald, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan
UNIFIED TEXT ANALYTICS ANNOTATOR DEVELOPMENT LIFE CYCLE COMBINING RULE-BASED AND MACHINE LEARNING BASED TECHNIQUES

Publication number: 20180246867

Abstract: One embodiment provides a method for developing a text analytics program for extracting at least one target concept including: utilizing at least one processor to execute computer code that performs the steps of: initiating a development tool that accepts user input to develop rules for extraction of features of the at least one target concept within a dataset comprising textual information; developing, using the rules for feature extraction, an evaluation dataset comprising at least one document annotated with the at least one target concept to be extracted by the text analytics program; creating, using the rules for feature extraction, a rule-based annotator to extract the at least one target concept; training, using the evaluation dataset, a machine-learning annotator to extract the at least one target concept within the dataset; combining the rule-based annotator and the machine learning annotator to form a combined annotator; evaluating, using the evaluation dataset, extraction performance of the combine

Type: Application

Filed: February 27, 2017

Publication date: August 30, 2018

Inventors: Laura Chiticariu, Jeffrey Thomas Kreulen, Rajasekar Krishnamurthy, Prithviraj Sen, Shivakumar Vaithyanathan
Maximizing the information content of system logs

Patent number: 9665625

Abstract: In a method for maximizing information content of logs, a log message from an executing software program is received. The log message includes a timestamp, a source code location ID, a session ID, and a log message text. The timestamp, the source code location ID, and the session ID of the log message are stored in a lossless buffer. A hash function value of the session ID is determined. It is determined that the hash function value of the session ID is less than a hash value threshold. The log message text is stored in a session buffer in response to determining that the hash function value of the session ID is less than the hash value threshold, wherein the session buffer contains log message texts of log messages with corresponding hash function values less than the hash value threshold.

Type: Grant

Filed: June 25, 2014

Date of Patent: May 30, 2017

Assignee: International Business Machines Corporation

Inventors: Frederick R. Reiss, Saeed Ghanbari, Prithviraj Sen
HYBRID PARALLELIZATION STRATEGIES FOR MACHINE LEARNING PROGRAMS ON TOP OF MAPREDUCE

Publication number: 20160124730

Abstract: Parallel execution of machine learning programs is provided. Program code is received. The program code contains at least one parallel for statement having a plurality of iterations. A parallel execution plan is determined for the program code. According to the parallel execution plan, the plurality of iterations is partitioned into a plurality of tasks. Each task comprises at least one iteration. The iterations of each task are independent. Data required by the plurality of tasks is determined. An access pattern by the plurality of tasks of the data is determined. The data is partitioned based on the access pattern.

Type: Application

Filed: January 12, 2016

Publication date: May 5, 2016

Inventors: Matthias Boehm, Doughlas Burdick, Berthold Reinwald, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan
Hybrid parallelization strategies for machine learning programs on top of MapReduce

Patent number: 9286044

Abstract: Hybrid parallelization strategies for machine learning programs on top of MapReduce are provided. In one embodiment, a method of and computer program product for parallel execution of machine learning programs are provided. Program code is received. The program code contains at least one parallel for statement having a plurality of iterations. A parallel execution plan is determined for the program code. According to the parallel execution plan, the plurality of iterations is partitioned into a plurality of tasks. Each task comprises at least one iteration. The iterations of each task are independent.

Type: Grant

Filed: June 27, 2014

Date of Patent: March 15, 2016

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Matthias Boehm, Douglas Burdick, Berthold Reinwald, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan
HYBRID PARALLELIZATION STRATEGIES FOR MACHINE LEARNING PROGRAMS ON TOP OF MAPREDUCE

Publication number: 20150378696

Abstract: Hybrid parallelization strategies for machine learning programs on top of MapReduce are provided. In one embodiment, a method of and computer program product for parallel execution of machine learning programs are provided. Program code is received. The program code contains at least one parallel for statement having a plurality of iterations. A parallel execution plan is determined for the program code. According to the parallel execution plan, the plurality of iterations is partitioned into a plurality of tasks. Each task comprises at least one iteration. The iterations of each task are independent.

Type: Application

Filed: June 27, 2014

Publication date: December 31, 2015

Inventors: Matthias Boehm, Douglas Burdick, Berthold Reinwald, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan
MAXIMIZING THE INFORMATION CONTENT OF SYSTEM LOGS

Publication number: 20150379008

Abstract: In a method for maximizing information content of logs, a log message from an executing software program is received. The log message includes a timestamp, a source code location ID, a session ID, and a log message text. The timestamp, the source code location ID, and the session ID of the log message are stored in a lossless buffer. A hash function value of the session ID is determined. It is determined that the hash function value of the session ID is less than a hash value threshold. The log message text is stored in a session buffer in response to determining that the hash function value of the session ID is less than the hash value threshold, wherein the session buffer contains log message texts of log messages with corresponding hash function values less than the hash value threshold.

Type: Application

Filed: June 25, 2014

Publication date: December 31, 2015

Inventors: Frederick R. Reiss, Saeed Ghanbari, Prithviraj Sen

prev 1 2