Patents by Inventor Prasan Roy

Prasan Roy has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method for synthetic data generation for query workloads

Patent number: 9785657

Abstract: Generation of synthetic database data includes annotated query subplans for a multiple table query workload that includes a desired cardinality for nodes (v) in the subplans. The subplans may be merged and represented by a direct acyclic graph (DAG). The maximum entropy joint probability distribution for each attribute (x) for each node (v) is determined as: p ? ( x ) = exp [ ( ? v ? ? w v ? f v ? ( x ) Z ] ) for each node v, where wv is a weight of node v, fv is a conjunct of predicates in a subplan rooted at node v, and Z is a normalization factor. This distribution is determined such that the desired cardinality, and selectivities for each node v determined from the desired cardinality, are satisfied. The data for a plurality of tables are generated by sampling the maximum entropy joint probability distribution for a domain of attributes (x) of a plurality of tables. Data may be efficiently generated for multiple table queries and for DAGs.

Type: Grant

Filed: September 13, 2014

Date of Patent: October 10, 2017

Assignee: International Business Machines Corporation

Inventors: Atreyee Dey, Prasan Roy
Declarative specification of data integration workflows for execution on parallel processing platforms

Patent number: 9361323

Abstract: A system for receiving a declarative specification including a plurality of stages. Each stage specifies an atomic operation, a data input to the atomic operation, and a data output from the atomic operation. The data input is characterized by a data type. Links between at least two of the stages are generated to create a data integration workflow. The data integration workflow is compiled to generate computer code for execution on a parallel processing platform. The computer code configured to perform at least one of data preparation and data analysis.

Type: Grant

Filed: October 4, 2011

Date of Patent: June 7, 2016

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Manoj K. Agarwal, Himanshu Gupta, Rajeev Gupta, Sanjeev K. Gupta, Mukesh K. Mohania, Sriram K. Padmanabhan, Prasan Roy
Declarative specification of data integration workflows for execution on parallel processing platforms

Patent number: 9317542

Abstract: A method for receiving a declarative specification including a plurality of stages. Each stage specifies an atomic operation, a data input to the atomic operation, and a data output from the atomic operation. The data input is characterized by a data type. Links between at least two of the stages are generated to create a data integration workflow. The data integration workflow is compiled to generate computer code for execution on a parallel processing platform. The computer code configured to perform at least one of data preparation and data analysis.

Type: Grant

Filed: April 29, 2013

Date of Patent: April 19, 2016

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Manoj K. Agarwal, Himanshu Gupta, Rajeev Gupta, Sanjeev K. Gupta, Mukesh K. Mohania, Sriram K. Padmanabhan, Prasan Roy
Method for synthetic data generation for query workloads

Patent number: 9244950

Abstract: Generation of synthetic database data includes annotated query subplans for a multiple table query workload that includes a desired cardinality for nodes (v) in the subplans. The subplans may be merged and represented by a direct acyclic graph (DAG). The maximum entropy joint probability distribution for each attribute (x) for each node (v) is determined as: p ? ( x ) = exp ( ? v ? ? w v ? f v ? ( x ) Z ) for each node ?, where wv is a weight of node v, fv is a conjunct of predicates in a subplan rooted at node v, and Z is a normalization factor. This distribution is determined such that the desired cardinality, and selectivities for each node v determined from the desired cardinality, are satisfied. The data for a plurality of tables are generated by sampling the maximum entropy joint probability distribution for a domain of attributes (x) of a plurality of tables. Data may be efficiently generated for multiple table queries and for DAGs.

Type: Grant

Filed: July 3, 2013

Date of Patent: January 26, 2016

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Atreyee Dey, Prasan Roy
Matching transactions in multi-level records

Patent number: 9135289

Abstract: Identifying matching transactions. First and second log files contain operation records of transactions in a transaction workload, each file recording a respective execution of the transaction workload, the method comprising. A first record location in the first file and an associated window of a defined number of sequential second record locations in the second file are advanced one record location at a time. Whether each operation record of a complete transaction at a first record location has a matching operation record at one of the record locations in the associated window of second record locations is determined. If so, the complete transaction in the first file and the transaction that includes the matching operation records in the second file are identified as matching transactions.

Type: Grant

Filed: June 2, 2014

Date of Patent: September 15, 2015

Assignee: International Business Machines Corporation

Inventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Sambandhan
Match window size for matching multi-level transactions between log files

Patent number: 9063944

Abstract: A predefined number of matches is identified between records in a first file and records in a second file. For the matches, determine the span of the actual range of record positions in the second file relative to the positions of the operation records in the first file within which all matches were found. If the actual span is smaller than the span of a current defined range of record positions by at least a first threshold value, decrease the span of the current defined range. If the actual span is within a second threshold value of the span of the current defined range, increase the span of the current defined range. If an amount above a third threshold value of operation records in the first file are not matched to operation records in the second file, increasing the span of the current defined range.

Type: Grant

Filed: February 21, 2013

Date of Patent: June 23, 2015

Assignee: International Business Machines Corporation

Inventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Sambandhan
System and method for shared execution of mixed data flows

Patent number: 8984515

Abstract: A method, computer program product, and computer system for shared execution of mixed data flows, performed by one or more computing devices, comprises identifying one or more resource sharing opportunities across a plurality of parallel tasks. The plurality of parallel tasks includes zero or more relational operations and at least one non-relational operation. The plurality of parallel tasks relative to the relational operations and the at least one non-relational operation are executed. In response to executing the plurality of parallel tasks, one or more resources of the identified resource sharing opportunities is shared across the relational operations and the at least one non-relational operation.

Type: Grant

Filed: May 31, 2012

Date of Patent: March 17, 2015

Assignee: International Business Machines Corporation

Inventors: Rajeev Gupta, Padmashree Ravindra, Prasan Roy
System and method for shared execution of mixed data flows

Patent number: 8984516

Abstract: A method, computer program product, and computer system for shared execution of mixed data flows, performed by one or more computing devices, comprises identifying one or more resource sharing opportunities across a plurality of parallel tasks. The plurality of parallel tasks includes zero or more relational operations and at least one non-relational operation. The plurality of parallel tasks relative to the relational operations and the at least one non-relational operation are executed. In response to executing the plurality of parallel tasks, one or more resources of the identified resource sharing opportunities is shared across the relational operations and the at least one non-relational operation.

Type: Grant

Filed: May 10, 2013

Date of Patent: March 17, 2015

Assignee: International Business Machines Corporation

Inventors: Rajeev Gupta, Padmashree Ravindra, Prasan Roy
Processing hierarchical data in a map-reduce framework

Patent number: 8959519

Abstract: Methods and arrangements for processing hierarchical data in a map-reduce framework. Hierarchical data is accepted, and a map-reduce job is performed on the hierarchical data. This performing of a map-reduce job includes determining a cost of partitioning the data, determining a cost of redefining the job and thereupon selectively performing at least one step taken from the group consisting of: partitioning the data and redefining the job.

Type: Grant

Filed: August 29, 2012

Date of Patent: February 17, 2015

Assignee: International Business Machines Corporation

Inventors: Manoj K. Agarwal, Himanshu Gupta, Rajeev Gupta, Sriram K. Padmanabhan, Prasan Roy
METHOD FOR SYNTHETIC DATA GENERATION FOR QUERY WORKLOADS

Publication number: 20150012522

Abstract: Generation of synthetic database data includes annotated query subplans for a multiple table query workload that includes a desired cardinality for nodes (v) in the subplans. The subplans may be merged and represented by a direct acyclic graph (DAG). The maximum entropy joint probability distribution for each attribute (x) for each node (v) is determined as: p ? ( x ) = exp ( ? v ? ? w v ? f v ? ( x ) Z ) for each node v, where wv is a weight of node v, fv is a conjunct of predicates in a subplan rooted at node v, and Z is a normalization factor. This distribution is determined such that the desired cardinality, and selectivities for each node v determined from the desired cardinality, are satisfied. The data for a plurality of tables are generated by sampling the maximum entropy joint probability distribution for a domain of attributes (x) of a plurality of tables. Data may be efficiently generated for multiple table queries and for DAGs.

Type: Application

Filed: July 3, 2013

Publication date: January 8, 2015

Inventors: Atreyee DEY, Prasan ROY
METHOD FOR SYNTHETIC DATA GENERATION FOR QUERY WORKLOADS

Publication number: 20150012523

Abstract: Generation of synthetic database data includes annotated query subplans for a multiple table query workload that includes a desired cardinality for nodes (v) in the subplans. The subplans may be merged and represented by a direct acyclic graph (DAG). The maximum entropy joint probability distribution for each attribute (x) for each node (v) is determined as: p ? ( x ) = exp [ ( ? v ? ? w v ? f v ? ( x ) Z ] ) for each node v, where wv is a weight of node v, fv is a conjunct of predicates in a subplan rooted at node v, and Z is a normalization factor. This distribution is determined such that the desired cardinality, and selectivities for each node v determined from the desired cardinality, are satisfied. The data for a plurality of tables are generated by sampling the maximum entropy joint probability distribution for a domain of attributes (x) of a plurality of tables. Data may be efficiently generated for multiple table queries and for DAGs.

Type: Application

Filed: September 13, 2014

Publication date: January 8, 2015

Inventors: Atreyee DEY, Prasan ROY
MATCHING TRANSACTIONS IN MULTI-LEVEL RECORDS

Publication number: 20140279945

Abstract: Identifying matching transactions. First and second log files contain operation records of transactions in a transaction workload, each file recording a respective execution of the transaction workload, the method comprising. A first record location in the first file and an associated window of a defined number of sequential second record locations in the second file are advanced one record location at a time. Whether each operation record of a complete transaction at a first record location has a matching operation record at one of the record locations in the associated window of second record locations is determined. If so, the complete transaction in the first file and the transaction that includes the matching operation records in the second file are identified as matching transactions.

Type: Application

Filed: June 2, 2014

Publication date: September 18, 2014

Applicant: International Business Machines Corporation

Inventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Sambandhan
MATCH WINDOW SIZE FOR MATCHING MULTI-LEVEL TRANSACTIONS BETWEEN LOG FILES

Publication number: 20140236976

Abstract: A predefined number of matches is identified between records in a first file and records in a second file. For the matches, determine the span of the actual range of record positions in the second file relative to the positions of the operation records in the first file within which all matches were found. If the actual span is smaller than the span of a current defined range of record positions by at least a first threshold value, decrease the span of the current defined range. If the actual span is within a second threshold value of the span of the current defined range, increase the span of the current defined range. If an amount above a third threshold value of operation records in the first file are not matched to operation records in the second file, increasing the span of the current defined range.

Type: Application

Filed: February 21, 2013

Publication date: August 21, 2014

Applicant: International Business Machines Corporation

Inventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Sambandhan
Matching transactions in multi-level records

Patent number: 8788471

Abstract: A method for identifying matching transactions between two log files where each transaction includes one or more statements. Each log file record records the execution of a statement and includes a transaction identifier. Each record in turn in one log file is compared to an advancing window of records in the other log file. A first table contains associations of statements to transactions and transactions to statements for records in the window. If a match is found between a record in the one file and a record in the window, information associating partial transactions in the one file to potential transactions of the records in the window is added to a second table. If an end-of-transaction record is read from the one file, a best match is found between the ended transaction and the potential transactions based on information in the first and second tables.

Type: Grant

Filed: May 30, 2012

Date of Patent: July 22, 2014

Assignee: International Business Machines Corporation

Inventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Sambandhan
Matching transactions in multi-level records

Patent number: 8788473

Abstract: A method for identifying matching transactions between two log files where each transaction includes one or more statements. Each log file record records the execution of a statement and includes a transaction identifier. Each record in turn in one log file is compared to an advancing window of records in the other log file. A first table contains associations of statements to transactions and transactions to statements for records in the window. If a match is found between a record in the one file and a record in the window, information associating partial transactions in the one file to potential transactions of the records in the window is added to a second table. If an end-of-transaction record is read from the one file, a best match is found between the ended transaction and the potential transactions based on information in the first and second tables.

Type: Grant

Filed: May 31, 2013

Date of Patent: July 22, 2014

Assignee: International Business Machines Corporation

Inventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Samabandhan
Systems and methods for processing hierarchical data in a map-reduce framework

Patent number: 8677366

Abstract: Methods and arrangements for processing hierarchical data in a map-reduce framework. Hierarchical data is accepted, and a map-reduce job is performed on the hierarchical data. This performing of a map-reduce job includes determining a cost of partitioning the data, determining a cost of redefining the job and thereupon selectively performing at least one step taken from the group consisting of: partitioning the data and redefining the job.

Type: Grant

Filed: May 31, 2011

Date of Patent: March 18, 2014

Assignee: International Business Machines Corporation

Inventors: Manoj K. Agarwal, Himanshu Gupta, Rajeev Gupta, Sriram K. Padmanabhan, Prasan Roy
MATCHING TRANSACTIONS IN MULTI-LEVEL RECORDS

Publication number: 20130325826

Abstract: A method for identifying matching transactions between two log files where each transaction includes one or more statements. Each log file record records the execution of a statement and includes a transaction identifier. Each record in turn in one log file is compared to an advancing window of records in the other log file. A first table contains associations of statements to transactions and transactions to statements for records in the window. If a match is found between a record in the one file and a record in the window, information associating partial transactions in the one file to potential transactions of the records in the window is added to a second table. If an end-of-transaction record is read from the one file, a best match is found between the ended transaction and the potential transactions based on information in the first and second tables.

Type: Application

Filed: May 30, 2012

Publication date: December 5, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Sambandhan
SYSTEM AND METHOD FOR SHARED EXECUTION OF MIXED DATA FLOWS

Publication number: 20130326538

Abstract: A method, computer program product, and computer system for shared execution of mixed data flows, performed by one or more computing devices, comprises identifying one or more resource sharing opportunities across a plurality of parallel tasks. The plurality of parallel tasks includes zero or more relational operations and at least one non-relational operation. The plurality of parallel tasks relative to the relational operations and the at least one non-relational operation are executed. In response to executing the plurality of parallel tasks, one or more resources of the identified resource sharing opportunities is shared across the relational operations and the at least one non-relational operation.

Type: Application

Filed: May 31, 2012

Publication date: December 5, 2013

Applicant: International Business Machines Corporation

Inventors: RAJEEV GUPTA, Padmashree Ravindra, Prasan Roy
SYSTEM AND METHOD FOR SHARED EXECUTION OF MIXED DATA FLOWS

Publication number: 20130326534

Abstract: A method, computer program product, and computer system for shared execution of mixed data flows, performed by one or more computing devices, comprises identifying one or more resource sharing opportunities across a plurality of parallel tasks. The plurality of parallel tasks includes zero or more relational operations and at least one non-relational operation. The plurality of parallel tasks relative to the relational operations and the at least one non-relational operation are executed. In response to executing the plurality of parallel tasks, one or more resources of the identified resource sharing opportunities is shared across the relational operations and the at least one non-relational operation.

Type: Application

Filed: May 10, 2013

Publication date: December 5, 2013

Applicant: International Business Machines Corporation

Inventors: Rajeev Gupta, Padmashree Ravindra, Prasan Roy
MATCHING TRANSACTIONS IN MULTI-LEVEL RECORDS

Publication number: 20130325829

Abstract: A method for identifying matching transactions between two log files where each transaction includes one or more statements. Each log file record records the execution of a statement and includes a transaction identifier. Each record in turn in one log file is compared to an advancing window of records in the other log file. A first table contains associations of statements to transactions and transactions to statements for records in the window. If a match is found between a record in the one file and a record in the window, information associating partial transactions in the one file to potential transactions of the records in the window is added to a second table. If an end-of-transaction record is read from the one file, a best match is found between the ended transaction and the potential transactions based on information in the first and second tables.

Type: Application

Filed: May 31, 2013

Publication date: December 5, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Sambandhan

1 2 next