Patents by Inventor Prasan Roy
Prasan Roy has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9785657Abstract: Generation of synthetic database data includes annotated query subplans for a multiple table query workload that includes a desired cardinality for nodes (v) in the subplans. The subplans may be merged and represented by a direct acyclic graph (DAG). The maximum entropy joint probability distribution for each attribute (x) for each node (v) is determined as: p ? ( x ) = exp [ ( ? v ? ? w v ? f v ? ( x ) Z ] ) for each node v, where wv is a weight of node v, fv is a conjunct of predicates in a subplan rooted at node v, and Z is a normalization factor. This distribution is determined such that the desired cardinality, and selectivities for each node v determined from the desired cardinality, are satisfied. The data for a plurality of tables are generated by sampling the maximum entropy joint probability distribution for a domain of attributes (x) of a plurality of tables. Data may be efficiently generated for multiple table queries and for DAGs.Type: GrantFiled: September 13, 2014Date of Patent: October 10, 2017Assignee: International Business Machines CorporationInventors: Atreyee Dey, Prasan Roy
-
Patent number: 9361323Abstract: A system for receiving a declarative specification including a plurality of stages. Each stage specifies an atomic operation, a data input to the atomic operation, and a data output from the atomic operation. The data input is characterized by a data type. Links between at least two of the stages are generated to create a data integration workflow. The data integration workflow is compiled to generate computer code for execution on a parallel processing platform. The computer code configured to perform at least one of data preparation and data analysis.Type: GrantFiled: October 4, 2011Date of Patent: June 7, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Manoj K. Agarwal, Himanshu Gupta, Rajeev Gupta, Sanjeev K. Gupta, Mukesh K. Mohania, Sriram K. Padmanabhan, Prasan Roy
-
Patent number: 9317542Abstract: A method for receiving a declarative specification including a plurality of stages. Each stage specifies an atomic operation, a data input to the atomic operation, and a data output from the atomic operation. The data input is characterized by a data type. Links between at least two of the stages are generated to create a data integration workflow. The data integration workflow is compiled to generate computer code for execution on a parallel processing platform. The computer code configured to perform at least one of data preparation and data analysis.Type: GrantFiled: April 29, 2013Date of Patent: April 19, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Manoj K. Agarwal, Himanshu Gupta, Rajeev Gupta, Sanjeev K. Gupta, Mukesh K. Mohania, Sriram K. Padmanabhan, Prasan Roy
-
Patent number: 9244950Abstract: Generation of synthetic database data includes annotated query subplans for a multiple table query workload that includes a desired cardinality for nodes (v) in the subplans. The subplans may be merged and represented by a direct acyclic graph (DAG). The maximum entropy joint probability distribution for each attribute (x) for each node (v) is determined as: p ? ( x ) = exp ( ? v ? ? w v ? f v ? ( x ) Z ) for each node ?, where wv is a weight of node v, fv is a conjunct of predicates in a subplan rooted at node v, and Z is a normalization factor. This distribution is determined such that the desired cardinality, and selectivities for each node v determined from the desired cardinality, are satisfied. The data for a plurality of tables are generated by sampling the maximum entropy joint probability distribution for a domain of attributes (x) of a plurality of tables. Data may be efficiently generated for multiple table queries and for DAGs.Type: GrantFiled: July 3, 2013Date of Patent: January 26, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Atreyee Dey, Prasan Roy
-
Patent number: 9135289Abstract: Identifying matching transactions. First and second log files contain operation records of transactions in a transaction workload, each file recording a respective execution of the transaction workload, the method comprising. A first record location in the first file and an associated window of a defined number of sequential second record locations in the second file are advanced one record location at a time. Whether each operation record of a complete transaction at a first record location has a matching operation record at one of the record locations in the associated window of second record locations is determined. If so, the complete transaction in the first file and the transaction that includes the matching operation records in the second file are identified as matching transactions.Type: GrantFiled: June 2, 2014Date of Patent: September 15, 2015Assignee: International Business Machines CorporationInventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Sambandhan
-
Patent number: 9063944Abstract: A predefined number of matches is identified between records in a first file and records in a second file. For the matches, determine the span of the actual range of record positions in the second file relative to the positions of the operation records in the first file within which all matches were found. If the actual span is smaller than the span of a current defined range of record positions by at least a first threshold value, decrease the span of the current defined range. If the actual span is within a second threshold value of the span of the current defined range, increase the span of the current defined range. If an amount above a third threshold value of operation records in the first file are not matched to operation records in the second file, increasing the span of the current defined range.Type: GrantFiled: February 21, 2013Date of Patent: June 23, 2015Assignee: International Business Machines CorporationInventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Sambandhan
-
Patent number: 8984516Abstract: A method, computer program product, and computer system for shared execution of mixed data flows, performed by one or more computing devices, comprises identifying one or more resource sharing opportunities across a plurality of parallel tasks. The plurality of parallel tasks includes zero or more relational operations and at least one non-relational operation. The plurality of parallel tasks relative to the relational operations and the at least one non-relational operation are executed. In response to executing the plurality of parallel tasks, one or more resources of the identified resource sharing opportunities is shared across the relational operations and the at least one non-relational operation.Type: GrantFiled: May 10, 2013Date of Patent: March 17, 2015Assignee: International Business Machines CorporationInventors: Rajeev Gupta, Padmashree Ravindra, Prasan Roy
-
Patent number: 8984515Abstract: A method, computer program product, and computer system for shared execution of mixed data flows, performed by one or more computing devices, comprises identifying one or more resource sharing opportunities across a plurality of parallel tasks. The plurality of parallel tasks includes zero or more relational operations and at least one non-relational operation. The plurality of parallel tasks relative to the relational operations and the at least one non-relational operation are executed. In response to executing the plurality of parallel tasks, one or more resources of the identified resource sharing opportunities is shared across the relational operations and the at least one non-relational operation.Type: GrantFiled: May 31, 2012Date of Patent: March 17, 2015Assignee: International Business Machines CorporationInventors: Rajeev Gupta, Padmashree Ravindra, Prasan Roy
-
Patent number: 8959519Abstract: Methods and arrangements for processing hierarchical data in a map-reduce framework. Hierarchical data is accepted, and a map-reduce job is performed on the hierarchical data. This performing of a map-reduce job includes determining a cost of partitioning the data, determining a cost of redefining the job and thereupon selectively performing at least one step taken from the group consisting of: partitioning the data and redefining the job.Type: GrantFiled: August 29, 2012Date of Patent: February 17, 2015Assignee: International Business Machines CorporationInventors: Manoj K. Agarwal, Himanshu Gupta, Rajeev Gupta, Sriram K. Padmanabhan, Prasan Roy
-
Publication number: 20150012522Abstract: Generation of synthetic database data includes annotated query subplans for a multiple table query workload that includes a desired cardinality for nodes (v) in the subplans. The subplans may be merged and represented by a direct acyclic graph (DAG). The maximum entropy joint probability distribution for each attribute (x) for each node (v) is determined as: p ? ( x ) = exp ( ? v ? ? w v ? f v ? ( x ) Z ) for each node v, where wv is a weight of node v, fv is a conjunct of predicates in a subplan rooted at node v, and Z is a normalization factor. This distribution is determined such that the desired cardinality, and selectivities for each node v determined from the desired cardinality, are satisfied. The data for a plurality of tables are generated by sampling the maximum entropy joint probability distribution for a domain of attributes (x) of a plurality of tables. Data may be efficiently generated for multiple table queries and for DAGs.Type: ApplicationFiled: July 3, 2013Publication date: January 8, 2015Inventors: Atreyee DEY, Prasan ROY
-
Publication number: 20150012523Abstract: Generation of synthetic database data includes annotated query subplans for a multiple table query workload that includes a desired cardinality for nodes (v) in the subplans. The subplans may be merged and represented by a direct acyclic graph (DAG). The maximum entropy joint probability distribution for each attribute (x) for each node (v) is determined as: p ? ( x ) = exp [ ( ? v ? ? w v ? f v ? ( x ) Z ] ) for each node v, where wv is a weight of node v, fv is a conjunct of predicates in a subplan rooted at node v, and Z is a normalization factor. This distribution is determined such that the desired cardinality, and selectivities for each node v determined from the desired cardinality, are satisfied. The data for a plurality of tables are generated by sampling the maximum entropy joint probability distribution for a domain of attributes (x) of a plurality of tables. Data may be efficiently generated for multiple table queries and for DAGs.Type: ApplicationFiled: September 13, 2014Publication date: January 8, 2015Inventors: Atreyee DEY, Prasan ROY
-
Publication number: 20140279945Abstract: Identifying matching transactions. First and second log files contain operation records of transactions in a transaction workload, each file recording a respective execution of the transaction workload, the method comprising. A first record location in the first file and an associated window of a defined number of sequential second record locations in the second file are advanced one record location at a time. Whether each operation record of a complete transaction at a first record location has a matching operation record at one of the record locations in the associated window of second record locations is determined. If so, the complete transaction in the first file and the transaction that includes the matching operation records in the second file are identified as matching transactions.Type: ApplicationFiled: June 2, 2014Publication date: September 18, 2014Applicant: International Business Machines CorporationInventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Sambandhan
-
Publication number: 20140236976Abstract: A predefined number of matches is identified between records in a first file and records in a second file. For the matches, determine the span of the actual range of record positions in the second file relative to the positions of the operation records in the first file within which all matches were found. If the actual span is smaller than the span of a current defined range of record positions by at least a first threshold value, decrease the span of the current defined range. If the actual span is within a second threshold value of the span of the current defined range, increase the span of the current defined range. If an amount above a third threshold value of operation records in the first file are not matched to operation records in the second file, increasing the span of the current defined range.Type: ApplicationFiled: February 21, 2013Publication date: August 21, 2014Applicant: International Business Machines CorporationInventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Sambandhan
-
Patent number: 8788471Abstract: A method for identifying matching transactions between two log files where each transaction includes one or more statements. Each log file record records the execution of a statement and includes a transaction identifier. Each record in turn in one log file is compared to an advancing window of records in the other log file. A first table contains associations of statements to transactions and transactions to statements for records in the window. If a match is found between a record in the one file and a record in the window, information associating partial transactions in the one file to potential transactions of the records in the window is added to a second table. If an end-of-transaction record is read from the one file, a best match is found between the ended transaction and the potential transactions based on information in the first and second tables.Type: GrantFiled: May 30, 2012Date of Patent: July 22, 2014Assignee: International Business Machines CorporationInventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Sambandhan
-
Patent number: 8788473Abstract: A method for identifying matching transactions between two log files where each transaction includes one or more statements. Each log file record records the execution of a statement and includes a transaction identifier. Each record in turn in one log file is compared to an advancing window of records in the other log file. A first table contains associations of statements to transactions and transactions to statements for records in the window. If a match is found between a record in the one file and a record in the window, information associating partial transactions in the one file to potential transactions of the records in the window is added to a second table. If an end-of-transaction record is read from the one file, a best match is found between the ended transaction and the potential transactions based on information in the first and second tables.Type: GrantFiled: May 31, 2013Date of Patent: July 22, 2014Assignee: International Business Machines CorporationInventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Samabandhan
-
Patent number: 8677366Abstract: Methods and arrangements for processing hierarchical data in a map-reduce framework. Hierarchical data is accepted, and a map-reduce job is performed on the hierarchical data. This performing of a map-reduce job includes determining a cost of partitioning the data, determining a cost of redefining the job and thereupon selectively performing at least one step taken from the group consisting of: partitioning the data and redefining the job.Type: GrantFiled: May 31, 2011Date of Patent: March 18, 2014Assignee: International Business Machines CorporationInventors: Manoj K. Agarwal, Himanshu Gupta, Rajeev Gupta, Sriram K. Padmanabhan, Prasan Roy
-
Publication number: 20130325826Abstract: A method for identifying matching transactions between two log files where each transaction includes one or more statements. Each log file record records the execution of a statement and includes a transaction identifier. Each record in turn in one log file is compared to an advancing window of records in the other log file. A first table contains associations of statements to transactions and transactions to statements for records in the window. If a match is found between a record in the one file and a record in the window, information associating partial transactions in the one file to potential transactions of the records in the window is added to a second table. If an end-of-transaction record is read from the one file, a best match is found between the ended transaction and the potential transactions based on information in the first and second tables.Type: ApplicationFiled: May 30, 2012Publication date: December 5, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Sambandhan
-
Publication number: 20130326534Abstract: A method, computer program product, and computer system for shared execution of mixed data flows, performed by one or more computing devices, comprises identifying one or more resource sharing opportunities across a plurality of parallel tasks. The plurality of parallel tasks includes zero or more relational operations and at least one non-relational operation. The plurality of parallel tasks relative to the relational operations and the at least one non-relational operation are executed. In response to executing the plurality of parallel tasks, one or more resources of the identified resource sharing opportunities is shared across the relational operations and the at least one non-relational operation.Type: ApplicationFiled: May 10, 2013Publication date: December 5, 2013Applicant: International Business Machines CorporationInventors: Rajeev Gupta, Padmashree Ravindra, Prasan Roy
-
Publication number: 20130326538Abstract: A method, computer program product, and computer system for shared execution of mixed data flows, performed by one or more computing devices, comprises identifying one or more resource sharing opportunities across a plurality of parallel tasks. The plurality of parallel tasks includes zero or more relational operations and at least one non-relational operation. The plurality of parallel tasks relative to the relational operations and the at least one non-relational operation are executed. In response to executing the plurality of parallel tasks, one or more resources of the identified resource sharing opportunities is shared across the relational operations and the at least one non-relational operation.Type: ApplicationFiled: May 31, 2012Publication date: December 5, 2013Applicant: International Business Machines CorporationInventors: RAJEEV GUPTA, Padmashree Ravindra, Prasan Roy
-
Publication number: 20130325829Abstract: A method for identifying matching transactions between two log files where each transaction includes one or more statements. Each log file record records the execution of a statement and includes a transaction identifier. Each record in turn in one log file is compared to an advancing window of records in the other log file. A first table contains associations of statements to transactions and transactions to statements for records in the window. If a match is found between a record in the one file and a record in the window, information associating partial transactions in the one file to potential transactions of the records in the window is added to a second table. If an end-of-transaction record is read from the one file, a best match is found between the ended transaction and the potential transactions based on information in the first and second tables.Type: ApplicationFiled: May 31, 2013Publication date: December 5, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Manoj K. Agarwal, Curt L. Cotner, Amitava Kundu, Prasan Roy, Rajesh Sambandhan