Patents by Inventor Artur M. Gruszecki

Artur M. Gruszecki has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

PARALLEL PREPARATION OF A QUERY EXECUTION PLAN IN A MASSIVELY PARALLEL PROCESSING ENVIRONMENT BASED ON GLOBAL AND LOW-LEVEL STATISTICS

Publication number: 20170147640

Abstract: In an approach to preparing a query execution plan, a host node receives a query implicating one or more data tables. The host node broadcasts one or more implicated data tables to one or more processing nodes. The host node receives a set of node-specific query execution plans and execution cost estimates associated with each of the node-specific query execution plans, which have been prepared in parallel based on global statistics and node-specific low level statistics. The host node selects an optimal query execution plan based on minimized execution cost.

Type: Application

Filed: November 23, 2015

Publication date: May 25, 2017

Inventors: Lukasz Gaza, Artur M. Gruszecki, Tomasz Kazalski, Konrad K. Skibski, Tomasz K. Stradomski
APPROXIMATE STRING MATCHING OPTIMIZATION FOR A DATABASE

Publication number: 20170124147

Abstract: Software for processing a database query that includes: (i) receiving a query of a database including a search value; (ii) determining a distance between the search value and at least one reference value; (iii) determining a maximum distance from the search value to be used in searching a plurality of datasets of the database, wherein the maximum distance from the search value defines a search range and is based, at least in part, on the determined distance between the search value and the at least one reference value; (iv) determining a subset of datasets from the plurality of datasets that includes datasets for which a data range with respect to each reference value overlaps with the search range; and (v) performing approximate string matching for the search value on the subset of datasets.

Type: Application

Filed: October 29, 2015

Publication date: May 4, 2017

Inventors: MICHAL BODZIONY, LUKASZ GAZA, ARTUR M. GRUSZECKI, TOMASZ KAZALSKI, KONRAD K. SKIBSKI, TOMASZ STRADOMSKI
EARLY DIAGNOSIS OF HARDWARE, SOFTWARE OR CONFIGURATION PROBLEMS IN DATA WAREHOUSE SYSTEM UTILIZING GROUPING OF QUERIES BASED ON QUERY PARAMETERS

Publication number: 20170123871

Abstract: A method, system and computer program product for providing early diagnosis of hardware, software or configuration problems in a data warehouse system. A received query is parsed to determine the properties of the query. The query may then be joined to existing groups of queries if those groups have shared properties of the query. After executing the query according to an execution plan, results from the execution of the query is received, which may include problem(s) that occurred during execution of the query. For those problems that reach a pre-defined threshold of becoming a “group problem” in those groups joined by the query, the problem is reported to the end user concerning those groups where the problem exceeds the pre-defined threshold. In this manner, an early diagnosis of the problems in the data warehouse system that can cause delay and failure of the processing of queries is able to occur.

Type: Application

Filed: October 28, 2015

Publication date: May 4, 2017

Inventors: Lukasz Gaza, Artur M. Gruszecki, Tomasz Kazalski, Bartlomiej T. Malecki, Konrad K. Skibski, Tomasz Stradomski
INTER-NODES MULTICASTING COMMUNICATION IN A MONITORING INFRASTRUCTURE

Publication number: 20170093660

Abstract: A method for determining when to send monitoring data to a server within a monitoring infrastructure. The method includes a first agent computer collecting a first instance of monitoring data relating to an alert on a computer system, wherein the collecting is based, at least in part, on a set of instructions received from a monitoring server, wherein the set of instructions includes instructions for determining whether the monitoring data is relevant to triggering the alert. The first agent then receives at least one second instance of monitoring data from a set of second agents. The first agent then determines whether the first instance of monitoring data is relevant to triggering the alert based, at least in part, on the first instance of monitoring data and the second instance of monitoring data. The first agent then determines whether to send the first instance of monitoring data to the monitoring server.

Type: Application

Filed: September 29, 2015

Publication date: March 30, 2017

Inventors: MICHAL BODZIONY, LUKASZ GAZA, ARTUR M. GRUSZECKI, TOMASZ KAZALSKI, KONRAD K. SKIBSKI
Avoidance of intermediate data skew in a massive parallel processing environment

Patent number: 9569494

Abstract: A computer-implemented method for minimizing join operation processing time within a database system based on estimated joined table spread of the database system has been provided. The computer-implemented method includes estimating value distribution of data in a joined table, wherein the joined table is a result of join operation between two instances of tables of a database system. The computer-implemented method further includes determining boundaries for partitioning at least one range of attributes of the estimated value distribution, wherein the boundaries for partitioning at least one range of attributes of the estimated value distribution corresponds to a same number of rows of the joined table. The computer-implemented method further includes determining at least one assignment of the determined partition of the at least one range of attributes to processing units of the database system.

Type: Grant

Filed: June 24, 2014

Date of Patent: February 14, 2017

Assignee: International Business Machines Corporation

Inventors: Lukasz Gaza, Artur M. Gruszecki, Tomasz Kazalski, Grzegorz S. Milka, Konrad K. Skibski, Tomasz Stradomski
Avoidance of intermediate data skew in a massive parallel processing environment

Patent number: 9569493

Abstract: A computer-implemented method for minimizing join operation processing time within a database system based on estimated joined table spread of the database system has been provided. The computer-implemented method includes, estimating value distribution of data in a joined table, wherein the joined table is a result of join operation between two instances of tables of a database system. The computer-implemented method further includes determining boundaries for partitioning at least one range of attributes of the estimated value distribution, wherein the boundaries for partitioning at least one range of attributes of the estimated value distribution corresponds to a same number of rows of the joined table. The computer-implemented method further includes determining at least one assignment of the determined partition of the at least one range of attributes to processing units of the database system.

Type: Grant

Filed: December 31, 2013

Date of Patent: February 14, 2017

Assignee: International Business Machines Corporatin

Inventors: Lukasz Gaza, Artur M. Gruszecki, Tomasz Kazalski, Grzegorz S. Milka, Konrad K. Skibski, Tomasz Stradomski
Method and system for estimating the size of a joined table

Patent number: 9460153

Abstract: A method, system, and/or computer program product estimate a cardinality of a joined table (T) obtained by joining at least a first data column (R) and a second data column (S), where R and S each comprise attribute values. A first density distribution function f(x) describes a frequency of the attribute values of R. A second density distribution function (g(x)) describes the frequency of the attribute values of S. A first information on values in R is based on a sample of values of R. A second information on values in S is based on a sample of values of S. One or more processors then estimate a cardinality of a joined table (T) based on the first and second density distribution function (f(x), g(x)) and the first and second information on values.

Type: Grant

Filed: October 14, 2013

Date of Patent: October 4, 2016

Assignee: International Business Machines Corporation

Inventors: Artur M. Gruszecki, Tomasz Kazalski, Grzegorz S. Milka, Konrad K. Skibski, Tomasz Stradomski
METHOD FOR PROCESSING A DATABASE QUERY

Publication number: 20160239538

Abstract: The invention relates to a computer-implemented method for processing a query in a database, the query comprising a search value. The database comprises a plurality of datasets the datasets comprising entries, wherein distance statistics are assigned to the datasets. The distance statistics describe the minimum and maximum distance between the values of the entries of a dataset of the plurality of datasets and a reference value. The method comprises determining the distance between the search value and the reference value, said determination resulting in a search distance, determining a subset of datasets from the plurality of datasets for which the search distance is within the limits given by the minimum and maximum distances described by the respective distance statistics, and searching for the search value in the subset of datasets.

Type: Application

Filed: March 9, 2016

Publication date: August 18, 2016

Inventors: Lukasz Gaza, Artur M. Gruszecki, Tomasz Kazalski, Konrad K. Skibski, Tomasz Stradomski
METHOD FOR PROCESSING A DATABASE QUERY

Publication number: 20160239549

Abstract: The invention relates to a computer-implemented method for processing a query in a database, the query comprising a search value. The database comprises a plurality of datasets the datasets comprising entries, wherein distance statistics are assigned to the datasets. The distance statistics describe the minimum and maximum distance between the values of the entries of a dataset of the plurality of datasets and a reference value. The method comprises determining the distance between the search value and the reference value, said determination resulting in a search distance, determining a subset of datasets from the plurality of datasets for which the search distance is within the limits given by the minimum and maximum distances described by the respective distance statistics, and searching for the search value in the subset of datasets.

Type: Application

Filed: February 13, 2015

Publication date: August 18, 2016

Inventors: Lukasz Gaza, Artur M. Gruszecki, Tomasz Kazalski, Konrad K. Skibski, Tomasz Stradomski
Functionality of decomposition data skew in asymmetric massively parallel processing databases

Patent number: 9355127

Abstract: Database queries are optimized through the functionality of decomposition data skew in an asymmetric massively parallel processing database system. A table having data skew is restructured by (1) storing original data values of a distribution key in a special switch column added to the table, (2) replacing the original data values of the distribution key with modified data values such as randomly generated data values, and (3) partitioning the rows across the nodes of the asymmetric massively parallel processing database system based on the distribution key. The original data values that are stored and replaced may only comprise a subset of the original data values that cause data skew in the table. Data skew is reduced, which improves performance, yet the original data values remain available, which reduces the impact on collocated joins.

Type: Grant

Filed: October 12, 2012

Date of Patent: May 31, 2016

Assignee: International Business Machines Corporation

Inventors: Lukasz Gaza, Artur M. Gruszecki, Tomasz Kazalski, Grzegorz S. Milka, Konrad Krzysztof Skibski, Tomasz Stradomski, Natalya A. Yanayt
SELECTIVITY ESTIMATION FOR QUERY EXECUTION PLANNING IN A DATABASE

Publication number: 20160110419

Abstract: A computer-implemented method of estimating selectivity of a query may include generating, for data stored in a database in a memory, a one-dimensional value distribution for each of a plurality of attributes of the data. A multidimensional histogram may be generated, wherein the multidimensional histogram includes the one-dimensional value distributions for the plurality of attributes of the data. The multidimensional histogram may be converted to a one-dimensional histogram by assigning each bucket of the multidimensional histogram to corresponding buckets of the one-dimensional histogram and ordering the corresponding buckets according to a space-filling curve. One or more bucket ranges of the one-dimensional histogram may be determined by mapping the query conditions on the one-dimensional histogram. The selectivity of the query may be estimated by estimating how many data values in the one or more bucket ranges will meet the query conditions.

Type: Application

Filed: April 23, 2015

Publication date: April 21, 2016

Inventors: Lukasz Gaza, Artur M. Gruszecki, Tomasz Kazalski, Konrad K. Skibski, Tomasz Stradomski
SELECTIVITY ESTIMATION FOR QUERY EXECUTION PLANNING IN A DATABASE

Publication number: 20160110426

Abstract: A computer-implemented method of estimating selectivity of a query may include generating, for data stored in a database in a memory, a one-dimensional value distribution for each of a plurality of attributes of the data. A multidimensional histogram may be generated, wherein the multidimensional histogram includes the one-dimensional value distributions for the plurality of attributes of the data. The multidimensional histogram may be converted to a one-dimensional histogram by assigning each bucket of the multidimensional histogram to corresponding buckets of the one-dimensional histogram and ordering the corresponding buckets according to a space-filling curve. One or more bucket ranges of the one-dimensional histogram may be determined by mapping the query conditions on the one-dimensional histogram. The selectivity of the query may be estimated by estimating how many data values in the one or more bucket ranges will meet the query conditions.

Type: Application

Filed: October 20, 2014

Publication date: April 21, 2016

Inventors: Lukasz Gaza, Artur M. Gruszecki, Tomasz Kazalski, Konrad K. Skibski, Tomasz Stradomski
OPTIMIZATION OF A PLURALITY OF TABLE PROCESSING OPERATIONS IN A MASSIVE PARALLEL PROCESSING ENVIRONMENT

Publication number: 20160098447

Abstract: A computer-implemented method for partitioning data for a query operation of one table of the database system is provided. The computer-implemented method comprises estimating a value distribution of the attribute in the result table based on a first value distribution of the attribute in the first column of the first table. The computer-implemented method further comprises determining boundaries for partitioning ranges of the attribute, based on the estimated value distribution, wherein the partitioning ranges correspond to a same number of rows of the result table. The computer-implemented method further comprises partitioning the first table with processing nodes of the query operation, based on the determined boundaries of partitioning ranges.

Type: Application

Filed: October 3, 2014

Publication date: April 7, 2016

Inventors: Lukasz Gaza, Artur M. Gruszecki, Tomasz Kazalski, Konrad K. Skibski, Tomasz Stradomski
OPTIMIZATION OF A PLURALITY OF TABLE PROCESSING OPERATIONS IN A MASSIVE PARALLEL PROCESSING ENVIRONMENT

Publication number: 20160098453

Abstract: A computer-implemented method for partitioning data for a query operation of one table of the database system is provided. The computer-implemented method comprises estimating a value distribution of the attribute in the result table based on a first value distribution of the attribute in the first column of the first table. The computer-implemented method further comprises determining boundaries for partitioning ranges of the attribute, based on the estimated value distribution, wherein the partitioning ranges correspond to a same number of rows of the result table. The computer-implemented method further comprises partitioning the first table with processing nodes of the query operation, based on the determined boundaries of partitioning ranges.

Type: Application

Filed: June 5, 2015

Publication date: April 7, 2016

Inventors: Lukasz Gaza, Artur M. Gruszecki, Tomasz Kazalski, Konrad K. Skibski, Tomasz Stradomski
DIRECTED BACKUP FOR MASSIVELY PARALLEL PROCESSING DATABASES

Publication number: 20150370651

Abstract: Creating a data backup of data on a first computer system to restore to a second computer system, each of the first and second computer system including one or more nodes, each node configured to manage a subset of the data. Receiving, by the first computer system, identification of data to back up and node configuration information for the second computer system. Creating, by the first computer system, a backup of the data from the one or more nodes of the first computer system, configured in accordance with the node configuration information of the second computer system, such that the backed up data is directly manageable by the one or more nodes of the second computer system.

Type: Application

Filed: February 5, 2015

Publication date: December 24, 2015

Inventors: Lukasz Gaza, Artur M. Gruszecki, Tomasz Kazalski, Konrad K. Skibski, Tomasz Stradomski
DIRECTED BACKUP FOR MASSIVELY PARALLEL PROCESSING DATABASES

Publication number: 20150370647

Abstract: Creating a data backup of data on a first computer system to restore to a second computer system, each of the first and second computer system including one or more nodes, each node configured to manage a subset of the data. Receiving, by the first computer system, identification of data to back up and node configuration information for the second computer system. Creating, by the first computer system, a backup of the data from the one or more nodes of the first computer system, configured in accordance with the node configuration information of the second computer system, such that the backed up data is directly manageable by the one or more nodes of the second computer system.

Type: Application

Filed: June 24, 2014

Publication date: December 24, 2015

Inventors: Lukasz Gaza, Artur M. Gruszecki, Tomasz Kazalski, Konrad K. Skibski, Tomasz Stradomski
AVOIDANCE OF INTERMEDIATE DATA SKEW IN A MASSIVE PARALLEL PROCESSING ENVIRONMENT

Publication number: 20150186466

Abstract: A computer-implemented method for minimizing join operation processing time within a database system based on estimated joined table spread of the database system has been provided. The computer-implemented method includes estimating value distribution of data in a joined table, wherein the joined table is a result of join operation between two instances of tables of a database system. The computer-implemented method further includes determining boundaries for partitioning at least one range of attributes of the estimated value distribution, wherein the boundaries for partitioning at least one range of attributes of the estimated value distribution corresponds to a same number of rows of the joined table. The computer-implemented method further includes determining at least one assignment of the determined partition of the at least one range of attributes to processing units of the database system.

Type: Application

Filed: June 24, 2014

Publication date: July 2, 2015

Inventors: Lukasz Gaza, ARTUR M. GRUSZECKI, TOMASZ KAZALSKI, GRZEGORZ S. MILKA, KONRAD K. SKIBSKI, TOMASZ STRADOMSKI
AVOIDANCE OF INTERMEDIATE DATA SKEW IN A MASSIVE PARALLEL PROCESSING ENVIRONMENT

Publication number: 20150186465

Abstract: A computer-implemented method for minimizing join operation processing time within a database system based on estimated joined table spread of the database system has been provided. The computer-implemented method includes, estimating value distribution of data in a joined table, wherein the joined table is a result of join operation between two instances of tables of a database system. The computer-implemented method further includes determining boundaries for partitioning at least one range of attributes of the estimated value distribution, wherein the boundaries for partitioning at least one range of attributes of the estimated value distribution corresponds to a same number of rows of the joined table. The computer-implemented method further includes determining at least one assignment of the determined partition of the at least one range of attributes to processing units of the database system.

Type: Application

Filed: December 31, 2013

Publication date: July 2, 2015

Applicant: International Business Machines Corporation

Inventors: Lukasz Gaza, ARTUR M. GRUSZECKI, TOMASZ KAZALSKI, GRZEGORZ S. MILKA, KONRAD K. SKIBSKI, TOMASZ STRADOMSKI
OPTIMIZING AN ORDER OF EXECUTION OF MULTIPLE JOIN OPERATIONS

Publication number: 20140156635

Abstract: A computer-implemented method, system, and/or computer program product optimizes an order of execution of column join operations. A first partitioning of the first data column splits the first data column into first subsets of rows. A second partitioning of the second data column splits the second data column into a second subsets of rows. A first value frequency information indicates a frequency of attribute values within a subset of rows of the first data column processed. A second value frequency information indicates a frequency of attribute values within a subset of rows of the second data column. Cardinalities of sub-tables derived by a respective joining of the subsets of rows of the first and second data columns are estimated, based on the first and second value frequency information. An order of execution of multiple join operations is then optimized based on the estimated cardinalities of the sub-tables.

Type: Application

Filed: November 11, 2013

Publication date: June 5, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: MAREK GROCHOWSKI, ARTUR M. GRUSZECKI, TOMASZ KAZALSKI, GRZEGORZ S. MILKA, KONRAD K. SKIBSKI, TOMASZ STRADOMSKI
METHOD AND SYSTEM FOR ESTIMATING THE SIZE OF A JOINED TABLE

Publication number: 20140149388

Abstract: A method, system, and/or computer program product estimate a cardinality of a joined table (T) obtained by joining at least a first data column (R) and a second data column (S), where R and S each comprise attribute values. A first density distribution function f(x) describes a frequency of the attribute values of R. A second density distribution function (g(x)) describes the frequency of the attribute values of S. A first information on values in R is based on a sample of values of R. A second information on values in S is based on a sample of values of S. One or more processors then estimate a cardinality of a joined table (T) based on the first and second density distribution function (f(x), g(x)) and the first and second information on values.

Type: Application

Filed: October 14, 2013

Publication date: May 29, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: ARTUR M. GRUSZECKI, TOMASZ KAZALSKI, GRZEGORZ S. MILKA, KONRAD K. SKIBSKI, TOMASZ STRADOMSKI

prev 1 2 3 4 5 next