Patents by Inventor Peter Jay Haas

Peter Jay Haas has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method for Estimating the Number of Distinct Values in a Partitioned Dataset

Publication number: 20090192980

Abstract: The task of estimating the number of distinct values (DVs) in a large dataset arises in a wide variety of settings in computer science and elsewhere. The present invention provides synopses for DV estimation in the setting of a partitioned dataset, as well as corresponding DV estimators that exploit these synopses. Whenever an output compound data partition is created via a multiset operation on a pair of (possibly compound) input partitions, the synopsis for the output partition can be obtained by combining the synopses of the input partitions. If the input partitions are compound partitions, it is not necessary to access the synopses for all the base partitions that were used to construct the input partitions. Superior (in certain cases near-optimal) accuracy in DV estimates is maintained, especially when the synopsis size is small. The synopses can be created in parallel, and can also handle deletions of individual partition elements.

Type: Application

Filed: January 30, 2008

Publication date: July 30, 2009

Applicant: International Business Machines Corporation

Inventors: Kevin Scott Beyer, Rainer Gemulla, Peter Jay Haas, Berthold Reinwald, John Sismanis
INCREMENTAL CARDINALITY ESTIMATION FOR A SET OF DATA VALUES

Publication number: 20090150421

Abstract: A system, an article, and a computer program product for estimating a cardinality value for a set of data values. In one embodiment, the system includes means for initializing a data structure for representing an array of counts; means for obtaining a data value from said set of data values; means for transforming said data value into a transformed string; means for modifying said data structure with said transformed string; means for obtaining a summary statistic value from said modified data structure, wherein the summary statistic value is based on the array of counts; and means for generating said estimated cardinality value using said summary statistic value.

Type: Application

Filed: November 26, 2008

Publication date: June 11, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Walid Rjaibi, Peter Jay Haas
Flexible, efficient and scalable sampling

Patent number: 7543006

Abstract: A sampling infrastructure/scheme that supports flexible, efficient, scalable and uniform sampling is disclosed. A sample is maintained in a compact histogram form while the sample footprint stays below a specified upper bound. If, at any point, the sample footprint exceeds the upper bound, then the compact representation is abandoned, the sample purged to obtain a subsample. The histogram of the purged subsample is expanded to a bag of values while sampling remaining data values of the partitioned subset. The expanded purged subsample is converted to a histogram and uniform random samples are yielded. The sampling scheme retains the bounded footprint property and to a partial degree the compact representation of the Concise Sampling scheme, while ensuring statistical uniformity. Samples from at least two partitioned subsets are merged on demand to yield uniform merged samples of combined partitions wherein the merged samples also maintain the histogram representation and bounded footprint property.

Type: Grant

Filed: August 31, 2006

Date of Patent: June 2, 2009

Assignee: International Business Machines Corporation

Inventors: Paul Geoffrey Brown, Peter Jay Haas
Consistent histogram maintenance using query feedback

Patent number: 7512574

Abstract: A novel method is employed for collecting optimizer statistics for optimizing database queries by gathering feedback from the query execution engine about the observed cardinality of predicates and constructing and maintaining multidimensional histograms. This makes use of the correlation between data columns without employing an inefficient data scan. The maximum entropy principle is used to approximate the true data distribution by a histogram distribution that is as “simple” as possible while being consistent with the observed predicate cardinalities. Changes in the underlying data are readily adapted to, automatically detecting and eliminating inconsistent feedback information in an efficient manner. The size of the histogram is controlled by retaining only the most “important” feedback.

Type: Grant

Filed: September 30, 2005

Date of Patent: March 31, 2009

Assignee: International Business Machines Corporation

Inventors: Peter Jay Haas, Volker Gerhard Markl, Nimrod Megiddo, Utkarsh Srivastava
Consistent and unbiased cardinality estimation for complex queries with conjuncts of predicates

Patent number: 7512629

Abstract: The present invention provides a method of selectivity estimation in which preprocessing steps improve the feasibility and efficiency of the estimation. The preprocessing steps are partitioning (to make iterative scaling estimation terminate in a reasonable time for even large sets of predicates), forced partitioning (to enable partitioning in case there are no “natural” partitions, by finding the subsets of predicates to create partitions that least impact the overall solution); inconsistency resolution (in order to ensure that there always is a correct and feasible solution), and implied zero elimination (to ensure convergence of the iterative scaling computation under all circumstances). All of these preprocessing steps make a maximum entropy method of selectivity estimation produce a correct cardinality model, for any kind of query with conjuncts of predicates. In addition, the preprocessing steps can also be used in conjunction with prior art methods for building a cardinality model.

Type: Grant

Filed: July 13, 2006

Date of Patent: March 31, 2009

Assignee: International Business Machines Corporation

Inventors: Peter Jay Haas, Marcel Kutsch, Volker Gerhard Markl, Nimrod Megiddo
Incremental cardinality estimation for a set of data values

Patent number: 7496584

Abstract: A method for incrementally maintaining column cardinality estimates in database management systems. In one embodiment, the system includes system catalog table containing a cardinality estimate for a column that is extended to include an appropriate data structure. A modified linear counting technique is used in a first embodiment of a method for column cardinality estimation. The cardinality estimate is produced by an initial scan of the data but is then further maintained without requiring a full scan of the data. Data changes are reflected incrementally in modifications to the initial cardinality estimate, keeping the cardinality statistics more current with respect to the database condition. The technique of the invention typically provides a capability for a database management system to produce more efficient search plans providing more effective responses to user queries through the use of improved cardinality statistics.

Type: Grant

Filed: August 8, 2006

Date of Patent: February 24, 2009

Assignee: International Business Machines Corporation

Inventors: Walid Rjaibi, Peter Jay Haas
ENTITY-BASED BUSINESS INTELLIGENCE

Publication number: 20090006349

Abstract: A method is disclosed for conducting a query to transform data in a pre-existing database, the method comprising: collecting database information from the pre-existing database, the database information including inconsistent dimensional tables and fact tables; running an entity discovery process on the inconsistent dimensional tables and the fact tables to produce entity mapping tables; using the entity mapping tables to resolve the inconsistent dimensional tables into resolved dimensional tables; and running the query on a resolved database to obtain a query result, the resolved database including the resolved dimensional table.

Type: Application

Filed: June 5, 2008

Publication date: January 1, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ariel Fuxman, Peter Jay Haas, Berthold Reinwald, Yannis Sismanis, Ling Wang
ENTITY-BASED BUSINESS INTELLIGENCE

Publication number: 20090006331

Abstract: A method is disclosed for conducting a query to transform data in a pre-existing database, the method comprising: collecting database information from the pre-existing database, the database information including inconsistent dimensional tables and fact tables; running an entity discovery process on the inconsistent dimensional tables and the fact tables to produce entity mapping tables; using the entity mapping tables to resolve the inconsistent dimensional tables into resolved dimensional tables; and running the query on a resolved database to obtain a query result, the resolved database including the resolved dimensional table.

Type: Application

Filed: June 29, 2007

Publication date: January 1, 2009

Inventors: Ariel Fuxman, Peter Jay Haas, Berthold Reinwald, Yannis Sismanis, Ling Wang
METHOD, SYSTEM AND PROGRAM FOR PRIORITIZING MAINTENANCE OF DATABASE TABLES

Publication number: 20080228831

Abstract: There is disclosed a data processing system implemented method, a data processing system, and an article of manufacture for directing a data processing system to maintain a database table associated with an initial maintenance scheduling interval. The data processing system implemented method includes selecting a randomizing factor, and selecting a new maintenance scheduling interval for the database table based on the initial maintenance scheduling interval and the selected randomizing factor.

Type: Application

Filed: March 31, 2008

Publication date: September 18, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES

Inventors: Ashraf Ismail Aboulnaga, Peter Jay Haas, Sam Sampson Lightstone, Volker Gerhard Markl, Ivan Popivanov, Vijayshankar Raman
SYSTEM AND METHOD FOR UPDATING DATABASE STATISTICS ACCORDING TO QUERY FEEDBACK

Publication number: 20080133454

Abstract: An autonomic tool that supervises the collection and maintenance of database statistics for query optimization by transparently deciding what statistics to gather, when and in what detail to gather them. Feedback from data-driven statistics collection is simultaneously combined with feedback from query-driven learning-based statistics collection, to better process both rapidly changing data and data that is queried frequently. The invention monitors table activity and decides if the data in a table has changed sufficiently to require a refresh of invalid statistics. The invention determines if the invalidity is due to correlation between purportedly independent data, outdated statistics, or statistics that have too few frequent values. Tables and column groups are ranked in order of statistical invalidity, and a limited computational budget is prioritized by ranking subsequent gathering of improved statistics.

Type: Application

Filed: October 29, 2004

Publication date: June 5, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: VOLKER G. MARKL, PETER JAY HAAS, ASHRAF ISMAIL ABOULNAGA, VIJAYSHANKAR RAMAN, FELIX ENDRES
Method, system and program for prioritizing maintenance of database tables

Patent number: 7363324

Abstract: There is disclosed a data processing system implemented method, a data processing system, and an article of manufacture for directing a data processing system to maintain a database table associated with an initial maintenance scheduling interval. The data processing system implemented method includes selecting a randomizing factor, and selecting a new maintenance scheduling interval for the database table based on the initial maintenance scheduling interval and the selected randomizing factor.

Type: Grant

Filed: December 17, 2004

Date of Patent: April 22, 2008

Assignee: International Business Machines Corporation

Inventors: Ashraf Ismail Aboulnaga, Peter Jay Haas, Sam Sampson Lightstone, Volker Gerhard Markl, Ivan Popivanov, Vijayshankar Raman
FLEXIBLE, EFFICIENT AND SCALABLE SAMPLING

Publication number: 20080059540

Abstract: A sampling infrastructure/scheme that supports flexible, efficient, scalable and uniform sampling is disclosed. A sample is maintained in a compact histogram form while the sample footprint stays below a specified upper bound. If, at any point, the sample footprint exceeds the upper bound, then the compact representation is abandoned, the sample purged to obtain a subsample. The histogram of the purged subsample is expanded to a bag of values while sampling remaining data values of the partitioned subset. The expanded purged subsample is converted to a histogram and uniform random samples are yielded. The sampling scheme retains the bounded footprint property and to a partial degree the compact representation of the Concise Sampling scheme, while ensuring statistical uniformity. Samples from at least two partitioned subsets are merged on demand to yield uniform merged samples of combined partitions wherein the merged samples also maintain the histogram representation and bounded footprint property.

Type: Application

Filed: August 31, 2006

Publication date: March 6, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: PAUL GEOFFREY BROWN, PETER JAY HAAS
QUERY FEEDBACK-BASED CONFIGURATION OF DATABASE STATISTICS

Publication number: 20080046455

Abstract: A method is disclosed for automatically configuring database statistics by: collecting information from a database system, the database information including data query feedback; consolidating and formatting the database information into a plurality of intervals; converting the plurality of intervals into a plurality of non-overlapping buckets; computing frequencies for the buckets by solving a constrained maximum entropy problem to create a proxy data distribution function; and using the proxy data distribution function to determine a set of statistics to maintain for the database information.

Type: Application

Filed: August 16, 2006

Publication date: February 21, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: ALEXANDER BEHM, PETER JAY HAAS, VOLKER GERHARD MARKL
CONSISTENT AND UNBIASED CARDINALITY ESTIMATION FOR COMPLEX QUERIES WITH CONJUNCTS OF PREDICATES

Publication number: 20080016097

Abstract: A method of selectivity estimation is disclosed in which preprocessing steps improve the feasibility and efficiency of the estimation. The preprocessing steps are: partitioning (to make iterative scaling estimation terminate in a reasonable time for even large sets of predicates); forced partitioning (to enable partitioning in case there are no “natural” partitions, by finding the subsets of predicates to create partitions that least impact the overall solution); inconsistency resolution (in order to ensure that there always is a correct and feasible solution); and implied zero elimination (to ensure convergence of the iterative scaling computation under all circumstances). All of these preprocessing steps make a maximum entropy method of selectivity estimation produce a correct cardinality model, for any kind of query with conjuncts of predicates. In addition, the preprocessing steps can also be used in conjunction with prior art methods for building a cardinality model.

Type: Application

Filed: July 13, 2006

Publication date: January 17, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: PETER JAY HAAS, MARCEL KUTSCH, VOLKER GERHARD MARKL, NIMROD MEGIDDO
Method for discovering undeclared and fuzzy rules in databases

Patent number: 7277873

Abstract: A scheme is used to automatically discover algebraic constraints between pairs of columns in relational data. The constraints may be “fuzzy” in that they hold for most, but not all, of the records, and the columns may be in the same table or different tables. The scheme first identifies candidate sets of column value pairs that are likely to satisfy an algebraic constraint. For each candidate, the scheme constructs algebraic constraints by applying statistical histogramming, segmentation, or clustering techniques to samples of column values. In query-optimization mode, the scheme automatically partitions the data into normal and exception records. During subsequent query processing, queries can be modified to incorporate the constraints; the optimizer uses the constraints to identify new, more efficient access paths. The results are then combined with the results of executing the original query against the (small) set of exception records.

Type: Grant

Filed: October 31, 2003

Date of Patent: October 2, 2007

Assignee: International Business Machines Corporaton

Inventors: Paul Geoffrey Brown, Peter Jay Haas
Incremental cardinality estimation for a set of data values

Patent number: 7124146

Abstract: A technique is provided for incrementally maintaining column cardinality estimates in database management systems. The system catalog table containing a cardinality estimate for a column is extended to include an appropriate data structure. A modified linear counting technique is used in a first embodiment of a method for column cardinality estimation. Moreover, a modified logarithmic counting technique is used in a second, preferred embodiment of a column cardinality estimation method to reduce storage requirements for the data structure. The cardinality estimate is produced by an initial scan of the data but is then further maintained without requiring a full scan of the data. Data changes are reflected incrementally in modifications to the initial cardinality estimate, keeping the cardinality statistics more current with respect to the database condition.

Type: Grant

Filed: April 30, 2003

Date of Patent: October 17, 2006

Assignee: International Business Machines Corporation

Inventors: Walid Rjaibi, Peter Jay Haas
Method, system and program for prioritizing maintenance of database tables

Publication number: 20060136499

Abstract: There is disclosed a data processing system implemented method, a data processing system, and an article of manufacture for directing a data processing system to maintain a database table associated with an initial maintenance scheduling interval. The data processing system implemented method includes selecting a randomizing factor, and selecting a new maintenance scheduling interval for the database table based on the initial maintenance scheduling interval and the selected randomizing factor.

Type: Application

Filed: December 17, 2004

Publication date: June 22, 2006

Inventors: Ashraf Ismail Aboulnaga, Peter Jay Haas, Sam Sampson Lightstone, Volker Gerhard Markl, Ivan Popivanov, Vijayshankar Raman
Efficient sampling of a relational database

Patent number: 6993516

Abstract: A system, method and computer readable medium for sampling data from a relational database are disclosed, where an information processing system chooses rows from a table in a relational database for sampling, wherein data values are arranged into rows, rows are arranged into pages, and pages are arranged into tables. Pages are chosen for sampling according to a probability P and rows in a selected page are chosen for sampling according to a probability R, so that the overall probability of choosing a row for sampling is Q=PR. The probabilities P and R are based on the desired precision of estimates computed from a sample, as well as processing speed. The probabilities P and R are further based on either catalog statistics of the relational database or a pilot sample of rows from the relational database.

Type: Grant

Filed: December 26, 2002

Date of Patent: January 31, 2006

Assignee: International Business Machines Corporation

Inventors: Peter Jay Haas, Guy Maring Lohman, Mir Hamid Pirahesh, David Everett Simmen, Ashutosh Vir Vikram Singh, Michael Jeffrey Winer, Markos Zaharioudakis
Efficient sampling of a relational database

Publication number: 20040128290

Abstract: A system, method and computer readable medium for sampling data from a relational database are disclosed, where an information processing system chooses rows from a table in a relational database for sampling, wherein data values are arranged into rows, rows are arranged into pages, and pages are arranged into tables. Pages are chosen for sampling according to a probability P and rows in a selected page are chosen for sampling according to a probability R, so that the overall probability of choosing a row for sampling is Q=PR. The probabilities P and R are based on the desired precision of estimates computed from a sample, as well as processing speed. The probabilities P and R are further based on either catalog statistics of the relational database or a pilot sample of rows from the relational database.

Type: Application

Filed: December 26, 2002

Publication date: July 1, 2004

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Peter Jay Haas, Guy Maring Lohman, Mir Hamid Pirahesh, David Everett Simmen, Ashutosh Vir Vikram Singh, Michael Jeffrey Winer, Markos Zaharioudakis
Estimation of column cardinality in a partitioned relational database

Patent number: 6732110

Abstract: The present invention is directed to a system, method and computer readable medium for estimating a column cardinality value for a column in a partitioned table stored in a plurality of nodes in a relational database. According to one embodiment of the present invention, a plurality of column values for the partitioned table stored in each node are hashed, and a hash data set for each node is generated. Each of the hash data sets from each node is transferred to a coordinator node designated from the plurality of nodes. The hash data sets are merged into a merged data set, and an estimated column cardinality value for the table is calculated from the merged data set.

Type: Grant

Filed: June 27, 2001

Date of Patent: May 4, 2004

Assignee: International Business Machines Corporation

Inventors: Walid Rjaibi, Guy Maring Lohman, Peter Jay Haas

prev 1 2 3 next