Patents by Inventor Peter Jay Haas

Peter Jay Haas has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20090192980
    Abstract: The task of estimating the number of distinct values (DVs) in a large dataset arises in a wide variety of settings in computer science and elsewhere. The present invention provides synopses for DV estimation in the setting of a partitioned dataset, as well as corresponding DV estimators that exploit these synopses. Whenever an output compound data partition is created via a multiset operation on a pair of (possibly compound) input partitions, the synopsis for the output partition can be obtained by combining the synopses of the input partitions. If the input partitions are compound partitions, it is not necessary to access the synopses for all the base partitions that were used to construct the input partitions. Superior (in certain cases near-optimal) accuracy in DV estimates is maintained, especially when the synopsis size is small. The synopses can be created in parallel, and can also handle deletions of individual partition elements.
    Type: Application
    Filed: January 30, 2008
    Publication date: July 30, 2009
    Applicant: International Business Machines Corporation
    Inventors: Kevin Scott Beyer, Rainer Gemulla, Peter Jay Haas, Berthold Reinwald, John Sismanis
  • Publication number: 20090150421
    Abstract: A system, an article, and a computer program product for estimating a cardinality value for a set of data values. In one embodiment, the system includes means for initializing a data structure for representing an array of counts; means for obtaining a data value from said set of data values; means for transforming said data value into a transformed string; means for modifying said data structure with said transformed string; means for obtaining a summary statistic value from said modified data structure, wherein the summary statistic value is based on the array of counts; and means for generating said estimated cardinality value using said summary statistic value.
    Type: Application
    Filed: November 26, 2008
    Publication date: June 11, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Walid Rjaibi, Peter Jay Haas
  • Patent number: 7543006
    Abstract: A sampling infrastructure/scheme that supports flexible, efficient, scalable and uniform sampling is disclosed. A sample is maintained in a compact histogram form while the sample footprint stays below a specified upper bound. If, at any point, the sample footprint exceeds the upper bound, then the compact representation is abandoned, the sample purged to obtain a subsample. The histogram of the purged subsample is expanded to a bag of values while sampling remaining data values of the partitioned subset. The expanded purged subsample is converted to a histogram and uniform random samples are yielded. The sampling scheme retains the bounded footprint property and to a partial degree the compact representation of the Concise Sampling scheme, while ensuring statistical uniformity. Samples from at least two partitioned subsets are merged on demand to yield uniform merged samples of combined partitions wherein the merged samples also maintain the histogram representation and bounded footprint property.
    Type: Grant
    Filed: August 31, 2006
    Date of Patent: June 2, 2009
    Assignee: International Business Machines Corporation
    Inventors: Paul Geoffrey Brown, Peter Jay Haas
  • Patent number: 7512574
    Abstract: A novel method is employed for collecting optimizer statistics for optimizing database queries by gathering feedback from the query execution engine about the observed cardinality of predicates and constructing and maintaining multidimensional histograms. This makes use of the correlation between data columns without employing an inefficient data scan. The maximum entropy principle is used to approximate the true data distribution by a histogram distribution that is as “simple” as possible while being consistent with the observed predicate cardinalities. Changes in the underlying data are readily adapted to, automatically detecting and eliminating inconsistent feedback information in an efficient manner. The size of the histogram is controlled by retaining only the most “important” feedback.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: March 31, 2009
    Assignee: International Business Machines Corporation
    Inventors: Peter Jay Haas, Volker Gerhard Markl, Nimrod Megiddo, Utkarsh Srivastava
  • Patent number: 7512629
    Abstract: The present invention provides a method of selectivity estimation in which preprocessing steps improve the feasibility and efficiency of the estimation. The preprocessing steps are partitioning (to make iterative scaling estimation terminate in a reasonable time for even large sets of predicates), forced partitioning (to enable partitioning in case there are no “natural” partitions, by finding the subsets of predicates to create partitions that least impact the overall solution); inconsistency resolution (in order to ensure that there always is a correct and feasible solution), and implied zero elimination (to ensure convergence of the iterative scaling computation under all circumstances). All of these preprocessing steps make a maximum entropy method of selectivity estimation produce a correct cardinality model, for any kind of query with conjuncts of predicates. In addition, the preprocessing steps can also be used in conjunction with prior art methods for building a cardinality model.
    Type: Grant
    Filed: July 13, 2006
    Date of Patent: March 31, 2009
    Assignee: International Business Machines Corporation
    Inventors: Peter Jay Haas, Marcel Kutsch, Volker Gerhard Markl, Nimrod Megiddo
  • Patent number: 7496584
    Abstract: A method for incrementally maintaining column cardinality estimates in database management systems. In one embodiment, the system includes system catalog table containing a cardinality estimate for a column that is extended to include an appropriate data structure. A modified linear counting technique is used in a first embodiment of a method for column cardinality estimation. The cardinality estimate is produced by an initial scan of the data but is then further maintained without requiring a full scan of the data. Data changes are reflected incrementally in modifications to the initial cardinality estimate, keeping the cardinality statistics more current with respect to the database condition. The technique of the invention typically provides a capability for a database management system to produce more efficient search plans providing more effective responses to user queries through the use of improved cardinality statistics.
    Type: Grant
    Filed: August 8, 2006
    Date of Patent: February 24, 2009
    Assignee: International Business Machines Corporation
    Inventors: Walid Rjaibi, Peter Jay Haas
  • Publication number: 20090006349
    Abstract: A method is disclosed for conducting a query to transform data in a pre-existing database, the method comprising: collecting database information from the pre-existing database, the database information including inconsistent dimensional tables and fact tables; running an entity discovery process on the inconsistent dimensional tables and the fact tables to produce entity mapping tables; using the entity mapping tables to resolve the inconsistent dimensional tables into resolved dimensional tables; and running the query on a resolved database to obtain a query result, the resolved database including the resolved dimensional table.
    Type: Application
    Filed: June 5, 2008
    Publication date: January 1, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ariel Fuxman, Peter Jay Haas, Berthold Reinwald, Yannis Sismanis, Ling Wang
  • Publication number: 20090006331
    Abstract: A method is disclosed for conducting a query to transform data in a pre-existing database, the method comprising: collecting database information from the pre-existing database, the database information including inconsistent dimensional tables and fact tables; running an entity discovery process on the inconsistent dimensional tables and the fact tables to produce entity mapping tables; using the entity mapping tables to resolve the inconsistent dimensional tables into resolved dimensional tables; and running the query on a resolved database to obtain a query result, the resolved database including the resolved dimensional table.
    Type: Application
    Filed: June 29, 2007
    Publication date: January 1, 2009
    Inventors: Ariel Fuxman, Peter Jay Haas, Berthold Reinwald, Yannis Sismanis, Ling Wang
  • Publication number: 20080228831
    Abstract: There is disclosed a data processing system implemented method, a data processing system, and an article of manufacture for directing a data processing system to maintain a database table associated with an initial maintenance scheduling interval. The data processing system implemented method includes selecting a randomizing factor, and selecting a new maintenance scheduling interval for the database table based on the initial maintenance scheduling interval and the selected randomizing factor.
    Type: Application
    Filed: March 31, 2008
    Publication date: September 18, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES
    Inventors: Ashraf Ismail Aboulnaga, Peter Jay Haas, Sam Sampson Lightstone, Volker Gerhard Markl, Ivan Popivanov, Vijayshankar Raman
  • Publication number: 20080133454
    Abstract: An autonomic tool that supervises the collection and maintenance of database statistics for query optimization by transparently deciding what statistics to gather, when and in what detail to gather them. Feedback from data-driven statistics collection is simultaneously combined with feedback from query-driven learning-based statistics collection, to better process both rapidly changing data and data that is queried frequently. The invention monitors table activity and decides if the data in a table has changed sufficiently to require a refresh of invalid statistics. The invention determines if the invalidity is due to correlation between purportedly independent data, outdated statistics, or statistics that have too few frequent values. Tables and column groups are ranked in order of statistical invalidity, and a limited computational budget is prioritized by ranking subsequent gathering of improved statistics.
    Type: Application
    Filed: October 29, 2004
    Publication date: June 5, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: VOLKER G. MARKL, PETER JAY HAAS, ASHRAF ISMAIL ABOULNAGA, VIJAYSHANKAR RAMAN, FELIX ENDRES
  • Patent number: 7363324
    Abstract: There is disclosed a data processing system implemented method, a data processing system, and an article of manufacture for directing a data processing system to maintain a database table associated with an initial maintenance scheduling interval. The data processing system implemented method includes selecting a randomizing factor, and selecting a new maintenance scheduling interval for the database table based on the initial maintenance scheduling interval and the selected randomizing factor.
    Type: Grant
    Filed: December 17, 2004
    Date of Patent: April 22, 2008
    Assignee: International Business Machines Corporation
    Inventors: Ashraf Ismail Aboulnaga, Peter Jay Haas, Sam Sampson Lightstone, Volker Gerhard Markl, Ivan Popivanov, Vijayshankar Raman
  • Publication number: 20080059540
    Abstract: A sampling infrastructure/scheme that supports flexible, efficient, scalable and uniform sampling is disclosed. A sample is maintained in a compact histogram form while the sample footprint stays below a specified upper bound. If, at any point, the sample footprint exceeds the upper bound, then the compact representation is abandoned, the sample purged to obtain a subsample. The histogram of the purged subsample is expanded to a bag of values while sampling remaining data values of the partitioned subset. The expanded purged subsample is converted to a histogram and uniform random samples are yielded. The sampling scheme retains the bounded footprint property and to a partial degree the compact representation of the Concise Sampling scheme, while ensuring statistical uniformity. Samples from at least two partitioned subsets are merged on demand to yield uniform merged samples of combined partitions wherein the merged samples also maintain the histogram representation and bounded footprint property.
    Type: Application
    Filed: August 31, 2006
    Publication date: March 6, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: PAUL GEOFFREY BROWN, PETER JAY HAAS
  • Publication number: 20080046455
    Abstract: A method is disclosed for automatically configuring database statistics by: collecting information from a database system, the database information including data query feedback; consolidating and formatting the database information into a plurality of intervals; converting the plurality of intervals into a plurality of non-overlapping buckets; computing frequencies for the buckets by solving a constrained maximum entropy problem to create a proxy data distribution function; and using the proxy data distribution function to determine a set of statistics to maintain for the database information.
    Type: Application
    Filed: August 16, 2006
    Publication date: February 21, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: ALEXANDER BEHM, PETER JAY HAAS, VOLKER GERHARD MARKL
  • Publication number: 20080016097
    Abstract: A method of selectivity estimation is disclosed in which preprocessing steps improve the feasibility and efficiency of the estimation. The preprocessing steps are: partitioning (to make iterative scaling estimation terminate in a reasonable time for even large sets of predicates); forced partitioning (to enable partitioning in case there are no “natural” partitions, by finding the subsets of predicates to create partitions that least impact the overall solution); inconsistency resolution (in order to ensure that there always is a correct and feasible solution); and implied zero elimination (to ensure convergence of the iterative scaling computation under all circumstances). All of these preprocessing steps make a maximum entropy method of selectivity estimation produce a correct cardinality model, for any kind of query with conjuncts of predicates. In addition, the preprocessing steps can also be used in conjunction with prior art methods for building a cardinality model.
    Type: Application
    Filed: July 13, 2006
    Publication date: January 17, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: PETER JAY HAAS, MARCEL KUTSCH, VOLKER GERHARD MARKL, NIMROD MEGIDDO
  • Patent number: 7277873
    Abstract: A scheme is used to automatically discover algebraic constraints between pairs of columns in relational data. The constraints may be “fuzzy” in that they hold for most, but not all, of the records, and the columns may be in the same table or different tables. The scheme first identifies candidate sets of column value pairs that are likely to satisfy an algebraic constraint. For each candidate, the scheme constructs algebraic constraints by applying statistical histogramming, segmentation, or clustering techniques to samples of column values. In query-optimization mode, the scheme automatically partitions the data into normal and exception records. During subsequent query processing, queries can be modified to incorporate the constraints; the optimizer uses the constraints to identify new, more efficient access paths. The results are then combined with the results of executing the original query against the (small) set of exception records.
    Type: Grant
    Filed: October 31, 2003
    Date of Patent: October 2, 2007
    Assignee: International Business Machines Corporaton
    Inventors: Paul Geoffrey Brown, Peter Jay Haas
  • Patent number: 7124146
    Abstract: A technique is provided for incrementally maintaining column cardinality estimates in database management systems. The system catalog table containing a cardinality estimate for a column is extended to include an appropriate data structure. A modified linear counting technique is used in a first embodiment of a method for column cardinality estimation. Moreover, a modified logarithmic counting technique is used in a second, preferred embodiment of a column cardinality estimation method to reduce storage requirements for the data structure. The cardinality estimate is produced by an initial scan of the data but is then further maintained without requiring a full scan of the data. Data changes are reflected incrementally in modifications to the initial cardinality estimate, keeping the cardinality statistics more current with respect to the database condition.
    Type: Grant
    Filed: April 30, 2003
    Date of Patent: October 17, 2006
    Assignee: International Business Machines Corporation
    Inventors: Walid Rjaibi, Peter Jay Haas
  • Publication number: 20060136499
    Abstract: There is disclosed a data processing system implemented method, a data processing system, and an article of manufacture for directing a data processing system to maintain a database table associated with an initial maintenance scheduling interval. The data processing system implemented method includes selecting a randomizing factor, and selecting a new maintenance scheduling interval for the database table based on the initial maintenance scheduling interval and the selected randomizing factor.
    Type: Application
    Filed: December 17, 2004
    Publication date: June 22, 2006
    Inventors: Ashraf Ismail Aboulnaga, Peter Jay Haas, Sam Sampson Lightstone, Volker Gerhard Markl, Ivan Popivanov, Vijayshankar Raman
  • Patent number: 6993516
    Abstract: A system, method and computer readable medium for sampling data from a relational database are disclosed, where an information processing system chooses rows from a table in a relational database for sampling, wherein data values are arranged into rows, rows are arranged into pages, and pages are arranged into tables. Pages are chosen for sampling according to a probability P and rows in a selected page are chosen for sampling according to a probability R, so that the overall probability of choosing a row for sampling is Q=PR. The probabilities P and R are based on the desired precision of estimates computed from a sample, as well as processing speed. The probabilities P and R are further based on either catalog statistics of the relational database or a pilot sample of rows from the relational database.
    Type: Grant
    Filed: December 26, 2002
    Date of Patent: January 31, 2006
    Assignee: International Business Machines Corporation
    Inventors: Peter Jay Haas, Guy Maring Lohman, Mir Hamid Pirahesh, David Everett Simmen, Ashutosh Vir Vikram Singh, Michael Jeffrey Winer, Markos Zaharioudakis
  • Publication number: 20040128290
    Abstract: A system, method and computer readable medium for sampling data from a relational database are disclosed, where an information processing system chooses rows from a table in a relational database for sampling, wherein data values are arranged into rows, rows are arranged into pages, and pages are arranged into tables. Pages are chosen for sampling according to a probability P and rows in a selected page are chosen for sampling according to a probability R, so that the overall probability of choosing a row for sampling is Q=PR. The probabilities P and R are based on the desired precision of estimates computed from a sample, as well as processing speed. The probabilities P and R are further based on either catalog statistics of the relational database or a pilot sample of rows from the relational database.
    Type: Application
    Filed: December 26, 2002
    Publication date: July 1, 2004
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Peter Jay Haas, Guy Maring Lohman, Mir Hamid Pirahesh, David Everett Simmen, Ashutosh Vir Vikram Singh, Michael Jeffrey Winer, Markos Zaharioudakis
  • Patent number: 6732110
    Abstract: The present invention is directed to a system, method and computer readable medium for estimating a column cardinality value for a column in a partitioned table stored in a plurality of nodes in a relational database. According to one embodiment of the present invention, a plurality of column values for the partitioned table stored in each node are hashed, and a hash data set for each node is generated. Each of the hash data sets from each node is transferred to a coordinator node designated from the plurality of nodes. The hash data sets are merged into a merged data set, and an estimated column cardinality value for the table is calculated from the merged data set.
    Type: Grant
    Filed: June 27, 2001
    Date of Patent: May 4, 2004
    Assignee: International Business Machines Corporation
    Inventors: Walid Rjaibi, Guy Maring Lohman, Peter Jay Haas