Patents by Inventor Peter Jay Haas

Peter Jay Haas has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20200117732
    Abstract: Techniques for analysis of relationship consistency are provided. A plurality of relationships is extracted from a plurality of documents, and a binary matrix is generated based on the plurality of relationships. A first relationship, of the plurality of relationships, is identified to be verified. A score of the first relationship in the binary matrix is set to a predefined value. Further, a factorization is performed on the binary matrix to produce a first matrix and a second matrix. A first consistency score is calculated for the first relationship by multiplying at least a portion of the first matrix and a second matrix. The first consistency score is ranked as compared to at least one other consistency score associated with at least one other relationship of the plurality of relationships. Finally, an indication of the first relationship is provided, based on the ranking.
    Type: Application
    Filed: October 11, 2018
    Publication date: April 16, 2020
    Inventors: William Scott SPANGLER, Peter Jay HAAS, Alix LACOSTE, Meenakshi NAGARAJAN, Sheng Hua BAO, Feng WANG
  • Patent number: 8983879
    Abstract: Systems and methods directed toward processing optimization problems using loss functions, wherein a loss function is decomposed into at least one stratum loss function, a loss is decreased for each stratum loss function to a predefined stratum loss threshold individually using gradient descent, and the overall loss is decreased to a predefined threshold for the loss function by appropriately ordering the processing of the strata and spending appropriate processing time in each stratum. Other embodiments and aspects are also described herein.
    Type: Grant
    Filed: August 27, 2012
    Date of Patent: March 17, 2015
    Assignee: International Business Machines Corporation
    Inventors: Rainer Gemulla, Peter Jay Haas, John Sismanis
  • Patent number: 8903748
    Abstract: Systems and methods directed toward processing optimization problems using loss functions, wherein a loss function is decomposed into at least one stratum loss function, a loss is decreased for each stratum loss function to a predefined stratum loss threshold individually using gradient descent, and the overall loss is decreased to a predefined threshold for the loss function by appropriately ordering the processing of the strata and spending appropriate processing time in each stratum. Other embodiments and aspects are also described herein.
    Type: Grant
    Filed: June 27, 2011
    Date of Patent: December 2, 2014
    Assignee: International Business Machines Corporation
    Inventors: Rainer Gemulla, Peter Jay Haas, John Sismanis
  • Patent number: 8838648
    Abstract: A method and system for discovering keys in a database. A minimal set of non-keys of the database are found. The database includes at least two entities and at least two attributes. The minimal set of non-keys includes at least two non-keys. Each entity independently includes a value of each attribute. A set of keys of the database is generated from the minimal set of non-keys. Each key of the generated set of keys independently is a unitary key consisting of one attribute or a composite key consisting of at least two attributes.
    Type: Grant
    Filed: August 17, 2006
    Date of Patent: September 16, 2014
    Assignee: International Business Machines Corporation
    Inventors: John Sismanis, Peter Jay Haas, Berthold Reinwald
  • Publication number: 20120330867
    Abstract: Systems and methods directed toward processing optimization problems using loss functions, wherein a loss function is decomposed into at least one stratum loss function, a loss is decreased for each stratum loss function to a predefined stratum loss threshold individually using gradient descent, and the overall loss is decreased to a predefined threshold for the loss function by appropriately ordering the processing of the strata and spending appropriate processing time in each stratum. Other embodiments and aspects are also described herein.
    Type: Application
    Filed: June 27, 2011
    Publication date: December 27, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Rainer Gemulla, Peter Jay Haas, John Sismanis
  • Publication number: 20120331025
    Abstract: Systems and methods directed toward processing optimization problems using loss functions, wherein a loss function is decomposed into at least one stratum loss function, a loss is decreased for each stratum loss function to a predefined stratum loss threshold individually using gradient descent, and the overall loss is decreased to a predefined threshold for the loss function by appropriately ordering the processing of the strata and spending appropriate processing time in each stratum. Other embodiments and aspects are also described herein.
    Type: Application
    Filed: August 27, 2012
    Publication date: December 27, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Rainer Gemulla, Peter Jay Haas, John Sismanis
  • Publication number: 20120254238
    Abstract: According to one embodiment of the present invention, a method for managing uncertain data is provided. The method includes specifying data uncertainty using at least one variable generation (VG) function. The VG function generates pseudorandom samples of uncertain data values. A random database based on the VG function is specified and multiple Monte Carlo instantiations of the random database are generated. Using a Monte Carlo method, a query is repeatedly executed over the multiple Monte Carlo instantiations to output a Monte Carlo method result and associated query-results. The Monte Carlo method result may then be used to estimate statistical properties of a probability distribution of the query-result.
    Type: Application
    Filed: June 13, 2012
    Publication date: October 4, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Peter Jay Haas, Ravindranath Jampani, Christopher Matthew Jermaine, Luis Leopoldo Perez, Mingxi Wu, Fei Xu
  • Patent number: 8234295
    Abstract: According to one embodiment of the present invention, a method for managing uncertain data is provided. The method includes specifying data uncertainty using at least one variable generation (VG) function, wherein the VG function generates pseudorandom samples of uncertain data values. A random database based on the VG function is specified. and multiple Monte Carlo instantiations of the random database are generated. Using a Monte Carlo method, a query is repeatedly executed over the multiple Monte Carlo instantiations to output a Monte Carlo method result and associated query-results. The Monte Carlo method result may then be used to estimate statistical properties of a probability distribution of the query-result.
    Type: Grant
    Filed: June 3, 2009
    Date of Patent: July 31, 2012
    Assignees: International Business Machines Corporation, University of Florida Research Foundation, Inc.
    Inventors: Peter Jay Haas, Ravindranath Jampani, Chistopher Matthew Jermaine, Luis Leopoldo Perez, Mingxi Wu, Fei Xu
  • Patent number: 8176088
    Abstract: A system, an article, and a computer program product for estimating a cardinality value for a set of data values. In one embodiment, the system includes means for initializing a data structure for representing an array of counts; means for obtaining a data value from said set of data values; means for transforming said data value into a transformed string; means for modifying said data structure with said transformed string; means for obtaining a summary statistic value from said modified data structure, wherein the summary statistic value is based on the array of counts; and means for generating said estimated cardinality value using said summary statistic value.
    Type: Grant
    Filed: November 26, 2008
    Date of Patent: May 8, 2012
    Assignee: International Business Machines Corporation
    Inventors: Walid Rjaibi, Peter Jay Haas
  • Patent number: 8140490
    Abstract: There is disclosed a data processing system implemented method, a data processing system, and an article of manufacture for directing a data processing system to maintain a database table associated with an initial maintenance scheduling interval. The data processing system implemented method includes selecting a randomizing factor, and selecting a new maintenance scheduling interval for the database table based on the initial maintenance scheduling interval and the selected randomizing factor.
    Type: Grant
    Filed: March 31, 2008
    Date of Patent: March 20, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ashraf Ismail Aboulnaga, Peter Jay Haas, Sam Sampson Lightstone, Volker Gerhard Markl, Ivan Popivanov, Vijayshankar Raman
  • Patent number: 8140466
    Abstract: One embodiment of the present invention provides a method for incrementally maintaining a Bernoulli sample S with sampling rate q over a multiset R in the presence of update, delete, and insert transactions. The method includes processing items inserted into R using Bernoulli sampling and augmenting S with tracking counters during this processing. Items deleted from R are processed by using the tracking counters and by removing newly deleted items from S using a calculated probability while maintaining a degree of uniformity in S.
    Type: Grant
    Filed: April 24, 2008
    Date of Patent: March 20, 2012
    Assignee: International Business Machines Corporation
    Inventors: Rainer Gemulla, Peter Jay Haas, Wolfgang Lehner
  • Patent number: 7987177
    Abstract: The task of estimating the number of distinct values (DVs) in a large dataset arises in a wide variety of settings in computer science and elsewhere. The present invention provides synopses for DV estimation in the setting of a partitioned dataset, as well as corresponding DV estimators that exploit these synopses. Whenever an output compound data partition is created via a multiset operation on a pair of (possibly compound) input partitions, the synopsis for the output partition can be obtained by combining the synopses of the input partitions. If the input partitions are compound partitions, it is not necessary to access the synopses for all the base partitions that were used to construct the input partitions. Superior (in certain cases near-optimal) accuracy in DV estimates is maintained, especially when the synopsis size is small. The synopses can be created in parallel, and can also handle deletions of individual partition elements.
    Type: Grant
    Filed: January 30, 2008
    Date of Patent: July 26, 2011
    Assignee: International Business Machines Corporation
    Inventors: Kevin Scott Beyer, Rainer Gemulla, Peter Jay Haas, Berthold Reinwald, John Sismanis
  • Patent number: 7979436
    Abstract: A method is disclosed for conducting a query to transform data in a pre-existing database, the method comprising: collecting database information from the pre-existing database, the database information including inconsistent dimensional tables and fact tables; running an entity discovery process on the inconsistent dimensional tables and the fact tables to produce entity mapping tables; using the entity mapping tables to resolve the inconsistent dimensional tables into resolved dimensional tables; and running the query on a resolved database to obtain a query result, the resolved database including the resolved dimensional table.
    Type: Grant
    Filed: June 5, 2008
    Date of Patent: July 12, 2011
    Assignee: International Business Machines Corporation
    Inventors: Ariel Fuxman, Peter Jay Haas, Berthold Reinwald, Yannis Sismanis, Ling Wang
  • Publication number: 20100312775
    Abstract: According to one embodiment of the present invention, a method for managing uncertain data is provided. The method includes specifying data uncertainty using at least one variable generation (VG) function, wherein the VG function generates pseudorandom samples of uncertain data values. A random database based on the VG function is specified. and multiple Monte Carlo instantiations of the random database are generated. Using a Monte Carlo method, a query is repeatedly executed over the multiple Monte Carlo instantiations to output a Monte Carlo method result and associated query-results. The Monte Carlo method result may then be used to estimate statistical properties of a probability distribution of the query-result.
    Type: Application
    Filed: June 3, 2009
    Publication date: December 9, 2010
    Applicant: International Business Machines Corporation
    Inventors: Peter Jay Haas, Ravindranath Jampani, Chistopher Matthew Jermaine, Luis Leopoldo Perez, Mingxi Wu, Fei Xu
  • Patent number: 7831592
    Abstract: An autonomic tool that supervises the collection and maintenance of database statistics for query optimization by transparently deciding what statistics to gather, when and in what detail to gather them. Feedback from data-driven statistics collection is simultaneously combined with feedback from query-driven learning-based statistics collection, to better process both rapidly changing data and data that is queried frequently. The invention monitors table activity and decides if the data in a table has changed sufficiently to require a refresh of invalid statistics. The invention determines if the invalidity is due to correlation between purportedly independent data, outdated statistics, or statistics that have too few frequent values. Tables and column groups are ranked in order of statistical invalidity, and a limited computational budget is prioritized by ranking subsequent gathering of improved statistics.
    Type: Grant
    Filed: October 29, 2004
    Date of Patent: November 9, 2010
    Assignee: International Business Machines Corporation
    Inventors: Volker G. Markl, Peter Jay Haas, Ashraf Ismail Aboulnaga, Vijayashankar Raman, Felix Endres
  • Patent number: 7792856
    Abstract: A method is disclosed for conducting a query to transform data in a pre-existing database, the method comprising: collecting database information from the pre-existing database, the database information including inconsistent dimensional tables and fact tables; running an entity discovery process on the inconsistent dimensional tables and the fact tables to produce entity mapping tables; using the entity mapping tables to resolve the inconsistent dimensional tables into resolved dimensional tables; and running the query on a resolved database to obtain a query result, the resolved database including the resolved dimensional table.
    Type: Grant
    Filed: June 29, 2007
    Date of Patent: September 7, 2010
    Assignee: International Business Machines Corporation
    Inventors: Ariel Fuxman, Peter Jay Haas, Berthold Reinwald, Yannis Sismanis, Ling Wang
  • Patent number: 7685086
    Abstract: A scheme is used to automatically discover algebraic constraints between pairs of columns in relational data. The constraints may be “fuzzy” in that they hold for most, but not all, of the records, and the columns may be in the same table or different tables. The scheme first identifies candidate sets of column value pairs that are likely to satisfy an algebraic constraint. For each candidate, the scheme constructs algebraic constraints by applying statistical histogramming, segmentation, or clustering techniques to samples of column values. In query-optimization mode, the scheme automatically partitions the data into normal and exception records. During subsequent query processing, queries can be modified to incorporate the constraints; the optimizer uses the constraints to identify new, more efficient access paths. The results are then combined with the results of executing the original query against the (small) set of exception records.
    Type: Grant
    Filed: August 21, 2007
    Date of Patent: March 23, 2010
    Assignee: International Business Machines Corporation
    Inventors: Paul Geoffrey Brown, Peter Jay Haas
  • Patent number: 7647293
    Abstract: A system and method of discovering dependencies between relational database column pairs and application of discoveries to query optimization is provided. For each candidate column pair remaining after simultaneously generating column pairs, pruning pairs not satisfying specified heuristic constraints, and eliminating pairs with trivial instances of correlation, a random sample of data values is collected. A candidate column pair is tested for the existence of a soft functional dependency (FD), and if a dependency is not found, statistically tested for correlation using a robust chi-squared statistic. Column pairs for which either a soft FD or a statistical correlation exists are prioritized for recommendation to a query optimizer, based on any of: strength of dependency, degree of correlation, or adjustment factor; statistics for recommended columns pairs are tracked to improve selectivity estimates.
    Type: Grant
    Filed: June 10, 2004
    Date of Patent: January 12, 2010
    Assignee: International Business Machines Corporation
    Inventors: Paul Geoffrey Brown, Peter Jay Haas, Ihab F. Ilyas, Volker G. Markl
  • Patent number: 7636735
    Abstract: Provided is a method for modeling the cost of XML as well as relational operators. As with traditional relational cost estimation, a set of system catalog statistics that summarizes the XML data is exploited; however, the novel use of a set of simple path statistics is also proposed. A new statistical learning technique called transform regression is utilized instead of detailed analytical models to predict the overall cost of an operator. Additionally, a query optimizer in a database is enabled to be self-tuning, automatically adapting to changes over time in the query workload and in the system environment.
    Type: Grant
    Filed: August 19, 2005
    Date of Patent: December 22, 2009
    Assignee: International Business Machines Corporation
    Inventors: Peter Jay Haas, Vanja Josifovski, Guy Maring Lohman, Chun Zhang
  • Publication number: 20090271421
    Abstract: One embodiment of the present invention provides a method for incrementally maintaining a Bernoulli sample S with sampling rate q over a multiset R in the presence of update, delete, and insert transactions. The method includes processing items inserted into R using Bernoulli sampling and augmenting S with tracking counters during this processing. Items deleted from R are processed by using the tracking counters and by removing newly deleted items from S using a calculated probability while maintaining a degree of uniformity in S.
    Type: Application
    Filed: April 24, 2008
    Publication date: October 29, 2009
    Applicant: International Business Machines Corporation
    Inventors: Rainer Gemulla, Peter Jay Haas, Wolfgang Lehner