Patents by Inventor Philip Korn

Philip Korn has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10417439
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a catalog for multiple datasets, the method comprising accessing multiple extant data sets, the extant data sets including data sets that are independently generated and structurally dissimilar; organizing the data sets into collections, each data set in each collection belonging to the collection based on collection data associated with the data set; for each collection of data sets: determining, from a subset of the data sets that belong to the collection, metadata that describe the data sets that belong to the collection, wherein the metadata does not include the collection data, and attributing, to other data sets in the collection, the metadata determined from the subset of data sets; and generating, from the collections of data sets and the determined metadata, a catalog for the multiple datasets.
    Type: Grant
    Filed: April 6, 2017
    Date of Patent: September 17, 2019
    Assignee: Google LLC
    Inventors: Philip Korn, Steven Euijong Whang, Natalya Fridman Noy, Sudip Roy, Neoklis Polyzotis, Alon Yitzchak Halevy, Christopher Olston
  • Publication number: 20170293671
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a catalog for multiple datasets, the method comprising accessing multiple extant data sets, the extant data sets including data sets that are independently generated and structurally dissimilar; organizing the data sets into collections, each data set in each collection belonging to the collection based on collection data associated with the data set; for each collection of data sets: determining, from a subset of the data sets that belong to the collection, metadata that describe the data sets that belong to the collection, wherein the metadata does not include the collection data, and attributing, to other data sets in the collection, the metadata determined from the subset of data sets; and generating, from the collections of data sets and the determined metadata, a catalog for the multiple datasets.
    Type: Application
    Filed: April 6, 2017
    Publication date: October 12, 2017
    Inventors: Philip Korn, Steven Euijong Whang, Natalya Fridman Noy, Sudip Roy, Neoklis Polyzotis, Alon Yitzchak Halevy, Christopher Olston
  • Patent number: 9177343
    Abstract: Given a set of data for which a conservation law is an appropriate characterization, “hold” and/or “fail” tableaux are provided for the underlying conservation law, thereby providing a conservation dependency whereby portions of the data for which the law approximately holds or fails can be discovered and summarized in a semantically meaningful way.
    Type: Grant
    Filed: November 23, 2010
    Date of Patent: November 3, 2015
    Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Barna Saha
  • Patent number: 9170984
    Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive at a data receiver out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the same sequence as their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by a decay function which is a function of the timestamp associated with the tuple and the current time. The statistical characteristics of the tuples are summarized by a set of linear data summaries. The set of linear data summaries are generated such that only a single linear data summary falls between a set of boundaries calculated from the decay function and a set of timestamps. Aggregates are calculated from the set of linear data summaries.
    Type: Grant
    Filed: March 26, 2013
    Date of Patent: October 27, 2015
    Assignees: AT&T Intellectual Property I, L.P., Iowa State University Research Foundation, Inc.
    Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
  • Patent number: 8908554
    Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the sequence of their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by multiplying each tuple by a decay function which is a function of the timestamp associated with the tuple and the current time. The tuples are recorded in a quantile-digest data structure. Aggregates are calculated from the data stored in the quantile-digest data structure.
    Type: Grant
    Filed: January 31, 2013
    Date of Patent: December 9, 2014
    Assignees: AT&T Intellectual Property I, L.P., Iowa State University Research Foundation, Inc.
    Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
  • Patent number: 8645309
    Abstract: The specification describes data processes for analyzing large data steams for target anomalies. “Sequential dependencies” (SDs) are chosen for ordered data and present a framework for discovering which subsets of the data obey a given sequential dependency. Given an interval G, an SD on attributes X and Y, written as X?G Y, denotes that the distance between the Y-values of any two consecutive records, when sorted on X, are within G. SDs may be extended to Conditional Sequential Dependencies (CSDs), consisting of an underlying SD plus a representation of the subsets of the data that satisfy the SD. The conditional approximate sequential dependencies may be expressed as pattern tableaux, i.e., compact representations of the subsets of the data that satisfy the underlying dependency.
    Type: Grant
    Filed: November 30, 2009
    Date of Patent: February 4, 2014
    Assignee: AT&T Intellectual Property I. L.P.
    Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Avishek Saha
  • Patent number: 8639667
    Abstract: Techniques are disclosed for generating conditional functional dependency (CFD) pattern tableaux having the desirable properties of support, confidence and parsimony. These techniques include both a greedy algorithm for generating a tableau and, for large data sets, an “on-demand” algorithm that outperforms the basic greedy algorithm in running time by an order of magnitude. In addition, a range tableau, as a generalization of a pattern tableau, can achieve even more parsimony.
    Type: Grant
    Filed: March 3, 2009
    Date of Patent: January 28, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Bei Yu
  • Patent number: 8484269
    Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive at a data receiver out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the same sequence as their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by a decay function which is a function of the timestamp associated with the tuple and the current time. The statistical characteristics of the tuples are summarized by a set of linear data summaries. The set of linear data summaries are generated such that only a single linear data summary falls between a set of boundaries calculated from the decay function and a set of timestamps. Aggregates are calculated from the set of linear data summaries.
    Type: Grant
    Filed: January 2, 2008
    Date of Patent: July 9, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
  • Patent number: 8391164
    Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the sequence of their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by multiplying each tuple by a decay function which is a function of the timestamp associated with the tuple and the current time. The tuples are recorded in a quantile-digest data structure. Aggregates are calculated from the data stored in the quantile-digest data structure.
    Type: Grant
    Filed: January 2, 2008
    Date of Patent: March 5, 2013
    Assignees: AT&T Intellectual Property I, L.P., Iowa State University Research Foundation, Inc.
    Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
  • Publication number: 20120130935
    Abstract: Given a set of data for which a conservation law is an appropriate characterization, “hold” and/or “fail” tableaux are provided for the underlying conservation law, thereby providing a conservation dependency whereby portions of the data for which the law approximately holds or fails can be discovered and summarized in a semantically meaningful way.
    Type: Application
    Filed: November 23, 2010
    Publication date: May 24, 2012
    Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Barna Saha
  • Patent number: 8160837
    Abstract: Methods and apparatus to determine statistical dominance point descriptors for multidimensional data are disclosed. An example method disclosed herein comprises determining a first joint dominance value for a first data point in a multidimensional data set, data points in the multidimensional data set comprising multidimensional values, each dimension corresponding to a different measurement of a physical event, the first joint dominance value corresponding to a number of data points in the multidimensional data set dominated by the first data point in every dimension, determining a first skewness value for the first data point, the first skewness value corresponding to a size of a first dimension of the first data point relative to a combined size of all dimensions of the first data point, and combining the first joint dominance and first skewness values to determine a first statistical dominance point descriptor associated with the first data point.
    Type: Grant
    Filed: December 12, 2008
    Date of Patent: April 17, 2012
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Graham Cormode, Philip Korn, Divesh Srivastava
  • Publication number: 20110131170
    Abstract: The specification describes data processes for analyzing large data steams for target anomalies. “Sequential dependencies” (SDs) are chosen for ordered data and present a framework for discovering which subsets of the data obey a given sequential dependency. Given an interval G, an SD on attributes X and Y, written as X?G Y, denotes that the distance between the Y-values of any two consecutive records, when sorted on X, are within G. SDs may be extended to Conditional Sequential Dependencies (CSDs), consisting of an underlying SD plus a representation of the subsets of the data that satisfy the SD. The conditional approximate sequential dependencies may be expressed as pattern tableaux, i.e., compact representations of the subsets of the data that satisfy the underlying dependency.
    Type: Application
    Filed: November 30, 2009
    Publication date: June 2, 2011
    Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Avishek Saha
  • Publication number: 20100153064
    Abstract: Methods and apparatus to determine statistical dominance point descriptors for multidimensional data are disclosed. An example method disclosed herein comprises determining a first joint dominance value for a first data point in a multidimensional data set, data points in the multidimensional data set comprising multidimensional values, each dimension corresponding to a different measurement of a physical event, the first joint dominance value corresponding to a number of data points in the multidimensional data set dominated by the first data point in every dimension, determining a first skewness value for the first data point, the first skewness value corresponding to a size of a first dimension of the first data point relative to a combined size of all dimensions of the first data point, and combining the first joint dominance and first skewness values to determine a first statistical dominance point descriptor associated with the first data point.
    Type: Application
    Filed: December 12, 2008
    Publication date: June 17, 2010
    Inventors: Graham Cormode, Philip Korn, Divesh Srivastava
  • Publication number: 20090287721
    Abstract: Techniques are disclosed for generating conditional functional dependency (CFD) pattern tableaux having the desirable properties of support, confidence and parsimony. These techniques include both a greedy algorithm for generating a tableau and, for large data sets, an “on-demand” algorithm that outperforms the basic greedy algorithm in running time by an order of magnitude. In addition, a range tableau, as a generalization of a pattern tableau, can achieve even more parsimony.
    Type: Application
    Filed: March 3, 2009
    Publication date: November 19, 2009
    Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Bei Yu
  • Publication number: 20090172059
    Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the sequence of their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by multiplying each tuple by a decay function which is a function of the timestamp associated with the tuple and the current time. The tuples are recorded in a quantile-digest data structure. Aggregates are calculated from the data stored in the quantile-digest data structure.
    Type: Application
    Filed: January 2, 2008
    Publication date: July 2, 2009
    Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
  • Publication number: 20090172058
    Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive at a data receiver out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the same sequence as their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by a decay function which is a function of the timestamp associated with the tuple and the current time. The statistical characteristics of the tuples are summarized by a set of linear data summaries. The set of linear data summaries are generated such that only a single linear data summary falls between a set of boundaries calculated from the decay function and a set of timestamps.
    Type: Application
    Filed: January 2, 2008
    Publication date: July 2, 2009
    Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
  • Publication number: 20060224609
    Abstract: A method and apparatus for computing biased or targeted quantiles are disclosed. For example, the present invention reads a plurality of items from a data stream and inserts each of the plurality of items that was read from the data stream into a data structure. Periodically, the data structure is compressed to reduce the number of stored items in the data structure. In turn, the compressed data structure can be used to output a biased or targeted quantile.
    Type: Application
    Filed: December 2, 2005
    Publication date: October 5, 2006
    Inventors: Graham Cormode, Philip Korn, Shanmugavelayutham Muthukrishnan, Divesh Srivastava
  • Publication number: 20060053122
    Abstract: A framework defining a family of index structures useful in evaluating XML path expressions (i.e., twigs) in XML database is disclosed. Within this framework, two particular index structures with different space-time tradeoffs are presented that prove effective for the evaluation of twigs with value conditions. These index structures can be realized using access methods of an underlying relational database system. Experimental results show that the indices disclosed achieve significant improvement in performance for evaluating twig queries as compared with previously proposed XML path indices.
    Type: Application
    Filed: September 9, 2004
    Publication date: March 9, 2006
    Inventors: Philip Korn, Nikolaos Koudas, Divesh Srivastava, Zhiyuan Chen, Johannes Gehrke, Jayavel Shanmugasundaram
  • Publication number: 20050131946
    Abstract: A method, apparatus, and computer readable medium for processing a data stream is described. In one example, a set of elements of a data stream are received. The set of elements are stored in a memory as a hierarchy of nodes. Each of the nodes includes frequency data associated with either an element in the set of elements or a prefix of an element in the set of elements. A set of hierarchical heavy hitters is then identified among the nodes in the hierarchy. The frequency data of each of the hierarchical heavy hitter nodes, after discounting any portion thereof attributed to a descendent hierarchical heavy hitter node in said set of hierarchical heavy hitter nodes, being greater than or equal to a fraction of the number of elements in the set of elements.
    Type: Application
    Filed: March 17, 2004
    Publication date: June 16, 2005
    Inventors: Philip Korn, Shanmugavelayutham Muthukrishnan, Divesh Srivastava, Graham Cormode