Patents by Inventor Philip Korn
Philip Korn has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10417439Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a catalog for multiple datasets, the method comprising accessing multiple extant data sets, the extant data sets including data sets that are independently generated and structurally dissimilar; organizing the data sets into collections, each data set in each collection belonging to the collection based on collection data associated with the data set; for each collection of data sets: determining, from a subset of the data sets that belong to the collection, metadata that describe the data sets that belong to the collection, wherein the metadata does not include the collection data, and attributing, to other data sets in the collection, the metadata determined from the subset of data sets; and generating, from the collections of data sets and the determined metadata, a catalog for the multiple datasets.Type: GrantFiled: April 6, 2017Date of Patent: September 17, 2019Assignee: Google LLCInventors: Philip Korn, Steven Euijong Whang, Natalya Fridman Noy, Sudip Roy, Neoklis Polyzotis, Alon Yitzchak Halevy, Christopher Olston
-
Publication number: 20170293671Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a catalog for multiple datasets, the method comprising accessing multiple extant data sets, the extant data sets including data sets that are independently generated and structurally dissimilar; organizing the data sets into collections, each data set in each collection belonging to the collection based on collection data associated with the data set; for each collection of data sets: determining, from a subset of the data sets that belong to the collection, metadata that describe the data sets that belong to the collection, wherein the metadata does not include the collection data, and attributing, to other data sets in the collection, the metadata determined from the subset of data sets; and generating, from the collections of data sets and the determined metadata, a catalog for the multiple datasets.Type: ApplicationFiled: April 6, 2017Publication date: October 12, 2017Inventors: Philip Korn, Steven Euijong Whang, Natalya Fridman Noy, Sudip Roy, Neoklis Polyzotis, Alon Yitzchak Halevy, Christopher Olston
-
Patent number: 9177343Abstract: Given a set of data for which a conservation law is an appropriate characterization, “hold” and/or “fail” tableaux are provided for the underlying conservation law, thereby providing a conservation dependency whereby portions of the data for which the law approximately holds or fails can be discovered and summarized in a semantically meaningful way.Type: GrantFiled: November 23, 2010Date of Patent: November 3, 2015Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Barna Saha
-
Patent number: 9170984Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive at a data receiver out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the same sequence as their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by a decay function which is a function of the timestamp associated with the tuple and the current time. The statistical characteristics of the tuples are summarized by a set of linear data summaries. The set of linear data summaries are generated such that only a single linear data summary falls between a set of boundaries calculated from the decay function and a set of timestamps. Aggregates are calculated from the set of linear data summaries.Type: GrantFiled: March 26, 2013Date of Patent: October 27, 2015Assignees: AT&T Intellectual Property I, L.P., Iowa State University Research Foundation, Inc.Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
-
Patent number: 8908554Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the sequence of their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by multiplying each tuple by a decay function which is a function of the timestamp associated with the tuple and the current time. The tuples are recorded in a quantile-digest data structure. Aggregates are calculated from the data stored in the quantile-digest data structure.Type: GrantFiled: January 31, 2013Date of Patent: December 9, 2014Assignees: AT&T Intellectual Property I, L.P., Iowa State University Research Foundation, Inc.Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
-
Patent number: 8645309Abstract: The specification describes data processes for analyzing large data steams for target anomalies. “Sequential dependencies” (SDs) are chosen for ordered data and present a framework for discovering which subsets of the data obey a given sequential dependency. Given an interval G, an SD on attributes X and Y, written as X?G Y, denotes that the distance between the Y-values of any two consecutive records, when sorted on X, are within G. SDs may be extended to Conditional Sequential Dependencies (CSDs), consisting of an underlying SD plus a representation of the subsets of the data that satisfy the SD. The conditional approximate sequential dependencies may be expressed as pattern tableaux, i.e., compact representations of the subsets of the data that satisfy the underlying dependency.Type: GrantFiled: November 30, 2009Date of Patent: February 4, 2014Assignee: AT&T Intellectual Property I. L.P.Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Avishek Saha
-
Patent number: 8639667Abstract: Techniques are disclosed for generating conditional functional dependency (CFD) pattern tableaux having the desirable properties of support, confidence and parsimony. These techniques include both a greedy algorithm for generating a tableau and, for large data sets, an “on-demand” algorithm that outperforms the basic greedy algorithm in running time by an order of magnitude. In addition, a range tableau, as a generalization of a pattern tableau, can achieve even more parsimony.Type: GrantFiled: March 3, 2009Date of Patent: January 28, 2014Assignee: AT&T Intellectual Property I, L.P.Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Bei Yu
-
Patent number: 8484269Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive at a data receiver out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the same sequence as their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by a decay function which is a function of the timestamp associated with the tuple and the current time. The statistical characteristics of the tuples are summarized by a set of linear data summaries. The set of linear data summaries are generated such that only a single linear data summary falls between a set of boundaries calculated from the decay function and a set of timestamps. Aggregates are calculated from the set of linear data summaries.Type: GrantFiled: January 2, 2008Date of Patent: July 9, 2013Assignee: AT&T Intellectual Property I, L.P.Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
-
Patent number: 8391164Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the sequence of their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by multiplying each tuple by a decay function which is a function of the timestamp associated with the tuple and the current time. The tuples are recorded in a quantile-digest data structure. Aggregates are calculated from the data stored in the quantile-digest data structure.Type: GrantFiled: January 2, 2008Date of Patent: March 5, 2013Assignees: AT&T Intellectual Property I, L.P., Iowa State University Research Foundation, Inc.Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
-
Publication number: 20120130935Abstract: Given a set of data for which a conservation law is an appropriate characterization, “hold” and/or “fail” tableaux are provided for the underlying conservation law, thereby providing a conservation dependency whereby portions of the data for which the law approximately holds or fails can be discovered and summarized in a semantically meaningful way.Type: ApplicationFiled: November 23, 2010Publication date: May 24, 2012Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Barna Saha
-
Methods and apparatus to determine statistical dominance point descriptors for multidimensional data
Patent number: 8160837Abstract: Methods and apparatus to determine statistical dominance point descriptors for multidimensional data are disclosed. An example method disclosed herein comprises determining a first joint dominance value for a first data point in a multidimensional data set, data points in the multidimensional data set comprising multidimensional values, each dimension corresponding to a different measurement of a physical event, the first joint dominance value corresponding to a number of data points in the multidimensional data set dominated by the first data point in every dimension, determining a first skewness value for the first data point, the first skewness value corresponding to a size of a first dimension of the first data point relative to a combined size of all dimensions of the first data point, and combining the first joint dominance and first skewness values to determine a first statistical dominance point descriptor associated with the first data point.Type: GrantFiled: December 12, 2008Date of Patent: April 17, 2012Assignee: AT&T Intellectual Property I, L.P.Inventors: Graham Cormode, Philip Korn, Divesh Srivastava -
Publication number: 20110131170Abstract: The specification describes data processes for analyzing large data steams for target anomalies. “Sequential dependencies” (SDs) are chosen for ordered data and present a framework for discovering which subsets of the data obey a given sequential dependency. Given an interval G, an SD on attributes X and Y, written as X?G Y, denotes that the distance between the Y-values of any two consecutive records, when sorted on X, are within G. SDs may be extended to Conditional Sequential Dependencies (CSDs), consisting of an underlying SD plus a representation of the subsets of the data that satisfy the SD. The conditional approximate sequential dependencies may be expressed as pattern tableaux, i.e., compact representations of the subsets of the data that satisfy the underlying dependency.Type: ApplicationFiled: November 30, 2009Publication date: June 2, 2011Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Avishek Saha
-
Methods and Apparatus to Determine Statistical Dominance Point Descriptors for Multidimensional Data
Publication number: 20100153064Abstract: Methods and apparatus to determine statistical dominance point descriptors for multidimensional data are disclosed. An example method disclosed herein comprises determining a first joint dominance value for a first data point in a multidimensional data set, data points in the multidimensional data set comprising multidimensional values, each dimension corresponding to a different measurement of a physical event, the first joint dominance value corresponding to a number of data points in the multidimensional data set dominated by the first data point in every dimension, determining a first skewness value for the first data point, the first skewness value corresponding to a size of a first dimension of the first data point relative to a combined size of all dimensions of the first data point, and combining the first joint dominance and first skewness values to determine a first statistical dominance point descriptor associated with the first data point.Type: ApplicationFiled: December 12, 2008Publication date: June 17, 2010Inventors: Graham Cormode, Philip Korn, Divesh Srivastava -
Publication number: 20090287721Abstract: Techniques are disclosed for generating conditional functional dependency (CFD) pattern tableaux having the desirable properties of support, confidence and parsimony. These techniques include both a greedy algorithm for generating a tableau and, for large data sets, an “on-demand” algorithm that outperforms the basic greedy algorithm in running time by an order of magnitude. In addition, a range tableau, as a generalization of a pattern tableau, can achieve even more parsimony.Type: ApplicationFiled: March 3, 2009Publication date: November 19, 2009Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Bei Yu
-
Publication number: 20090172059Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the sequence of their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by multiplying each tuple by a decay function which is a function of the timestamp associated with the tuple and the current time. The tuples are recorded in a quantile-digest data structure. Aggregates are calculated from the data stored in the quantile-digest data structure.Type: ApplicationFiled: January 2, 2008Publication date: July 2, 2009Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
-
Publication number: 20090172058Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive at a data receiver out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the same sequence as their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by a decay function which is a function of the timestamp associated with the tuple and the current time. The statistical characteristics of the tuples are summarized by a set of linear data summaries. The set of linear data summaries are generated such that only a single linear data summary falls between a set of boundaries calculated from the decay function and a set of timestamps.Type: ApplicationFiled: January 2, 2008Publication date: July 2, 2009Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
-
Publication number: 20060224609Abstract: A method and apparatus for computing biased or targeted quantiles are disclosed. For example, the present invention reads a plurality of items from a data stream and inserts each of the plurality of items that was read from the data stream into a data structure. Periodically, the data structure is compressed to reduce the number of stored items in the data structure. In turn, the compressed data structure can be used to output a biased or targeted quantile.Type: ApplicationFiled: December 2, 2005Publication date: October 5, 2006Inventors: Graham Cormode, Philip Korn, Shanmugavelayutham Muthukrishnan, Divesh Srivastava
-
Publication number: 20060053122Abstract: A framework defining a family of index structures useful in evaluating XML path expressions (i.e., twigs) in XML database is disclosed. Within this framework, two particular index structures with different space-time tradeoffs are presented that prove effective for the evaluation of twigs with value conditions. These index structures can be realized using access methods of an underlying relational database system. Experimental results show that the indices disclosed achieve significant improvement in performance for evaluating twig queries as compared with previously proposed XML path indices.Type: ApplicationFiled: September 9, 2004Publication date: March 9, 2006Inventors: Philip Korn, Nikolaos Koudas, Divesh Srivastava, Zhiyuan Chen, Johannes Gehrke, Jayavel Shanmugasundaram
-
Publication number: 20050131946Abstract: A method, apparatus, and computer readable medium for processing a data stream is described. In one example, a set of elements of a data stream are received. The set of elements are stored in a memory as a hierarchy of nodes. Each of the nodes includes frequency data associated with either an element in the set of elements or a prefix of an element in the set of elements. A set of hierarchical heavy hitters is then identified among the nodes in the hierarchy. The frequency data of each of the hierarchical heavy hitter nodes, after discounting any portion thereof attributed to a descendent hierarchical heavy hitter node in said set of hierarchical heavy hitter nodes, being greater than or equal to a fraction of the number of elements in the set of elements.Type: ApplicationFiled: March 17, 2004Publication date: June 16, 2005Inventors: Philip Korn, Shanmugavelayutham Muthukrishnan, Divesh Srivastava, Graham Cormode