Patents by Inventor Philip Korn

Philip Korn has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Post-hoc management of datasets

Patent number: 10417439

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a catalog for multiple datasets, the method comprising accessing multiple extant data sets, the extant data sets including data sets that are independently generated and structurally dissimilar; organizing the data sets into collections, each data set in each collection belonging to the collection based on collection data associated with the data set; for each collection of data sets: determining, from a subset of the data sets that belong to the collection, metadata that describe the data sets that belong to the collection, wherein the metadata does not include the collection data, and attributing, to other data sets in the collection, the metadata determined from the subset of data sets; and generating, from the collections of data sets and the determined metadata, a catalog for the multiple datasets.

Type: Grant

Filed: April 6, 2017

Date of Patent: September 17, 2019

Assignee: Google LLC

Inventors: Philip Korn, Steven Euijong Whang, Natalya Fridman Noy, Sudip Roy, Neoklis Polyzotis, Alon Yitzchak Halevy, Christopher Olston
POST-HOC MANAGEMENT OF DATASETS

Publication number: 20170293671

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a catalog for multiple datasets, the method comprising accessing multiple extant data sets, the extant data sets including data sets that are independently generated and structurally dissimilar; organizing the data sets into collections, each data set in each collection belonging to the collection based on collection data associated with the data set; for each collection of data sets: determining, from a subset of the data sets that belong to the collection, metadata that describe the data sets that belong to the collection, wherein the metadata does not include the collection data, and attributing, to other data sets in the collection, the metadata determined from the subset of data sets; and generating, from the collections of data sets and the determined metadata, a catalog for the multiple datasets.

Type: Application

Filed: April 6, 2017

Publication date: October 12, 2017

Inventors: Philip Korn, Steven Euijong Whang, Natalya Fridman Noy, Sudip Roy, Neoklis Polyzotis, Alon Yitzchak Halevy, Christopher Olston
Conservation dependencies

Patent number: 9177343

Abstract: Given a set of data for which a conservation law is an appropriate characterization, “hold” and/or “fail” tableaux are provided for the underlying conservation law, thereby providing a conservation dependency whereby portions of the data for which the law approximately holds or fails can be discovered and summarized in a semantically meaningful way.

Type: Grant

Filed: November 23, 2010

Date of Patent: November 3, 2015

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Barna Saha
Computing time-decayed aggregates under smooth decay functions

Patent number: 9170984

Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive at a data receiver out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the same sequence as their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by a decay function which is a function of the timestamp associated with the tuple and the current time. The statistical characteristics of the tuples are summarized by a set of linear data summaries. The set of linear data summaries are generated such that only a single linear data summary falls between a set of boundaries calculated from the decay function and a set of timestamps. Aggregates are calculated from the set of linear data summaries.

Type: Grant

Filed: March 26, 2013

Date of Patent: October 27, 2015

Assignees: AT&T Intellectual Property I, L.P., Iowa State University Research Foundation, Inc.

Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
Computing time-decayed aggregates in data streams

Patent number: 8908554

Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the sequence of their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by multiplying each tuple by a decay function which is a function of the timestamp associated with the tuple and the current time. The tuples are recorded in a quantile-digest data structure. Aggregates are calculated from the data stored in the quantile-digest data structure.

Type: Grant

Filed: January 31, 2013

Date of Patent: December 9, 2014

Assignees: AT&T Intellectual Property I, L.P., Iowa State University Research Foundation, Inc.

Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
Processing data using sequential dependencies

Patent number: 8645309

Abstract: The specification describes data processes for analyzing large data steams for target anomalies. “Sequential dependencies” (SDs) are chosen for ordered data and present a framework for discovering which subsets of the data obey a given sequential dependency. Given an interval G, an SD on attributes X and Y, written as X?G Y, denotes that the distance between the Y-values of any two consecutive records, when sorted on X, are within G. SDs may be extended to Conditional Sequential Dependencies (CSDs), consisting of an underlying SD plus a representation of the subsets of the data that satisfy the SD. The conditional approximate sequential dependencies may be expressed as pattern tableaux, i.e., compact representations of the subsets of the data that satisfy the underlying dependency.

Type: Grant

Filed: November 30, 2009

Date of Patent: February 4, 2014

Assignee: AT&T Intellectual Property I. L.P.

Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Avishek Saha
Generating conditional functional dependencies

Patent number: 8639667

Abstract: Techniques are disclosed for generating conditional functional dependency (CFD) pattern tableaux having the desirable properties of support, confidence and parsimony. These techniques include both a greedy algorithm for generating a tableau and, for large data sets, an “on-demand” algorithm that outperforms the basic greedy algorithm in running time by an order of magnitude. In addition, a range tableau, as a generalization of a pattern tableau, can achieve even more parsimony.

Type: Grant

Filed: March 3, 2009

Date of Patent: January 28, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Bei Yu
Computing time-decayed aggregates under smooth decay functions

Patent number: 8484269

Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive at a data receiver out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the same sequence as their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by a decay function which is a function of the timestamp associated with the tuple and the current time. The statistical characteristics of the tuples are summarized by a set of linear data summaries. The set of linear data summaries are generated such that only a single linear data summary falls between a set of boundaries calculated from the decay function and a set of timestamps. Aggregates are calculated from the set of linear data summaries.

Type: Grant

Filed: January 2, 2008

Date of Patent: July 9, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
Computing time-decayed aggregates in data streams

Patent number: 8391164

Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the sequence of their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by multiplying each tuple by a decay function which is a function of the timestamp associated with the tuple and the current time. The tuples are recorded in a quantile-digest data structure. Aggregates are calculated from the data stored in the quantile-digest data structure.

Type: Grant

Filed: January 2, 2008

Date of Patent: March 5, 2013

Assignees: AT&T Intellectual Property I, L.P., Iowa State University Research Foundation, Inc.

Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
Conservation dependencies

Publication number: 20120130935

Abstract: Given a set of data for which a conservation law is an appropriate characterization, “hold” and/or “fail” tableaux are provided for the underlying conservation law, thereby providing a conservation dependency whereby portions of the data for which the law approximately holds or fails can be discovered and summarized in a semantically meaningful way.

Type: Application

Filed: November 23, 2010

Publication date: May 24, 2012

Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Barna Saha
Methods and apparatus to determine statistical dominance point descriptors for multidimensional data

Patent number: 8160837

Abstract: Methods and apparatus to determine statistical dominance point descriptors for multidimensional data are disclosed. An example method disclosed herein comprises determining a first joint dominance value for a first data point in a multidimensional data set, data points in the multidimensional data set comprising multidimensional values, each dimension corresponding to a different measurement of a physical event, the first joint dominance value corresponding to a number of data points in the multidimensional data set dominated by the first data point in every dimension, determining a first skewness value for the first data point, the first skewness value corresponding to a size of a first dimension of the first data point relative to a combined size of all dimensions of the first data point, and combining the first joint dominance and first skewness values to determine a first statistical dominance point descriptor associated with the first data point.

Type: Grant

Filed: December 12, 2008

Date of Patent: April 17, 2012

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Graham Cormode, Philip Korn, Divesh Srivastava
Processing data using sequential dependencies

Publication number: 20110131170

Abstract: The specification describes data processes for analyzing large data steams for target anomalies. “Sequential dependencies” (SDs) are chosen for ordered data and present a framework for discovering which subsets of the data obey a given sequential dependency. Given an interval G, an SD on attributes X and Y, written as X?G Y, denotes that the distance between the Y-values of any two consecutive records, when sorted on X, are within G. SDs may be extended to Conditional Sequential Dependencies (CSDs), consisting of an underlying SD plus a representation of the subsets of the data that satisfy the SD. The conditional approximate sequential dependencies may be expressed as pattern tableaux, i.e., compact representations of the subsets of the data that satisfy the underlying dependency.

Type: Application

Filed: November 30, 2009

Publication date: June 2, 2011

Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Avishek Saha
Methods and Apparatus to Determine Statistical Dominance Point Descriptors for Multidimensional Data

Publication number: 20100153064

Abstract: Methods and apparatus to determine statistical dominance point descriptors for multidimensional data are disclosed. An example method disclosed herein comprises determining a first joint dominance value for a first data point in a multidimensional data set, data points in the multidimensional data set comprising multidimensional values, each dimension corresponding to a different measurement of a physical event, the first joint dominance value corresponding to a number of data points in the multidimensional data set dominated by the first data point in every dimension, determining a first skewness value for the first data point, the first skewness value corresponding to a size of a first dimension of the first data point relative to a combined size of all dimensions of the first data point, and combining the first joint dominance and first skewness values to determine a first statistical dominance point descriptor associated with the first data point.

Type: Application

Filed: December 12, 2008

Publication date: June 17, 2010

Inventors: Graham Cormode, Philip Korn, Divesh Srivastava
Generating conditional functional dependencies

Publication number: 20090287721

Abstract: Techniques are disclosed for generating conditional functional dependency (CFD) pattern tableaux having the desirable properties of support, confidence and parsimony. These techniques include both a greedy algorithm for generating a tableau and, for large data sets, an “on-demand” algorithm that outperforms the basic greedy algorithm in running time by an order of magnitude. In addition, a range tableau, as a generalization of a pattern tableau, can achieve even more parsimony.

Type: Application

Filed: March 3, 2009

Publication date: November 19, 2009

Inventors: Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava, Bei Yu
Computing time-decayed aggregates under smooth decay functions

Publication number: 20090172058

Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive at a data receiver out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the same sequence as their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by a decay function which is a function of the timestamp associated with the tuple and the current time. The statistical characteristics of the tuples are summarized by a set of linear data summaries. The set of linear data summaries are generated such that only a single linear data summary falls between a set of boundaries calculated from the decay function and a set of timestamps.

Type: Application

Filed: January 2, 2008

Publication date: July 2, 2009

Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
Computing time-decayed aggregates in data streams

Publication number: 20090172059

Abstract: Aggregates are calculated from a data stream in which data is sent in a sequence of tuples, in which each tuple comprises an item identifier and a timestamp indicating when the tuple was transmitted. The tuples may arrive out-of-order, that is, the sequence in which the tuples arrive are not necessarily in the sequence of their corresponding timestamps. In calculating aggregates, more recent data may be given more weight by multiplying each tuple by a decay function which is a function of the timestamp associated with the tuple and the current time. The tuples are recorded in a quantile-digest data structure. Aggregates are calculated from the data stored in the quantile-digest data structure.

Type: Application

Filed: January 2, 2008

Publication date: July 2, 2009

Inventors: Graham Cormode, Philip Korn, Srikanta Tirthapura
Method and apparatus for finding biased quantiles in data streams

Publication number: 20060224609

Abstract: A method and apparatus for computing biased or targeted quantiles are disclosed. For example, the present invention reads a plurality of items from a data stream and inserts each of the plurality of items that was read from the data stream into a data structure. Periodically, the data structure is compressed to reduce the number of stored items in the data structure. In turn, the compressed data structure can be used to output a biased or targeted quantile.

Type: Application

Filed: December 2, 2005

Publication date: October 5, 2006

Inventors: Graham Cormode, Philip Korn, Shanmugavelayutham Muthukrishnan, Divesh Srivastava
Method for matching XML twigs using index structures and relational query processors

Publication number: 20060053122

Abstract: A framework defining a family of index structures useful in evaluating XML path expressions (i.e., twigs) in XML database is disclosed. Within this framework, two particular index structures with different space-time tradeoffs are presented that prove effective for the evaluation of twigs with value conditions. These index structures can be realized using access methods of an underlying relational database system. Experimental results show that the indices disclosed achieve significant improvement in performance for evaluating twig queries as compared with previously proposed XML path indices.

Type: Application

Filed: September 9, 2004

Publication date: March 9, 2006

Inventors: Philip Korn, Nikolaos Koudas, Divesh Srivastava, Zhiyuan Chen, Johannes Gehrke, Jayavel Shanmugasundaram
Method and apparatus for identifying hierarchical heavy hitters in a data stream

Publication number: 20050131946

Abstract: A method, apparatus, and computer readable medium for processing a data stream is described. In one example, a set of elements of a data stream are received. The set of elements are stored in a memory as a hierarchy of nodes. Each of the nodes includes frequency data associated with either an element in the set of elements or a prefix of an element in the set of elements. A set of hierarchical heavy hitters is then identified among the nodes in the hierarchy. The frequency data of each of the hierarchical heavy hitter nodes, after discounting any portion thereof attributed to a descendent hierarchical heavy hitter node in said set of hierarchical heavy hitter nodes, being greater than or equal to a fraction of the number of elements in the set of elements.

Type: Application

Filed: March 17, 2004

Publication date: June 16, 2005

Inventors: Philip Korn, Shanmugavelayutham Muthukrishnan, Divesh Srivastava, Graham Cormode