Patents by Inventor Rajeev Motwani

Rajeev Motwani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Computer implemented scalable, Incremental and parallel clustering based on weighted divide and conquer

Publication number: 20040122797

Abstract: A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , Sp; 2) for each piece Si, determining a set Di of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.

Type: Application

Filed: December 1, 2003

Publication date: June 24, 2004

Inventors: Nina Mishra, Liadan O?apos; Callaghan, Sudipto Guha, Rajeev Motwani
Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer

Patent number: 6684177

Abstract: A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , SP; 2) for each piece Si, determining a set Di of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.

Type: Grant

Filed: May 10, 2001

Date of Patent: January 27, 2004

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Nina Mishra, Liadan O'Callaghan, Sudipto Guha, Rajeev Motwani
Sampling over joins for database systems

Patent number: 6542886

Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.

Type: Grant

Filed: March 15, 1999

Date of Patent: April 1, 2003

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya
Sampling for database systems

Patent number: 6532458

Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.

Type: Grant

Filed: March 15, 1999

Date of Patent: March 11, 2003

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya
Sampling for database systems

Publication number: 20030018615

Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.

Type: Application

Filed: September 10, 2002

Publication date: January 23, 2003

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya
Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer

Publication number: 20020183966

Abstract: A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , SP; 2) for each piece S1, determining a set D1 of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set D1 by the number of points in the corresponding piece S1 assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.

Type: Application

Filed: May 10, 2001

Publication date: December 5, 2002

Inventors: Nina Mishra, Liadan O'Callaghan, Sudipto Guha, Rajeev Motwani
Sampling for aggregation queries

Publication number: 20020124001

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.

Type: Application

Filed: January 12, 2001

Publication date: September 5, 2002

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
Sampling for queries

Publication number: 20020123979

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

Type: Application

Filed: January 12, 2001

Publication date: September 5, 2002

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
Histogram construction using adaptive random sampling with cross-validation for database systems

Patent number: 6278989

Abstract: Using adaptive random sampling with cross-validation helps determine when enough data of a database has been sampled to construct histograms on one or more columns of one or more tables of the database within a desired or predetermined degree of accuracy. An adaptive random sampling histogram construction tool constructs an approximate equi-height k-histogram using an initial sample of data values from the database and iteratively updates the histogram using an additional sample of data values from the database until the histogram is within the desired degree of accuracy. The accuracy of the histogram is cross-validated against the additional sample at each iteration, and the additional sample is used to update the histogram to help improve its accuracy. The accuracy of the histogram may be measured by an error in distribution of the additional sample over the histogram as compared to a threshold error using a suitable error metric.

Type: Grant

Filed: August 25, 1998

Date of Patent: August 21, 2001

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya

prev 1 2