Patents by Inventor Rajeev Motwani

Rajeev Motwani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Continuous processing language for real-time data streams

Patent number: 8396886

Abstract: A computer software language capable of expressing registered queries that operate on one more or more data streams continuously. The language of the present invention is based on a publish/subscribe model in that queries subscribe to data streams and publish to data streams. Also, the language of the present invention can express queries that operate directly on data streams. Since queries expressed in the language of the present invention may be executed continuously and directly on data streams, the language includes a clause for specifying time-based and/or row-based windows for the input data stream. Operations are then performed on the data within such windows. In one embodiment, the language is also SQL-like and includes a clause for defining named windows (which can be used in any number of queries); a clause for detecting a pattern, and correlated database subqueries for correlating data stream data with database tables.

Type: Grant

Filed: February 2, 2006

Date of Patent: March 12, 2013

Assignee: Sybase Inc.

Inventors: Mark Tsimelzon, Aleksey Sanin, Rajeev Motwani, Glenn Robert Seidman, Gayatri Patel
Sampling for queries

Patent number: 7577638

Abstract: An outlier index for a database and a given workload is generated by identifying sub-relations of tuples in the database induced by selection and group by conditions in queries in the workload. A variance is then generated for values in each sub-relation. Sub-relations having higher variances are selected, and outliers from such sub-relations having higher variances are generated.

Type: Grant

Filed: December 7, 2005

Date of Patent: August 18, 2009

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
Sampling for database systems

Patent number: 7567949

Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.

Type: Grant

Filed: September 10, 2002

Date of Patent: July 28, 2009

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya
Robust detector of fuzzy duplicates

Patent number: 7516149

Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.

Type: Grant

Filed: August 30, 2004

Date of Patent: April 7, 2009

Assignee: Microsoft Corporation

Inventors: Rajeev Motwani, Surajit Chaudhuri, Venkatesh Ganti
Sampling for queries

Patent number: 7493316

Abstract: A method of estimating results of a database query, the results are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. An aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.

Type: Grant

Filed: December 7, 2005

Date of Patent: February 17, 2009

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
Publish and subscribe capable continuous query processor for real-time data streams

Patent number: 7383253

Abstract: A Continuous Query Processor processes queries on continuously updating data sources or data streams and includes a Publication Manager for accepting published structured elements of data from data stream Publishers, a Subscription Manager for giving structured elements of data to one or more data stream Subscribers, a Query Module Manager for processing queries represented by Query Modules, a Query Module Store for maintaining deployed Query Modules, a Query Primitive Manager performing processing for various primitives that comprise a Query Module, and a Schedule Manager for coordinating when a primitive within a Query Module gets processed in order to maintain that each continuous query is continuously updated immediately upon the arrival of structured element data affecting any part of a continuous query.

Type: Grant

Filed: December 17, 2004

Date of Patent: June 3, 2008

Assignee: Coral 8, Inc.

Inventors: Mark Tsimelzon, Aleksey Sanin, Rajeev Motwani, Glenn Robert Seidman, Gayatri Patel
Database aggregation query result estimator

Patent number: 7363301

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

Type: Grant

Filed: October 7, 2005

Date of Patent: April 22, 2008

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
Efficient fuzzy match for evaluating data records

Patent number: 7296011

Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.

Type: Grant

Filed: June 20, 2003

Date of Patent: November 13, 2007

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani
Database aggregation query result estimator

Patent number: 7293037

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

Type: Grant

Filed: October 7, 2005

Date of Patent: November 6, 2007

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
Sampling for queries

Patent number: 7287020

Abstract: This disclosure describes leveraging workload information associated with executed database queries for estimating the result of a current database query. The workload information is analyzed to determine the usage of tuples in a database during query execution, such as how often a tuple is accessed and the number of different queries that accessed the tuple. A tuple is assigned a weight value that is based on the analyzed workload information. The particular tuples sampled for estimating a result for the current query is based on each tuple's weight value. The workload information may also be leveraged to generate an outlier index that identifies outlier tuples associated with the executed queries or that identifies outlier tuples associated with particular queries that are executed more frequently than other queries. The result for the current query can also be estimated using the sampled values along with the outlier tuples from the outlier index.

Type: Grant

Filed: January 12, 2001

Date of Patent: October 23, 2007

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
Database aggregation query result estimator

Patent number: 7191181

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.

Type: Grant

Filed: June 22, 2004

Date of Patent: March 13, 2007

Assignee: Microsoft Corporation

Inventors: Sarajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
Sampling for queries

Publication number: 20060085410

Abstract: A method of estimating the Results of a database query are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. And, can aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.

Type: Application

Filed: December 7, 2005

Publication date: April 20, 2006

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
Sampling for queries

Publication number: 20060085463

Abstract: An outlier index for a database and a given workload is generated by identifying sub-relations of tuples in the database induced by selection and group by conditions in queries in the workload. A variance is then generated for values in each sub-relation. Sub-relations having higher variances are selected, and outliers from such sub-relations having higher variances are generated.

Type: Application

Filed: December 7, 2005

Publication date: April 20, 2006

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
Database aggregation query result estimator

Publication number: 20060053103

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

Type: Application

Filed: October 7, 2005

Publication date: March 9, 2006

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
Robust detector of fuzzy duplicates

Publication number: 20060053129

Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.

Type: Application

Filed: August 30, 2004

Publication date: March 9, 2006

Applicant: Microsoft Corporation

Inventors: Rajeev Motwani, Surajit Chaudhuri, Venkatesh Ganti
Database aggregation query result estimator

Publication number: 20060036600

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

Type: Application

Filed: October 7, 2005

Publication date: February 16, 2006

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer

Patent number: 6907380

Abstract: A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , Sp; 2) for each piece Si, determining a set Di of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.

Type: Grant

Filed: December 1, 2003

Date of Patent: June 14, 2005

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Nina Mishra, Liadan O'Callaghan, Sudipto Guha, Rajeev Motwani
Sampling for aggregation queries

Patent number: 6842753

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.

Type: Grant

Filed: January 12, 2001

Date of Patent: January 11, 2005

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
Efficient fuzzy match for evaluating data records

Publication number: 20040260694

Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.

Type: Application

Filed: June 20, 2003

Publication date: December 23, 2004

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani
Database aggregation query result estimator

Publication number: 20040236735

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.

Type: Application

Filed: June 22, 2004

Publication date: November 25, 2004

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar

1 2 next