Patents by Inventor Rajeev Motwani
Rajeev Motwani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 8396886Abstract: A computer software language capable of expressing registered queries that operate on one more or more data streams continuously. The language of the present invention is based on a publish/subscribe model in that queries subscribe to data streams and publish to data streams. Also, the language of the present invention can express queries that operate directly on data streams. Since queries expressed in the language of the present invention may be executed continuously and directly on data streams, the language includes a clause for specifying time-based and/or row-based windows for the input data stream. Operations are then performed on the data within such windows. In one embodiment, the language is also SQL-like and includes a clause for defining named windows (which can be used in any number of queries); a clause for detecting a pattern, and correlated database subqueries for correlating data stream data with database tables.Type: GrantFiled: February 2, 2006Date of Patent: March 12, 2013Assignee: Sybase Inc.Inventors: Mark Tsimelzon, Aleksey Sanin, Rajeev Motwani, Glenn Robert Seidman, Gayatri Patel
-
Patent number: 7577638Abstract: An outlier index for a database and a given workload is generated by identifying sub-relations of tuples in the database induced by selection and group by conditions in queries in the workload. A variance is then generated for values in each sub-relation. Sub-relations having higher variances are selected, and outliers from such sub-relations having higher variances are generated.Type: GrantFiled: December 7, 2005Date of Patent: August 18, 2009Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
-
Patent number: 7567949Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.Type: GrantFiled: September 10, 2002Date of Patent: July 28, 2009Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya
-
Patent number: 7516149Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.Type: GrantFiled: August 30, 2004Date of Patent: April 7, 2009Assignee: Microsoft CorporationInventors: Rajeev Motwani, Surajit Chaudhuri, Venkatesh Ganti
-
Patent number: 7493316Abstract: A method of estimating results of a database query, the results are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. An aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.Type: GrantFiled: December 7, 2005Date of Patent: February 17, 2009Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
-
Patent number: 7383253Abstract: A Continuous Query Processor processes queries on continuously updating data sources or data streams and includes a Publication Manager for accepting published structured elements of data from data stream Publishers, a Subscription Manager for giving structured elements of data to one or more data stream Subscribers, a Query Module Manager for processing queries represented by Query Modules, a Query Module Store for maintaining deployed Query Modules, a Query Primitive Manager performing processing for various primitives that comprise a Query Module, and a Schedule Manager for coordinating when a primitive within a Query Module gets processed in order to maintain that each continuous query is continuously updated immediately upon the arrival of structured element data affecting any part of a continuous query.Type: GrantFiled: December 17, 2004Date of Patent: June 3, 2008Assignee: Coral 8, Inc.Inventors: Mark Tsimelzon, Aleksey Sanin, Rajeev Motwani, Glenn Robert Seidman, Gayatri Patel
-
Patent number: 7363301Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.Type: GrantFiled: October 7, 2005Date of Patent: April 22, 2008Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
-
Patent number: 7296011Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.Type: GrantFiled: June 20, 2003Date of Patent: November 13, 2007Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani
-
Patent number: 7293037Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.Type: GrantFiled: October 7, 2005Date of Patent: November 6, 2007Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
-
Patent number: 7287020Abstract: This disclosure describes leveraging workload information associated with executed database queries for estimating the result of a current database query. The workload information is analyzed to determine the usage of tuples in a database during query execution, such as how often a tuple is accessed and the number of different queries that accessed the tuple. A tuple is assigned a weight value that is based on the analyzed workload information. The particular tuples sampled for estimating a result for the current query is based on each tuple's weight value. The workload information may also be leveraged to generate an outlier index that identifies outlier tuples associated with the executed queries or that identifies outlier tuples associated with particular queries that are executed more frequently than other queries. The result for the current query can also be estimated using the sampled values along with the outlier tuples from the outlier index.Type: GrantFiled: January 12, 2001Date of Patent: October 23, 2007Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
-
Patent number: 7191181Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.Type: GrantFiled: June 22, 2004Date of Patent: March 13, 2007Assignee: Microsoft CorporationInventors: Sarajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
-
Publication number: 20060085410Abstract: A method of estimating the Results of a database query are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. And, can aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.Type: ApplicationFiled: December 7, 2005Publication date: April 20, 2006Applicant: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
-
Publication number: 20060085463Abstract: An outlier index for a database and a given workload is generated by identifying sub-relations of tuples in the database induced by selection and group by conditions in queries in the workload. A variance is then generated for values in each sub-relation. Sub-relations having higher variances are selected, and outliers from such sub-relations having higher variances are generated.Type: ApplicationFiled: December 7, 2005Publication date: April 20, 2006Applicant: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
-
Publication number: 20060053103Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.Type: ApplicationFiled: October 7, 2005Publication date: March 9, 2006Applicant: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
-
Publication number: 20060053129Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.Type: ApplicationFiled: August 30, 2004Publication date: March 9, 2006Applicant: Microsoft CorporationInventors: Rajeev Motwani, Surajit Chaudhuri, Venkatesh Ganti
-
Publication number: 20060036600Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.Type: ApplicationFiled: October 7, 2005Publication date: February 16, 2006Applicant: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
-
Patent number: 6907380Abstract: A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , Sp; 2) for each piece Si, determining a set Di of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.Type: GrantFiled: December 1, 2003Date of Patent: June 14, 2005Assignee: Hewlett-Packard Development Company, L.P.Inventors: Nina Mishra, Liadan O'Callaghan, Sudipto Guha, Rajeev Motwani
-
Patent number: 6842753Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.Type: GrantFiled: January 12, 2001Date of Patent: January 11, 2005Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
-
Publication number: 20040260694Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.Type: ApplicationFiled: June 20, 2003Publication date: December 23, 2004Applicant: Microsoft CorporationInventors: Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani
-
Publication number: 20040236735Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.Type: ApplicationFiled: June 22, 2004Publication date: November 25, 2004Applicant: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar