Patents by Inventor Rajeev Motwani

Rajeev Motwani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8396886
    Abstract: A computer software language capable of expressing registered queries that operate on one more or more data streams continuously. The language of the present invention is based on a publish/subscribe model in that queries subscribe to data streams and publish to data streams. Also, the language of the present invention can express queries that operate directly on data streams. Since queries expressed in the language of the present invention may be executed continuously and directly on data streams, the language includes a clause for specifying time-based and/or row-based windows for the input data stream. Operations are then performed on the data within such windows. In one embodiment, the language is also SQL-like and includes a clause for defining named windows (which can be used in any number of queries); a clause for detecting a pattern, and correlated database subqueries for correlating data stream data with database tables.
    Type: Grant
    Filed: February 2, 2006
    Date of Patent: March 12, 2013
    Assignee: Sybase Inc.
    Inventors: Mark Tsimelzon, Aleksey Sanin, Rajeev Motwani, Glenn Robert Seidman, Gayatri Patel
  • Patent number: 7577638
    Abstract: An outlier index for a database and a given workload is generated by identifying sub-relations of tuples in the database induced by selection and group by conditions in queries in the workload. A variance is then generated for values in each sub-relation. Sub-relations having higher variances are selected, and outliers from such sub-relations having higher variances are generated.
    Type: Grant
    Filed: December 7, 2005
    Date of Patent: August 18, 2009
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
  • Patent number: 7567949
    Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.
    Type: Grant
    Filed: September 10, 2002
    Date of Patent: July 28, 2009
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya
  • Patent number: 7516149
    Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.
    Type: Grant
    Filed: August 30, 2004
    Date of Patent: April 7, 2009
    Assignee: Microsoft Corporation
    Inventors: Rajeev Motwani, Surajit Chaudhuri, Venkatesh Ganti
  • Patent number: 7493316
    Abstract: A method of estimating results of a database query, the results are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. An aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.
    Type: Grant
    Filed: December 7, 2005
    Date of Patent: February 17, 2009
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
  • Patent number: 7383253
    Abstract: A Continuous Query Processor processes queries on continuously updating data sources or data streams and includes a Publication Manager for accepting published structured elements of data from data stream Publishers, a Subscription Manager for giving structured elements of data to one or more data stream Subscribers, a Query Module Manager for processing queries represented by Query Modules, a Query Module Store for maintaining deployed Query Modules, a Query Primitive Manager performing processing for various primitives that comprise a Query Module, and a Schedule Manager for coordinating when a primitive within a Query Module gets processed in order to maintain that each continuous query is continuously updated immediately upon the arrival of structured element data affecting any part of a continuous query.
    Type: Grant
    Filed: December 17, 2004
    Date of Patent: June 3, 2008
    Assignee: Coral 8, Inc.
    Inventors: Mark Tsimelzon, Aleksey Sanin, Rajeev Motwani, Glenn Robert Seidman, Gayatri Patel
  • Patent number: 7363301
    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.
    Type: Grant
    Filed: October 7, 2005
    Date of Patent: April 22, 2008
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
  • Patent number: 7296011
    Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.
    Type: Grant
    Filed: June 20, 2003
    Date of Patent: November 13, 2007
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani
  • Patent number: 7293037
    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.
    Type: Grant
    Filed: October 7, 2005
    Date of Patent: November 6, 2007
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
  • Patent number: 7287020
    Abstract: This disclosure describes leveraging workload information associated with executed database queries for estimating the result of a current database query. The workload information is analyzed to determine the usage of tuples in a database during query execution, such as how often a tuple is accessed and the number of different queries that accessed the tuple. A tuple is assigned a weight value that is based on the analyzed workload information. The particular tuples sampled for estimating a result for the current query is based on each tuple's weight value. The workload information may also be leveraged to generate an outlier index that identifies outlier tuples associated with the executed queries or that identifies outlier tuples associated with particular queries that are executed more frequently than other queries. The result for the current query can also be estimated using the sampled values along with the outlier tuples from the outlier index.
    Type: Grant
    Filed: January 12, 2001
    Date of Patent: October 23, 2007
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
  • Patent number: 7191181
    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.
    Type: Grant
    Filed: June 22, 2004
    Date of Patent: March 13, 2007
    Assignee: Microsoft Corporation
    Inventors: Sarajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
  • Publication number: 20060085410
    Abstract: A method of estimating the Results of a database query are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. And, can aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.
    Type: Application
    Filed: December 7, 2005
    Publication date: April 20, 2006
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
  • Publication number: 20060085463
    Abstract: An outlier index for a database and a given workload is generated by identifying sub-relations of tuples in the database induced by selection and group by conditions in queries in the workload. A variance is then generated for values in each sub-relation. Sub-relations having higher variances are selected, and outliers from such sub-relations having higher variances are generated.
    Type: Application
    Filed: December 7, 2005
    Publication date: April 20, 2006
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
  • Publication number: 20060053103
    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.
    Type: Application
    Filed: October 7, 2005
    Publication date: March 9, 2006
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
  • Publication number: 20060053129
    Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.
    Type: Application
    Filed: August 30, 2004
    Publication date: March 9, 2006
    Applicant: Microsoft Corporation
    Inventors: Rajeev Motwani, Surajit Chaudhuri, Venkatesh Ganti
  • Publication number: 20060036600
    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.
    Type: Application
    Filed: October 7, 2005
    Publication date: February 16, 2006
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
  • Patent number: 6907380
    Abstract: A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , Sp; 2) for each piece Si, determining a set Di of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.
    Type: Grant
    Filed: December 1, 2003
    Date of Patent: June 14, 2005
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Nina Mishra, Liadan O'Callaghan, Sudipto Guha, Rajeev Motwani
  • Patent number: 6842753
    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.
    Type: Grant
    Filed: January 12, 2001
    Date of Patent: January 11, 2005
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
  • Publication number: 20040260694
    Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.
    Type: Application
    Filed: June 20, 2003
    Publication date: December 23, 2004
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani
  • Publication number: 20040236735
    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.
    Type: Application
    Filed: June 22, 2004
    Publication date: November 25, 2004
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar