Patents by Inventor Venkatesh Ganti

Venkatesh Ganti has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Minimal difference query and view matching

Patent number: 7251646

Abstract: The subject disclosure pertains to efficient computation of the difference between queries by exploiting commonality between them. A minimal difference query (MDQ) is generated that roughly corresponds to removal of as many joins as possible while still accurately representing the query difference. The minimal difference can be employed to further substantially the scope of view matching where a query is not wholly subsumed by a view. Additionally, the minimal difference query can be employed as an analytical tool in various contexts.

Type: Grant

Filed: February 13, 2006

Date of Patent: July 31, 2007

Assignee: Microsoft Corporation

Inventors: Kaushik Shriraghav, Venkatesh Ganti, Xin Dong
Probabilistic techniques for detecting duplicate tuples

Publication number: 20070005556

Abstract: A technique for probabilistic determining fuzzy duplicates includes converting a plurality of tuples into hash vectors utilizing a locality sensitive hashing algorithm. The hash vectors are sorted, on one or more vector coordinates, to cluster similar hash coordinate values together. Each cluster of two or more hash vectors identifies candidate tuples. The candidate tuples are compared utilizing a similarity function. Tuples which are more similar than a specified threshold are returned.

Type: Application

Filed: June 30, 2005

Publication date: January 4, 2007

Applicant: Microsoft Corporation

Inventors: Venkatesh Ganti, Ying Xu
Systems and methods for estimating functional relationships in a database

Publication number: 20060282436

Abstract: A system that facilitates estimating functional relationships associated with one or more columns in a database comprises a sampling component that receives a random sample of records within the database. An estimate generator component calculates an estimate of strength of functional relationships based at least in part upon the received samples. For example, the estimate generator component can calculate an estimate of strength of a column as a key column based at least in part upon the received samples.

Type: Application

Filed: May 6, 2005

Publication date: December 14, 2006

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Venkatesh Ganti, Kaushik Shriraghav
String predicate selectivity estimation

Patent number: 7149735

Abstract: A method of estimating selectivity of a given string predicate in a database query. In the method selectivities of substrings of various substring lengths are estimated. For example, the selectivity of substrings between length l (or some constant q) to the length of the given string predicate may be estimated. The method then selects a candidate sub string for each sub string length based on estimated selectivities of the substrings. The estimated selectivities of the candidate substrings are combined. The combined estimated selectivity of the candidate substrings is returned as the estimated selectivity of the given string predicate.

Type: Grant

Filed: June 24, 2003

Date of Patent: December 12, 2006

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Venkatesh Ganti, Luis Gravano
Robust detector of fuzzy duplicates

Publication number: 20060053129

Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.

Type: Application

Filed: August 30, 2004

Publication date: March 9, 2006

Applicant: Microsoft Corporation

Inventors: Rajeev Motwani, Surajit Chaudhuri, Venkatesh Ganti
Detecting duplicate records in databases

Publication number: 20050262044

Abstract: The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key-foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.

Type: Application

Filed: July 14, 2005

Publication date: November 24, 2005

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Venkatesh Ganti, Rohit Ananthakrishna
Detecting duplicate records in database

Patent number: 6961721

Abstract: The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key—foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.

Type: Grant

Filed: June 28, 2002

Date of Patent: November 1, 2005

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Venkatesh Ganti, Rohit Ananthakrishna
Segmentation of strings into structured records

Publication number: 20050234906

Abstract: An system for segmenting strings into component parts for use with a database management system. A reference table of string records are segmented into multiple substrings corresponding to database attributes. The substrings within an attribute are analyzed to provide a state model that assumes a beginning, a middle and an ending token topology for that attribute. A null token takes into account an empty attribute component and copying of states allows for erroneous token insertions and misordering. Once the model is created from the clean data, the process breaks or parses an input record into a sequence of tokens. The process then determines a most probable segmentation of the input record by comparing the tokens of the input record with a state models derived for attributes from the reference table.

Type: Application

Filed: April 14, 2004

Publication date: October 20, 2005

Applicant: Microsoft Corporation

Inventors: Venkatesh Ganti, Theodore Vassilakis, Yevgeny Agichtein
String predicate selectivity estimation

Publication number: 20040267713

Abstract: A method of estimating selectivity of a given string predicate in a database query. In the method selectivities of substrings of various substring lengths are estimated. For example, the selectivity of substrings between length l (or some constant q) to the length of the given string predicate may be estimated. The method then selects a candidate sub string for each sub string length based on estimated selectivities of the substrings. The estimated selectivities of the candidate substrings are combined. The combined estimated selectivity of the candidate substrings is returned as the estimated selectivity of the given string predicate.

Type: Application

Filed: June 24, 2003

Publication date: December 30, 2004

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Venkatesh Ganti, Luis Gravano
Efficient fuzzy match for evaluating data records

Publication number: 20040260694

Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.

Type: Application

Filed: June 20, 2003

Publication date: December 23, 2004

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani
Duplicate data elimination system

Publication number: 20040249789

Abstract: A process for finding a similar data records from a set of data records. A database table or tables provide a number of data records from which one or more canonical data records are identified. Tokens are identified within the data records and classified according to attribute field. A similarity score is assigned to data records in relation to other data records based on a similarity between tokens of the data records. Data records whose similarity score with respect to each other is greater than a threshold form one or more groups of data records. The records or tuples form nodes of a graph wherein edges between nodes represent a similarity score between records of a group. Within each group a canonical record is identified based on the similarity of data records to each other within the group.

Type: Application

Filed: June 4, 2003

Publication date: December 9, 2004

Applicant: Microsoft Corporation

Inventors: Rahul Kapoor, Venkatesh Ganti, Surajit Chaudhuri
Detecting duplicate records in databases

Publication number: 20040003005

Abstract: The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key—foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.

Type: Application

Filed: June 28, 2002

Publication date: January 1, 2004

Inventors: Surajit Chaudhuri, Venkatesh Ganti, Rohit Ananthakrishna
Method of constructing binary decision trees with reduced memory access

Patent number: 6442561

Abstract: A method of creating and updating a binary decision tree from training databases that cannot be fit in high speed solid state memory is provided in which a subset of the training database which can fit into high speed memory is used to create a statistically good estimate of the binary decision tree desired. This statistically good estimate is used to review the entire training database in as little as one sequential scan to collect statistics necessary to verify the accuracy of the binary decision tree and to refine the binary decision tree to be identical to that which would be obtained by a full analysis of the training database.

Type: Grant

Filed: December 15, 1999

Date of Patent: August 27, 2002

Assignee: Wisconsin Alumni Research Foundation

Inventors: Johannes E. Gehrke, Venkatesh Ganti, Raghu Ramakrishnan
Method, apparatus and programmed medium for approximating the data cube and obtaining approximate answers to queries in relational databases

Patent number: 6108647

Abstract: A novel and unique method of approximating the data cube and summarizing database data in order to provide quick and approximate answers to aggregate queries by precomputing a summary of the data cube using histograms and answering queries using the substantially smaller summary. A unique method according to the present invention provides for identifying accurate histogram classes and distributing the space among the histograms on various sub-cubes such that the errors are minimized, while at the same time computer resources are maximized.

Type: Grant

Filed: May 21, 1998

Date of Patent: August 22, 2000

Assignee: Lucent Technologies, Inc.

Inventors: Viswanath Poosala, Venkatesh Ganti

prev 1 2 3 4