Patents by Inventor Venkatesh Ganti

Venkatesh Ganti has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7251646
    Abstract: The subject disclosure pertains to efficient computation of the difference between queries by exploiting commonality between them. A minimal difference query (MDQ) is generated that roughly corresponds to removal of as many joins as possible while still accurately representing the query difference. The minimal difference can be employed to further substantially the scope of view matching where a query is not wholly subsumed by a view. Additionally, the minimal difference query can be employed as an analytical tool in various contexts.
    Type: Grant
    Filed: February 13, 2006
    Date of Patent: July 31, 2007
    Assignee: Microsoft Corporation
    Inventors: Kaushik Shriraghav, Venkatesh Ganti, Xin Dong
  • Publication number: 20070005556
    Abstract: A technique for probabilistic determining fuzzy duplicates includes converting a plurality of tuples into hash vectors utilizing a locality sensitive hashing algorithm. The hash vectors are sorted, on one or more vector coordinates, to cluster similar hash coordinate values together. Each cluster of two or more hash vectors identifies candidate tuples. The candidate tuples are compared utilizing a similarity function. Tuples which are more similar than a specified threshold are returned.
    Type: Application
    Filed: June 30, 2005
    Publication date: January 4, 2007
    Applicant: Microsoft Corporation
    Inventors: Venkatesh Ganti, Ying Xu
  • Publication number: 20060282436
    Abstract: A system that facilitates estimating functional relationships associated with one or more columns in a database comprises a sampling component that receives a random sample of records within the database. An estimate generator component calculates an estimate of strength of functional relationships based at least in part upon the received samples. For example, the estimate generator component can calculate an estimate of strength of a column as a key column based at least in part upon the received samples.
    Type: Application
    Filed: May 6, 2005
    Publication date: December 14, 2006
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Kaushik Shriraghav
  • Patent number: 7149735
    Abstract: A method of estimating selectivity of a given string predicate in a database query. In the method selectivities of substrings of various substring lengths are estimated. For example, the selectivity of substrings between length l (or some constant q) to the length of the given string predicate may be estimated. The method then selects a candidate sub string for each sub string length based on estimated selectivities of the substrings. The estimated selectivities of the candidate substrings are combined. The combined estimated selectivity of the candidate substrings is returned as the estimated selectivity of the given string predicate.
    Type: Grant
    Filed: June 24, 2003
    Date of Patent: December 12, 2006
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Luis Gravano
  • Publication number: 20060053129
    Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.
    Type: Application
    Filed: August 30, 2004
    Publication date: March 9, 2006
    Applicant: Microsoft Corporation
    Inventors: Rajeev Motwani, Surajit Chaudhuri, Venkatesh Ganti
  • Publication number: 20050262044
    Abstract: The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key-foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.
    Type: Application
    Filed: July 14, 2005
    Publication date: November 24, 2005
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Rohit Ananthakrishna
  • Patent number: 6961721
    Abstract: The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key—foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.
    Type: Grant
    Filed: June 28, 2002
    Date of Patent: November 1, 2005
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Rohit Ananthakrishna
  • Publication number: 20050234906
    Abstract: An system for segmenting strings into component parts for use with a database management system. A reference table of string records are segmented into multiple substrings corresponding to database attributes. The substrings within an attribute are analyzed to provide a state model that assumes a beginning, a middle and an ending token topology for that attribute. A null token takes into account an empty attribute component and copying of states allows for erroneous token insertions and misordering. Once the model is created from the clean data, the process breaks or parses an input record into a sequence of tokens. The process then determines a most probable segmentation of the input record by comparing the tokens of the input record with a state models derived for attributes from the reference table.
    Type: Application
    Filed: April 14, 2004
    Publication date: October 20, 2005
    Applicant: Microsoft Corporation
    Inventors: Venkatesh Ganti, Theodore Vassilakis, Yevgeny Agichtein
  • Publication number: 20040267713
    Abstract: A method of estimating selectivity of a given string predicate in a database query. In the method selectivities of substrings of various substring lengths are estimated. For example, the selectivity of substrings between length l (or some constant q) to the length of the given string predicate may be estimated. The method then selects a candidate sub string for each sub string length based on estimated selectivities of the substrings. The estimated selectivities of the candidate substrings are combined. The combined estimated selectivity of the candidate substrings is returned as the estimated selectivity of the given string predicate.
    Type: Application
    Filed: June 24, 2003
    Publication date: December 30, 2004
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Luis Gravano
  • Publication number: 20040260694
    Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.
    Type: Application
    Filed: June 20, 2003
    Publication date: December 23, 2004
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani
  • Publication number: 20040249789
    Abstract: A process for finding a similar data records from a set of data records. A database table or tables provide a number of data records from which one or more canonical data records are identified. Tokens are identified within the data records and classified according to attribute field. A similarity score is assigned to data records in relation to other data records based on a similarity between tokens of the data records. Data records whose similarity score with respect to each other is greater than a threshold form one or more groups of data records. The records or tuples form nodes of a graph wherein edges between nodes represent a similarity score between records of a group. Within each group a canonical record is identified based on the similarity of data records to each other within the group.
    Type: Application
    Filed: June 4, 2003
    Publication date: December 9, 2004
    Applicant: Microsoft Corporation
    Inventors: Rahul Kapoor, Venkatesh Ganti, Surajit Chaudhuri
  • Publication number: 20040003005
    Abstract: The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key—foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.
    Type: Application
    Filed: June 28, 2002
    Publication date: January 1, 2004
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Rohit Ananthakrishna
  • Patent number: 6442561
    Abstract: A method of creating and updating a binary decision tree from training databases that cannot be fit in high speed solid state memory is provided in which a subset of the training database which can fit into high speed memory is used to create a statistically good estimate of the binary decision tree desired. This statistically good estimate is used to review the entire training database in as little as one sequential scan to collect statistics necessary to verify the accuracy of the binary decision tree and to refine the binary decision tree to be identical to that which would be obtained by a full analysis of the training database.
    Type: Grant
    Filed: December 15, 1999
    Date of Patent: August 27, 2002
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Johannes E. Gehrke, Venkatesh Ganti, Raghu Ramakrishnan
  • Patent number: 6108647
    Abstract: A novel and unique method of approximating the data cube and summarizing database data in order to provide quick and approximate answers to aggregate queries by precomputing a summary of the data cube using histograms and answering queries using the substantially smaller summary. A unique method according to the present invention provides for identifying accurate histogram classes and distributing the space among the histograms on various sub-cubes such that the errors are minimized, while at the same time computer resources are maximized.
    Type: Grant
    Filed: May 21, 1998
    Date of Patent: August 22, 2000
    Assignee: Lucent Technologies, Inc.
    Inventors: Viswanath Poosala, Venkatesh Ganti