Patents by Inventor Shriraghav Kaushik

Shriraghav Kaushik has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Disk-Based Probabilistic Set-Similarity Indexes

Publication number: 20080313128

Abstract: Input set indexing for set-similarity lookups. The architecture provides input to an indexing process that enables more efficient lookups for large data sets (e.g., disk-based) without requiring a full scan of the input. A new index structure is provided, the output of which is exact, rather than approximate. The similarity of two sets is specified using a similarity function that maps two sets to a numeric value that represents similarity of the two sets. Threshold-based lookups are addressed where two sets are considered similar if the numeric similarity score is above a threshold. The structure efficiently identifies all input sets within a distance k (e.g., a hamming distance) of the query set. Additional information in the form of frequency of elements (the number of input sets in which an element occurs) is used to improve index performance.

Type: Application

Filed: June 12, 2007

Publication date: December 18, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Arvind Arasu, Venkatesh Ganti, Shriraghav Kaushik
EXAMPLE-DRIVEN DESIGN OF EFFICIENT RECORD MATCHING QUERIES

Publication number: 20080306945

Abstract: Example-driven creation of record matching queries. The disclosed architecture employs techniques that exploit the availability of positive (or matching) and negative (non-matching) examples to search through this space and suggest an initial record matching query. The record matching task is modeled as that of designing an operator tree obtained by composing a few primitive operators. This ensures that record matching programs be executable efficiently and scalably over large input relations. The architecture joins records across multiple (e.g., two) relations (e.g., R and S). The architecture exploits the monotonicity property of similarity functions for record matching in the relations, in that, any pair of matching records have a higher similarity value than non-matching record pairs on at least one similarity function.

Type: Application

Filed: June 5, 2007

Publication date: December 11, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Surajit Chaudhuri, Bee-Chung Chen, Venkatesh Ganti, Shriraghav Kaushik
Leveraging constraints for deduplication

Publication number: 20080288482

Abstract: A deduplication algorithm that provides improved accuracy in data deduplication by using aggregate and/or groupwise constraints. Deduplication is accomplished using only as many of these constraints that are satisfied rather than be imposed inflexibly as hard constraints. Additionally, textual similarity between tuples is leveraged to restrict the search space. The algorithm begins with a coarse initial partition of data records and continues by raising the similarity threshold until the threshold splits a given partition. This sequence of splits defines a rich space of alternatives. Over this space, an algorithm finds a partition of the input that maximizes constraint satisfaction. In the context of groupwise aggregation constraints for deduplication all SQL (structured query language) aggregates are allowed, including summation.

Type: Application

Filed: May 18, 2007

Publication date: November 20, 2008

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Venkatesh Ganti, Shriraghav Kaushik

prev 1 2

Disk-Based Probabilistic Set-Similarity Indexes

EXAMPLE-DRIVEN DESIGN OF EFFICIENT RECORD MATCHING QUERIES

Leveraging constraints for deduplication