Patents by Inventor Gautham Thambidorai

Gautham Thambidorai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Regional indexes

Patent number: 8131712

Abstract: A corpus of documents is identified, such as a large corpus of web documents. A quality score is applied to each, and at least some of the documents in the corpus of documents are identified based on their respective quality scores. At least one query characteristic, for instance, the language of a query, associated with a plurality of search queries is identified. A subset of documents in the corpus of documents is identified that satisfy the at least one query characteristic. An index is built that includes the identified at least some documents and the identified subset of documents.

Type: Grant

Filed: October 15, 2007

Date of Patent: March 6, 2012

Assignee: Google Inc.

Inventors: Gautham Thambidorai, Eisar A. Lipkovitz, Cosmos Nicolaou, Li Fan
Efficient Indexing of Documents with Similar Content

Publication number: 20120023073

Abstract: A set of documents may be stored and indexed as a compressed sequence of tokens. A set of documents are grouped into clusters. Sequences of tokens representing the clusters of documents are encoded to elide some repeating instances of tokens. A compressed sequence of tokens is generated from the compressed cluster sequences of tokens. Queries on the compressed sequence are performed by identifying cluster sequences within the compressed sequence that are likely to have documents that satisfy the query and then identifying, within these identified clusters, the documents that actually satisfies the query.

Type: Application

Filed: September 29, 2011

Publication date: January 26, 2012

Inventors: Jeffrey A. Dean, Sanjay Ghemawat, Gautham Thambidorai
Document compression system and method for use with tokenspace repository

Publication number: 20070220023

Abstract: The disclosed embodiments enable multi-stage query scoring, including “snippet” generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme. The mapping scheme includes a first mapping between unique tokens contained in a set of documents and unique global token identifiers (e.g., 32-bit integers) contained in a global-lexicon (i.e., dictionary). The mapping scheme also includes a second mapping between the global token identifiers and a set of fixed-length local token identifiers (e.g., 8-bit integers) contained in one or more mini-lexicons (i.e., sub-dictionaries). Each mini-lexicon is associated with a range of token positions in the tokenized documents. The first and second mappings are used to encode/decode documents into local token identifiers having fixed widths which can be compactly stored in the tokenspace repository. The use of fixed-length local token identifiers allows for fast and efficient decoding of tokenized documents.

Type: Application

Filed: August 13, 2004

Publication date: September 20, 2007

Inventors: Jeffrey Dean, Gautham Thambidorai, Sanjay Ghemawat, Benedict Gomes, Olcan Sercinoglu

prev 1 2

Regional indexes

Efficient Indexing of Documents with Similar Content

Document compression system and method for use with tokenspace repository