Patents by Inventor Sanjay Ghemawat

Sanjay Ghemawat has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Representative Document Selection for a Set of Duplicate Documents

Publication number: 20150026170

Abstract: Systems and methods are provided for obtaining a plurality of documents. A respective document in the plurality of documents is associated with a score and each document in the plurality of documents is from a different data structure in a plurality of data structures. Each data structure in the plurality of data structures represents a different portion of a document address space. A first document in the plurality of documents is selected in accordance with the score associated with the first document. The first document has a fingerprint that indicates that the first document has substantially identical content to every other document in the plurality of documents. In accordance with the score, the first document is indexed thereby producing an indexed first document. With respect to the plurality of documents, the indexed first document is included in a document index as representative of each document in the plurality of documents.

Type: Application

Filed: October 9, 2014

Publication date: January 22, 2015

Inventors: Daniel Wesley Dulitz, Alexandre A. Verstak, Sanjay Ghemawat, Jeffrey Adgate Dean
Representative document selection for a set of duplicate documents

Patent number: 8868559

Abstract: Systems and methods for indexing a representative document from a set of duplicate documents are disclosed. Disclosed systems and methods comprise selecting a first document in a plurality of documents on the basis that the first document is associated with a query independent score. Each respective document in the plurality of documents has a fingerprint that indicates that the respective document has substantially identical content to every other document in the plurality of documents. Disclosed systems and methods further comprise indexing, in accordance with the query independent score, the first document thereby producing an indexed first document. With respect to the plurality of documents, only the indexed first document is included in a document index.

Type: Grant

Filed: August 30, 2012

Date of Patent: October 21, 2014

Assignee: Google Inc.

Inventors: Daniel Dulitz, Alexandre A. Verstak, Sanjay Ghemawat, Jeffrey A. Dean
Providing posts from an extended network

Patent number: 8856141

Abstract: A system includes: an engaging post identifier for identifying and retrieving engaging posts; an extended network post identifier for identifying extended posts from an extended network; a combining module for creating a combined list of added posts from the engaging post and the extended posts, the combining module generating one or more ranked posts by ranking the list of added posts by relevance to a user; and a user interface module for providing the one or more ranked posts. The disclosure also includes a method for finding and providing engaging posts that includes determining engaging posts; determining extended posts from an extended social network using a social graph of the user; adding the engaging posts and the extended posts to create a combined list of added posts; ranking the added posts by relevance to a user; and providing one or more of the ranked posts.

Type: Grant

Filed: October 23, 2012

Date of Patent: October 7, 2014

Assignee: Google Inc.

Inventors: Jeffrey Adgate Dean, Sanjay Ghemawat, Sachin Jain, Boris Mazniker
Distributed crawling of hyperlinked documents

Patent number: 8812478

Abstract: Techniques for crawling hyperlinked documents are provided. Hyperlinked documents to be crawled are grouped by host and the host to be crawled next is selected according to a stall time of the host. The stall time can indicate the earliest time that the host should be crawled and the stall times can be a predetermined amount of time, vary by host and be adjusted according to actual retrieval times from the host.

Type: Grant

Filed: September 10, 2012

Date of Patent: August 19, 2014

Assignee: Google Inc.

Inventors: Jeffrey A. Dean, Craig Silverstein, Benedict Gomes, Sanjay Ghemawat
System and method of accessing a document efficiently through multi-tier web caching

Patent number: 8788475

Abstract: Upon receipt of a document request, a client assistant examines its cache for the document. If not successful, a server searches for the requested document in its cache. If the server copy is still not fresh or not found, the server seeks the document from its host. If the host cannot provide the copy, the server seeks it from a document repository. Certain documents are identified from the document repository as being fresh or stable. Information about each of these identified documents is transmitted to the server which inserts entries into an index if the index does not already contain an entry for the document. If and when this particular document is requested, the document will not be present in the server, however the server will contain an entry directing the server to obtain the document from the document repository rather than the document's web host.

Type: Grant

Filed: June 28, 2012

Date of Patent: July 22, 2014

Assignee: Google Inc.

Inventors: Eric Russell Fredricksen, Fritz John Schneider, Jeffrey Adgate Dean, Sanjay Ghemawat, Niels Provos, Georges Harik
DETERMINING CORRESPONDING TERMS WRITTEN IN DIFFERENT FORMATS

Publication number: 20140188454

Abstract: Methods and apparatus consistent with the invention allow a user to submit an ambiguous search query and to receive relevant search results. Queries can be expressed using character sets and/or languages that are different from the character set and/or language of at least some of the data that is to be searched. A translation between these character sets and/or languages can be performed by examining the use of terms in aligned text. Probabilities can be associated with each possible translation. Refinements can be made to these probabilities by examining user interactions with the search results.

Type: Application

Filed: March 6, 2014

Publication date: July 3, 2014

Applicant: Google Inc.

Inventors: Vibhu Mittal, Jay M. Ponte, Mehran Sahami, Sanjay Ghemawat, John A. Bauer
Back-off language model compression

Patent number: 8725509

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to language models stored for digital language processing. In one aspect, a method includes the actions of generating a language model, including: receiving a collection of n-grams from a corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus, and generating a trie representing the collection of n-grams, the trie being represented using one or more arrays of integers, and compressing an array representation of the trie using block encoding; and using the language model to identify a second probability of a particular string of words occurring.

Type: Grant

Filed: June 17, 2009

Date of Patent: May 13, 2014

Assignee: Google Inc.

Inventors: Boulos Harb, Ciprian Chelba, Jeffrey A. Dean, Sanjay Ghemawat
Identification of semantic units from within a search query

Patent number: 8719262

Abstract: A search engine for searching a corpus improves the relevancy of the results by classifying multiple terms in a search query as a single semantic unit. A semantic unit locator of the search engine generates a subset of documents that are generally relevant to the query based on the individual terms within the query. Combinations of search terms that define potential semantic units from the query are then evaluated against the subset of documents to determine which combinations of search terms should be classified as a semantic unit. The resultant semantic units are used to refine the results of the search.

Type: Grant

Filed: September 14, 2012

Date of Patent: May 6, 2014

Assignee: Google Inc.

Inventors: Krishna Bharat, Sanjay Ghemawat, Urs Hoelzle
Systems and methods for searching using queries written in a different character-set and/or language from the target pages

Patent number: 8706747

Abstract: Methods and apparatus consistent with the invention allow a user to submit an ambiguous search query and to receive relevant search results. Queries can be expressed using character sets and/or languages that are different from the character set and/or language of at least some of the data that is to be searched. A translation between these character sets and/or languages can be performed by examining the use of terms in aligned text. Probabilities can be associated with each possible translation. Refinements can be made to these probabilities by examining user interactions with the search results.

Type: Grant

Filed: September 30, 2003

Date of Patent: April 22, 2014

Assignee: Google Inc.

Inventors: Vibhu Mittal, Jay M. Ponte, Mehran Sahami, Sanjay Ghemawat, John A. Bauer
System and Method For Large-Scale Data Processing Using an Application-Independent Framework

Publication number: 20140096138

Abstract: A large-scale data processing system and method for processing data in a distributed and parallel processing environment is disclosed. The system comprises a set of interconnected computing systems, each having one or more processors and memory. The set of interconnected computing systems include: a set of application-independent map modules for reading portions of input files containing data, and for producing intermediate data values by applying at least one user-specified, application-specific map operation to the data; a set of intermediate data structures distributed among a plurality of the interconnected computing systems for storing the intermediate data values; and a set of application-independent reduce modules, distinct from the plurality of application-independent map modules, for producing final output data by applying at least one user-specified, application-specific reduce operation to the intermediate data values.

Type: Application

Filed: December 6, 2013

Publication date: April 3, 2014

Applicant: Google Inc.

Inventors: Jeffrey Dean, Sanjay Ghemawat
Efficiently Updating and Deleting Data in a Data Storage System

Publication number: 20140025899

Abstract: A method of storing data is disclosed. The method is performed on a data storage server having one or more processors and memory storing one or more programs for execution by the one or more processors. The data storage server receives a first and second data request, the requests including a first and second range of one or more keys and an associated first and second value respectively. The data storage server identifies one or more overlap points associated with the first range and the second range. For each of the overlap points, the data storage server then creates data items including ranges of keys, the ranges of each data item including one or more keys that are either: (a) the keys between a terminal key of the first or second range and the overlap point, or (b) the keys between two adjacent overlap points.

Type: Application

Filed: June 4, 2013

Publication date: January 23, 2014

Inventors: Jeffrey Adgate Dean, Sanjay Ghemawat, Andrew Fikes
Collecting Processor Usage Statistics

Publication number: 20140025810

Abstract: In accordance with some implementations, a method of collecting statistics about processor usage is disclosed. The method is performed on a data storage server having one or more processors and memory storing one or more programs for execution by the one or more processors. The server system executes one or more processes, wherein each of the one or more first processes is associated with an entity from a group of one or more entities. The server system then receives an interrupt signal at a first predetermined interval. In response to receiving the interrupt signal and for each processor of the one or more processors, the server system interrupts the process currently being executed on the processor. The server system increments the counter associated with the interrupted process. The server system then resumes the interrupted process.

Type: Application

Filed: June 4, 2013

Publication date: January 23, 2014

Inventors: Sanjay Ghemawat, Andrew Fikes, Chris Jorgen Taylor
Storing and Moving Data in a Distributed Storage System

Publication number: 20130346540

Abstract: A system, computer-readable storage medium storing at least one program, and a computer-implemented method for identifying a storage group in a distributed storage system into which data is to be stored is presented. A data structure including information relating to storage groups in a distributed storage system is maintained, where a respective entry in the data structure for a respective storage group includes placement metrics for the respective storage group. A request to identify a storage group into which data is to be stored is received from a computer system. The data structure is used to determine an identifier for a storage group whose placement metrics satisfy a selection criterion. The identifier for the storage group whose placement metrics satisfy the selection criterion is returned to the computer system.

Type: Application

Filed: May 21, 2013

Publication date: December 26, 2013

Applicant: Google Inc.

Inventors: Jeffrey Adgate Dean, Sanjay Ghemawat, Yasushi Saito, Andrew Fikes, Christopher Jorgen Taylor, Sean Quinlan, Michal Piotr Szymaniak, Sebastian Kanthak, Wilson Cheng-Yi Hsieh, Alexander Lloyd, Michael James Boyer Epstein
METHOD AND SYSTEM FOR DELETING OBSOLETE FILES FROM A FILE SYSTEM

Publication number: 20130339318

Abstract: A method for deleting obsolete files from a file system is provided. The method includes: receiving a request to delete a reference to a target file in a file system from a file reference data structure, wherein the file reference data structure includes target file names and reference file names; identifying a reference file name in the file reference data structure, wherein the reference file name includes a file name of the target file; deleting a reference file from the file system, wherein the reference file has the identified reference file name; checking whether the file system includes at least one reference file whose file name matches the file name of the target file; if there is no such reference file in the file system: deleting the target file from the file system; and deleting the file name of the target file from the file reference data structure.

Type: Application

Filed: June 3, 2013

Publication date: December 19, 2013

Applicant: Google Inc.

Inventors: Yasushi Saito, Sanjay Ghemawat, Jeffrey Adgate Dean
Organizing Data in a Distributed Storage System

Publication number: 20130339295

Abstract: A distributed storage system is provided. The distributed storage system includes multiple front-end servers and zones for managing data for clients. Data within the distributed storage system is associated with a plurality of accounts and divided into a plurality of groups, each group including a plurality of splits, each split being associated with a respective account, and each group having multiple tablets and each tablet managed by a respective tablet server of the distributed storage system. Data associated with different accounts may be replicated within the distributed storage system using different data replication policies. There is no limit to the amount of data for an account by adding new splits to the distributed storage system. In response to a client request for a particular account's data, a front-end server communicates such request to a particular zone that has the client-requested data and returns the client-requested data to the requesting client.

Type: Application

Filed: May 20, 2013

Publication date: December 19, 2013

Applicant: Google, Inc.

Inventors: Jeffrey Adgate Dean, Michael James Boyer Epstein, Andrew Fikes, Sanjay Ghemawat, Wilson Cheng-Yi Hsieh, Alexander Lloyd, Yasushi Saito, Michal Piotr Szymaniak, Sebastian Kanthak, Chris Jorgen Taylor
EFFICIENT SNAPSHOT READ OF A DATABASE IN A DISTRIBUTED STORAGE SYSTEM

Publication number: 20130339301

Abstract: A computer system issues a batch read operation to a tablet in a first replication group in a distributed database and obtains a most recent version of data items in the tablet that have a timestamp no great than a snapshot timestamp T. For each data item in the one tablet, the computer system determines whether the data item has a move-in timestamp less than or equal to the snapshot timestamp T, which is less than a move-out timestamp, and whether the data item has a creation timestamp less than the snapshot timestamp T, which is less than or equal to a deletion timestamp.

Type: Application

Filed: June 3, 2013

Publication date: December 19, 2013

Applicant: Google Inc.

Inventors: Yasushi Saito, Sanjay Ghemawat, Sebastian Kanthak, Christopher Cunningham Frost
System and method for large-scale data processing using an application-independent framework

Patent number: 8612510

Abstract: A large-scale data processing system and method for processing data in a distributed and parallel processing environment. The system includes an application-independent framework for processing data having a plurality of application-independent map modules and reduce modules. These application-independent modules use application-independent operators to automatically handle parallelization of computations across the distributed and parallel processing environment when performing user-specified data processing operations. The system also includes a plurality of user-specified, application-specific operators, for use with the application-independent framework to perform a user-specified data processing operation on a user-specified set of input files. The application-specific operators include: a map operator and a reduce operator. The map operator is applied by the application-independent map modules to input data in the user-specified set of input files to produce intermediate data values.

Type: Grant

Filed: January 12, 2010

Date of Patent: December 17, 2013

Assignee: Google Inc.

Inventors: Jeffrey Dean, Sanjay Ghemawat
PERMITTING USERS TO REMOVE DOCUMENTS

Publication number: 20130332455

Abstract: A system may present information regarding a document and provide an option for removing the document. The system may also receive selection of the option and remove the document when the option is selected. The system may aggregate information regarding documents that have been removed by a group of users and assign scores to a set of documents based on the aggregated information.

Type: Application

Filed: April 8, 2013

Publication date: December 12, 2013

Applicant: GOOGLE INC.

Inventors: Sanjay GHEMAWAT, John PISCITELLO, Simon TONG, Matt CUTTS
Associating Application-Specific Methods with Tables Used for Data Storage

Publication number: 20130297592

Abstract: A method of accessing data includes storing a table that includes a plurality of tablets corresponding to distinct non-overlapping table portions. Respective pluralities of tablet access objects and application objects are stored in a plurality of servers. A distinct application object and distinct tablet are associated with each tablet access object. Each application object corresponds to a distinct instantiation of an application associated with the table. The tablet access objects and associated application objects are redistributed among the servers in accordance with a first load-balancing criterion. A first request directed to a respective tablet is received from a client. In response, the tablet access object associated with the respective tablet is used to perform a data access operation on the respective tablet, and the application object associated with the respective tablet is used to perform an additional computational operation to produce a result to be returned to the client.

Type: Application

Filed: July 9, 2013

Publication date: November 7, 2013

Inventors: Jeffrey A. Dean, Sanjay Ghemawat, Andrew B. Fikes, Yasushi Saito
Efficient indexing of documents with similar content

Patent number: 8554561

Abstract: A computer system comprising one or more processors and memory groups a set of documents into a plurality of clusters. Each cluster includes one or more documents of the set of documents and a respective cluster of documents of the plurality of clusters includes respective cluster data corresponding to a plurality of documents including a first document and a second document. The computer system determines that the second document includes duplicate data that is duplicative of corresponding data in the first document, identifies a respective subset of the respective cluster data that excludes at least a subset of the duplicate data, and generates an index of the respective subset of the respective cluster data.

Type: Grant

Filed: August 9, 2012

Date of Patent: October 8, 2013

Assignee: Google Inc.

Inventors: Jeffrey A. Dean, Sanjay Ghemawat, Gautham Thambidorai

prev 1 2 3 4 5 6 7 8 9 next