Patents by Inventor Sanjay Ghemawat

Sanjay Ghemawat has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20150026170
    Abstract: Systems and methods are provided for obtaining a plurality of documents. A respective document in the plurality of documents is associated with a score and each document in the plurality of documents is from a different data structure in a plurality of data structures. Each data structure in the plurality of data structures represents a different portion of a document address space. A first document in the plurality of documents is selected in accordance with the score associated with the first document. The first document has a fingerprint that indicates that the first document has substantially identical content to every other document in the plurality of documents. In accordance with the score, the first document is indexed thereby producing an indexed first document. With respect to the plurality of documents, the indexed first document is included in a document index as representative of each document in the plurality of documents.
    Type: Application
    Filed: October 9, 2014
    Publication date: January 22, 2015
    Inventors: Daniel Wesley Dulitz, Alexandre A. Verstak, Sanjay Ghemawat, Jeffrey Adgate Dean
  • Patent number: 8868559
    Abstract: Systems and methods for indexing a representative document from a set of duplicate documents are disclosed. Disclosed systems and methods comprise selecting a first document in a plurality of documents on the basis that the first document is associated with a query independent score. Each respective document in the plurality of documents has a fingerprint that indicates that the respective document has substantially identical content to every other document in the plurality of documents. Disclosed systems and methods further comprise indexing, in accordance with the query independent score, the first document thereby producing an indexed first document. With respect to the plurality of documents, only the indexed first document is included in a document index.
    Type: Grant
    Filed: August 30, 2012
    Date of Patent: October 21, 2014
    Assignee: Google Inc.
    Inventors: Daniel Dulitz, Alexandre A. Verstak, Sanjay Ghemawat, Jeffrey A. Dean
  • Patent number: 8856141
    Abstract: A system includes: an engaging post identifier for identifying and retrieving engaging posts; an extended network post identifier for identifying extended posts from an extended network; a combining module for creating a combined list of added posts from the engaging post and the extended posts, the combining module generating one or more ranked posts by ranking the list of added posts by relevance to a user; and a user interface module for providing the one or more ranked posts. The disclosure also includes a method for finding and providing engaging posts that includes determining engaging posts; determining extended posts from an extended social network using a social graph of the user; adding the engaging posts and the extended posts to create a combined list of added posts; ranking the added posts by relevance to a user; and providing one or more of the ranked posts.
    Type: Grant
    Filed: October 23, 2012
    Date of Patent: October 7, 2014
    Assignee: Google Inc.
    Inventors: Jeffrey Adgate Dean, Sanjay Ghemawat, Sachin Jain, Boris Mazniker
  • Patent number: 8812478
    Abstract: Techniques for crawling hyperlinked documents are provided. Hyperlinked documents to be crawled are grouped by host and the host to be crawled next is selected according to a stall time of the host. The stall time can indicate the earliest time that the host should be crawled and the stall times can be a predetermined amount of time, vary by host and be adjusted according to actual retrieval times from the host.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: August 19, 2014
    Assignee: Google Inc.
    Inventors: Jeffrey A. Dean, Craig Silverstein, Benedict Gomes, Sanjay Ghemawat
  • Patent number: 8788475
    Abstract: Upon receipt of a document request, a client assistant examines its cache for the document. If not successful, a server searches for the requested document in its cache. If the server copy is still not fresh or not found, the server seeks the document from its host. If the host cannot provide the copy, the server seeks it from a document repository. Certain documents are identified from the document repository as being fresh or stable. Information about each of these identified documents is transmitted to the server which inserts entries into an index if the index does not already contain an entry for the document. If and when this particular document is requested, the document will not be present in the server, however the server will contain an entry directing the server to obtain the document from the document repository rather than the document's web host.
    Type: Grant
    Filed: June 28, 2012
    Date of Patent: July 22, 2014
    Assignee: Google Inc.
    Inventors: Eric Russell Fredricksen, Fritz John Schneider, Jeffrey Adgate Dean, Sanjay Ghemawat, Niels Provos, Georges Harik
  • Publication number: 20140188454
    Abstract: Methods and apparatus consistent with the invention allow a user to submit an ambiguous search query and to receive relevant search results. Queries can be expressed using character sets and/or languages that are different from the character set and/or language of at least some of the data that is to be searched. A translation between these character sets and/or languages can be performed by examining the use of terms in aligned text. Probabilities can be associated with each possible translation. Refinements can be made to these probabilities by examining user interactions with the search results.
    Type: Application
    Filed: March 6, 2014
    Publication date: July 3, 2014
    Applicant: Google Inc.
    Inventors: Vibhu Mittal, Jay M. Ponte, Mehran Sahami, Sanjay Ghemawat, John A. Bauer
  • Patent number: 8725509
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to language models stored for digital language processing. In one aspect, a method includes the actions of generating a language model, including: receiving a collection of n-grams from a corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus, and generating a trie representing the collection of n-grams, the trie being represented using one or more arrays of integers, and compressing an array representation of the trie using block encoding; and using the language model to identify a second probability of a particular string of words occurring.
    Type: Grant
    Filed: June 17, 2009
    Date of Patent: May 13, 2014
    Assignee: Google Inc.
    Inventors: Boulos Harb, Ciprian Chelba, Jeffrey A. Dean, Sanjay Ghemawat
  • Patent number: 8719262
    Abstract: A search engine for searching a corpus improves the relevancy of the results by classifying multiple terms in a search query as a single semantic unit. A semantic unit locator of the search engine generates a subset of documents that are generally relevant to the query based on the individual terms within the query. Combinations of search terms that define potential semantic units from the query are then evaluated against the subset of documents to determine which combinations of search terms should be classified as a semantic unit. The resultant semantic units are used to refine the results of the search.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: May 6, 2014
    Assignee: Google Inc.
    Inventors: Krishna Bharat, Sanjay Ghemawat, Urs Hoelzle
  • Patent number: 8706747
    Abstract: Methods and apparatus consistent with the invention allow a user to submit an ambiguous search query and to receive relevant search results. Queries can be expressed using character sets and/or languages that are different from the character set and/or language of at least some of the data that is to be searched. A translation between these character sets and/or languages can be performed by examining the use of terms in aligned text. Probabilities can be associated with each possible translation. Refinements can be made to these probabilities by examining user interactions with the search results.
    Type: Grant
    Filed: September 30, 2003
    Date of Patent: April 22, 2014
    Assignee: Google Inc.
    Inventors: Vibhu Mittal, Jay M. Ponte, Mehran Sahami, Sanjay Ghemawat, John A. Bauer
  • Publication number: 20140096138
    Abstract: A large-scale data processing system and method for processing data in a distributed and parallel processing environment is disclosed. The system comprises a set of interconnected computing systems, each having one or more processors and memory. The set of interconnected computing systems include: a set of application-independent map modules for reading portions of input files containing data, and for producing intermediate data values by applying at least one user-specified, application-specific map operation to the data; a set of intermediate data structures distributed among a plurality of the interconnected computing systems for storing the intermediate data values; and a set of application-independent reduce modules, distinct from the plurality of application-independent map modules, for producing final output data by applying at least one user-specified, application-specific reduce operation to the intermediate data values.
    Type: Application
    Filed: December 6, 2013
    Publication date: April 3, 2014
    Applicant: Google Inc.
    Inventors: Jeffrey Dean, Sanjay Ghemawat
  • Publication number: 20140025899
    Abstract: A method of storing data is disclosed. The method is performed on a data storage server having one or more processors and memory storing one or more programs for execution by the one or more processors. The data storage server receives a first and second data request, the requests including a first and second range of one or more keys and an associated first and second value respectively. The data storage server identifies one or more overlap points associated with the first range and the second range. For each of the overlap points, the data storage server then creates data items including ranges of keys, the ranges of each data item including one or more keys that are either: (a) the keys between a terminal key of the first or second range and the overlap point, or (b) the keys between two adjacent overlap points.
    Type: Application
    Filed: June 4, 2013
    Publication date: January 23, 2014
    Inventors: Jeffrey Adgate Dean, Sanjay Ghemawat, Andrew Fikes
  • Publication number: 20140025810
    Abstract: In accordance with some implementations, a method of collecting statistics about processor usage is disclosed. The method is performed on a data storage server having one or more processors and memory storing one or more programs for execution by the one or more processors. The server system executes one or more processes, wherein each of the one or more first processes is associated with an entity from a group of one or more entities. The server system then receives an interrupt signal at a first predetermined interval. In response to receiving the interrupt signal and for each processor of the one or more processors, the server system interrupts the process currently being executed on the processor. The server system increments the counter associated with the interrupted process. The server system then resumes the interrupted process.
    Type: Application
    Filed: June 4, 2013
    Publication date: January 23, 2014
    Inventors: Sanjay Ghemawat, Andrew Fikes, Chris Jorgen Taylor
  • Publication number: 20130346540
    Abstract: A system, computer-readable storage medium storing at least one program, and a computer-implemented method for identifying a storage group in a distributed storage system into which data is to be stored is presented. A data structure including information relating to storage groups in a distributed storage system is maintained, where a respective entry in the data structure for a respective storage group includes placement metrics for the respective storage group. A request to identify a storage group into which data is to be stored is received from a computer system. The data structure is used to determine an identifier for a storage group whose placement metrics satisfy a selection criterion. The identifier for the storage group whose placement metrics satisfy the selection criterion is returned to the computer system.
    Type: Application
    Filed: May 21, 2013
    Publication date: December 26, 2013
    Applicant: Google Inc.
    Inventors: Jeffrey Adgate Dean, Sanjay Ghemawat, Yasushi Saito, Andrew Fikes, Christopher Jorgen Taylor, Sean Quinlan, Michal Piotr Szymaniak, Sebastian Kanthak, Wilson Cheng-Yi Hsieh, Alexander Lloyd, Michael James Boyer Epstein
  • Publication number: 20130339318
    Abstract: A method for deleting obsolete files from a file system is provided. The method includes: receiving a request to delete a reference to a target file in a file system from a file reference data structure, wherein the file reference data structure includes target file names and reference file names; identifying a reference file name in the file reference data structure, wherein the reference file name includes a file name of the target file; deleting a reference file from the file system, wherein the reference file has the identified reference file name; checking whether the file system includes at least one reference file whose file name matches the file name of the target file; if there is no such reference file in the file system: deleting the target file from the file system; and deleting the file name of the target file from the file reference data structure.
    Type: Application
    Filed: June 3, 2013
    Publication date: December 19, 2013
    Applicant: Google Inc.
    Inventors: Yasushi Saito, Sanjay Ghemawat, Jeffrey Adgate Dean
  • Publication number: 20130339295
    Abstract: A distributed storage system is provided. The distributed storage system includes multiple front-end servers and zones for managing data for clients. Data within the distributed storage system is associated with a plurality of accounts and divided into a plurality of groups, each group including a plurality of splits, each split being associated with a respective account, and each group having multiple tablets and each tablet managed by a respective tablet server of the distributed storage system. Data associated with different accounts may be replicated within the distributed storage system using different data replication policies. There is no limit to the amount of data for an account by adding new splits to the distributed storage system. In response to a client request for a particular account's data, a front-end server communicates such request to a particular zone that has the client-requested data and returns the client-requested data to the requesting client.
    Type: Application
    Filed: May 20, 2013
    Publication date: December 19, 2013
    Applicant: Google, Inc.
    Inventors: Jeffrey Adgate Dean, Michael James Boyer Epstein, Andrew Fikes, Sanjay Ghemawat, Wilson Cheng-Yi Hsieh, Alexander Lloyd, Yasushi Saito, Michal Piotr Szymaniak, Sebastian Kanthak, Chris Jorgen Taylor
  • Publication number: 20130339301
    Abstract: A computer system issues a batch read operation to a tablet in a first replication group in a distributed database and obtains a most recent version of data items in the tablet that have a timestamp no great than a snapshot timestamp T. For each data item in the one tablet, the computer system determines whether the data item has a move-in timestamp less than or equal to the snapshot timestamp T, which is less than a move-out timestamp, and whether the data item has a creation timestamp less than the snapshot timestamp T, which is less than or equal to a deletion timestamp.
    Type: Application
    Filed: June 3, 2013
    Publication date: December 19, 2013
    Applicant: Google Inc.
    Inventors: Yasushi Saito, Sanjay Ghemawat, Sebastian Kanthak, Christopher Cunningham Frost
  • Patent number: 8612510
    Abstract: A large-scale data processing system and method for processing data in a distributed and parallel processing environment. The system includes an application-independent framework for processing data having a plurality of application-independent map modules and reduce modules. These application-independent modules use application-independent operators to automatically handle parallelization of computations across the distributed and parallel processing environment when performing user-specified data processing operations. The system also includes a plurality of user-specified, application-specific operators, for use with the application-independent framework to perform a user-specified data processing operation on a user-specified set of input files. The application-specific operators include: a map operator and a reduce operator. The map operator is applied by the application-independent map modules to input data in the user-specified set of input files to produce intermediate data values.
    Type: Grant
    Filed: January 12, 2010
    Date of Patent: December 17, 2013
    Assignee: Google Inc.
    Inventors: Jeffrey Dean, Sanjay Ghemawat
  • Publication number: 20130332455
    Abstract: A system may present information regarding a document and provide an option for removing the document. The system may also receive selection of the option and remove the document when the option is selected. The system may aggregate information regarding documents that have been removed by a group of users and assign scores to a set of documents based on the aggregated information.
    Type: Application
    Filed: April 8, 2013
    Publication date: December 12, 2013
    Applicant: GOOGLE INC.
    Inventors: Sanjay GHEMAWAT, John PISCITELLO, Simon TONG, Matt CUTTS
  • Publication number: 20130297592
    Abstract: A method of accessing data includes storing a table that includes a plurality of tablets corresponding to distinct non-overlapping table portions. Respective pluralities of tablet access objects and application objects are stored in a plurality of servers. A distinct application object and distinct tablet are associated with each tablet access object. Each application object corresponds to a distinct instantiation of an application associated with the table. The tablet access objects and associated application objects are redistributed among the servers in accordance with a first load-balancing criterion. A first request directed to a respective tablet is received from a client. In response, the tablet access object associated with the respective tablet is used to perform a data access operation on the respective tablet, and the application object associated with the respective tablet is used to perform an additional computational operation to produce a result to be returned to the client.
    Type: Application
    Filed: July 9, 2013
    Publication date: November 7, 2013
    Inventors: Jeffrey A. Dean, Sanjay Ghemawat, Andrew B. Fikes, Yasushi Saito
  • Patent number: 8554561
    Abstract: A computer system comprising one or more processors and memory groups a set of documents into a plurality of clusters. Each cluster includes one or more documents of the set of documents and a respective cluster of documents of the plurality of clusters includes respective cluster data corresponding to a plurality of documents including a first document and a second document. The computer system determines that the second document includes duplicate data that is duplicative of corresponding data in the first document, identifies a respective subset of the respective cluster data that excludes at least a subset of the duplicate data, and generates an index of the respective subset of the respective cluster data.
    Type: Grant
    Filed: August 9, 2012
    Date of Patent: October 8, 2013
    Assignee: Google Inc.
    Inventors: Jeffrey A. Dean, Sanjay Ghemawat, Gautham Thambidorai