Patents by Inventor Sanjay Ghemawat

Sanjay Ghemawat has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20170124452
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for receiving a request from a client to process a computational graph; obtaining data representing the computational graph, the computational graph comprising a plurality of nodes and directed edges, wherein each node represents a respective operation, wherein each directed edge connects a respective first node to a respective second node that represents an operation that receives, as input, an output of an operation represented by the respective first node; identifying a plurality of available devices for performing the requested operation; partitioning the computational graph into a plurality of subgraphs, each subgraph comprising one or more nodes in the computational graph; and assigning, for each subgraph, the operations represented by the one or more nodes in the subgraph to a respective available device in the plurality of available devices for operation.
    Type: Application
    Filed: October 28, 2016
    Publication date: May 4, 2017
    Inventors: Paul A. Tucker, Jeffrey Adgate Dean, Sanjay Ghemawat, Yuan Yu
  • Publication number: 20170124454
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for modifying a computational graph to include send and receive nodes. Communication between unique devices performing operations of different subgraphs of the computational graph can be handled efficiently by inserting send and receive nodes into each subgraph. When executed, the operations that these send and receive nodes represent may enable pairs of unique devices to conduct communication with each other in a self-sufficient manner. This shifts the burden of coordinating communication away from the backend, which affords the system that processes this computational graph representation the opportunity to perform one or more other processes while devices are executing subgraphs.
    Type: Application
    Filed: October 28, 2016
    Publication date: May 4, 2017
    Inventors: Vijay Vasudevan, Jeffrey Adgate Dean, Sanjay Ghemawat
  • Patent number: 9621651
    Abstract: A system facilitates the distribution and redistribution of chunks of data among multiple servers. The system may identify servers to store a replica of the data based on at least one of utilization of the servers, prior data distribution involving the servers, and failure correlation properties associated with the servers, and place the replicas of the data at the identified servers. The system may also monitor total numbers of replicas of the chunks available in the system, identify chunks that have a total number of replicas below one or more chunk thresholds, assign priorities to the identified chunks, and re-replicate the identified chunks based substantially on the assigned priorities.
    Type: Grant
    Filed: May 27, 2015
    Date of Patent: April 11, 2017
    Assignee: Google Inc.
    Inventors: Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
  • Patent number: 9619565
    Abstract: A search engine server system receives from a client system a search query and identifies a set of documents in accordance with the search query. A content snippet corresponding to content in a respective document of the identified set of documents is generated, the content snippet associated with at least one query term of the one or more query terms in the search query. A response to the search query is returned to the client system, the response including information identifying at least the respective document and including the content snippet. Generating the content snippet includes performing a first decompression operation on first token identifiers, from a compressed document repository, to provide a set of second token identifiers, and performing a second decompression operation on the set of second token identifiers to recover uncompressed content comprising a portion of the respective document.
    Type: Grant
    Filed: August 3, 2015
    Date of Patent: April 11, 2017
    Assignee: Google Inc.
    Inventors: Jeffrey Dean, Gautham K. Thambidorai, Sanjay Ghemawat, Benedict Anthony Gomes, Olcan Sercinoglu
  • Patent number: 9612883
    Abstract: A large-scale data processing system and method for processing data in a distributed and parallel processing environment is disclosed. The system comprises a set of interconnected computing systems, each having one or more processors and memory. The set of interconnected computing systems include: a set of application-independent map modules for reading portions of input files containing data, and for producing intermediate data values by applying at least one user-specified, application-specific map operation to the data; a set of intermediate data structures distributed among a plurality of the interconnected computing systems for storing the intermediate data values; and a set of application-independent reduce modules, distinct from the plurality of application-independent map modules, for producing final output data by applying at least one user-specified, application-specific reduce operation to the intermediate data values.
    Type: Grant
    Filed: December 6, 2013
    Date of Patent: April 4, 2017
    Assignee: Google Inc.
    Inventors: Jeffrey Dean, Sanjay Ghemawat
  • Publication number: 20170011056
    Abstract: A method for deleting obsolete files from a file system is provided. The method includes receiving a request to delete a reference to a first target file of a plurality of target files stored in a file system, the first target file having a first target file name. A first reference file whose file name includes the first target file name is identified. The first reference file is deleted from the file system. The method further includes determining whether the file system includes at least one reference file, distinct from the first reference file, whose file name includes the first target file name. In accordance with a determination that the file system does not include the at least one reference file, the first target file is deleted from the file system.
    Type: Application
    Filed: September 19, 2016
    Publication date: January 12, 2017
    Inventors: Yasushi Saito, Sanjay Ghemawat, Jeffrey Adgate Dean
  • Publication number: 20160342657
    Abstract: A method processes data records. The method partitions the data records into groups and assigns each group to a respective process of a first plurality of processes, which execute in parallel. For each group, the assigned process extracts information from the data records, applies a script with information processing commands applied sequentially to produce intermediate values, stores the intermediate values in a respective intermediate data structure, and updates the status of the group to indicate completion. When the predefined threshold percentage of the data records are completed, the process assigns each group to a respective second process as a backup. When each of the groups has been completed by at least one process (either the original or the backup), the method executes a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data. The aggregation includes intermediate values only once for each group.
    Type: Application
    Filed: August 2, 2016
    Publication date: November 24, 2016
    Inventors: Robert C. Pike, Sean Quinlan, Sean M. Dorward, Jeffrey Dean, Sanjay Ghemawat
  • Publication number: 20160321252
    Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.
    Type: Application
    Filed: April 1, 2016
    Publication date: November 3, 2016
    Inventors: Huican ZHU, Jeffrey DEAN, Sanjay GHEMAWAT, Bwolen Po-Jen YANG, Anurag ACHARYA
  • Patent number: 9477758
    Abstract: In one aspect, the present disclosure can be embodied in a method that includes identifying a collection of entities from one or more data sources, calculating a score for subsets of entities from the collection based on one or more seed entities associated with the collection, identifying one or more entities from each of the subsets based on the calculated score, assigning the calculated score to the identified one or more entities from the respective subset, and ranking the one or more entities based on the assigned score, so as to identify entities in the collection that are related to the one or more seed entities.
    Type: Grant
    Filed: July 19, 2012
    Date of Patent: October 25, 2016
    Assignee: GOOGLE INC.
    Inventors: Simon Tong, Jeffrey Adgate Dean, Sanjay Ghemawat
  • Patent number: 9449006
    Abstract: A method for deleting obsolete files from a file system is provided. The method includes: receiving a request to delete a reference to a target file in a file system from a file reference data structure, wherein the file reference data structure includes target file names and reference file names; identifying a reference file name in the file reference data structure, wherein the reference file name includes a file name of the target file; deleting a reference file from the file system, wherein the reference file has the identified reference file name; checking whether the file system includes at least one reference file whose file name matches the file name of the target file; if there is no such reference file in the file system: deleting the target file from the file system; and deleting the file name of the target file from the file reference data structure.
    Type: Grant
    Filed: June 3, 2013
    Date of Patent: September 20, 2016
    Assignee: Google Inc.
    Inventors: Yasushi Saito, Sanjay Ghemawat, Jeffrey Adgate Dean
  • Patent number: 9405808
    Abstract: A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.
    Type: Grant
    Filed: February 28, 2012
    Date of Patent: August 2, 2016
    Assignee: GOOGLE INC.
    Inventors: Robert C. Pike, Sean Quinlan, Sean M. Dorward, Jeffrey Dean, Sanjay Ghemawat
  • Patent number: 9305091
    Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.
    Type: Grant
    Filed: November 18, 2011
    Date of Patent: April 5, 2016
    Assignee: Google Inc.
    Inventors: Huican Zhu, Jeffrey Dean, Sanjay Ghemawat, Bwolen Po-Jen Yang, Anurag Acharya
  • Patent number: 9298576
    Abstract: In accordance with some implementations, a method of collecting statistics about processor usage is disclosed. The method is performed on a data storage server having one or more processors and memory storing one or more programs for execution by the one or more processors. The server system executes one or more processes, wherein each of the one or more first processes is associated with an entity from a group of one or more entities. The server system then receives an interrupt signal at a first predetermined interval. In response to receiving the interrupt signal and for each processor of the one or more processors, the server system interrupts the process currently being executed on the processor. The server system increments the counter associated with the interrupted process. The server system then resumes the interrupted process.
    Type: Grant
    Filed: June 4, 2013
    Date of Patent: March 29, 2016
    Assignee: GOOGLE INC.
    Inventors: Sanjay Ghemawat, Andrew Fikes, Chris Jorgen Taylor
  • Patent number: 9256506
    Abstract: A system, computer-readable storage medium storing at least one program, and a computer-implemented method for performing operations on target servers is presented. A request including an operation is received. A set of target servers associated with the operation is identified. The following request processing operations are performed until a predetermined termination condition has been satisfied: a target server in the set of target servers to which the request has not been issued and whose health metrics satisfy health criteria is identified, the request to perform the operation is issued to the target server, and when the request to perform the operation fails at the target server, health metrics for the target server are updated to indicate that the request to perform the operation failed at the target server and health check operation is scheduled to be performed with respect to the target server.
    Type: Grant
    Filed: June 3, 2013
    Date of Patent: February 9, 2016
    Assignee: GOOGLE INC.
    Inventors: Chris Jorgen Taylor, Sanjay Ghemawat, Alexander Lloyd, Andrew Fikes, Yaz Saito, Wilson Cheng-Yi Hsieh, Christopher Cunningham Frost
  • Patent number: 9195611
    Abstract: A method of storing data is disclosed. The method is performed on a data storage server having one or more processors and memory storing one or more programs for execution by the one or more processors. The data storage server receives a first and second data request, the requests including a first and second range of one or more keys and an associated first and second value respectively. The data storage server identifies one or more overlap points associated with the first range and the second range. For each of the overlap points, the data storage server then creates data items including ranges of keys, the ranges of each data item including one or more keys that are either: (a) the keys between a terminal key of the first or second range and the overlap point, or (b) the keys between two adjacent overlap points.
    Type: Grant
    Filed: June 4, 2013
    Date of Patent: November 24, 2015
    Assignee: GOOGLE INC.
    Inventors: Jeffrey Adgate Dean, Sanjay Ghemawat, Andrew Fikes
  • Patent number: 9098501
    Abstract: A search engine server system receives from a client system a search query and identifies a set of documents in accordance with the search query. A content snippet corresponding to content in a respective document of the identified set of documents is generated, the content snippet associated with at least one query term of the one or more query terms in the search query. A response to the search query is returned to the client system, the response including information identifying at least the respective document and including the content snippet. Generating the content snippet includes performing a first decompression operation on first token identifiers, from a compressed document repository, to provide a set of second token identifiers, and performing a second decompression operation on the set of second token identifiers to recover uncompressed content comprising a portion of the respective document.
    Type: Grant
    Filed: November 26, 2012
    Date of Patent: August 4, 2015
    Assignee: Google Inc.
    Inventors: Jeffrey Dean, Gauthaum K. Thambidorai, Sanjay Ghemawat, Benedict Anthony Gomes, Olcan Sercinoglu
  • Patent number: 9069835
    Abstract: A distributed storage system is provided. The distributed storage system includes multiple front-end servers and zones for managing data for clients. Data within the distributed storage system is associated with a plurality of accounts and divided into a plurality of groups, each group including a plurality of splits, each split being associated with a respective account, and each group having multiple tablets and each tablet managed by a respective tablet server of the distributed storage system. Data associated with different accounts may be replicated within the distributed storage system using different data replication policies. There is no limit to the amount of data for an account by adding new splits to the distributed storage system. In response to a client request for a particular account's data, a front-end server communicates such request to a particular zone that has the client-requested data and returns the client-requested data to the requesting client.
    Type: Grant
    Filed: May 20, 2013
    Date of Patent: June 30, 2015
    Assignee: GOOGLE INC.
    Inventors: Jeffrey Adgate Dean, Michael James Boyer Epstein, Andrew Fikes, Sanjay Ghemawat, Wilson Cheng-Yi Hsieh, Alexander Lloyd, Yasushi Saito, Michal Piotr Szymaniak, Sebastian Kanthak, Chris Jorgen Taylor
  • Patent number: 9047307
    Abstract: A system facilitates the distribution and redistribution of chunks of data among multiple servers. The system may identify servers to store a replica of the data based on at least one of utilization of the servers, prior data distribution involving the servers, and failure correlation properties associated with the servers, and place the replicas of the data at the identified servers. The system may also monitor total numbers of replicas of the chunks available in the system, identify chunks that have a total number of replicas below one or more chunk thresholds, assign priorities to the identified chunks, and re-replicate the identified chunks based substantially on the assigned priorities.
    Type: Grant
    Filed: August 5, 2013
    Date of Patent: June 2, 2015
    Assignee: Google Inc.
    Inventors: Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
  • Patent number: 9002860
    Abstract: Methods for organizing and retrieving data values in a persistent data structure are provided. Data values are grouped into data blocks and pointers are obtained for each data block. In addition, one or more summaries, related to a properties of the data block, are created and associated with the data block's pointer. The summaries allow for a more efficient retrieval of data values from the data structure by preventing unnecessary retrieval calls to persistent storage when the summaries do not match query criteria.
    Type: Grant
    Filed: February 6, 2012
    Date of Patent: April 7, 2015
    Assignee: Google Inc.
    Inventor: Sanjay Ghemawat
  • Patent number: 8996517
    Abstract: A system may present information regarding a document and provide an option for removing the document. The system may also receive selection of the option and remove the document when the option is selected. The system may aggregate information regarding documents that have been removed by a group of users and assign scores to a set of documents based on the aggregated information.
    Type: Grant
    Filed: April 8, 2013
    Date of Patent: March 31, 2015
    Assignee: Google Inc.
    Inventors: Sanjay Ghemawat, John Piscitello, Simon Tong, Matt Cutts