Patents by Inventor Sanjay Ghemawat

Sanjay Ghemawat has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Storing genetic data in a storage system

Patent number: 10354748

Abstract: A method includes receiving, by a processing device, a plurality of genome files. Each genome file corresponds to a different sample and defining a genetic sequence. The method also includes generating, by the processing device, a two-dimensional alignment file based on the genome files and a reference sequence. A first dimension of the alignment file corresponds to individual genetic sequences and each of the genetic sequences is aligned with respect to the reference sequence along a second dimension of the alignment file. The method includes separating, by the processing device, the alignment file into a plurality of groups and storing the groups in a non-transitory genome data store. Each group contains segments of the genetic sequences of two or more of the genomic files.

Type: Grant

Filed: March 27, 2015

Date of Patent: July 16, 2019

Assignee: Google LLC

Inventors: David Konerding, Jeffrey Adgate Dean, Sanjay Ghemawat, Jonathan Bingham
Modifying computational graphs

Patent number: 10354186

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for modifying a computational graph to include send and receive nodes. Communication between unique devices performing operations of different subgraphs of the computational graph can be handled efficiently by inserting send and receive nodes into each subgraph. When executed, the operations that these send and receive nodes represent may enable pairs of unique devices to conduct communication with each other in a self-sufficient manner. This shifts the burden of coordinating communication away from the backend, which affords the system that processes this computational graph representation the opportunity to perform one or more other processes while devices are executing subgraphs.

Type: Grant

Filed: April 27, 2018

Date of Patent: July 16, 2019

Assignee: Google LLC

Inventors: Vijay Vasudevan, Jeffrey Adgate Dean, Sanjay Ghemawat
System and method for large-scale data processing using an application-independent framework

Patent number: 10296500

Abstract: A method performs large-scale data processing in a distributed and parallel processing environment. The method defines application-independent map and reduce operations, each invoking one or more library functions that automatically handle data partitioning, parallelization of computations, and fault tolerance. A user specifies a map operation, which calls one or more of the application-independent map operators to perform data read and write operations. A user also specifies a reduce operation, which calls one or more of the application-independent reduce operators to perform data read and write operations. The method executes application-independent map worker processes. Each map worker process executes the user-specified map operation to read designated portions of input files and store intermediate data values in intermediate data structures. The method also executes application-independent reduce worker processes.

Type: Grant

Filed: April 4, 2017

Date of Patent: May 21, 2019

Assignee: Google LLC

Inventors: Jeffrey Dean, Sanjay Ghemawat
METHOD FOR INTRA-SUBGRAPH OPTIMIZATION IN TUPLE GRAPH PROGRAMS

Publication number: 20190065162

Abstract: A programming model generates a graph for a program, the graph including a plurality of nodes and edges, wherein each node of the graph represents an operation and edges between the nodes represent streams of data input to and output from the operations represented by the nodes. The model determines where in a distributed architecture to execute the operations represented by the nodes. Such determining may include determining which nodes have location restrictions, assigning locations to each node having a location restriction based on the restriction, and partitioning the graph into a plurality of subgraphs, the partitioning including assigning locations to nodes without location restrictions in accordance with a first set of constraints, wherein each node within a particular subgraph is assigned to the same location. Each of the subgraphs is executed at its assigned location in a respective single thread.

Type: Application

Filed: August 24, 2017

Publication date: February 28, 2019

Inventors: Gautham Thambidorai, Matthew Rosencrantz, Sanjay Ghemawat, Srdjan Petrovic, Ivan Posva
METHOD OF EXECUTING A TUPLE GRAPH PROGRAM ACROSS A NETWORK

Publication number: 20190068504

Abstract: A programming model provides a method for executing a program in a distributed architecture. One or more first shards of the distributed architecture execute one or more operations, and sending tuples to at least one second shard, the tuples being part of a stream and being based on the one or more operations. The one or more first shards send a token value to the at least one second shard when the sending of the tuples in the stream is complete. The at least one second shard determines whether a total of the token values matches a number of the one or more first shards, and takes a first action in response to determining that the total of the token values matches the number of the one or more first shards. The first action may include marking the stream as being complete and/or generating a message indicating that the stream is complete.

Type: Application

Filed: August 24, 2017

Publication date: February 28, 2019

Inventors: Gautham Thambidorai, Matthew Rosencrantz, Sanjay Ghemawat, Srdjan Petrovic, Ivan Posva
SYSTEM OF TYPE INFERENCE FOR TUPLE GRAPH PROGRAMS METHOD OF EXECUTING A TUPLE GRAPH PROGRAM ACROSS A NETWORK

Publication number: 20190065154

Abstract: A programming model provides a method for type inference in programming operations. Information defining one or more attributes of an operation is received, the information specifying a field including a field name and a field type identifier for each of the attributes. Constraints for the operation are determined at least based on the attributes, wherein the constraints restrict at least one of a type of input for the operation or a type of output for the operation. Information defining an input for the operation is received, and it is determined, based on the constraints and the received information defining the input, the type of output for the operation. The type of output is associated with an output for the operation.

Type: Application

Filed: August 24, 2017

Publication date: February 28, 2019

Inventors: Gautham Thambidorai, Matthew Rosencrantz, Sanjay Ghemawat, Srdjan Petrovic, Ivan Posva
Anchor tag indexing in a web crawler system

Patent number: 10210256

Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.

Type: Grant

Filed: April 1, 2016

Date of Patent: February 19, 2019

Assignee: GOOGLE LLC

Inventors: Huican Zhu, Jeffrey Dean, Sanjay Ghemawat, Bwolen Po-Jen Yang, Anurag Acharya
Method and system for deleting obsolete files from a file system

Patent number: 10204110

Abstract: A method for deleting obsolete files from a file system is provided. The method includes receiving a request to delete a reference to a first target file of a plurality of target files stored in a file system, the first target file having a first target file name. A first reference file whose file name includes the first target file name is identified. The first reference file is deleted from the file system. The method further includes determining whether the file system includes at least one reference file, distinct from the first reference file, whose file name includes the first target file name. In accordance with a determination that the file system does not include the at least one reference file, the first target file is deleted from the file system.

Type: Grant

Filed: September 19, 2016

Date of Patent: February 12, 2019

Assignee: GOOGLE LLC

Inventors: Yasushi Saito, Sanjay Ghemawat, Jeffrey Adgate Dean
PROCESSING COMPUTATIONAL GRAPHS

Publication number: 20180247197

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for receiving a request from a client to process a computational graph; obtaining data representing the computational graph, the computational graph comprising a plurality of nodes and directed edges, wherein each node represents a respective operation, wherein each directed edge connects a respective first node to a respective second node that represents an operation that receives, as input, an output of an operation represented by the respective first node; identifying a plurality of available devices for performing the requested operation; partitioning the computational graph into a plurality of subgraphs, each subgraph comprising one or more nodes in the computational graph; and assigning, for each subgraph, the operations represented by the one or more nodes in the subgraph to a respective available device in the plurality of available devices for operation.

Type: Application

Filed: April 27, 2018

Publication date: August 30, 2018

Inventors: Paul A. Tucker, Jeffrey Adgate Dean, Sanjay Ghemawat, Yuan Yu
MODIFYING COMPUTATIONAL GRAPHS

Publication number: 20180247198

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for modifying a computational graph to include send and receive nodes. Communication between unique devices performing operations of different subgraphs of the computational graph can be handled efficiently by inserting send and receive nodes into each subgraph. When executed, the operations that these send and receive nodes represent may enable pairs of unique devices to conduct communication with each other in a self-sufficient manner. This shifts the burden of coordinating communication away from the backend, which affords the system that processes this computational graph representation the opportunity to perform one or more other processes while devices are executing subgraphs.

Type: Application

Filed: April 27, 2018

Publication date: August 30, 2018

Inventors: Vijay Vasudevan, Jeffrey Adgate Dean, Sanjay Ghemawat
Associating Application-Specific Methods with Tables Used for Data Storage

Publication number: 20180173722

Abstract: A method of accessing data includes storing a table that includes a plurality of tablets corresponding to distinct non-overlapping table portions. Respective pluralities of tablet access objects and application objects are stored in a plurality of servers. A distinct application object and distinct tablet are associated with each tablet access object. Each application object corresponds to a distinct instantiation of an application associated with the table. The tablet access objects and associated application objects are redistributed among the servers in accordance with a first load-balancing criterion. A first request directed to a respective tablet is received from a client. In response, the tablet access object associated with the respective tablet is used to perform a data access operation on the respective tablet, and the application object associated with the respective tablet is used to perform an additional computational operation to produce a result to be returned to the client.

Type: Application

Filed: January 11, 2018

Publication date: June 21, 2018

Inventors: Jeffrey A. Dean, Sanjay Ghemawat, Andrew B. Fikes, Yasushi Saito
System and Method For Analyzing Data Records

Publication number: 20180052890

Abstract: Systems and methods for analyzing input data records are provided in which a master process initiates a plurality of concurrent first processes each of which comprises, for each data record in at least a subset of a plurality of input data records, creating a parsed representation of the data record and independently applying a procedural language query to the parsed representation to extract one or more values. A respective emit operator is applied to at least one of the extracted one or more values thereby adding corresponding information to a respective intermediate data structure. The respective emit operator implements one of a predefined set of statistical information processing functions. The master process also initiates a plurality of second processes each of which aggregates information from a corresponding subset of intermediate data structures to produce aggregated data that is, in turn, combined to produce output data.

Type: Application

Filed: October 31, 2017

Publication date: February 22, 2018

Inventors: Robert C. Pike, Sean Quinlan, Sean M. Dorward, Jeffrey Dean, Sanjay Ghemawat
Associating application-specific methods with tables used for data storage

Patent number: 9870371

Abstract: A method of accessing data includes storing a table that includes a plurality of tablets corresponding to distinct non-overlapping table portions. Respective pluralities of tablet access objects and application objects are stored in a plurality of servers. A distinct application object and distinct tablet are associated with each tablet access object. Each application object corresponds to a distinct instantiation of an application associated with the table. The tablet access objects and associated application objects are redistributed among the servers in accordance with a first load-balancing criterion. A first request directed to a respective tablet is received from a client. In response, the tablet access object associated with the respective tablet is used to perform a data access operation on the respective tablet, and the application object associated with the respective tablet is used to perform an additional computational operation to produce a result to be returned to the client.

Type: Grant

Filed: July 9, 2013

Date of Patent: January 16, 2018

Assignee: GOOGLE LLC

Inventors: Jeffrey A. Dean, Sanjay Ghemawat, Andrew B. Fikes, Yasushi Saito
DETERMINING CORRESPONDING TERMS WRITTEN IN DIFFERENT FORMATS

Publication number: 20170351673

Abstract: Methods and apparatus consistent with the invention allow a user to submit an ambiguous search query and to receive relevant search results. Queries can be expressed using character sets and/or languages that are different from the character set and/or language of at least some of the data that is to be searched. A translation between these character sets and/or languages can be performed by examining the use of terms in aligned text. Probabilities can be associated with each possible translation. Refinements can be made to these probabilities by examining user interactions with the search results.

Type: Application

Filed: August 8, 2017

Publication date: December 7, 2017

Inventors: Vibhu Mittal, Jay M. Ponte, Mehran Sahami, Sanjay Ghemawat, John A. Bauer
System and method for analyzing data records

Patent number: 9830357

Abstract: A method processes data records. The method partitions the data records into groups and assigns each group to a respective process of a first plurality of processes, which execute in parallel. For each group, the assigned process extracts information from the data records, applies a script with information processing commands applied sequentially to produce intermediate values, stores the intermediate values in a respective intermediate data structure, and updates the status of the group to indicate completion. When the predefined threshold percentage of the data records are completed, the process assigns each group to a respective second process as a backup. When each of the groups has been completed by at least one process (either the original or the backup), the method executes a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data. The aggregation includes intermediate values only once for each group.

Type: Grant

Filed: August 2, 2016

Date of Patent: November 28, 2017

Assignee: GOOGLE INC.

Inventors: Robert C. Pike, Sean Quinlan, Sean M. Dorward, Jeffrey Dean, Sanjay Ghemawat
Storing and moving data in a distributed storage system

Patent number: 9774676

Abstract: A system, computer-readable storage medium storing at least one program, and a computer-implemented method for identifying a storage group in a distributed storage system into which data is to be stored is presented. A data structure including information relating to storage groups in a distributed storage system is maintained, where a respective entry in the data structure for a respective storage group includes placement metrics for the respective storage group. A request to identify a storage group into which data is to be stored is received from a computer system. The data structure is used to determine an identifier for a storage group whose placement metrics satisfy a selection criterion. The identifier for the storage group whose placement metrics satisfy the selection criterion is returned to the computer system.

Type: Grant

Filed: May 21, 2013

Date of Patent: September 26, 2017

Assignee: GOOGLE INC.

Inventors: Jeffrey Adgate Dean, Sanjay Ghemawat, Yasushi Saito, Andrew Fikes, Christopher Jorgen Taylor, Sean Quinlan, Michal Piotr Szymaniak, Sebastian Kanthak, Wilson Cheng-Yi Hsieh, Alexander Lloyd, Michael James Boyer Epstein
Providing posts from an extended network

Patent number: 9747347

Abstract: A system includes: an engaging post identifier for identifying and retrieving engaging posts; an extended network post identifier for identifying extended posts from an extended network; a combining module for creating a combined list of added posts from the engaging post and the extended posts, the combining module generating one or more ranked posts by ranking the list of added posts by relevance to a user; and a user interface module for providing the one or more ranked posts. The disclosure also includes a method for finding and providing engaging posts that includes determining engaging posts; determining extended posts from an extended social network using a social graph of the user; adding the engaging posts and the extended posts to create a combined list of added posts; ranking the added posts by relevance to a user; and providing one or more of the ranked posts.

Type: Grant

Filed: September 3, 2014

Date of Patent: August 29, 2017

Assignee: Google Inc.

Inventors: Jeffrey Adgate Dean, Sanjay Ghemawat, Sachin Jain, Boris Mazniker
Determining corresponding terms written in different formats

Patent number: 9734197

Abstract: Methods and apparatus consistent with the invention allow a user to submit an ambiguous search query and to receive relevant search results. Queries can be expressed using character sets and/or languages that are different from the character set and/or language of at least some of the data that is to be searched. A translation between these character sets and/or languages can be performed by examining the use of terms in aligned text. Probabilities can be associated with each possible translation. Refinements can be made to these probabilities by examining user interactions with the search results.

Type: Grant

Filed: March 6, 2014

Date of Patent: August 15, 2017

Assignee: Google Inc.

Inventors: Vibhu Mittal, Jay M. Ponte, Mehran Sahami, Sanjay Ghemawat, John A. Bauer
System and Method for Large-Scale Data Processing Using an Application-Independent Framework

Publication number: 20170206232

Abstract: A method performs large-scale data processing in a distributed and parallel processing environment. The method defines application-independent map and reduce operations, each invoking one or more library functions that automatically handle data partitioning, parallelization of computations, and fault tolerance. A user specifies a map operation, which calls one or more of the application-independent map operators to perform data read and write operations. A user also specifies a reduce operation, which calls one or more of the application-independent reduce operators to perform data read and write operations. The method executes application-independent map worker processes. Each map worker process executes the user-specified map operation to read designated portions of input files and store intermediate data values in intermediate data structures. The method also executes application-independent reduce worker processes.

Type: Application

Filed: April 4, 2017

Publication date: July 20, 2017

Inventors: Jeffrey Dean, Sanjay Ghemawat
Efficient snapshot read of a database in a distributed storage system

Patent number: 9659038

Abstract: A computer system issues a batch read operation to a tablet in a first replication group in a distributed database and obtains a most recent version of data items in the tablet that have a timestamp no great than a snapshot timestamp T. For each data item in the one tablet, the computer system determines whether the data item has a move-in timestamp less than or equal to the snapshot timestamp T, which is less than a move-out timestamp, and whether the data item has a creation timestamp less than the snapshot timestamp T, which is less than or equal to a deletion timestamp.

Type: Grant

Filed: June 3, 2013

Date of Patent: May 23, 2017

Assignee: GOOGLE INC.

Inventors: Yasushi Saito, Sanjay Ghemawat, Sebastian Kanthak, Christopher Cunningham Frost

prev 1 2 3 4 5 6 7 … next