Patents by Inventor Scott Michael Le Grand

Scott Michael Le Grand has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Fault-tolerant parallel computation

Patent number: 10936432

Abstract: Methods, systems, and computer-readable media for implementing a fault-tolerant parallel computation framework are disclosed. Execution of an application comprises execution of a plurality of processes in parallel. Process states for the processes are stored during the execution of the application. The processes use a message passing interface for exchanging messages with one other. The messages are exchanged and the process states are stored at a plurality of checkpoints during execution of the application. A final successful checkpoint is determined after the execution of the application is terminated. The final successful checkpoint represents the most recent checkpoint at which the processes exchanged messages successfully. Execution of the application is resumed from the final successful checkpoint using the process states stored at the final successful checkpoint.

Type: Grant

Filed: September 24, 2014

Date of Patent: March 2, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Tin-Yu Lee, Rejith George Joseph, Scott Michael Le Grand, Saurabh Dileep Baji
Batch processing of requests for trained machine learning model

Patent number: 10846096

Abstract: Memory management is provided for processors, such as GPUs used to process data using a trained machine learning model. Requests received to a CPU can be stored to a request queue until the queue is full, or until a timeout value has been reached for periods of lower activity. The requests can then be batched and sent to a GPU as a single message on a single thread. Memory can be pre-allocated, and the trained model loaded into GPU memory once for processing of the relevant batches. The individual requests can be processed by the GPU and the results analyzed to determine at least a subset of results to return to the CPU, which can be provided back as results of the processing.

Type: Grant

Filed: September 17, 2018

Date of Patent: November 24, 2020

Assignee: A9.com, Inc.

Inventors: Kiuk Chung, Edward Kandrot, Scott Michael Le Grand
Isolating compute clusters created for a customer

Patent number: 10659523

Abstract: At the request of a customer, a distributed computing service provider may create multiple clusters under a single customer account, and may isolate them from each other. For example, various isolation mechanisms (or combinations of isolation mechanisms) may be applied when creating the clusters to isolate a given cluster of compute nodes from network traffic from compute nodes of other clusters (e.g., by creating the clusters in different VPCs); to restrict access to data, metadata, or resources that are within the given cluster of compute nodes or that are associated with the given cluster of compute nodes by compute nodes of other clusters in the distributed computing system (e.g., using an instance metadata tag and/or a storage system prefix); and/or restricting access to application programming interfaces of the distributed computing service by the given cluster of compute nodes (e.g., using an identity and access manager).

Type: Grant

Filed: May 23, 2014

Date of Patent: May 19, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Rejith George Joseph, Tin-Yu Lee, Scott Michael Le Grand, Saurabh Dileep Baji
Conditional parallel processing in fully-connected neural networks

Patent number: 10482380

Abstract: The present disclosure is directed to parallelization of artificial neural network processing by conditionally synchronizing, among multiple computer processors, either the input or output of individual operations, and by conditionally using either rows or columns of certain matrices used in the operations. The conditional processing may depend upon the relative sizes of the input and output of the specific operations to be performed. For example, if a current layer matrix of values is larger than a next layer matrix of values to be computed, then rows of a weight matrix may be used by the computer processors to compute the next layer matrix. If the current layer matrix is smaller than the next layer matrix, then columns of the weight matrix may be used by the computer processors to compute the next layer matrix.

Type: Grant

Filed: December 30, 2015

Date of Patent: November 19, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Scott Michael Le Grand, Rejith George Joseph
Executing parallel jobs with message passing on compute clusters

Patent number: 10148736

Abstract: A client may submit a job to a service provider that processes a large data set and that employs a message passing interface (MPI) to coordinate the collective execution of the job on multiple compute nodes. The framework may create a MapReduce cluster (e.g., within a VPC) and may generate a single key pair for the cluster, which may be downloaded by nodes in the cluster and used to establish secure node-to-node communication channels for MPI messaging. A single node may be assigned as a mapper process and may launch the MPI job, which may fork its commands to other nodes in the cluster (e.g., nodes identified in a hostfile associated with the MPI job), according to the MPI interface. A rankfile may be used to synchronize the MPI job and another MPI process used to download portions of the data set to respective nodes in the cluster.

Type: Grant

Filed: May 19, 2014

Date of Patent: December 4, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Tin-Yu Lee, Rejith George Joseph, Scott Michael Le Grand, Saurabh Dileep Baji, Peter Sirota
Fault tolerance in a distributed file system

Patent number: 10133646

Abstract: A method for providing fault tolerance in a distributed file system of a service provider may include launching at least one data storage node on at least a first virtual machine instance (VMI) running on one or more servers of the service provider and storing file data. At least one data management node may be launched on at least a second VMI running on the one or more servers of the service provider. The at least second VMI may be associated with a dedicated IP address and the at least one data management node may store metadata information associated with the file data in a network storage attached to the at least second VMI. Upon detecting a failure of the at least second VMI, the at least one data management node may be re-launched on at least a third VMI running on the one or more servers.

Type: Grant

Filed: March 24, 2017

Date of Patent: November 20, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Rejith George Joseph, Tin-Yu Lee, Bandish N. Chheda, Scott Michael Le Grand, Saurabh Dileep Baji
Automated error detection and recovery for GPU computations in a service environment

Patent number: 9836354

Abstract: A service provider system may implement ECC-like features when executing computations on GPUs that do not include sufficient error detection and recovery for computations that are sensitive to bit errors. During execution of critical computations on behalf of customers, the system may automatically instrument program instructions received from the customers to cause each computation to be executed using multiple sets of hardware resources (e.g., different host machines, processor cores, or internal hardware resources). The service may provide APIs with which customers may instrument their code for execution using redundant resource instances, or specify parameters for applying the ECC-like features. The service or customer may instrument code to perform (or cause the system to perform) checkpointing operations at particular points in the code, and to compare intermediate results produced by different hardware resources.

Type: Grant

Filed: April 28, 2014

Date of Patent: December 5, 2017

Assignee: Amazon Technologies, Inc.

Inventors: Nachiketh Rao Potlapally, John Merrill Phillips, Nicholas Patrick Wilt, Deepak Singh, Scott Michael Le Grand
CONDITIONAL PARALLEL PROCESSING IN FULLY-CONNECTED NEURAL NETWORKS

Publication number: 20170193368

Abstract: The present disclosure is directed to parallelization of artificial neural network processing by conditionally synchronizing, among multiple computer processors, either the input or output of individual operations, and by conditionally using either rows or columns of certain matrices used in the operations. The conditional processing may depend upon the relative sizes of the input and output of the specific operations to be performed. For example, if a current layer matrix of values is larger than a next layer matrix of values to be computed, then rows of a weight matrix may be used by the computer processors to compute the next layer matrix. If the current layer matrix is smaller than the next layer matrix, then columns of the weight matrix may be used by the computer processors to compute the next layer matrix.

Type: Application

Filed: December 30, 2015

Publication date: July 6, 2017

Inventors: Scott Michael Le Grand, Rejith George Joseph
Fault tolerance in a distributed file system

Patent number: 9612924

Abstract: A method for providing fault tolerance in a distributed file system of a service provider may include launching at least one data storage node on at least a first virtual machine instance (VMI) running on one or more servers of the service provider and storing file data. At least one data management node may be launched on at least a second VMI running on the one or more servers of the service provider. The at least second VMI may be associated with a dedicated IP address and the at least one data management node may store metadata information associated with the file data in a network storage attached to the at least second VMI. Upon detecting a failure of the at least second VMI, the at least one data management node may be re-launched on at least a third VMI running on the one or more servers.

Type: Grant

Filed: June 25, 2014

Date of Patent: April 4, 2017

Assignee: Amazon Technologies, Inc.

Inventors: Rejith George Joseph, Tin-Yu Lee, Bandish N. Chheda, Scott Michael Le Grand, Saurabh Dileep Baji