Patents by Inventor John Heaton

John Heaton has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Accelerated convolution of neural networks

Patent number: 12205013

Abstract: Accelerated convolution of neural networks can be performed by executing N computing engines (CEs) of a neural network processor in parallel. An input dataset can be divided spatially into N chunks such that a respective last portion of each chunk overlaps with a respective first portion of a subsequent chunk. Portions of each chunk can be processed by a respective CE to generate a respective portion of an output dataset. The overlapping intermediate states computed by each CE from processing the overlapping portion can be stored locally for sharing with a subsequent CE using an on-chip bus.

Type: Grant

Filed: September 1, 2020

Date of Patent: January 21, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Thiam Khean Hah, Randy Renfu Huang, Richard John Heaton, Ron Diamant, Vignesh Vivekraja
Efficient utilization of processing element array

Patent number: 12198041

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

Type: Grant

Filed: July 14, 2023

Date of Patent: January 14, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
Hierarchical partitioning of operators

Patent number: 12182688

Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.

Type: Grant

Filed: November 27, 2019

Date of Patent: December 31, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Animesh Jain, Yizhi Liu, Hongbin Zheng, Jeffrey T. Huynh, Haichen Li, Drazen Borkovic, Jindrich Zejda, Richard John Heaton, Randy Renfu Huang, Zhi Chen, Yida Wang
Static memory allocation for neural network inference

Patent number: 12093806

Abstract: Static memory allocation may be performed for weight values across multiple processing units executing a neural network. A neural network may be received for execution across multiple processing units. A partitioning scheme may be applied to divide the neural network into subgraphs. The subgraphs may be assigned to different processing units. The weights for the operations of the subgraph may be statically allocated in dedicated caches for the processing units as part of the instructions to execute the neural network across the processing units.

Type: Grant

Filed: July 1, 2019

Date of Patent: September 17, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Jindrich Zejda, Ron Diamant, Jeffrey T. Huynh, Drazen Borkovic, Randy Renfu Huang, Richard John Heaton
Neural network processing based on subgraph recognition

Patent number: 12093801

Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions.

Type: Grant

Filed: May 3, 2023

Date of Patent: September 17, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Richard John Heaton, Randy Renfu Huang, Ron Diamant
Compilation time reduction for memory and compute bound neural networks

Patent number: 12079734

Abstract: Techniques for reducing a compilation time for compiling a neural network are disclosed. A description of a neural network is received by a compiler. A plurality of operators are identified based on the description of the neural network. A plurality of subgraphs are formed, each including one or more operators. For each subgraph, a performance factor is calculated based on a compute usage and a memory usage associated with the operators included in the subgraph. The performance factor is compared to a threshold. Based on the comparison, either the subgraph is classified as a compute bound subgraph and a set of memory optimizations are suppressed or the subgraph is classified as a memory bound subgraph and a set of compute optimizations are suppressed.

Type: Grant

Filed: August 1, 2022

Date of Patent: September 3, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Hongbin Zheng, Randy Renfu Huang, Richard John Heaton
Reducing computation in neural networks using self-modifying code

Patent number: 12073199

Abstract: In various implementations, provided are systems and methods for reducing neural network processing. A compiler may generate instructions from source code for a neural network having a repeatable set of operations. The instructions may include a plurality of blocks. The compiler may add an overwrite instruction to the plurality of blocks that, when executed by one or more execution engines, triggers an overwrite action. The overwrite action causes the instructions of subsequent blocks to be overwritten with NOP instructions. The overwrite action is triggered only when a condition is satisfied.

Type: Grant

Filed: June 6, 2019

Date of Patent: August 27, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Vignesh Vivekraja, Randy Renfu Huang, Yu Zhou, Ron Diamant, Richard John Heaton
NEURAL NETWORK TRAINING IN A DISTRIBUTED SYSTEM

Publication number: 20240232630

Abstract: Methods and systems for performing a training operation of a neural network are provided. In one example, a method comprises: performing backward propagation computations for a second layer of a neural network to generate second weight gradients; splitting the second weight gradients into portions; causing a hardware interface to exchange a first portion of the second weight gradients with the second computer system; performing backward propagation computations for a first layer of the neural network to generate first weight gradients when the exchange of the first portion of the second weight gradients is underway, the first layer being a lower layer than the second layer in the neural network; causing the hardware interface to transmit the first weight gradients to the second computer system; and causing the hardware interface to transmit the remaining portions of the second weight gradients to the second computer system.

Type: Application

Filed: July 13, 2023

Publication date: July 11, 2024

Inventors: Vignesh Vivekraja, Thiam Khean Hah, Randy Renfu Huang, Ron Diamant, Richard John Heaton
Acceleration of neural networks with stacks of convolutional layers

Patent number: 12008469

Abstract: A single neural network model can be used by each computing engine (CE) in a neural network processor to perform convolution operations in parallel for one or more stacks of convolutional layers. An input feature map can be divided into N chunks to be processed by N CEs, respectively. Each CE can process a last portion of a respective chunk to generate respective shared states to be used by a subsequent CE. A first CE uses pre-computed states to generate a first portion of an output feature map, while other CEs use shared states computed by a preceding CE to generate respective portions of the output feature map.

Type: Grant

Filed: September 1, 2020

Date of Patent: June 11, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Thiam Khean Hah, Randy Renfu Huang, Richard John Heaton, Ron Diamant, Vignesh Vivekraja
Neural network training in a distributed system

Patent number: 11941528

Abstract: Methods and systems for performing a training operation of a neural network are provided. In one example, a method comprises: performing backward propagation computations for a second layer of a neural network to generate second weight gradients; splitting the second weight gradients into portions; causing a hardware interface to exchange a first portion of the second weight gradients with the second computer system; performing backward propagation computations for a first layer of the neural network to generate first weight gradients when the exchange of the first portion of the second weight gradients is underway, the first layer being a lower layer than the second layer in the neural network; causing the hardware interface to transmit the first weight gradients to the second computer system; and causing the hardware interface to transmit the remaining portions of the second weight gradients to the second computer system.

Type: Grant

Filed: September 30, 2019

Date of Patent: March 26, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Vignesh Vivekraja, Thiam Khean Hah, Randy Renfu Huang, Ron Diamant, Richard John Heaton
IMPROPER NEURAL NETWORK INPUT DETECTION AND HANDLING

Publication number: 20240020514

Abstract: Systems and methods for performing improper input data detection are described. In one example, a system comprises: hardware circuits configured to receive input data and to perform computations of a neural network based on the input data to generate computation outputs; and an improper input detection circuit configured to: determine a relationship between the computation outputs of the hardware circuits and reference outputs; determine that the input data are improper based on the relationship; and perform an action based on determining that the input data are improper.

Type: Application

Filed: May 5, 2023

Publication date: January 18, 2024

Inventors: Randy Renfu Huang, Richard John Heaton, Andrea Olgiati, Ron Diamant
Input batching with serial dynamic memory access

Patent number: 11875247

Abstract: An acceleration engine with multiple accelerators may share a common set of data that is used by each accelerator to perform computations on input data. The set of shared data can be loaded into the acceleration engine from an external memory. Instead of accessing the external memory multiple times to load the set of shared data into each accelerator, the external memory can be accessed once using direct memory access to load the set of shared data into the first accelerator. The set of shared data can then be serially loaded from one accelerator to the next accelerator in the acceleration engine using direct memory access. To achieve data parallelism and reduce computation time, a runtime driver may split the input data into data batches, and each accelerator can perform computations on a different batch of input data with the common set of shared data.

Type: Grant

Filed: June 18, 2020

Date of Patent: January 16, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Richard John Heaton, Ron Diamant
Dynamic processing element array expansion

Patent number: 11868895

Abstract: A computer-implemented method includes receiving a neural network model that includes a tensor operation, dividing the tensor operation into a set of sub-operations, and generating instructions for performing a plurality of sub-operations of the set of sub-operations on respective computing engines of a plurality of computing engines on a same integrated circuit device or on different integrated circuit devices. Each sub-operation of the set of sub-operations generates a portion of a final output of the tensor operation. An inference is made based on a result of a sub-operation of the plurality of sub-operations, or based on results of the plurality of sub-operations.

Type: Grant

Filed: January 13, 2023

Date of Patent: January 9, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Randy Renfu Huang, Ron Diamant, Richard John Heaton
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY

Publication number: 20230359876

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

Type: Application

Filed: July 14, 2023

Publication date: November 9, 2023

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
Performing hardware operator fusion

Patent number: 11809981

Abstract: A method of generating executable instructions for a computing system is provided. The method comprises: receiving a first set of instructions including a kernel of a first operator and a kernel of a second operator, the kernel of the first operator including instructions of the first operator and write instructions to a virtual data node, the kernel of the second operator including instructions of the second operator and read instructions to the virtual data node; determining, based on a mapping between the write instructions and read instructions, instructions of data transfer operations between the first operator and the second operator; and generating a second set of instructions representing a fused operator of the first operator and the second operator, the second set of instructions including the instructions of the first operator, the instructions of the second operator, and the instructions of the data transfer operations.

Type: Grant

Filed: November 27, 2019

Date of Patent: November 7, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Animesh Jain, Tobias Joseph Kastulus Edler von Koch, Yizhi Liu, Taemin Kim, Jindrich Zejda, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang
Efficient utilization of processing element array

Patent number: 11741350

Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

Type: Grant

Filed: November 27, 2019

Date of Patent: August 29, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
Neural network processing based on subgraph recognition

Patent number: 11714992

Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions.

Type: Grant

Filed: December 13, 2018

Date of Patent: August 1, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Richard John Heaton, Randy Renfu Huang, Ron Diamant
Improper neural network input detection and handling

Patent number: 11687761

Abstract: Systems and methods for performing improper input data detection are described. In one example, a system comprises: hardware circuits configured to receive input data and to perform computations of a neural network based on the input data to generate computation outputs; and an improper input detection circuit configured to: determine a relationship between the computation outputs of the hardware circuits and reference outputs; determine that the input data are improper based on the relationship; and perform an action based on determining that the input data are improper.

Type: Grant

Filed: December 11, 2018

Date of Patent: June 27, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Randy Renfu Huang, Richard John Heaton, Andrea Olgiati, Ron Diamant
DYNAMIC PROCESSING ELEMENT ARRAY EXPANSION

Publication number: 20230153620

Abstract: A computer-implemented method includes receiving a neural network model that includes a tensor operation, dividing the tensor operation into a set of sub-operations, and generating instructions for performing a plurality of sub-operations of the set of sub-operations on respective computing engines of a plurality of computing engines on a same integrated circuit device or on different integrated circuit devices. Each sub-operation of the set of sub-operations generates a portion of a final output of the tensor operation. An inference is made based on a result of a sub-operation of the plurality of sub-operations, or based on results of the plurality of sub-operations.

Type: Application

Filed: January 13, 2023

Publication date: May 18, 2023

Inventors: Randy Renfu Huang, Ron Diamant, Richard John Heaton
Dynamic processing element array expansion

Patent number: 11568238

Abstract: A computer-implemented method includes receiving a neural network model that includes a tensor operation, and dividing the tensor operation into sub-operations. The sub-operations includes at least two sub-operations that have no data dependency between the two sub-operations. The computer-implemented method further includes assigning a first sub-operation in the two sub-operations to a first computing engine, assigning a second sub-operation in the two sub-operations to a second computing engine, and generating instructions for performing, in parallel, the first sub-operation by the first computing engine and the second sub-operation by the second computing engine. An inference is then made based on a result of the first sub-operation, a result of the second sub-operation, or both. The first computing engine and the second computing engine are in a same integrated circuit device or in two different integrated circuit devices.

Type: Grant

Filed: June 28, 2019

Date of Patent: January 31, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Randy Renfu Huang, Ron Diamant, Richard John Heaton

1 2 3 4 next