Patents by Inventor Jeffrey T. Huynh

Jeffrey T. Huynh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Efficient utilization of processing element array

Patent number: 12198041

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

Type: Grant

Filed: July 14, 2023

Date of Patent: January 14, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
Hierarchical partitioning of operators

Patent number: 12182688

Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.

Type: Grant

Filed: November 27, 2019

Date of Patent: December 31, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Animesh Jain, Yizhi Liu, Hongbin Zheng, Jeffrey T. Huynh, Haichen Li, Drazen Borkovic, Jindrich Zejda, Richard John Heaton, Randy Renfu Huang, Zhi Chen, Yida Wang
Static memory allocation for neural network inference

Patent number: 12093806

Abstract: Static memory allocation may be performed for weight values across multiple processing units executing a neural network. A neural network may be received for execution across multiple processing units. A partitioning scheme may be applied to divide the neural network into subgraphs. The subgraphs may be assigned to different processing units. The weights for the operations of the subgraph may be statically allocated in dedicated caches for the processing units as part of the instructions to execute the neural network across the processing units.

Type: Grant

Filed: July 1, 2019

Date of Patent: September 17, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Jindrich Zejda, Ron Diamant, Jeffrey T. Huynh, Drazen Borkovic, Randy Renfu Huang, Richard John Heaton
Memory operation for systolic array

Patent number: 12026607

Abstract: A neural network accelerator executes instructions to: load a first weight data element of an array of weight data elements from a memory into a systolic array; extract, from the instructions, information indicating a first number of input data elements to be obtained from a first address of the memory and a second number of input data elements to be skipped between adjacent input data elements to be obtained, the first address being based on first coordinates of the first weight data element, and the first and second numbers being based on a stride of a convolution operation; based on the information, obtain first input data elements from the first address of the memory; and control the systolic array to perform first computations based on the first weight data element and the first input data elements to generate first output data elements of an output data array.

Type: Grant

Filed: October 12, 2022

Date of Patent: July 2, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant
Transposed convolution using systolic array

Patent number: 11954583

Abstract: In one example, a neural network accelerator can execute a set of instructions to: load a first weight data element from a memory into a systolic array, the first weight data element having first coordinates; extract, from the instructions, information indicating a first subset of input data elements to be obtained from the memory, the first subset being based on a stride of a transposed convolution operation and second coordinates of first weight data element in a rotated array of weight data elements; based on the information, obtain the first subset of input data elements from the memory; load the first subset of input data elements into the systolic array; and control the systolic array to perform first computations based on the first weight data element and the first subset of input data elements to generate output data elements of an array of output data elements.

Type: Grant

Filed: April 14, 2023

Date of Patent: April 9, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T Huynh, Vignesh Vivekraja
Hybrid wildcard match table

Patent number: 11943142

Abstract: Embodiments of the present invention are directed to a wildcard matching solution that uses a combination of static random access memories (SRAMs) and ternary content addressable memories (TCAMs) in a hybrid solution. In particular, the wildcard matching solution uses a plurality of SRAM pools for lookup and a spillover TCAM pool for unresolved hash conflicts.

Type: Grant

Filed: November 23, 2021

Date of Patent: March 26, 2024

Assignee: MARVELL ASIA PTE, LTD

Inventors: Jeffrey T. Huynh, Weihuang Wang, Tsahi Daniel, Srinath Atluri, Mohan Balan
Data selection circuit

Patent number: 11868875

Abstract: Provided are systems and methods for operating a neural network processor, wherein the processor includes an input selector circuit that can be configured to select the data that will be input into the processor's computational array. In various implementations, the selector circuit can determine, for a row of the array, whether the row input will be the output from a buffer memory or data that the input selector circuit has selected for a different row. The row can receive an input feature map from a set of input data or an input feature map that was selected for inputting into a different row, such that the input feature map is input into more than one row at a time. The selector circuit can also include a delay circuit, so that the duplicated input feature map can be input into the computational array later than the original input feature map.

Type: Grant

Filed: September 10, 2018

Date of Patent: January 9, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Ron Diamant, Randy Renfu Huang, Jeffrey T. Huynh, Sundeep Amirineni
Dilated convolution using systolic array

Patent number: 11816559

Abstract: In one example, a non-transitory computer readable medium stores instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array; select a subset of input data elements from the memory into the systolic array to perform first computations of a dilated convolution operation, the subset being selected based on a rate of the dilated convolution operation and coordinates of the weight data element within the array of weight data elements; and control the systolic array to perform the first computations based on the first weight data element and the subset to generate first output data elements of an output data array. An example of a compiler that generates the instructions is also provided.

Type: Grant

Filed: June 3, 2022

Date of Patent: November 14, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY

Publication number: 20230359876

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

Type: Application

Filed: July 14, 2023

Publication date: November 9, 2023

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
TRANSPOSED CONVOLUTION USING SYSTOLIC ARRAY

Publication number: 20230306249

Abstract: In one example, a neural network accelerator can execute a set of instructions to: load a first weight data element from a memory into a systolic array, the first weight data element having first coordinates; extract, from the instructions, information indicating a first subset of input data elements to be obtained from the memory, the first subset being based on a stride of a transposed convolution operation and second coordinates of first weight data element in a rotated array of weight data elements; based on the information, obtain the first subset of input data elements from the memory; load the first subset of input data elements into the systolic array; and control the systolic array to perform first computations based on the first weight data element and the first subset of input data elements to generate output data elements of an array of output data elements.

Type: Application

Filed: April 14, 2023

Publication date: September 28, 2023

Inventors: Jeffrey T Huynh, Vignesh Vivekraja
Efficient utilization of processing element array

Patent number: 11741350

Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

Type: Grant

Filed: November 27, 2019

Date of Patent: August 29, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
Transposed convolution using systolic array

Patent number: 11681902

Abstract: In one example, a neural network accelerator can execute a set of instructions to: load a first weight data element from a memory into a systolic array, the first weight data element having first coordinates; extract, from the instructions, information indicating a first subset of input data elements to be obtained from the memory, the first subset being based on a stride of a transposed convolution operation and second coordinates of first weight data element in a rotated array of weight data elements; based on the information, obtain the first subset of input data elements from the memory; load the first subset of input data elements into the systolic array; and control the systolic array to perform first computations based on the first weight data element and the first subset of input data elements to generate output data elements of an array of output data elements.

Type: Grant

Filed: September 27, 2019

Date of Patent: June 20, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T Huynh, Vignesh Vivekraja
Neural network operation reordering for parallel execution

Patent number: 11567778

Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

Type: Grant

Filed: April 28, 2021

Date of Patent: January 31, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Drazen Borkovic, Jindrich Zejda, Randy Renfu Huang, Ron Diamant
Memory operation for systolic array

Patent number: 11501145

Abstract: In one example, a neural network accelerator executes instructions to: load a first weight data element of an array of weight data elements from a memory into a systolic array; extract, from the instructions, information indicating a first number of input data elements to be obtained from a first address of the memory and a second number of input data elements to be skipped between adjacent input data elements to be obtained, the first address being based on first coordinates of the first weight data element, and the first and second numbers being based on a stride of a convolution operation; based on the information, obtain first input data elements from the first address of the memory; and control the systolic array to perform first computations based on the first weight data element and the first input data elements to generate first output data elements of an output data array.

Type: Grant

Filed: September 17, 2019

Date of Patent: November 15, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant
DILATED CONVOLUTION USING SYSTOLIC ARRAY

Publication number: 20220292163

Abstract: In one example, a non-transitory computer readable medium stores instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array; select a subset of input data elements from the memory into the systolic array to perform first computations of a dilated convolution operation, the subset being selected based on a rate of the dilated convolution operation and coordinates of the weight data element within the array of weight data elements; and control the systolic array to perform the first computations based on the first weight data element and the subset to generate first output data elements of an output data array. An example of a compiler that generates the instructions is also provided.

Type: Application

Filed: June 3, 2022

Publication date: September 15, 2022

Inventors: Jeffrey T. Huynh, Ron Diamant
Dilated convolution using systolic array

Patent number: 11379555

Abstract: In one example, a non-transitory computer readable medium stores instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array; select a subset of input data elements from the memory into the systolic array to perform first computations of a dilated convolution operation, the subset being selected based on a rate of the dilated convolution operation and coordinates of the weight data element within the array of weight data elements; and control the systolic array to perform the first computations based on the first weight data element and the subset to generate first output data elements of an output data array. An example of a compiler that generates the instructions is also provided.

Type: Grant

Filed: June 28, 2019

Date of Patent: July 5, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant
Transpose operations using processing element array

Patent number: 11347480

Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.

Type: Grant

Filed: December 15, 2020

Date of Patent: May 31, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Haichen Li, Ron Diamant, Jeffrey T. Huynh, Yu Zhou, Se jong Oh
Neural network layer-by-layer debugging

Patent number: 11308396

Abstract: Techniques are disclosed for debugging a neural network execution on a target processor. A reference processor may generate a plurality of first reference tensors for the neural network. The neural network may be repeatedly reduced to produce a plurality of lengths. For each of the lengths, a compiler converts the neural network into first machine instructions, the target processor executes the first machine instructions to generate a first device tensor, and the debugger program determines whether the first device tensor matches a first reference tensor. A shortest length is identified for which the first device tensor does not match the first reference tensor. Tensor output is enabled for a lower-level intermediate representation of the shortest neural network, and the neural network is converted into second machine instructions, which are executed by the target processor to generate a second device tensor.

Type: Grant

Filed: June 27, 2019

Date of Patent: April 19, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Jindrich Zejda, Jeffrey T. Huynh, Drazen Borkovic, Se jong Oh, Ron Diamant, Randy Renfu Huang
Registers for restricted memory

Patent number: 11294599

Abstract: Provided are integrated circuits and methods for operating integrated circuits. An integrated circuit can include a plurality of memory banks and an execution engine including a set of execution components. Each execution component can be associated with a respective memory bank and can read from and write to the respective memory bank. The integrated circuit can further include a set of registers each associated with a respective memory bank from the plurality of memory banks. The integrated circuit can further be operable to load to or store from the set of registers in parallel, and load to or store from the set of registers serially. A parallel operation followed by a serial operation enables data to be moved from many memory banks into one memory bank. A serial operation followed by a parallel operation enables data to be moved from one memory bank into many memory banks.

Type: Grant

Filed: June 3, 2020

Date of Patent: April 5, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Ron Diamant, Randy Renfu Huang, Sundeep Amirineni, Jeffrey T. Huynh
HYBRID WILDCARD MATCH TABLE

Publication number: 20220086089

Abstract: Embodiments of the present invention are directed to a wildcard matching solution that uses a combination of static random access memories (SRAMs) and ternary content addressable memories (TCAMs) in a hybrid solution. In particular, the wildcard matching solution uses a plurality of SRAM pools for lookup and a spillover TCAM pool for unresolved hash conflicts.

Type: Application

Filed: November 23, 2021

Publication date: March 17, 2022

Inventors: Jeffrey T. Huynh, Weihuang Wang, Tsahi Daniel, Srinath Atluri, Mohan Balan

1 2 next