Patents by Inventor Vignesh Vivekraja

Vignesh Vivekraja has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Accelerated convolution of neural networks

Patent number: 12205013

Abstract: Accelerated convolution of neural networks can be performed by executing N computing engines (CEs) of a neural network processor in parallel. An input dataset can be divided spatially into N chunks such that a respective last portion of each chunk overlaps with a respective first portion of a subsequent chunk. Portions of each chunk can be processed by a respective CE to generate a respective portion of an output dataset. The overlapping intermediate states computed by each CE from processing the overlapping portion can be stored locally for sharing with a subsequent CE using an on-chip bus.

Type: Grant

Filed: September 1, 2020

Date of Patent: January 21, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Thiam Khean Hah, Randy Renfu Huang, Richard John Heaton, Ron Diamant, Vignesh Vivekraja
Fine-grained sparsity computations in systolic array

Patent number: 12182695

Abstract: A systolic array can implement an architecture tailored to perform matrix multiplications on sparse matrices. Each processing element in the systolic array may include a register configured to store a value, and a multiplexor configured to select an input element from multiple input data buses based on metadata associated with the value. Each processing element may also include a multiplier configured to multiply the selected input element with the value to generate a multiplication result, and an adder configured to add the multiplication result to a partial sum input to generate a partial sum output.

Type: Grant

Filed: September 25, 2023

Date of Patent: December 31, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Paul Gilbert Meyer, Thiam Khean Hah, Randy Renfu Huang, Ron Diamant, Vignesh Vivekraja
NEURAL NETWORK TRAINING UNDER MEMORY RESTRAINT

Publication number: 20240403646

Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.

Type: Application

Filed: August 8, 2024

Publication date: December 5, 2024

Inventors: Sudipta Sengupta, Randy Renfu Renfu, Ron Diamant, Vignesh Vivekraja
Dropout layer in a neural network processor

Patent number: 12159218

Abstract: A single instruction multiple data (SIMD) processor is used to implement a dropout layer between a first layer and a second layer of a neural network. The SIMD processor can implement the dropout layer by setting one or more elements in an output tensor of the first layer to zero before providing it as an input tensor to the second layer. Setting of the one or more elements to zero is based on a dropout rate, and pseudo-random numbers generated by a random number generator in the SIMD processor.

Type: Grant

Filed: July 27, 2020

Date of Patent: December 3, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Jiading Gai, Hongbin Zheng, Animesh Jain, Randy Renfu Huang, Vignesh Vivekraja
Emulating fine-grained sparsity in a systolic array

Patent number: 12130885

Abstract: To take advantage of the architecture of a systolic array tailored to perform sparse matrix multiplications, a weight matrix can be converted into a set of constrained fine-grained sparse weight matrices. The conversion process may include receiving a request to perform a matrix multiplication operation with a weight matrix, and determining that the weight matrix satisfies a sparsity condition to convert the weight matrix into a set of constrained fine-grained sparse weight matrices. The weight matrix can then be converted into a set of constrained fine-grained sparse weight matrices. Computer instructions can then be generated for an integrated circuit device to perform the requested matrix multiplication operation as a set of sparse matrix multiplication operations using the set of constrained fine-grained sparse weight matrices.

Type: Grant

Filed: November 3, 2022

Date of Patent: October 29, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Paul Gilbert Meyer, Thiam Khean Hah, Randy Renfu Huang, Ron Diamant, Vignesh Vivekraja
Neural network training under memory restraint

Patent number: 12106222

Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.

Type: Grant

Filed: February 21, 2023

Date of Patent: October 1, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Sudipta Sengupta, Randy Renfu Huang, Ron Diamant, Vignesh Vivekraja
Reducing computation in neural networks using self-modifying code

Patent number: 12073199

Abstract: In various implementations, provided are systems and methods for reducing neural network processing. A compiler may generate instructions from source code for a neural network having a repeatable set of operations. The instructions may include a plurality of blocks. The compiler may add an overwrite instruction to the plurality of blocks that, when executed by one or more execution engines, triggers an overwrite action. The overwrite action causes the instructions of subsequent blocks to be overwritten with NOP instructions. The overwrite action is triggered only when a condition is satisfied.

Type: Grant

Filed: June 6, 2019

Date of Patent: August 27, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Vignesh Vivekraja, Randy Renfu Huang, Yu Zhou, Ron Diamant, Richard John Heaton
NEURAL NETWORK TRAINING IN A DISTRIBUTED SYSTEM

Publication number: 20240232630

Abstract: Methods and systems for performing a training operation of a neural network are provided. In one example, a method comprises: performing backward propagation computations for a second layer of a neural network to generate second weight gradients; splitting the second weight gradients into portions; causing a hardware interface to exchange a first portion of the second weight gradients with the second computer system; performing backward propagation computations for a first layer of the neural network to generate first weight gradients when the exchange of the first portion of the second weight gradients is underway, the first layer being a lower layer than the second layer in the neural network; causing the hardware interface to transmit the first weight gradients to the second computer system; and causing the hardware interface to transmit the remaining portions of the second weight gradients to the second computer system.

Type: Application

Filed: July 13, 2023

Publication date: July 11, 2024

Inventors: Vignesh Vivekraja, Thiam Khean Hah, Randy Renfu Huang, Ron Diamant, Richard John Heaton
POLYMORPHIC TWO-DIMENSIONAL REGISTER FILE

Publication number: 20240220256

Abstract: In one embodiment, a computing system may load data from a memory unit into a number of registers according to a first order by which the data is arranged. The registers may be configured to be accessed during a single operation cycle. The system may determine a second order for the data based on one or more subsequent operations to process the data. The system may read the data from the registers according to the second order during one or more operation cycles. The data read from the registers may be arranged in the second order. The system may transmit the data arranged in the second order to an execution unit configured to execute the one or more subsequent operations to process the data arranged in the second order.

Type: Application

Filed: November 30, 2023

Publication date: July 4, 2024

Inventors: Reza Tusi, Tomonari Tohara, Vignesh Vivekraja, Javid Jaffari
SEQUENCE OF OPERATIONS IN AN SIMD VLIW PROCESSOR FOR MACHINE-LEARNING COMPUTATIONS

Publication number: 20240220779

Abstract: In one embodiment, a system comprising a processor and a non-transitory memory coupled to the processor comprising instructions executable by the processor. The processor, comprising an internal memory; a Multiply-Accumulate (MAC) array; a first vector register array; a second vector register array; and a third vector register array, is operable when executing instructions to transfer weights for M filters and an input activation tensor from an external memory to the internal memory, insert paddings to the input activation tensor in the internal memory based on first configuration parameters, configure the MAC array to a required shape based on second configuration parameters for convolution operations between the input activation tensor and the M filters, and calculate a row of the output activation tensor by performing the convolution operations on corresponding R rows of the input activation tensor with the M filters, wherein R is a filter height.

Type: Application

Filed: December 1, 2023

Publication date: July 4, 2024

Inventors: Vignesh Vivekraja, Tomonari Tohara, Reza Tusi, Abuduwaili Tuoheti, Javid Jaffari, Vlad Fruchter, David Vakrat, Ohad Meitav
MAPPING HARDWARE COMPONENTS TO A SERIES OF CALCULATIONS

Publication number: 20240220281

Abstract: In one embodiment, a method includes accessing a computational graph representing computations to be executed on a computing system comprising a plurality of Execution Units (EUs), identifying a set of candidate mapped-graphs for the computational graph, where each node in a candidate mapped-graph is mapped to an EU capable of calculating the node, ensuring that each edge from a first node to a second node in each candidate mapped-graph satisfies memory constraints, determining an expected cost for executing each candidate mapped-graph using mapped-EUs in the candidate mapped-graph for calculating respective nodes, and selecting a candidate mapped-graph with a least expected cost from the set of candidate mapped-graphs.

Type: Application

Filed: November 30, 2023

Publication date: July 4, 2024

Inventors: Vignesh Vivekraja, Tomonari Tohara, Reza Tusi, Abuduwaili Tuoheti, Weiping Liu, Javid Jaffari
HARDWARE ARCHITECTURE AND AN INSTRUCTION SET ARCHITECTURE FOR MACHINE-LEARNING COMPUTATIONS

Publication number: 20240220273

Abstract: In one embodiment, a system comprising a processor and a non-transitory memory coupled to the processor comprising instructions executable by the processor. The processor, comprising an internal memory; a Multiply-Accumulate (MAC) array; a first vector register array; a second vector register array; and a third vector register array, is operable when executing a first instruction among the instructions to feed a weight vector array from the second vector register array to the MAC array, broadcast an input activation vector to the MAC array, multiply an input activation value broadcast to the MAC unit from the input activation vector and a weight value fed to the MAC unit from the weight vector array at each MAC unit in the MAC array, and store a partial output activation vector to the third vector register array, wherein the partial output activation vector is the output of the MAC array.

Type: Application

Filed: December 1, 2023

Publication date: July 4, 2024

Inventors: Vignesh Vivekraja, Tomonari Tohara, Reza Tusi, Abuduwaili Tuoheti, Javid Jaffari, Vlad Fruchter, David Vakrat, Ohad Meitav
DATA COMPRESSION USING INSTRUCTION SET ARCHITECTURE

Publication number: 20240220259

Abstract: In one embodiment, a computing system may set data to a first group of registers. The first group of registers may be configured to be accessed during a single operation cycle. The system may set a number of patterns to a second group of registers. Each pattern of the number of patterns may include an array of index for the data stored in the first group of registers. The system may select, for a first vector register associated with a vector engine, a first pattern from the patterns stored in the second group of registers. The system may load a first portion of the data from the first group of registers to the first vector register based on the first pattern selected for the first vector register from the patterns stored in the second group of registers.

Type: Application

Filed: November 30, 2023

Publication date: July 4, 2024

Inventors: Tomonari Tohara, Vignesh Vivekraja, Alagappan Valliappan, Andrey Bushev, Javid Jaffari
Acceleration of neural networks with stacks of convolutional layers

Patent number: 12008469

Abstract: A single neural network model can be used by each computing engine (CE) in a neural network processor to perform convolution operations in parallel for one or more stacks of convolutional layers. An input feature map can be divided into N chunks to be processed by N CEs, respectively. Each CE can process a last portion of a respective chunk to generate respective shared states to be used by a subsequent CE. A first CE uses pre-computed states to generate a first portion of an output feature map, while other CEs use shared states computed by a preceding CE to generate respective portions of the output feature map.

Type: Grant

Filed: September 1, 2020

Date of Patent: June 11, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Thiam Khean Hah, Randy Renfu Huang, Richard John Heaton, Ron Diamant, Vignesh Vivekraja
Transposed convolution using systolic array

Patent number: 11954583

Abstract: In one example, a neural network accelerator can execute a set of instructions to: load a first weight data element from a memory into a systolic array, the first weight data element having first coordinates; extract, from the instructions, information indicating a first subset of input data elements to be obtained from the memory, the first subset being based on a stride of a transposed convolution operation and second coordinates of first weight data element in a rotated array of weight data elements; based on the information, obtain the first subset of input data elements from the memory; load the first subset of input data elements into the systolic array; and control the systolic array to perform first computations based on the first weight data element and the first subset of input data elements to generate output data elements of an array of output data elements.

Type: Grant

Filed: April 14, 2023

Date of Patent: April 9, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T Huynh, Vignesh Vivekraja
Neural network training in a distributed system

Patent number: 11941528

Abstract: Methods and systems for performing a training operation of a neural network are provided. In one example, a method comprises: performing backward propagation computations for a second layer of a neural network to generate second weight gradients; splitting the second weight gradients into portions; causing a hardware interface to exchange a first portion of the second weight gradients with the second computer system; performing backward propagation computations for a first layer of the neural network to generate first weight gradients when the exchange of the first portion of the second weight gradients is underway, the first layer being a lower layer than the second layer in the neural network; causing the hardware interface to transmit the first weight gradients to the second computer system; and causing the hardware interface to transmit the remaining portions of the second weight gradients to the second computer system.

Type: Grant

Filed: September 30, 2019

Date of Patent: March 26, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Vignesh Vivekraja, Thiam Khean Hah, Randy Renfu Huang, Ron Diamant, Richard John Heaton
Fine-grained sparsity computations in systolic array

Patent number: 11803736

Abstract: A systolic array can implement an architecture tailored to perform matrix multiplications on constrained fine-grained sparse weight matrices. Each processing element in the systolic array may include a weight register configured to store a weight value, and a multiplexor configured to select a feature map (FMAP) input element from multiple FMAP input data buses based on metadata associated with the weight value. Each processing element may also include a multiplier configured to multiply the selected feature map input element with the weight value to generate a multiplication result, and an adder configured to add the multiplication result to a partial sum input to generate a partial sum output.

Type: Grant

Filed: June 30, 2020

Date of Patent: October 31, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Paul Gilbert Meyer, Thiam Khean Hah, Randy Renfu Huang, Ron Diamant, Vignesh Vivekraja
TRANSPOSED CONVOLUTION USING SYSTOLIC ARRAY

Publication number: 20230306249

Abstract: In one example, a neural network accelerator can execute a set of instructions to: load a first weight data element from a memory into a systolic array, the first weight data element having first coordinates; extract, from the instructions, information indicating a first subset of input data elements to be obtained from the memory, the first subset being based on a stride of a transposed convolution operation and second coordinates of first weight data element in a rotated array of weight data elements; based on the information, obtain the first subset of input data elements from the memory; load the first subset of input data elements into the systolic array; and control the systolic array to perform first computations based on the first weight data element and the first subset of input data elements to generate output data elements of an array of output data elements.

Type: Application

Filed: April 14, 2023

Publication date: September 28, 2023

Inventors: Jeffrey T Huynh, Vignesh Vivekraja
Transposed convolution using systolic array

Patent number: 11681902

Abstract: In one example, a neural network accelerator can execute a set of instructions to: load a first weight data element from a memory into a systolic array, the first weight data element having first coordinates; extract, from the instructions, information indicating a first subset of input data elements to be obtained from the memory, the first subset being based on a stride of a transposed convolution operation and second coordinates of first weight data element in a rotated array of weight data elements; based on the information, obtain the first subset of input data elements from the memory; load the first subset of input data elements into the systolic array; and control the systolic array to perform first computations based on the first weight data element and the first subset of input data elements to generate output data elements of an array of output data elements.

Type: Grant

Filed: September 27, 2019

Date of Patent: June 20, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T Huynh, Vignesh Vivekraja
Neural network training under memory restraint

Patent number: 11610128

Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.

Type: Grant

Filed: March 31, 2020

Date of Patent: March 21, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Sudipta Sengupta, Randy Renfu Huang, Ron Diamant, Vignesh Vivekraja

1 2 next