Patents by Inventor Vignesh Vivekraja

Vignesh Vivekraja has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Emulating fine-grained sparsity in a systolic array

Patent number: 11500962

Abstract: To take advantage of the architecture of a systolic array tailored to perform sparse matrix multiplications, a weight matrix can be converted into a set of constrained fine-grained sparse weight matrices. The conversion process may include receiving a request to perform a matrix multiplication operation with a weight matrix, and determining that the weight matrix satisfies a sparsity condition to convert the weight matrix into a set of constrained fine-grained sparse weight matrices. The weight matrix can then be converted into a set of constrained fine-grained sparse weight matrices. Computer instructions can then be generated for an integrated circuit device to perform the requested matrix multiplication operation as a set of sparse matrix multiplication operations using the set of constrained fine-grained sparse weight matrices.

Type: Grant

Filed: June 30, 2020

Date of Patent: November 15, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Paul Gilbert Meyer, Thiam Khean Hah, Randy Renfu Huang, Ron Diamant, Vignesh Vivekraja
Low latency neural network model loading

Patent number: 11182314

Abstract: An integrated circuit device implementing a neural network accelerator may have a peripheral bus interface to interface with a host memory, and neural network models can be loaded from the host memory onto the state buffer of the neural network accelerator for execution by the array of processing elements. The neural network accelerator may also have a memory interface to interface with a local memory. The local memory may store neural network models from the host memory, and the models can be loaded from the local memory into the state buffer with reduced latency as compared to loading from the host memory. In systems with multiple accelerators, the models in the local memory can also be shared amongst different accelerators.

Type: Grant

Filed: November 27, 2019

Date of Patent: November 23, 2021

Assignee: Amazon Techaologies, Inc.

Inventors: Drazen Borkovic, Ilya Minkin, Vignesh Vivekraja, Richard John Heaton, Randy Renfu Huang
NEURAL NETWORK TRAINING UNDER MEMORY RESTRAINT

Publication number: 20210304010

Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.

Type: Application

Filed: March 31, 2020

Publication date: September 30, 2021

Inventors: Sudipta Sengupta, Randy Renfu Huang, Ron Diamant, Vignesh Vivekraja
Multinomial distribution on an integrated circuit

Patent number: 10997277

Abstract: An integrated circuit device such as a neural network accelerator can be programmed to select a numerical value based on a multinomial distribution. In various examples, the integrated circuit device can include an execution engine that includes multiple separate execution units. The multiple execution units can operate in parallel on different streams of data. For example, to make a selection based on a multinomial distribution, the execution units can be configured to perform cumulative sums on sets of numerical values, where the numerical values represent probabilities. In this example, to then obtain cumulative sums across the sets of numerical values, the largest values from the sets can be accumulated, and then added, in parallel to the sets. The resulting cumulative sum across all the numerical values can then be used to randomly select a specific index, which can provide a particular numerical value as the selected value.

Type: Grant

Filed: March 26, 2019

Date of Patent: May 4, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Yu Zhou, Vignesh Vivekraja, Ron Diamant
NEURAL NETWORK TRAINING IN A DISTRIBUTED SYSTEM

Publication number: 20210097396

Abstract: Methods and systems for performing a training operation of a neural network are provided. In one example, a method comprises: performing backward propagation computations for a second layer of a neural network to generate second weight gradients; splitting the second weight gradients into portions; causing a hardware interface to exchange a first portion of the second weight gradients with the second computer system; performing backward propagation computations for a first layer of the neural network to generate first weight gradients when the exchange of the first portion of the second weight gradients is underway, the first layer being a lower layer than the second layer in the neural network; causing the hardware interface to transmit the first weight gradients to the second computer system; and causing the hardware interface to transmit the remaining portions of the second weight gradients to the second computer system.

Type: Application

Filed: September 30, 2019

Publication date: April 1, 2021

Inventors: Vignesh Vivekraja, Thiam Khean Hah, Randy Renfu Huang, Ron Diamant, Richard John Heaton
TRANSPOSED CONVOLUTION USING SYSTOLIC ARRAY

Publication number: 20210097375

Abstract: In one example, a neural network accelerator can execute a set of instructions to: load a first weight data element from a memory into a systolic array, the first weight data element having first coordinates; extract, from the instructions, information indicating a first subset of input data elements to be obtained from the memory, the first subset being based on a stride of a transposed convolution operation and second coordinates of first weight data element in a rotated array of weight data elements; based on the information, obtain the first subset of input data elements from the memory; load the first subset of input data elements into the systolic array; and control the systolic array to perform first computations based on the first weight data element and the first subset of input data elements to generate output data elements of an array of output data elements.

Type: Application

Filed: September 27, 2019

Publication date: April 1, 2021

Inventors: Jeffrey T. Huynh, Vignesh Vivekraja
Assisted indirect memory addressing

Patent number: 10929063

Abstract: Systems and methods for assisted indirect memory addressing are provided. Some computing systems move data between levels of a hierarchical memory system. To accommodate data movement for computing systems that do not natively support indirect addressing between levels of the memory hierarchy, a direct memory access (DMA) engine is used to fetch data. The DMA engine executes a first set of memory instructions that modify a second set of memory instructions to fetch data stored at one level of the memory hierarchy from dynamically computed indirect addresses stored in memory locations at another level of the memory hierarchy.

Type: Grant

Filed: March 28, 2019

Date of Patent: February 23, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Vignesh Vivekraja, Yu Zhou, Ron Diamant, Randy Renfu Huang, Richard John Heaton
REDUCING COMPUTATION IN NEURAL NETWORKS USING SELF-MODIFYING CODE

Publication number: 20200387799

Abstract: In various implementations, provided are systems and methods for reducing neural network processing. A compiler may generate instructions from source code for a neural network having a repeatable set of operations. The instructions may include a plurality of blocks. The compiler may add an overwrite instruction to the plurality of blocks that, when executed by one or more execution engines, triggers an overwrite action. The overwrite action causes the instructions of subsequent blocks to be overwritten with NOP instructions. The overwrite action is triggered only when a condition is satisfied.

Type: Application

Filed: June 6, 2019

Publication date: December 10, 2020

Inventors: Vignesh Vivekraja, Randy Renfu Huang, Yu Zhou, Ron Diamant, Richard John Heaton

prev 1 2