Patents by Inventor Ron Diamant

Ron Diamant has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Circuit architecture with biased randomization

Patent number: 11250319

Abstract: Disclosed herein are techniques for classifying data with a data processing circuit. In one embodiment, the data processing circuit includes a probabilistic circuit configurable to generate a decision at a pre-determined probability, and an output generation circuit including an output node and configured to receive input data and a weight, and generate output data at the output node for approximating a product of the input data and the weight. The generation of the output data includes propagating the weight to the output node according a first decision of the probabilistic circuit. The probabilistic circuit is configured to generate the first decision at a probability determined based on the input data.

Type: Grant

Filed: September 25, 2017

Date of Patent: February 15, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Randy Huang, Ron Diamant
Debug for computation networks using error detection codes

Patent number: 11232016

Abstract: Techniques disclosed herein relate generally to debugging complex computing systems, such as those executing neural networks. A neural network processor includes a processing engine configured to execute instructions to implement multiple layers of a neural network. The neural network processor includes a debugging circuit configured to generate error detection codes for input data to the processing engine or error detection codes for output data generated by the processing engine. The neural network processor also includes an interface to a memory device, where the interface is configured to save the error detection codes generated by the debugging circuit into the memory device. The error detection codes generated by the debugging circuit are compared with expected error detection codes generated using a function model of the neural network to identify defects of the neural network.

Type: Grant

Filed: September 21, 2018

Date of Patent: January 25, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant, Sundeep Amirineni, Randy Renfu Huang
Debugging of memory operations

Patent number: 11231987

Abstract: A debugging tool, such as may take the form of a software daemon running in the background, can provide for the monitoring of utilization of access mechanisms, such as Direct Memory Access (DMA) mechanisms, for purposes such as debugging and performance improvement. Debugging tools can obtain and provide DMA utilization data, as may include statistics, graphs, predictive analytics, or other such information. The data can help to pinpoint issues that have arisen, or may arise, in the system, and take appropriate remedial or preventative action. Data from related DMAs can be aggregated intelligently, helping to identify bottlenecks where the individual DMA data might not. A debugging tool can store state information as snapshots, which may be beneficial if the system is in a state where current data is not accessible. The statistics and predictive analytics can also be leveraged to optimize system-performance.

Type: Grant

Filed: June 28, 2019

Date of Patent: January 25, 2022

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Benita Bose, Ron Diamant, Georgy Zorik Machulsky, Alex Levin
Top value computation on an integrated circuit device

Patent number: 11188302

Abstract: Top-k is a process by which the largest elements among a set of elements is found. In various implementations, a top-k computation can be executed by a neural network accelerator, where the top-k computation is performed using a process that makes use of the accelerators memory array. A set of numerical values on which to perform top-k can be stored in the memory array. The accelerator can locate the maximum value from among the set of numerical values, and can store the maximum value back into the memory array. The accelerator can next remove the maximum value from the set of numerical values, so that a next largest value can be found. To remove the maximum value, the accelerator can write a value representing negative infinity to the memory array at each location of the maximum value.

Type: Grant

Filed: February 4, 2019

Date of Patent: November 30, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Ron Diamant, Randy Renfu Huang, Richard John Heaton
Synchronization of concurrent computation engines

Patent number: 11175919

Abstract: Integrated circuit devices and methods for synchronizing execution of program code for multiple concurrently operating execution engines of the integrated circuit devices are provided. In some cases, one execution engine of an integrated circuit device may be dependent on the operation of another execution engine of the integrated circuit device. To synchronize the execution engines around the dependency, a first execution engine may execute an instruction to set a value in a register while a second execution engine may execute an instruction to wait for a condition associated with the register value.

Type: Grant

Filed: December 13, 2018

Date of Patent: November 16, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Ilya Minkin, Ron Diamant, Drazen Borkovic, Jindrich Zejda, Dana Michelle Vantrease
In-band de-duplication

Patent number: 11157452

Abstract: A method for in-band de-duplication, the method may include receiving by a hardware accelerator, a received packet of a first sequence of packets that conveys a first data chunk; applying a data chunk hash calculation process on the received packet while taking into account a hash calculation result obtained when applying the data chunk hash calculation process on a last packet of the first sequence that preceded the received packet; wherein the calculating of the first data chunk hash value is initiated before a completion of a reception of the entire first data chunk by the hardware accelerator.

Type: Grant

Filed: May 9, 2017

Date of Patent: October 26, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Nafea Bshara, Leah Shalev, Erez Izenberg, Georgy Machulsky, Ron Diamant
Target port with distributed transactions

Patent number: 11138106

Abstract: Provided are integrated circuit devices and methods for operating integrated circuit devices. In various examples, the integrated circuit device can include a target port operable to receive transactions from a master port. The target port can be configured with a multicast address range that is associated with a plurality of indices corresponding to memory banks of the device. When the target port receives a write transaction that has an address that is within the multicast address range, the target port can determine an index from the plurality of indices, and can use the index to determine a second address, which combines the index and the offset value with the address. The target port can then use the second address to write the data to the memory.

Type: Grant

Filed: March 31, 2020

Date of Patent: October 5, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Ron Diamant, Randy Renfu Huang
MULTI-MODEL TRAINING PIPELINE IN DISTRIBUTED SYSTEMS

Publication number: 20210303988

Abstract: A first worker node of a distributed system computes a first set of gradients using a first neural network model and a first set of weights associated with the first neural network model. The first set of gradients are transmitted from the first worker node to a second worker node of the distributed system. The second worker node computes a first set of synchronized gradients based on the first set of gradients. While the first set of synchronized gradients are being computed, the first worker node computes a second set of gradients using a second neural network model and a second set of weights associated with the second neural network model. The second set of gradients are transmitted from the first worker node to the second worker node. The second worker node computes a second set of synchronized gradients based on the second set of gradients.

Type: Application

Filed: March 30, 2020

Publication date: September 30, 2021

Inventors: Patricio Kaplan, Ron Diamant
NEURAL NETWORK TRAINING UNDER MEMORY RESTRAINT

Publication number: 20210304010

Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.

Type: Application

Filed: March 31, 2020

Publication date: September 30, 2021

Inventors: Sudipta Sengupta, Randy Renfu Huang, Ron Diamant, Vignesh Vivekraja
GRADIENT COMPRESSION FOR DISTRIBUTED TRAINING

Publication number: 20210295168

Abstract: Techniques for exchanging compressed gradient data within a distributed system are disclosed. A set of gradients are computed at a first worker node of the distributed system using a neural network model and a set of weights associated with the neural network model. Each of the set of gradients having a value less than a threshold is clipped, resulting in non-clipped data elements and clipped data elements. A mapping indicating which of the set of gradients correspond to non-clipped data elements and which of the set of gradients correspond to clipped data elements is generated. Compressed data is generated based on the non-clipped data elements.

Type: Application

Filed: March 23, 2020

Publication date: September 23, 2021

Inventors: Kun Xu, Ron Diamant
Non-intrusive hardware profiling

Patent number: 11119787

Abstract: Systems and methods for non-intrusive hardware profiling are provided. In some cases integrated circuit devices can be manufactured without native support for performance measurement and/or debugging capabilities, thereby limiting visibility into the integrated circuit device. Understanding the timing of operations can help to determine whether the hardware of the device is operating correctly and, when the device is not operating correctly, provide information that can be used to debug the device. In order to measure execution time of various tasks performed by the integrated circuit device, program instructions may be inserted to generate notifications that provide tracing information, including timestamps, for operations executed by the integrated circuit device.

Type: Grant

Filed: March 28, 2019

Date of Patent: September 14, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Mohammad El-Shabani, Ron Diamant, Samuel Jacob, Ilya Minkin, Richard John Heaton
NEURAL NETWORK OPERATION REORDERING FOR PARALLEL EXECUTION

Publication number: 20210247984

Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

Type: Application

Filed: April 28, 2021

Publication date: August 12, 2021

Inventors: Jeffrey T. Huynh, Drazen Borkovic, Jindrich Zejda, Randy Renfu Huang, Ron Diamant
Synchronization of concurrent computation engines

Patent number: 11061654

Abstract: Provided are systems and methods for synchronizing program code execution for a plurality of execution engines in an integrated circuit device. In some cases, the operation of one execution engine may be dependent on the operation of another execution engine. To accommodate this dependency, the instructions for the first execution engine can include a set-event instruction and the instructions for the second execution engine can include a wait-on-event instruction. The wait-on-event instruction can cause the second execution engine to wait for the first execution engine to reach the set-event instruction. In this way, the two execution engines can be synchronized around the data or resource dependency.

Type: Grant

Filed: December 12, 2018

Date of Patent: July 13, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Drazen Borkovic, Jindrich Zejda, Taemin Kim, Ron Diamant
HARDWARE ACCELERATOR HAVING RECONFIGURABLE INSTRUCTION SET AND RECONFIGURABLE DECODER

Publication number: 20210173656

Abstract: In one example, a hardware accelerator comprises: a programmable hardware instruction decoder programmed to store a plurality of opcodes; a programmable instruction schema mapping table implemented as a content addressable memory (CAM) and programmed to map the plurality of opcodes to a plurality of definitions of operands in a plurality of instructions; a hardware execution engine; and a controller configured to: receive an instruction that includes a first opcode of the plurality of opcodes; control the hardware instruction decoder to extract the first opcode from the instruction; obtain, from the instruction schema mapping table and based on the first opcode, a first definition of a first operand; and forward the instruction and the first definition to the hardware execution engine to control the hardware execution engine to extract the first operand from the instruction based on the first definition, and execute the instruction based on the first operand.

Type: Application

Filed: December 9, 2019

Publication date: June 10, 2021

Inventor: Ron Diamant
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY

Publication number: 20210158132

Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

Type: Application

Filed: November 27, 2019

Publication date: May 27, 2021

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
Neural network operation reordering for parallel execution

Patent number: 11016775

Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

Type: Grant

Filed: June 26, 2019

Date of Patent: May 25, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Drazen Borkovic, Jindrich Zejda, Randy Renfu Huang, Ron Diamant
Multinomial distribution on an integrated circuit

Patent number: 10997277

Abstract: An integrated circuit device such as a neural network accelerator can be programmed to select a numerical value based on a multinomial distribution. In various examples, the integrated circuit device can include an execution engine that includes multiple separate execution units. The multiple execution units can operate in parallel on different streams of data. For example, to make a selection based on a multinomial distribution, the execution units can be configured to perform cumulative sums on sets of numerical values, where the numerical values represent probabilities. In this example, to then obtain cumulative sums across the sets of numerical values, the largest values from the sets can be accumulated, and then added, in parallel to the sets. The resulting cumulative sum across all the numerical values can then be used to randomly select a specific index, which can provide a particular numerical value as the selected value.

Type: Grant

Filed: March 26, 2019

Date of Patent: May 4, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Yu Zhou, Vignesh Vivekraja, Ron Diamant
Reducing computations for data including padding

Patent number: 10990650

Abstract: Systems and methods are provided to eliminate multiplication operations with zero padding data for convolution computations. A multiplication matrix is generated from an input feature map matrix with padding by adjusting coordinates and dimensions of the input feature map matrix to exclude padding data. The multiplication matrix is used to perform matrix multiplications with respective weight values which results in fewer computations as compared to matrix multiplications which include the zero padding data.

Type: Grant

Filed: March 22, 2018

Date of Patent: April 27, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Dana Michelle Vantrease, Ron Diamant
Place and route aware data pipelining

Patent number: 10990408

Abstract: Methods for place-and-route aware data pipelining for an integrated circuit device are provided. In large integrated circuits, the physical distance a data signal must travel between a signal source in a master circuit block partition and a signal destination in a servant circuit block partition can exceed the distance the signal can travel in a single clock cycle. To maintain timing requirements of the integrated circuit, a longest physical distance and signal delay for a datapath between master and servant circuit block partitions can be determined and pipelining registers added. Datapaths of master circuit block partitions further away from the servant circuit block can have more pipelining registers added within the master circuit block than datapaths of master circuit block partitions that are closer to the servant circuit block.

Type: Grant

Filed: September 25, 2019

Date of Patent: April 27, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Ron Diamant, Akshay Balasubramanian, Sundeep Amirineni
Accelerated quantized multiply-and-add operations

Patent number: 10983754

Abstract: Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. In one example, an apparatus comprises a first circuit, a second circuit, and a third circuit. The first circuit is configured to: receive first values in a first format, the first values being generated from one or more asymmetric quantization operations of second values in a second format, and generate difference values based on subtracting a third value from each of the first values, the third value representing a zero value in the first format. The second circuit is configured to generate a sum of products in the first format using the difference values. The third circuit is configured to convert the sum of products from the first format to the second format based on scaling the sum of products with a scaling factor.

Type: Grant

Filed: June 2, 2020

Date of Patent: April 20, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Dana Michelle Vantrease, Randy Huang, Ron Diamant, Thomas Elmer, Sundeep Amirineni

prev 1 2 3 4 5 6 7 8 9 … next