Patents by Inventor Ron Diamant

Ron Diamant has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11250319
    Abstract: Disclosed herein are techniques for classifying data with a data processing circuit. In one embodiment, the data processing circuit includes a probabilistic circuit configurable to generate a decision at a pre-determined probability, and an output generation circuit including an output node and configured to receive input data and a weight, and generate output data at the output node for approximating a product of the input data and the weight. The generation of the output data includes propagating the weight to the output node according a first decision of the probabilistic circuit. The probabilistic circuit is configured to generate the first decision at a probability determined based on the input data.
    Type: Grant
    Filed: September 25, 2017
    Date of Patent: February 15, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Randy Huang, Ron Diamant
  • Patent number: 11232016
    Abstract: Techniques disclosed herein relate generally to debugging complex computing systems, such as those executing neural networks. A neural network processor includes a processing engine configured to execute instructions to implement multiple layers of a neural network. The neural network processor includes a debugging circuit configured to generate error detection codes for input data to the processing engine or error detection codes for output data generated by the processing engine. The neural network processor also includes an interface to a memory device, where the interface is configured to save the error detection codes generated by the debugging circuit into the memory device. The error detection codes generated by the debugging circuit are compared with expected error detection codes generated using a function model of the neural network to identify defects of the neural network.
    Type: Grant
    Filed: September 21, 2018
    Date of Patent: January 25, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Jeffrey T. Huynh, Ron Diamant, Sundeep Amirineni, Randy Renfu Huang
  • Patent number: 11231987
    Abstract: A debugging tool, such as may take the form of a software daemon running in the background, can provide for the monitoring of utilization of access mechanisms, such as Direct Memory Access (DMA) mechanisms, for purposes such as debugging and performance improvement. Debugging tools can obtain and provide DMA utilization data, as may include statistics, graphs, predictive analytics, or other such information. The data can help to pinpoint issues that have arisen, or may arise, in the system, and take appropriate remedial or preventative action. Data from related DMAs can be aggregated intelligently, helping to identify bottlenecks where the individual DMA data might not. A debugging tool can store state information as snapshots, which may be beneficial if the system is in a state where current data is not accessible. The statistics and predictive analytics can also be leveraged to optimize system-performance.
    Type: Grant
    Filed: June 28, 2019
    Date of Patent: January 25, 2022
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventors: Benita Bose, Ron Diamant, Georgy Zorik Machulsky, Alex Levin
  • Patent number: 11188302
    Abstract: Top-k is a process by which the largest elements among a set of elements is found. In various implementations, a top-k computation can be executed by a neural network accelerator, where the top-k computation is performed using a process that makes use of the accelerators memory array. A set of numerical values on which to perform top-k can be stored in the memory array. The accelerator can locate the maximum value from among the set of numerical values, and can store the maximum value back into the memory array. The accelerator can next remove the maximum value from the set of numerical values, so that a next largest value can be found. To remove the maximum value, the accelerator can write a value representing negative infinity to the memory array at each location of the maximum value.
    Type: Grant
    Filed: February 4, 2019
    Date of Patent: November 30, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Randy Renfu Huang, Richard John Heaton
  • Patent number: 11175919
    Abstract: Integrated circuit devices and methods for synchronizing execution of program code for multiple concurrently operating execution engines of the integrated circuit devices are provided. In some cases, one execution engine of an integrated circuit device may be dependent on the operation of another execution engine of the integrated circuit device. To synchronize the execution engines around the dependency, a first execution engine may execute an instruction to set a value in a register while a second execution engine may execute an instruction to wait for a condition associated with the register value.
    Type: Grant
    Filed: December 13, 2018
    Date of Patent: November 16, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Ilya Minkin, Ron Diamant, Drazen Borkovic, Jindrich Zejda, Dana Michelle Vantrease
  • Patent number: 11157452
    Abstract: A method for in-band de-duplication, the method may include receiving by a hardware accelerator, a received packet of a first sequence of packets that conveys a first data chunk; applying a data chunk hash calculation process on the received packet while taking into account a hash calculation result obtained when applying the data chunk hash calculation process on a last packet of the first sequence that preceded the received packet; wherein the calculating of the first data chunk hash value is initiated before a completion of a reception of the entire first data chunk by the hardware accelerator.
    Type: Grant
    Filed: May 9, 2017
    Date of Patent: October 26, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Nafea Bshara, Leah Shalev, Erez Izenberg, Georgy Machulsky, Ron Diamant
  • Patent number: 11138106
    Abstract: Provided are integrated circuit devices and methods for operating integrated circuit devices. In various examples, the integrated circuit device can include a target port operable to receive transactions from a master port. The target port can be configured with a multicast address range that is associated with a plurality of indices corresponding to memory banks of the device. When the target port receives a write transaction that has an address that is within the multicast address range, the target port can determine an index from the plurality of indices, and can use the index to determine a second address, which combines the index and the offset value with the address. The target port can then use the second address to write the data to the memory.
    Type: Grant
    Filed: March 31, 2020
    Date of Patent: October 5, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Randy Renfu Huang
  • Publication number: 20210303988
    Abstract: A first worker node of a distributed system computes a first set of gradients using a first neural network model and a first set of weights associated with the first neural network model. The first set of gradients are transmitted from the first worker node to a second worker node of the distributed system. The second worker node computes a first set of synchronized gradients based on the first set of gradients. While the first set of synchronized gradients are being computed, the first worker node computes a second set of gradients using a second neural network model and a second set of weights associated with the second neural network model. The second set of gradients are transmitted from the first worker node to the second worker node. The second worker node computes a second set of synchronized gradients based on the second set of gradients.
    Type: Application
    Filed: March 30, 2020
    Publication date: September 30, 2021
    Inventors: Patricio Kaplan, Ron Diamant
  • Publication number: 20210304010
    Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.
    Type: Application
    Filed: March 31, 2020
    Publication date: September 30, 2021
    Inventors: Sudipta Sengupta, Randy Renfu Huang, Ron Diamant, Vignesh Vivekraja
  • Publication number: 20210295168
    Abstract: Techniques for exchanging compressed gradient data within a distributed system are disclosed. A set of gradients are computed at a first worker node of the distributed system using a neural network model and a set of weights associated with the neural network model. Each of the set of gradients having a value less than a threshold is clipped, resulting in non-clipped data elements and clipped data elements. A mapping indicating which of the set of gradients correspond to non-clipped data elements and which of the set of gradients correspond to clipped data elements is generated. Compressed data is generated based on the non-clipped data elements.
    Type: Application
    Filed: March 23, 2020
    Publication date: September 23, 2021
    Inventors: Kun Xu, Ron Diamant
  • Patent number: 11119787
    Abstract: Systems and methods for non-intrusive hardware profiling are provided. In some cases integrated circuit devices can be manufactured without native support for performance measurement and/or debugging capabilities, thereby limiting visibility into the integrated circuit device. Understanding the timing of operations can help to determine whether the hardware of the device is operating correctly and, when the device is not operating correctly, provide information that can be used to debug the device. In order to measure execution time of various tasks performed by the integrated circuit device, program instructions may be inserted to generate notifications that provide tracing information, including timestamps, for operations executed by the integrated circuit device.
    Type: Grant
    Filed: March 28, 2019
    Date of Patent: September 14, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Mohammad El-Shabani, Ron Diamant, Samuel Jacob, Ilya Minkin, Richard John Heaton
  • Publication number: 20210247984
    Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.
    Type: Application
    Filed: April 28, 2021
    Publication date: August 12, 2021
    Inventors: Jeffrey T. Huynh, Drazen Borkovic, Jindrich Zejda, Randy Renfu Huang, Ron Diamant
  • Patent number: 11061654
    Abstract: Provided are systems and methods for synchronizing program code execution for a plurality of execution engines in an integrated circuit device. In some cases, the operation of one execution engine may be dependent on the operation of another execution engine. To accommodate this dependency, the instructions for the first execution engine can include a set-event instruction and the instructions for the second execution engine can include a wait-on-event instruction. The wait-on-event instruction can cause the second execution engine to wait for the first execution engine to reach the set-event instruction. In this way, the two execution engines can be synchronized around the data or resource dependency.
    Type: Grant
    Filed: December 12, 2018
    Date of Patent: July 13, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Drazen Borkovic, Jindrich Zejda, Taemin Kim, Ron Diamant
  • Publication number: 20210173656
    Abstract: In one example, a hardware accelerator comprises: a programmable hardware instruction decoder programmed to store a plurality of opcodes; a programmable instruction schema mapping table implemented as a content addressable memory (CAM) and programmed to map the plurality of opcodes to a plurality of definitions of operands in a plurality of instructions; a hardware execution engine; and a controller configured to: receive an instruction that includes a first opcode of the plurality of opcodes; control the hardware instruction decoder to extract the first opcode from the instruction; obtain, from the instruction schema mapping table and based on the first opcode, a first definition of a first operand; and forward the instruction and the first definition to the hardware execution engine to control the hardware execution engine to extract the first operand from the instruction based on the first definition, and execute the instruction based on the first operand.
    Type: Application
    Filed: December 9, 2019
    Publication date: June 10, 2021
    Inventor: Ron Diamant
  • Publication number: 20210158132
    Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.
    Type: Application
    Filed: November 27, 2019
    Publication date: May 27, 2021
    Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
  • Patent number: 11016775
    Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.
    Type: Grant
    Filed: June 26, 2019
    Date of Patent: May 25, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Jeffrey T. Huynh, Drazen Borkovic, Jindrich Zejda, Randy Renfu Huang, Ron Diamant
  • Patent number: 10997277
    Abstract: An integrated circuit device such as a neural network accelerator can be programmed to select a numerical value based on a multinomial distribution. In various examples, the integrated circuit device can include an execution engine that includes multiple separate execution units. The multiple execution units can operate in parallel on different streams of data. For example, to make a selection based on a multinomial distribution, the execution units can be configured to perform cumulative sums on sets of numerical values, where the numerical values represent probabilities. In this example, to then obtain cumulative sums across the sets of numerical values, the largest values from the sets can be accumulated, and then added, in parallel to the sets. The resulting cumulative sum across all the numerical values can then be used to randomly select a specific index, which can provide a particular numerical value as the selected value.
    Type: Grant
    Filed: March 26, 2019
    Date of Patent: May 4, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Yu Zhou, Vignesh Vivekraja, Ron Diamant
  • Patent number: 10990650
    Abstract: Systems and methods are provided to eliminate multiplication operations with zero padding data for convolution computations. A multiplication matrix is generated from an input feature map matrix with padding by adjusting coordinates and dimensions of the input feature map matrix to exclude padding data. The multiplication matrix is used to perform matrix multiplications with respective weight values which results in fewer computations as compared to matrix multiplications which include the zero padding data.
    Type: Grant
    Filed: March 22, 2018
    Date of Patent: April 27, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Dana Michelle Vantrease, Ron Diamant
  • Patent number: 10990408
    Abstract: Methods for place-and-route aware data pipelining for an integrated circuit device are provided. In large integrated circuits, the physical distance a data signal must travel between a signal source in a master circuit block partition and a signal destination in a servant circuit block partition can exceed the distance the signal can travel in a single clock cycle. To maintain timing requirements of the integrated circuit, a longest physical distance and signal delay for a datapath between master and servant circuit block partitions can be determined and pipelining registers added. Datapaths of master circuit block partitions further away from the servant circuit block can have more pipelining registers added within the master circuit block than datapaths of master circuit block partitions that are closer to the servant circuit block.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: April 27, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Akshay Balasubramanian, Sundeep Amirineni
  • Patent number: 10983754
    Abstract: Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. In one example, an apparatus comprises a first circuit, a second circuit, and a third circuit. The first circuit is configured to: receive first values in a first format, the first values being generated from one or more asymmetric quantization operations of second values in a second format, and generate difference values based on subtracting a third value from each of the first values, the third value representing a zero value in the first format. The second circuit is configured to generate a sum of products in the first format using the difference values. The third circuit is configured to convert the sum of products from the first format to the second format based on scaling the sum of products with a scaling factor.
    Type: Grant
    Filed: June 2, 2020
    Date of Patent: April 20, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Dana Michelle Vantrease, Randy Huang, Ron Diamant, Thomas Elmer, Sundeep Amirineni