Patents by Inventor Ron Diamant

Ron Diamant has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11610128
    Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.
    Type: Grant
    Filed: March 31, 2020
    Date of Patent: March 21, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Sudipta Sengupta, Randy Renfu Huang, Ron Diamant, Vignesh Vivekraja
  • Patent number: 11568238
    Abstract: A computer-implemented method includes receiving a neural network model that includes a tensor operation, and dividing the tensor operation into sub-operations. The sub-operations includes at least two sub-operations that have no data dependency between the two sub-operations. The computer-implemented method further includes assigning a first sub-operation in the two sub-operations to a first computing engine, assigning a second sub-operation in the two sub-operations to a second computing engine, and generating instructions for performing, in parallel, the first sub-operation by the first computing engine and the second sub-operation by the second computing engine. An inference is then made based on a result of the first sub-operation, a result of the second sub-operation, or both. The first computing engine and the second computing engine are in a same integrated circuit device or in two different integrated circuit devices.
    Type: Grant
    Filed: June 28, 2019
    Date of Patent: January 31, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Randy Renfu Huang, Ron Diamant, Richard John Heaton
  • Patent number: 11567778
    Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.
    Type: Grant
    Filed: April 28, 2021
    Date of Patent: January 31, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Jeffrey T. Huynh, Drazen Borkovic, Jindrich Zejda, Randy Renfu Huang, Ron Diamant
  • Publication number: 20230014783
    Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.
    Type: Application
    Filed: September 22, 2022
    Publication date: January 19, 2023
    Inventors: Dana Michelle Vantrease, Ron Diamant, Thomas A. Volpe, Randy Huang
  • Patent number: 11550736
    Abstract: To reduce direct memory access (DMA) overhead, a tensorized descriptor can be used to generate a series of memory descriptors to perform a series of DMA data transfers. The tensorized descriptor may include attributes such as a stride and a memory descriptor template, which can be used to generate the series of memory descriptors. Hence, instead of having to retrieve each of the memory descriptors to perform the series of DMA transfers, a single tensorized descriptor can be retrieved to perform a series of data transfers.
    Type: Grant
    Filed: September 30, 2021
    Date of Patent: January 10, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Kun Xu, Ron Diamant, Ilya Minkin, Mohammad El-Shabani, Raymond S. Whiteside, Uday Shilton Udayaselvam
  • Publication number: 20230004523
    Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reducer can receive a particular input and generate multiple reduced inputs from the input. The reduced inputs can include reduced input data elements and/or a reduced weights. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide multiple reduced inputs with second shorter bit-length to the array. The systolic array may perform multiply-accumulate operations on each unique combination of the multiple reduced input data elements and the reduced weights to generate multiple partial outputs. The systolic array may sum the partial outputs to generate the output.
    Type: Application
    Filed: June 30, 2021
    Publication date: January 5, 2023
    Inventors: Paul Gilbert Meyer, Thomas A. Volpe, Ron Diamant, Joshua Wayne Bowman, Nishith Desai, Thomas Elmer
  • Publication number: 20230004384
    Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.
    Type: Application
    Filed: June 30, 2021
    Publication date: January 5, 2023
    Inventors: Paul Gilbert Meyer, Thomas A Volpe, Ron Diamant, Joshua Wayne Bowman, Nishith Desai, Thomas Elmer
  • Publication number: 20220413980
    Abstract: A self-detection mechanism for an IC is disclosed that determines whether the IC's internal bus is in a hanging state. An initialization sequence can be modified after a soft reset by reading data from an internal DRAM of the IC using a Direct Memory Access (DMA) controller as part of the initialization sequence. The read command is issued over the internal bus and, if the bus is hanging, the read command is not completed. Monitoring can be performed by waiting a predetermined period of time (e.g., 100ms) to determine if the read was properly completed. If so, no further action is needed. If the read was not completed, then a hard reset is requested to be performed. Thus, an initialization sequence can be modified to run dummy transactions through the internal bus, and validate that all paths are functional.
    Type: Application
    Filed: August 26, 2022
    Publication date: December 29, 2022
    Applicant: Amazon Technologies, Inc.
    Inventors: Noga Smith, Ron Diamant, Saar Gross
  • Patent number: 11520731
    Abstract: Throttling recommendations for a systolic array may be arbitrated. Throttling recommendations may be received at an arbiter for a systolic array from different sources, such as one or more monitors implemented in an integrated circuit along with the systolic array or sources external to the integrated circuit with the systolic array. A strongest throttling recommendation may be selected. The rate at which data enters the systolic array may be modified according to the strongest throttling recommendation.
    Type: Grant
    Filed: November 6, 2020
    Date of Patent: December 6, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Thomas A Volpe
  • Patent number: 11507378
    Abstract: In one example, an integrated circuit comprises: a memory configured to store a first mapping between a first opcode and first control information and a second mapping between the first opcode and second control information; a processing engine configured to perform processing operations based on the control information; and a controller configured to: at a first time, provide the first opcode to the memory to, based on the first mapping stored in the memory, fetch the first control information for the processing engine, to enable the processing engine to perform a first processing operation based on the first control information; and at a second time, provide the first opcode to the memory to, based on the second mapping stored in the memory, fetch the second control information for the processing engine, to enable the processing engine to perform a second processing operation based on the second control information.
    Type: Grant
    Filed: March 1, 2021
    Date of Patent: November 22, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Sundeep Amirineni, Mohammad El-Shabani, Sagar Sonar, Kenneth Wayne Patton
  • Patent number: 11500802
    Abstract: A direct memory access (DMA) engine can be used to multicast data from system memory to a target memory for loading into an array. The DMA engine may include a controller that is configured to receive a data transfer request, and generate a set of write operations for the output interface. The set of write operations can include, for each of multiple partitions of the target memory, a write operation to write usable data from the multicast data to an address offset in the corresponding partition, and an additional write operation to write filler data from the multicast data to a null device address.
    Type: Grant
    Filed: March 31, 2021
    Date of Patent: November 15, 2022
    Assignee: Amazon Technologies. Inc.
    Inventors: Kun Xu, Ron Diamant, Patricio Kaplan, Henry Wang
  • Patent number: 11500962
    Abstract: To take advantage of the architecture of a systolic array tailored to perform sparse matrix multiplications, a weight matrix can be converted into a set of constrained fine-grained sparse weight matrices. The conversion process may include receiving a request to perform a matrix multiplication operation with a weight matrix, and determining that the weight matrix satisfies a sparsity condition to convert the weight matrix into a set of constrained fine-grained sparse weight matrices. The weight matrix can then be converted into a set of constrained fine-grained sparse weight matrices. Computer instructions can then be generated for an integrated circuit device to perform the requested matrix multiplication operation as a set of sparse matrix multiplication operations using the set of constrained fine-grained sparse weight matrices.
    Type: Grant
    Filed: June 30, 2020
    Date of Patent: November 15, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Paul Gilbert Meyer, Thiam Khean Hah, Randy Renfu Huang, Ron Diamant, Vignesh Vivekraja
  • Patent number: 11501145
    Abstract: In one example, a neural network accelerator executes instructions to: load a first weight data element of an array of weight data elements from a memory into a systolic array; extract, from the instructions, information indicating a first number of input data elements to be obtained from a first address of the memory and a second number of input data elements to be skipped between adjacent input data elements to be obtained, the first address being based on first coordinates of the first weight data element, and the first and second numbers being based on a stride of a convolution operation; based on the information, obtain first input data elements from the first address of the memory; and control the systolic array to perform first computations based on the first weight data element and the first input data elements to generate first output data elements of an output data array.
    Type: Grant
    Filed: September 17, 2019
    Date of Patent: November 15, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Jeffrey T. Huynh, Ron Diamant
  • Patent number: 11494326
    Abstract: To perform complex arithmetic operations in neural networks without compromising the performance of the neural network accelerator, a programmable computation unit is integrated with a direct memory access (DMA) engine that is used to exchange neural network parameters between the neural network accelerator and system memory. The DMA engine may include a calculation circuit operable to perform a multiply-and-add calculation on a set of operands, and an operand selector circuit operable to select a source for each operand of the calculation circuit. The DMA engine may also include a control circuit operable to retrieve a meta-descriptor for performing a computation, configure the operand selector circuit based on the meta-descriptor, and use the calculation circuit to perform the computation based on the meta-descriptor to generate a computation result.
    Type: Grant
    Filed: March 30, 2021
    Date of Patent: November 8, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Kun Xu, Ron Diamant
  • Patent number: 11483296
    Abstract: A hardware security accelerator includes a configurable parser that is configured to receive a packet and to extract from the packet headers associated with a set of protocols. The security accelerator also includes a packet type detection unit to determine a type of the packet in response to the set of protocols and to generate a packet type identifier indicative of the type of the packet. A configurable security unit includes a configuration unit and a configurable security engine. The configuration unit configures the configurable security engine according to the type of the packet and to content of at least one of the headers extracted from the packet. The configurable security engine performs security processing of the packet to provide at least one security result.
    Type: Grant
    Filed: June 30, 2020
    Date of Patent: October 25, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Nafea Bshara, Leah Shalev, Erez Izenberg
  • Patent number: 11475306
    Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.
    Type: Grant
    Filed: March 22, 2018
    Date of Patent: October 18, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Dana Michelle Vantrease, Ron Diamant, Thomas A. Volpe, Randy Huang
  • Patent number: 11467973
    Abstract: Systems and methods are provided to perform fine-grained memory accesses using a memory controller. The memory controller can access elements stored in memory across multiple dimensions of a matrix. The memory controller can perform accesses to non-contiguous memory locations by skipping zero or more elements across any dimension of the matrix.
    Type: Grant
    Filed: September 28, 2018
    Date of Patent: October 11, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Sundeep Amirineni
  • Patent number: 11467992
    Abstract: In one example, an apparatus comprises: a local on-chip memory; a computation engine configured to generate local data and to store the local data at the local on-chip memory; and a controller. The apparatus is configured to be coupled with a second device via an interconnect, the second device comprising a local memory. The controller is configured to: fetch the local data from the local on-chip memory; fetch remote data generated by another device from a local off-chip memory; generate output data based on combining the local data and the remote data; and store, via the interconnect, the output data at the local memory of the second device.
    Type: Grant
    Filed: September 24, 2020
    Date of Patent: October 11, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Patricio Kaplan, Ron Diamant
  • Patent number: 11468325
    Abstract: A first worker node of a distributed system computes a first set of gradients using a first neural network model and a first set of weights associated with the first neural network model. The first set of gradients are transmitted from the first worker node to a second worker node of the distributed system. The second worker node computes a first set of synchronized gradients based on the first set of gradients. While the first set of synchronized gradients are being computed, the first worker node computes a second set of gradients using a second neural network model and a second set of weights associated with the second neural network model. The second set of gradients are transmitted from the first worker node to the second worker node. The second worker node computes a second set of synchronized gradients based on the second set of gradients.
    Type: Grant
    Filed: March 30, 2020
    Date of Patent: October 11, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Patricio Kaplan, Ron Diamant
  • Patent number: 11468304
    Abstract: In one example, a hardware accelerator comprises an event register that stores an event; a hardware execution engine; and a controller configured to: extract, from an instruction, parameters of an operation to be performed by the hardware execution engine, and a synchronization primitive of a plurality of synchronization primitives for the event; and based on the synchronization primitive, perform at least one of: controlling a start time of the operation at the hardware execution engine, or determining whether to access the event register. The synchronization primitives include a set operation to set the event and/or a wait operation to suspend the operation at the hardware execution engine until the event is set. The plurality of synchronization primitive defines different conditions to be satisfied in order to perform the set operation.
    Type: Grant
    Filed: November 26, 2019
    Date of Patent: October 11, 2022
    Assignee: Amazon Technologies, Inc.
    Inventor: Ron Diamant