Patents by Inventor Ron Diamant

Ron Diamant has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Neural network training under memory restraint

Patent number: 11610128

Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.

Type: Grant

Filed: March 31, 2020

Date of Patent: March 21, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Sudipta Sengupta, Randy Renfu Huang, Ron Diamant, Vignesh Vivekraja
Dynamic processing element array expansion

Patent number: 11568238

Abstract: A computer-implemented method includes receiving a neural network model that includes a tensor operation, and dividing the tensor operation into sub-operations. The sub-operations includes at least two sub-operations that have no data dependency between the two sub-operations. The computer-implemented method further includes assigning a first sub-operation in the two sub-operations to a first computing engine, assigning a second sub-operation in the two sub-operations to a second computing engine, and generating instructions for performing, in parallel, the first sub-operation by the first computing engine and the second sub-operation by the second computing engine. An inference is then made based on a result of the first sub-operation, a result of the second sub-operation, or both. The first computing engine and the second computing engine are in a same integrated circuit device or in two different integrated circuit devices.

Type: Grant

Filed: June 28, 2019

Date of Patent: January 31, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Randy Renfu Huang, Ron Diamant, Richard John Heaton
Neural network operation reordering for parallel execution

Patent number: 11567778

Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

Type: Grant

Filed: April 28, 2021

Date of Patent: January 31, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Drazen Borkovic, Jindrich Zejda, Randy Renfu Huang, Ron Diamant
PROCESSING FOR MULTIPLE INPUT DATA SETS

Publication number: 20230014783

Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.

Type: Application

Filed: September 22, 2022

Publication date: January 19, 2023

Inventors: Dana Michelle Vantrease, Ron Diamant, Thomas A. Volpe, Randy Huang
Tensorized direct memory access descriptors

Patent number: 11550736

Abstract: To reduce direct memory access (DMA) overhead, a tensorized descriptor can be used to generate a series of memory descriptors to perform a series of DMA data transfers. The tensorized descriptor may include attributes such as a stride and a memory descriptor template, which can be used to generate the series of memory descriptors. Hence, instead of having to retrieve each of the memory descriptors to perform the series of DMA transfers, a single tensorized descriptor can be retrieved to perform a series of data transfers.

Type: Grant

Filed: September 30, 2021

Date of Patent: January 10, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Kun Xu, Ron Diamant, Ilya Minkin, Mohammad El-Shabani, Raymond S. Whiteside, Uday Shilton Udayaselvam
SYSTOLIC ARRAY WITH INPUT REDUCTION TO MULTIPLE REDUCED INPUTS

Publication number: 20230004523

Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reducer can receive a particular input and generate multiple reduced inputs from the input. The reduced inputs can include reduced input data elements and/or a reduced weights. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide multiple reduced inputs with second shorter bit-length to the array. The systolic array may perform multiply-accumulate operations on each unique combination of the multiple reduced input data elements and the reduced weights to generate multiple partial outputs. The systolic array may sum the partial outputs to generate the output.

Type: Application

Filed: June 30, 2021

Publication date: January 5, 2023

Inventors: Paul Gilbert Meyer, Thomas A. Volpe, Ron Diamant, Joshua Wayne Bowman, Nishith Desai, Thomas Elmer
SYSTOLIC ARRAY WITH EFFICIENT INPUT REDUCTION AND EXTENDED ARRAY PERFORMANCE

Publication number: 20230004384

Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.

Type: Application

Filed: June 30, 2021

Publication date: January 5, 2023

Inventors: Paul Gilbert Meyer, Thomas A Volpe, Ron Diamant, Joshua Wayne Bowman, Nishith Desai, Thomas Elmer
AUTO-DETECTION OF INTERCONNECT HANGS IN INTEGRATED CIRCUITS

Publication number: 20220413980

Abstract: A self-detection mechanism for an IC is disclosed that determines whether the IC's internal bus is in a hanging state. An initialization sequence can be modified after a soft reset by reading data from an internal DRAM of the IC using a Direct Memory Access (DMA) controller as part of the initialization sequence. The read command is issued over the internal bus and, if the bus is hanging, the read command is not completed. Monitoring can be performed by waiting a predetermined period of time (e.g., 100ms) to determine if the read was properly completed. If so, no further action is needed. If the read was not completed, then a hard reset is requested to be performed. Thus, an initialization sequence can be modified to run dummy transactions through the internal bus, and validate that all paths are functional.

Type: Application

Filed: August 26, 2022

Publication date: December 29, 2022

Applicant: Amazon Technologies, Inc.

Inventors: Noga Smith, Ron Diamant, Saar Gross
Arbitrating throttling recommendations for a systolic array

Patent number: 11520731

Abstract: Throttling recommendations for a systolic array may be arbitrated. Throttling recommendations may be received at an arbiter for a systolic array from different sources, such as one or more monitors implemented in an integrated circuit along with the systolic array or sources external to the integrated circuit with the systolic array. A strongest throttling recommendation may be selected. The rate at which data enters the systolic array may be modified according to the strongest throttling recommendation.

Type: Grant

Filed: November 6, 2020

Date of Patent: December 6, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Ron Diamant, Thomas A Volpe
Hardware engine with configurable instructions

Patent number: 11507378

Abstract: In one example, an integrated circuit comprises: a memory configured to store a first mapping between a first opcode and first control information and a second mapping between the first opcode and second control information; a processing engine configured to perform processing operations based on the control information; and a controller configured to: at a first time, provide the first opcode to the memory to, based on the first mapping stored in the memory, fetch the first control information for the processing engine, to enable the processing engine to perform a first processing operation based on the first control information; and at a second time, provide the first opcode to the memory to, based on the second mapping stored in the memory, fetch the second control information for the processing engine, to enable the processing engine to perform a second processing operation based on the second control information.

Type: Grant

Filed: March 1, 2021

Date of Patent: November 22, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Ron Diamant, Sundeep Amirineni, Mohammad El-Shabani, Sagar Sonar, Kenneth Wayne Patton
Data replication for accelerator

Patent number: 11500802

Abstract: A direct memory access (DMA) engine can be used to multicast data from system memory to a target memory for loading into an array. The DMA engine may include a controller that is configured to receive a data transfer request, and generate a set of write operations for the output interface. The set of write operations can include, for each of multiple partitions of the target memory, a write operation to write usable data from the multicast data to an address offset in the corresponding partition, and an additional write operation to write filler data from the multicast data to a null device address.

Type: Grant

Filed: March 31, 2021

Date of Patent: November 15, 2022

Assignee: Amazon Technologies. Inc.

Inventors: Kun Xu, Ron Diamant, Patricio Kaplan, Henry Wang
Emulating fine-grained sparsity in a systolic array

Patent number: 11500962

Abstract: To take advantage of the architecture of a systolic array tailored to perform sparse matrix multiplications, a weight matrix can be converted into a set of constrained fine-grained sparse weight matrices. The conversion process may include receiving a request to perform a matrix multiplication operation with a weight matrix, and determining that the weight matrix satisfies a sparsity condition to convert the weight matrix into a set of constrained fine-grained sparse weight matrices. The weight matrix can then be converted into a set of constrained fine-grained sparse weight matrices. Computer instructions can then be generated for an integrated circuit device to perform the requested matrix multiplication operation as a set of sparse matrix multiplication operations using the set of constrained fine-grained sparse weight matrices.

Type: Grant

Filed: June 30, 2020

Date of Patent: November 15, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Paul Gilbert Meyer, Thiam Khean Hah, Randy Renfu Huang, Ron Diamant, Vignesh Vivekraja
Memory operation for systolic array

Patent number: 11501145

Abstract: In one example, a neural network accelerator executes instructions to: load a first weight data element of an array of weight data elements from a memory into a systolic array; extract, from the instructions, information indicating a first number of input data elements to be obtained from a first address of the memory and a second number of input data elements to be skipped between adjacent input data elements to be obtained, the first address being based on first coordinates of the first weight data element, and the first and second numbers being based on a stride of a convolution operation; based on the information, obtain first input data elements from the first address of the memory; and control the systolic array to perform first computations based on the first weight data element and the first input data elements to generate first output data elements of an output data array.

Type: Grant

Filed: September 17, 2019

Date of Patent: November 15, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant
Programmable computations in direct memory access engine

Patent number: 11494326

Abstract: To perform complex arithmetic operations in neural networks without compromising the performance of the neural network accelerator, a programmable computation unit is integrated with a direct memory access (DMA) engine that is used to exchange neural network parameters between the neural network accelerator and system memory. The DMA engine may include a calculation circuit operable to perform a multiply-and-add calculation on a set of operands, and an operand selector circuit operable to select a source for each operand of the calculation circuit. The DMA engine may also include a control circuit operable to retrieve a meta-descriptor for performing a computation, configure the operand selector circuit based on the meta-descriptor, and use the calculation circuit to perform the computation based on the meta-descriptor to generate a computation result.

Type: Grant

Filed: March 30, 2021

Date of Patent: November 8, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Kun Xu, Ron Diamant
Hardware security accelerator

Patent number: 11483296

Abstract: A hardware security accelerator includes a configurable parser that is configured to receive a packet and to extract from the packet headers associated with a set of protocols. The security accelerator also includes a packet type detection unit to determine a type of the packet in response to the set of protocols and to generate a packet type identifier indicative of the type of the packet. A configurable security unit includes a configuration unit and a configurable security engine. The configuration unit configures the configurable security engine according to the type of the packet and to content of at least one of the headers extracted from the packet. The configurable security engine performs security processing of the packet to provide at least one security result.

Type: Grant

Filed: June 30, 2020

Date of Patent: October 25, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Ron Diamant, Nafea Bshara, Leah Shalev, Erez Izenberg
Processing for multiple input data sets

Patent number: 11475306

Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.

Type: Grant

Filed: March 22, 2018

Date of Patent: October 18, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Dana Michelle Vantrease, Ron Diamant, Thomas A. Volpe, Randy Huang
Fine-grained access memory controller

Patent number: 11467973

Abstract: Systems and methods are provided to perform fine-grained memory accesses using a memory controller. The memory controller can access elements stored in memory across multiple dimensions of a matrix. The memory controller can perform accesses to non-contiguous memory locations by skipping zero or more elements across any dimension of the matrix.

Type: Grant

Filed: September 28, 2018

Date of Patent: October 11, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Ron Diamant, Sundeep Amirineni
Memory access operation in distributed computing system

Patent number: 11467992

Abstract: In one example, an apparatus comprises: a local on-chip memory; a computation engine configured to generate local data and to store the local data at the local on-chip memory; and a controller. The apparatus is configured to be coupled with a second device via an interconnect, the second device comprising a local memory. The controller is configured to: fetch the local data from the local on-chip memory; fetch remote data generated by another device from a local off-chip memory; generate output data based on combining the local data and the remote data; and store, via the interconnect, the output data at the local memory of the second device.

Type: Grant

Filed: September 24, 2020

Date of Patent: October 11, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Patricio Kaplan, Ron Diamant
Multi-model training pipeline in distributed systems

Patent number: 11468325

Abstract: A first worker node of a distributed system computes a first set of gradients using a first neural network model and a first set of weights associated with the first neural network model. The first set of gradients are transmitted from the first worker node to a second worker node of the distributed system. The second worker node computes a first set of synchronized gradients based on the first set of gradients. While the first set of synchronized gradients are being computed, the first worker node computes a second set of gradients using a second neural network model and a second set of weights associated with the second neural network model. The second set of gradients are transmitted from the first worker node to the second worker node. The second worker node computes a second set of synchronized gradients based on the second set of gradients.

Type: Grant

Filed: March 30, 2020

Date of Patent: October 11, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Patricio Kaplan, Ron Diamant
Synchronizing operations in hardware accelerator

Patent number: 11468304

Abstract: In one example, a hardware accelerator comprises an event register that stores an event; a hardware execution engine; and a controller configured to: extract, from an instruction, parameters of an operation to be performed by the hardware execution engine, and a synchronization primitive of a plurality of synchronization primitives for the event; and based on the synchronization primitive, perform at least one of: controlling a start time of the operation at the hardware execution engine, or determining whether to access the event register. The synchronization primitives include a set operation to set the event and/or a wait operation to suspend the operation at the hardware execution engine until the event is set. The plurality of synchronization primitive defines different conditions to be satisfied in order to perform the set operation.

Type: Grant

Filed: November 26, 2019

Date of Patent: October 11, 2022

Assignee: Amazon Technologies, Inc.

Inventor: Ron Diamant

prev 1 2 3 4 5 6 7 … next