Patents by Inventor Ron Diamant

Ron Diamant has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multidimensional and multiblock tensorized direct memory access descriptors

Patent number: 11983128

Abstract: Techniques to reduce overhead in a direct memory access (DMA) engine can include processing descriptors from a descriptor queue to obtain a striding configuration to generate tensorized memory descriptors. The striding configuration can include, for each striding dimension, a stride and a repetition number indicating a number of times to repeat striding in the corresponding striding dimension. One or more sets of tensorized memory descriptors can be generated based on the striding configuration. Data transfers are then performed based on the generated tensorized memory descriptors.

Type: Grant

Filed: December 16, 2022

Date of Patent: May 14, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Kun Xu, Ron Diamant, Ilya Minkin, Mohammad El-Shabani, Raymond S. Whiteside, Uday Shilton Udayaselvam
Reducing computations for data including padding

Patent number: 11960566

Abstract: Systems and methods are provided to eliminate multiplication operations with zero padding data for convolution computations. A multiplication matrix is generated from an input feature map matrix with padding by adjusting coordinates and dimensions of the input feature map matrix to exclude padding data. The multiplication matrix is used to perform matrix multiplications with respective weight values which results in fewer computations as compared to matrix multiplications which include the zero padding data.

Type: Grant

Filed: April 13, 2021

Date of Patent: April 16, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Dana Michelle Vantrease, Ron Diamant
Circuit architecture with biased randomization

Patent number: 11960997

Abstract: Disclosed herein are techniques for classifying data with a data processing circuit. In one embodiment, the data processing circuit includes a probabilistic circuit configurable to generate a decision at a pre-determined probability, and an output generation circuit including an output node and configured to receive input data and a weight, and generate output data at the output node for approximating a product of the input data and the weight. The generation of the output data includes propagating the weight to the output node according a first decision of the probabilistic circuit. The probabilistic circuit is configured to generate the first decision at a probability determined based on the input data.

Type: Grant

Filed: January 7, 2022

Date of Patent: April 16, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Randy Huang, Ron Diamant
PROGRAMMABLE COMPUTE ENGINE HAVING TRANSPOSE OPERATIONS

Publication number: 20240111528

Abstract: A technique to execute transpose and compute operations may include retrieving a set of machine instructions from an instruction buffer of a data processor. The instruction buffer has multiple entries, and each entry stores one machine instruction. A machine instruction from the set of machine instructions is executed to transpose a submatrix of an input tensor and perform computations on column elements of the submatrix. The machine instruction combines the transpose operation with computational operations into a single machine instruction.

Type: Application

Filed: September 21, 2022

Publication date: April 4, 2024

Inventors: Xiaodan Tan, Paul Gilbert Meyer, Sheng Xu, Ron Diamant
COMPUTE ENGINE WITH TRANSPOSE CIRCUITRY

Publication number: 20240103813

Abstract: An integrated circuit that combines transpose and compute operations may include a transpose circuit coupled to a set of compute channels. Each compute channel may include multiple arithmetic logic unit (ALU) circuits coupled in series. The transpose circuit is operable to receive an input tensor, transpose the input tensor, and output a transposed tensor to the set of compute channels. The set of compute channels is operable to generate outputs in parallel, with each of the outputs being generated from a corresponding vector of the transposed tensor.

Type: Application

Filed: September 21, 2022

Publication date: March 28, 2024

Inventors: Xiaodan Tan, Paul Gilbert Meyer, Sheng Xu, Ron Diamant
Neural network training in a distributed system

Patent number: 11941528

Abstract: Methods and systems for performing a training operation of a neural network are provided. In one example, a method comprises: performing backward propagation computations for a second layer of a neural network to generate second weight gradients; splitting the second weight gradients into portions; causing a hardware interface to exchange a first portion of the second weight gradients with the second computer system; performing backward propagation computations for a first layer of the neural network to generate first weight gradients when the exchange of the first portion of the second weight gradients is underway, the first layer being a lower layer than the second layer in the neural network; causing the hardware interface to transmit the first weight gradients to the second computer system; and causing the hardware interface to transmit the remaining portions of the second weight gradients to the second computer system.

Type: Grant

Filed: September 30, 2019

Date of Patent: March 26, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Vignesh Vivekraja, Thiam Khean Hah, Randy Renfu Huang, Ron Diamant, Richard John Heaton
Auto-detection of interconnect hangs in integrated circuits

Patent number: 11880289

Abstract: A self-detection mechanism for an IC is disclosed that determines whether the IC's internal bus is in a hanging state. An initialization sequence can be modified after a soft reset by reading data from an internal DRAM of the IC using a Direct Memory Access (DMA) controller as part of the initialization sequence. The read command is issued over the internal bus and, if the bus is hanging, the read command is not completed. Monitoring can be performed by waiting a predetermined period of time (e.g., 100 ms) to determine if the read was properly completed. If so, no further action is needed. If the read was not completed, then a hard reset is requested to be performed. Thus, an initialization sequence can be modified to run dummy transactions through the internal bus, and validate that all paths are functional.

Type: Grant

Filed: August 26, 2022

Date of Patent: January 23, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Noga Smith, Ron Diamant, Saar Gross
Systolic array with efficient input reduction and extended array performance

Patent number: 11880682

Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.

Type: Grant

Filed: June 30, 2021

Date of Patent: January 23, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Paul Gilbert Meyer, Thomas A Volpe, Ron Diamant, Joshua Wayne Bowman, Nishith Desai, Thomas Elmer
IMPROPER NEURAL NETWORK INPUT DETECTION AND HANDLING

Publication number: 20240020514

Abstract: Systems and methods for performing improper input data detection are described. In one example, a system comprises: hardware circuits configured to receive input data and to perform computations of a neural network based on the input data to generate computation outputs; and an improper input detection circuit configured to: determine a relationship between the computation outputs of the hardware circuits and reference outputs; determine that the input data are improper based on the relationship; and perform an action based on determining that the input data are improper.

Type: Application

Filed: May 5, 2023

Publication date: January 18, 2024

Inventors: Randy Renfu Huang, Richard John Heaton, Andrea Olgiati, Ron Diamant
Input batching with serial dynamic memory access

Patent number: 11875247

Abstract: An acceleration engine with multiple accelerators may share a common set of data that is used by each accelerator to perform computations on input data. The set of shared data can be loaded into the acceleration engine from an external memory. Instead of accessing the external memory multiple times to load the set of shared data into each accelerator, the external memory can be accessed once using direct memory access to load the set of shared data into the first accelerator. The set of shared data can then be serially loaded from one accelerator to the next accelerator in the acceleration engine using direct memory access. To achieve data parallelism and reduce computation time, a runtime driver may split the input data into data batches, and each accelerator can perform computations on a different batch of input data with the common set of shared data.

Type: Grant

Filed: June 18, 2020

Date of Patent: January 16, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Richard John Heaton, Ron Diamant
Memory access operation in distributed computing system

Patent number: 11874785

Abstract: In one example, an apparatus comprises: a local on-chip memory; a computation engine configured to generate local data and to store the local data at the local on-chip memory; and a controller. The apparatus is configured to be coupled with a second device via an interconnect, the second device comprising a local memory. The controller is configured to: fetch the local data from the local on-chip memory; fetch remote data generated by another device from a local off-chip memory; generate output data based on combining the local data and the remote data; and store, via the interconnect, the output data at the local memory of the second device.

Type: Grant

Filed: September 20, 2022

Date of Patent: January 16, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Patricio Kaplan, Ron Diamant
Direct memory access operation for neural network accelerator

Patent number: 11868872

Abstract: In one example, an apparatus comprises: a direct memory access (DMA) descriptor queue that stores DMA descriptors, each DMA descriptor including an indirect address; an address translation table that stores an address mapping between indirect addresses and physical addresses; and a DMA engine configured to: fetch a DMA descriptor from the DMA descriptor queue to the address translation table to translate a first indirect address of the DMA descriptor to a first physical address based on the address mapping, and perform a DMA operation based on executing the DMA descriptor to transfer data to or from the first physical address.

Type: Grant

Filed: March 31, 2020

Date of Patent: January 9, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Ilya Minkin, Ron Diamant, Kun Xu
Dynamic processing element array expansion

Patent number: 11868895

Abstract: A computer-implemented method includes receiving a neural network model that includes a tensor operation, dividing the tensor operation into a set of sub-operations, and generating instructions for performing a plurality of sub-operations of the set of sub-operations on respective computing engines of a plurality of computing engines on a same integrated circuit device or on different integrated circuit devices. Each sub-operation of the set of sub-operations generates a portion of a final output of the tensor operation. An inference is made based on a result of a sub-operation of the plurality of sub-operations, or based on results of the plurality of sub-operations.

Type: Grant

Filed: January 13, 2023

Date of Patent: January 9, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Randy Renfu Huang, Ron Diamant, Richard John Heaton
Data selection circuit

Patent number: 11868875

Abstract: Provided are systems and methods for operating a neural network processor, wherein the processor includes an input selector circuit that can be configured to select the data that will be input into the processor's computational array. In various implementations, the selector circuit can determine, for a row of the array, whether the row input will be the output from a buffer memory or data that the input selector circuit has selected for a different row. The row can receive an input feature map from a set of input data or an input feature map that was selected for inputting into a different row, such that the input feature map is input into more than one row at a time. The selector circuit can also include a delay circuit, so that the duplicated input feature map can be input into the computational array later than the original input feature map.

Type: Grant

Filed: September 10, 2018

Date of Patent: January 9, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Ron Diamant, Randy Renfu Huang, Jeffrey T. Huynh, Sundeep Amirineni
Executing sublayers of a fully-connected layer

Patent number: 11868878

Abstract: Disclosed herein are techniques for implementing a large fully-connected layer in an artificial neural network. The large fully-connected layer is grouped into multiple fully-connected subnetworks. Each fully-connected subnetwork is configured to classify an object into an unknown class or a class in a subset of target classes. If the object is classified as the unknown class by a fully-connected subnetwork, a next fully-connected subnetwork may be used to further classify the object. In some embodiments, the fully-connected layer is grouped based on a ranking of target classes.

Type: Grant

Filed: March 23, 2018

Date of Patent: January 9, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Randy Huang, Ron Diamant
Hardware security accelerator

Patent number: 11870761

Abstract: An integrated circuit device includes a packet type detection circuit, a security circuitry, and a configuration circuit. The packet type detection circuit is operable to determine a packet type of a packet based on header portions of the packet. The security circuit is operable to perform security processing of the packet according to a set of security parameters. The configuration circuit operable to determine the set of security parameters based on the packet type of the packet, an identifier associated with the packet, and an index associated with the packet, and provide the set of security parameters to the security circuit.

Type: Grant

Filed: October 14, 2022

Date of Patent: January 9, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Ron Diamant, Nafea Bshara, Leah Shalev, Erez Izenberg
Instructions with multiple memory access modes

Patent number: 11841792

Abstract: In one example, a hardware accelerator comprises: a programmable hardware instruction decoder programmed to store a plurality of opcodes; a programmable instruction schema mapping table implemented as a content addressable memory (CAM) and programmed to map the plurality of opcodes to a plurality of definitions of operands in a plurality of instructions; a hardware execution engine; and a controller configured to: receive an instruction that includes a first opcode of the plurality of opcodes; control the hardware instruction decoder to extract the first opcode from the instruction; obtain, from the instruction schema mapping table and based on the first opcode, a first definition of a first operand; and forward the instruction and the first definition to the hardware execution engine to control the hardware execution engine to extract the first operand from the instruction based on the first definition, and execute the instruction based on the first operand.

Type: Grant

Filed: December 9, 2019

Date of Patent: December 12, 2023

Assignee: Amazon Technologies, Inc.

Inventor: Ron Diamant
Dilated convolution using systolic array

Patent number: 11816559

Abstract: In one example, a non-transitory computer readable medium stores instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array; select a subset of input data elements from the memory into the systolic array to perform first computations of a dilated convolution operation, the subset being selected based on a rate of the dilated convolution operation and coordinates of the weight data element within the array of weight data elements; and control the systolic array to perform the first computations based on the first weight data element and the subset to generate first output data elements of an output data array. An example of a compiler that generates the instructions is also provided.

Type: Grant

Filed: June 3, 2022

Date of Patent: November 14, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY

Publication number: 20230359876

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

Type: Application

Filed: July 14, 2023

Publication date: November 9, 2023

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
PROCESSING FOR MULTIPLE INPUT DATA SETS

Publication number: 20230351186

Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.

Type: Application

Filed: May 5, 2023

Publication date: November 2, 2023

Inventors: Dana Michelle Vantrease, Ron Diamant, Thomas A. Volpe, Randy Huang

1 2 3 4 5 … next