Patents by Inventor Ron Diamant

Ron Diamant has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220318604
    Abstract: To reduce the storage size of weight tensors and speed up loading of weight tensors from system memory, a compression technique can be employed to remove zero values from a weight tensor before storing the weight tensor in system memory. A sparsity threshold can be enforced to achieve a compression ratio target by forcing small weight values to zero during training. When the weight tensor is loaded from system memory, a direct memory access (DMA) engine with an in-line decompression unit can decompress the weight tensor on-the-fly. By performing the decompression in the DMA engine, expansion of the weight values back to the original weight tensor size can be carried out in parallel while other neural network computations are being performed by the processing unit.
    Type: Application
    Filed: March 30, 2021
    Publication date: October 6, 2022
    Inventors: Kun Xu, Ron Diamant, Patricio Kaplan
  • Patent number: 11461631
    Abstract: Disclosed herein are techniques for scheduling and executing multi-layer neural network computations for multiple contexts. In one embodiment, a method comprises determining a set of computation tasks to be executed, the set of computation tasks including a first computation task and a second computation task, as well as a third computation task and a fourth computation task to provide input data for the first and second computation tasks; determining a first execution batch comprising the first and second computation tasks; determining a second execution batch comprising at least the third computation task to be executed before the first execution batch; determining whether to include the fourth computation task in the second execution batch based on whether the memory device has sufficient capacity to hold input data and output data of both of the third and fourth computation; executing the second execution batch followed by the first execution batch.
    Type: Grant
    Filed: March 22, 2018
    Date of Patent: October 4, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Dana Michelle Vantrease, Ron Diamant, Thomas A. Volpe, Randy Huang
  • Publication number: 20220292163
    Abstract: In one example, a non-transitory computer readable medium stores instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array; select a subset of input data elements from the memory into the systolic array to perform first computations of a dilated convolution operation, the subset being selected based on a rate of the dilated convolution operation and coordinates of the weight data element within the array of weight data elements; and control the systolic array to perform the first computations based on the first weight data element and the subset to generate first output data elements of an output data array. An example of a compiler that generates the instructions is also provided.
    Type: Application
    Filed: June 3, 2022
    Publication date: September 15, 2022
    Inventors: Jeffrey T. Huynh, Ron Diamant
  • Patent number: 11435941
    Abstract: In one example, an apparatus comprises: a memory array having an array of memory elements arranged in rows and columns, each memory element being configured to store a data element; and a memory access circuit configured to: perform a row write operation to store a first group of data elements at a first row of the array of memory elements; perform a column read operation at a first column of the array of memory elements to obtain a second group of data elements; and perform a column write operation to store a third group of data elements at the first column of the array of memory elements to replace the second group of data elements.
    Type: Grant
    Filed: June 24, 2020
    Date of Patent: September 6, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Kun Xu, Paul Gilbert Meyer, Ron Diamant
  • Patent number: 11429503
    Abstract: A self-detection mechanism for an IC is disclosed that determines whether the IC's internal bus is in a hanging state. An initialization sequence can be modified after a soft reset by reading data from an internal DRAM of the IC using a Direct Memory Access (DMA) controller as part of the initialization sequence. The read command is issued over the internal bus and, if the bus is hanging, the read command is not completed. Monitoring can be performed by waiting a predetermined period of time (e.g., 100 ms) to determine if the read was properly completed. If so, no further action is needed. If the read was not completed, then a hard reset is requested to be performed. Thus, an initialization sequence can be modified to run dummy transactions through the internal bus, and validate that all paths are functional.
    Type: Grant
    Filed: June 28, 2019
    Date of Patent: August 30, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Noga Smith, Ron Diamant, Saar Gross
  • Patent number: 11423313
    Abstract: Methods and systems for performing hardware approximation of function are provided. In one example, a system comprises a controller, configurable arithmetic circuits, and a mapping table. The mapping table stores a first set of function parameters in a first mode of operation and stores a second set of function parameters in a second mode of operation. Depending on the mode of operation, the controller may configure the arithmetic circuits to compute a first approximation result of a function at an input value based on the first set of function parameters, or to compute a second approximation result of the function at the input value based on the second set of function parameters and to perform post-processing, such as quantization, of the second approximation result.
    Type: Grant
    Filed: December 12, 2018
    Date of Patent: August 23, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Sundeep Amirineni, Mohammad El-Shabani, Kenneth Wayne Patton, Thomas Elmer
  • Patent number: 11409685
    Abstract: In one example, a method comprises: receiving, by a hardware data processor and from a network adapter, a transfer complete message indicating that the network adapter has initiated a transfer of data received from a network to the hardware data processor, the transfer being performed over an interconnect coupled between the hardware data processor and the network adapter; based on receiving the transfer complete message, performing, by the hardware data processor, a flush operation to fetch any remaining portion of the data buffered in the interconnect to a local memory of the hardware data processor; based on determining that flush operation is complete, storing, by the data hardware processor, the transfer complete message at the local memory; and based on determining that the transfer complete message is stored at the local memory, starting the computation operation of the data at the hardware data processor or preforming an error handling operation.
    Type: Grant
    Filed: September 24, 2020
    Date of Patent: August 9, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Patricio Kaplan, Ron Diamant
  • Patent number: 11379555
    Abstract: In one example, a non-transitory computer readable medium stores instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array; select a subset of input data elements from the memory into the systolic array to perform first computations of a dilated convolution operation, the subset being selected based on a rate of the dilated convolution operation and coordinates of the weight data element within the array of weight data elements; and control the systolic array to perform the first computations based on the first weight data element and the subset to generate first output data elements of an output data array. An example of a compiler that generates the instructions is also provided.
    Type: Grant
    Filed: June 28, 2019
    Date of Patent: July 5, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Jeffrey T. Huynh, Ron Diamant
  • Publication number: 20220188073
    Abstract: To reduce power consumption, data bits or a portion of a data register that is not expected to toggle frequently can be grouped together, and be clock-gated independently from the rest of the data register. The grouping of the data bits can be determined based on the data types of the workload being operated on. For a data register configured to store a numeric value that supports multiple data types, the portion of the data register being clock-gated may store a group of data bits that are unused for one or more data types of the multiple data types supported by the data register. The portion of the data register being clock-gated can also be a group of data bits that remain unchanged or have a constant value for numeric values within a certain numeric range that is frequently operated on.
    Type: Application
    Filed: December 11, 2020
    Publication date: June 16, 2022
    Inventors: Joshua Wayne Bowman, Thomas A. Volpe, Sundeep Amirineni, Nishith Desai, Ron Diamant
  • Patent number: 11354258
    Abstract: In one example, an apparatus comprises: a first local memory, a computation engine configured to generate local data and to store the local data at the first local memory, and a controller. The apparatus is coupled with a host processor and a second device via an interconnect, the second device comprising a second local memory, the host processor hosting an application. The controller is configured to: receive, from the second device, a first message indicating that first data is stored in the second local memory; based on the first message: fetch the first data from the second local memory via the interconnect; control the computation engine to perform a computation operation on the first data to generate second data to support the application hosted by the host processor; and transmit, to the second device, a second message indicating that the second data is stored in the first local memory.
    Type: Grant
    Filed: September 30, 2020
    Date of Patent: June 7, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Patricio Kaplan, Ron Diamant
  • Patent number: 11347480
    Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.
    Type: Grant
    Filed: December 15, 2020
    Date of Patent: May 31, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Haichen Li, Ron Diamant, Jeffrey T. Huynh, Yu Zhou, Se jong Oh
  • Patent number: 11334358
    Abstract: In one example, a hardware accelerator comprises: a programmable hardware instruction decoder programmed to store a plurality of opcodes; a programmable instruction schema mapping table implemented as a content addressable memory (CAM) and programmed to map the plurality of opcodes to a plurality of definitions of operands in a plurality of instructions; a hardware execution engine; and a controller configured to: receive an instruction that includes a first opcode of the plurality of opcodes; control the hardware instruction decoder to extract the first opcode from the instruction; obtain, from the instruction schema mapping table and based on the first opcode, a first definition of a first operand; and forward the instruction and the first definition to the hardware execution engine to control the hardware execution engine to extract the first operand from the instruction based on the first definition, and execute the instruction based on the first operand.
    Type: Grant
    Filed: December 9, 2019
    Date of Patent: May 17, 2022
    Assignee: Amazon Technologies, Inc.
    Inventor: Ron Diamant
  • Patent number: 11321179
    Abstract: A circuit at an interface between a device and an interconnect fabric is configured to track outstanding transactions associated with the device and ensure the completion of the outstanding transactions before rebooting or powering down the device. In some embodiments, the circuit is also configurable to provide appropriate responses when the device is powered down or is being rebooted such that other devices in the system can still operate even without knowing that the device is inactive and would not hang because no response is received from the device.
    Type: Grant
    Filed: August 24, 2020
    Date of Patent: May 3, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Kun Xu, Thomas A. Volpe, Ron Diamant, Mark Anthony Banse
  • Patent number: 11314842
    Abstract: Methods and systems for performing hardware computations of mathematical functions are provided. In one example, a system comprises a mapping table that maps each base value of a plurality of base values to parameters related to a mathematical function; a selection module configured to select, based on an input value, a first base value and first parameters mapped to the first base value in the mapping table; and arithmetic circuits configured to: receive, from the mapping table, the first base value and the first plurality of parameters; and compute, based on a relationship between the input value and the first base value, and based on the first parameters, an estimated output value of the mathematical function for the input value.
    Type: Grant
    Filed: August 7, 2020
    Date of Patent: April 26, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Randy Renfu Huang, Mohammad El-Shabani, Sundeep Amirineni, Kenneth Wayne Patton, Willis Wang
  • Patent number: 11308396
    Abstract: Techniques are disclosed for debugging a neural network execution on a target processor. A reference processor may generate a plurality of first reference tensors for the neural network. The neural network may be repeatedly reduced to produce a plurality of lengths. For each of the lengths, a compiler converts the neural network into first machine instructions, the target processor executes the first machine instructions to generate a first device tensor, and the debugger program determines whether the first device tensor matches a first reference tensor. A shortest length is identified for which the first device tensor does not match the first reference tensor. Tensor output is enabled for a lower-level intermediate representation of the shortest neural network, and the neural network is converted into second machine instructions, which are executed by the target processor to generate a second device tensor.
    Type: Grant
    Filed: June 27, 2019
    Date of Patent: April 19, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Jindrich Zejda, Jeffrey T. Huynh, Drazen Borkovic, Se jong Oh, Ron Diamant, Randy Renfu Huang
  • Patent number: 11294599
    Abstract: Provided are integrated circuits and methods for operating integrated circuits. An integrated circuit can include a plurality of memory banks and an execution engine including a set of execution components. Each execution component can be associated with a respective memory bank and can read from and write to the respective memory bank. The integrated circuit can further include a set of registers each associated with a respective memory bank from the plurality of memory banks. The integrated circuit can further be operable to load to or store from the set of registers in parallel, and load to or store from the set of registers serially. A parallel operation followed by a serial operation enables data to be moved from many memory banks into one memory bank. A serial operation followed by a parallel operation enables data to be moved from one memory bank into many memory banks.
    Type: Grant
    Filed: June 3, 2020
    Date of Patent: April 5, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Randy Renfu Huang, Sundeep Amirineni, Jeffrey T. Huynh
  • Patent number: 11294841
    Abstract: Techniques disclosed herein relate to dynamically configurable multi-stage pipeline processing units. In one embodiment, a circuit includes a plurality of processing engines and a plurality of switches. Each of the plurality of processing engines includes an input port and an output port. Each of the plurality of switches comprises two input ports and two output ports. For each processing engine, the input port of the processing engine is electrically coupled to one of the switches, the output port of the processing engine is electrically coupled to another one of the switches, and the input port of the processing engine is electrically coupled to the output port of each of the processing engines by the switches.
    Type: Grant
    Filed: August 4, 2020
    Date of Patent: April 5, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Adiel Sarusi, Ron Diamant, Ori Weber, Erez Izenberg
  • Patent number: 11275997
    Abstract: Disclosed herein are techniques for obtain weights for neural network computations. In one embodiment, an integrated circuit may include memory configured to store a first weight and a second weight; a row of processing elements comprising a first processing element and a second processing element, the first processing element comprising a first weight register, the second processing element comprising a second weight register, both of the first weight register and the second weight register being controllable by a weight load signal; and a controller configured to: provide the first weight from the memory to the row of processing elements; set the weight load signal to enable the first weight to propagate through the row to reach the first processing element; and set the weight load signal to store the first weight at the first weight register and the flush value at the second weight register.
    Type: Grant
    Filed: April 30, 2018
    Date of Patent: March 15, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Dana Michelle Vantrease, Ron Diamant, Sundeep Amirineni
  • Patent number: 11275661
    Abstract: A method of generating instructions to be executed by a plurality of execution engines that shares a resource is provided. The method comprises, in a first generation step: reading a first engine logical timestamp vector of a first execution engine of the execution engines, the logical timestamp representing a history of access operations for the resource; determining whether the first engine logical timestamp vector includes a most-up-to-date logical timestamp of the resource in the first generation step; based on the first engine logical timestamp vector including the most-up-to-date logical timestamp of the resource in the first generation step, generating an access instruction to be executed by the first execution engine to access the resource; and scheduling the first execution engine to execute the access instruction.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: March 15, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Dana Michelle Vantrease, Ron Diamant
  • Patent number: 11263517
    Abstract: Disclosed herein are techniques for obtain weights for neural network computations. In one embodiment, an integrated circuit may include an arithmetic circuit configured to perform arithmetic operations for a neural network. The integrated circuit may also include a weight processing circuit configured to: acquire data from a memory device; receive configuration information indicating a size of each quantized weight of a set of quantized weights; extract the set of quantized weights from the data based on the size of the each weight indicated by the configuration information; perform de-quantization processing on the set of quantized weights to generate a set of de-quantized weights; and provide the set of de-quantized weights to the arithmetic circuit to enable the arithmetic circuit to perform the arithmetic operations. The memory device may be part of or external to the integrated circuit.
    Type: Grant
    Filed: February 28, 2018
    Date of Patent: March 1, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Randy Huang