Patents by Inventor Drazen Borkovic

Drazen Borkovic has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DMA synchronization using alternating semaphores

Patent number: 11847507

Abstract: Two or more semaphores can be used per queue for synchronization of direct memory access (DMA) transfers between a DMA engine and various computational engines by alternating the semaphores across sequential sets of consecutive DMA transfers in the queue. The DMA engine can increment a first semaphore after performing each DMA transfer of a first set of consecutive DMA transfers and a second semaphore after performing each DMA transfer of a second set of consecutive DMA transfers that is after the first set of consecutive DMA transfers in the queue. Each semaphore can be reset when all the computational engines that are dependent on the respective set of consecutive DMA transfers are done waiting on the given semaphore before performing respective operations. After reset, the first semaphore or the second semaphore can be reused for the next set of consecutive DMA transfers in the queue.

Type: Grant

Filed: December 2, 2020

Date of Patent: December 19, 2023

Assignee: Amazon Technologies, Inc.

Inventor: Drazen Borkovic
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY

Publication number: 20230359876

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

Type: Application

Filed: July 14, 2023

Publication date: November 9, 2023

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
Reconfigurable neural network processing based on subgraph recognition

Patent number: 11782706

Abstract: In one example, a method comprises: receiving input codes, wherein the input codes represent a computational dataflow graph; traversing the computational dataflow graph to identify single-entry-single-exit (SESE) subgraphs of the computational dataflow graph, wherein each SESE subgraph has a sequence of nodes comprising a root node and a child node and representing a sequence of element-wise operators, wherein the root node receives a single input tensor, and wherein the child node outputs a single output tensor; determining a merged operator for each SESE subgraph; and generating executable instructions for the computational dataflow graph to be executed by a hardware accelerator having a first execution unit and a second execution unit, wherein the executable instructions comprise first executable instructions for the merged operators targeted at the first execution unit, and second executable instructions for other operators of the computational dataflow graph targeted at the second execution unit.

Type: Grant

Filed: June 29, 2021

Date of Patent: October 10, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Ron Diamant, Hongbin Zheng, Drazen Borkovic, Haichen Li
Vector clocks for highly concurrent execution engines

Patent number: 11775299

Abstract: Disclosed are methods, systems, and other techniques for modeling concurrency between a set of nodes to be executed on a set of execution engines of an integrated circuit device. A computation graph that includes the set of nodes is received. A set of edges connecting the set of nodes are determined based on the computation graph. An edge type for each of the set of edges is determined based on the computation graph, the edge type indicating a type of synchronization between connected nodes. A vector clock is generated for each of the set of nodes, the vector clock for a particular node being calculated based on the vector clock for each connected preceding node and the edge type for the one of the set of edges that connects each connected preceding node and the particular node.

Type: Grant

Filed: March 29, 2021

Date of Patent: October 3, 2023

Assignee: Amazon Technologies, Inc.

Inventor: Drazen Borkovic
Saving intermediate outputs of a neural network

Patent number: 11748622

Abstract: A computing system is configured to access intermediate outputs of a neural network by augmenting a data flow graph generated for the neural network. The data flow graph includes a plurality of nodes interconnected by connections, each node representing an operation to be executed by the neural network. To access the intermediate output, the data flow graph is augmented by inserting a node representing an operation that saves the output of a node which produces the intermediate output. The node representing the save operation is inserted while maintaining all existing nodes and connections in the data flow graph, thereby preserving the behavior of the data flow graph. The augmenting can be performed using a compiler that generates the data flow graph from program code.

Type: Grant

Filed: March 4, 2019

Date of Patent: September 5, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Drazen Borkovic, Se jong Oh
Efficient utilization of processing element array

Patent number: 11741350

Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

Type: Grant

Filed: November 27, 2019

Date of Patent: August 29, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
Time-based memory allocation for neural network inference

Patent number: 11610102

Abstract: Techniques for time-based memory allocation for a neural network inference are disclosed. A description of a neural network comprising a plurality of operations to be executed across a set of accelerators is received. A plurality of interconnect times at a plurality of partition points within the neural network are calculated. Each of the plurality of interconnect times corresponds to a duration of time for transferring an output feature map from one of the set of accelerators to another of the set of accelerators to be used as an input feature map. A partitioning scheme that divides the plurality of operations into a set of subgraphs is determined based on the plurality of interconnect times. Each of the set of subgraphs is assigned to a different accelerator of the set of accelerators in accordance with the partitioning scheme.

Type: Grant

Filed: November 27, 2019

Date of Patent: March 21, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jindrich Zejda, Drazen Borkovic
Neural network operation reordering for parallel execution

Patent number: 11567778

Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

Type: Grant

Filed: April 28, 2021

Date of Patent: January 31, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Drazen Borkovic, Jindrich Zejda, Randy Renfu Huang, Ron Diamant
Allocation and placement of resources for network computation

Patent number: 11561833

Abstract: Techniques for operating a computing system to perform neural network operations are disclosed. In one example, a method comprises receiving a neural network model, determining a sequence of neural network operations based on data dependency in the neural network model, and determining a set of instructions to map the sequence of neural network operations to the processing resources of the neural network processor. The method further comprises determining, based on a set of memory access operations included in the set of instructions, a first set of memory references associated with a first location of an external memory to store the input data and a second set of memory references associated with a second location of the external memory to store the output data, and generating an instruction file including the set of instructions, the first set of memory references and the second set of memory references.

Type: Grant

Filed: June 28, 2018

Date of Patent: January 24, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Richard John Heaton, Randy Renfu Huang, Drazen Borkovic, Jindrich Zejda
Configurable delay insertion in compiled instructions

Patent number: 11556342

Abstract: Techniques are disclosed for utilizing configurable delays in an instruction stream. A set of instructions to be executed on a set of engines are generated. The set of engines are distributed between a set of hardware elements. A set of configurable delays are inserted into the set of instructions. Each of the set of configurable delays includes an adjustable delay amount that delays an execution of the set of instructions on the set of engines. The adjustable delay amount is adjustable by a runtime application that facilitates the execution of the set of instructions on the set of engines. The runtime application is configured to determine a runtime condition associated with the execution of the set of instructions on the set of engines and to adjust the set of configurable delays based on the runtime condition.

Type: Grant

Filed: September 24, 2020

Date of Patent: January 17, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Ravi Kumar, Drazen Borkovic
Breakpoints in neural network accelerator

Patent number: 11467946

Abstract: Techniques are disclosed for setting a breakpoint for debugging a neural network. User input is received by a debugger program executable by a host processor indicating a target layer of a neural network at which to halt execution of the neural network. The neural network includes a first set of instructions to be executed by a first execution engine and a second set of instructions to be executed by a second execution engine. A first halt point is set within the first set of instructions and a second halt point is set within the second set of instructions. It is then determined that operation of the first execution engine and the second execution engine has halted. It is then determined that the first execution engine has reached the first halt point. The second execution engine is then caused to move through instructions until reaching the second halt point.

Type: Grant

Filed: March 28, 2019

Date of Patent: October 11, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Samuel Jacob, Drazen Borkovic, Yu Zhou, Mohammad El-Shabani
Event assignment for synchronization of concurrent execution engines

Patent number: 11442794

Abstract: Techniques for synchronizing operations of execution engines of an integrated circuit device are disclosed. A description of a plurality of operations to be performed by the execution engines may be obtained. The plurality of operations may be connected through a plurality of edges. A dependency vector may be generated for each operation of the plurality of operations. The dependency vector of a corresponding operation may include a set of values that are calculated based on the set of values of one or more dependency vectors calculated for one or more immediately preceding operations of the plurality of operations. An event register of a plurality of event registers may be assigned, for each edge of one or more of the plurality of edges, to the corresponding edge based on the dependency vector generated for a start operation associated with the corresponding edge.

Type: Grant

Filed: September 27, 2019

Date of Patent: September 13, 2022

Assignee: Amazon Technologies, Inc.

Inventor: Drazen Borkovic
Efficient race-condition detection

Patent number: 11354130

Abstract: Techniques for detecting a data race condition between multiple execution engines of an integrated circuit device are provided. Computations and data movements involving execution engines of an integrated circuit may be described with a flow graph, where graph nodes represent computation or data movement operations and graph edges represent dependencies between the operations. When a graph has incorrect dependencies, data races may result. To detect data race conditions, compiler-generated vector clocks that track the relationships of operations performed by various execution engines may be used to determine concurrent operations between nodes of different execution engines, and memory access patterns for the operations may be compared to determine if the concurrent operations access the same memory address.

Type: Grant

Filed: March 19, 2020

Date of Patent: June 7, 2022

Assignee: Amazon Technologies, Inc.

Inventor: Drazen Borkovic
Neural network layer-by-layer debugging

Patent number: 11308396

Abstract: Techniques are disclosed for debugging a neural network execution on a target processor. A reference processor may generate a plurality of first reference tensors for the neural network. The neural network may be repeatedly reduced to produce a plurality of lengths. For each of the lengths, a compiler converts the neural network into first machine instructions, the target processor executes the first machine instructions to generate a first device tensor, and the debugger program determines whether the first device tensor matches a first reference tensor. A shortest length is identified for which the first device tensor does not match the first reference tensor. Tensor output is enabled for a lower-level intermediate representation of the shortest neural network, and the neural network is converted into second machine instructions, which are executed by the target processor to generate a second device tensor.

Type: Grant

Filed: June 27, 2019

Date of Patent: April 19, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Jindrich Zejda, Jeffrey T. Huynh, Drazen Borkovic, Se jong Oh, Ron Diamant, Randy Renfu Huang
Synchronization of DMA transfers for large number of queues

Patent number: 11221979

Abstract: Synchronization of a plurality of aggregate DMA transfers on large number of DMA queues can be achieved using a small number of semaphores. One or more semaphores from M semaphores can be assigned to each aggregate DMA transfer based on round-robin or another suitable method. Each aggregate DMA transfer can comprise N DMA transfers, where M is smaller than N. Each DMA transfer can be assigned to one of the assigned one or more semaphores from the M semaphores. Each DMA engine of N DMA engines can increment the assigned semaphore after performing a respective DMA transfer of the N DMA transfers. A computational engine waiting on completion of a certain aggregate DMA transfer can perform an operation based upon the one or more assigned semaphores reaching respective threshold values.

Type: Grant

Filed: November 24, 2020

Date of Patent: January 11, 2022

Assignee: Amazon Technologies, Inc.

Inventor: Drazen Borkovic
Low latency neural network model loading

Patent number: 11182314

Abstract: An integrated circuit device implementing a neural network accelerator may have a peripheral bus interface to interface with a host memory, and neural network models can be loaded from the host memory onto the state buffer of the neural network accelerator for execution by the array of processing elements. The neural network accelerator may also have a memory interface to interface with a local memory. The local memory may store neural network models from the host memory, and the models can be loaded from the local memory into the state buffer with reduced latency as compared to loading from the host memory. In systems with multiple accelerators, the models in the local memory can also be shared amongst different accelerators.

Type: Grant

Filed: November 27, 2019

Date of Patent: November 23, 2021

Assignee: Amazon Techaologies, Inc.

Inventors: Drazen Borkovic, Ilya Minkin, Vignesh Vivekraja, Richard John Heaton, Randy Renfu Huang
Synchronization of concurrent computation engines

Patent number: 11175919

Abstract: Integrated circuit devices and methods for synchronizing execution of program code for multiple concurrently operating execution engines of the integrated circuit devices are provided. In some cases, one execution engine of an integrated circuit device may be dependent on the operation of another execution engine of the integrated circuit device. To synchronize the execution engines around the dependency, a first execution engine may execute an instruction to set a value in a register while a second execution engine may execute an instruction to wait for a condition associated with the register value.

Type: Grant

Filed: December 13, 2018

Date of Patent: November 16, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Ilya Minkin, Ron Diamant, Drazen Borkovic, Jindrich Zejda, Dana Michelle Vantrease
NEURAL NETWORK OPERATION REORDERING FOR PARALLEL EXECUTION

Publication number: 20210247984

Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

Type: Application

Filed: April 28, 2021

Publication date: August 12, 2021

Inventors: Jeffrey T. Huynh, Drazen Borkovic, Jindrich Zejda, Randy Renfu Huang, Ron Diamant
Synchronization of concurrent computation engines

Patent number: 11061654

Abstract: Provided are systems and methods for synchronizing program code execution for a plurality of execution engines in an integrated circuit device. In some cases, the operation of one execution engine may be dependent on the operation of another execution engine. To accommodate this dependency, the instructions for the first execution engine can include a set-event instruction and the instructions for the second execution engine can include a wait-on-event instruction. The wait-on-event instruction can cause the second execution engine to wait for the first execution engine to reach the set-event instruction. In this way, the two execution engines can be synchronized around the data or resource dependency.

Type: Grant

Filed: December 12, 2018

Date of Patent: July 13, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Drazen Borkovic, Jindrich Zejda, Taemin Kim, Ron Diamant
HIERARCHICAL PARTITIONING OF OPERATORS

Publication number: 20210158131

Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.

Type: Application

Filed: November 27, 2019

Publication date: May 27, 2021

Inventors: Animesh Jain, Yizhi Liu, Hongbin Zheng, Jeffrey T. Huynh, Haichen Li, Drazen Borkovic, Jindrich Zejda, Richard John Heaton, Randy Renfu Huang, Zhi Chen, Yida Wang

1 2 3 next