Patents by Inventor Drazen Borkovic

Drazen Borkovic has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Breakpoints in neural network accelerator

Patent number: 12210438

Abstract: Techniques are disclosed for setting a breakpoint for debugging a neural network. User input is received by a debugger program executable by a host processor indicating a target layer of a neural network at which to halt execution of the neural network. The neural network includes a first set of instructions to be executed by a first execution engine and a second set of instructions to be executed by a second execution engine. A first halt point is set within the first set of instructions and a second halt point is set within the second set of instructions. It is then determined that operation of the first execution engine and the second execution engine has halted. It is then determined that the first execution engine has reached the first halt point. The second execution engine is then caused to move through instructions until reaching the second halt point.

Type: Grant

Filed: September 19, 2022

Date of Patent: January 28, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Samuel Jacob, Drazen Borkovic, Yu Zhou, Mohammad El-Shabani
Efficient utilization of processing element array

Patent number: 12198041

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

Type: Grant

Filed: July 14, 2023

Date of Patent: January 14, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
Hierarchical partitioning of operators

Patent number: 12182688

Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.

Type: Grant

Filed: November 27, 2019

Date of Patent: December 31, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Animesh Jain, Yizhi Liu, Hongbin Zheng, Jeffrey T. Huynh, Haichen Li, Drazen Borkovic, Jindrich Zejda, Richard John Heaton, Randy Renfu Huang, Zhi Chen, Yida Wang
Using vector clocks to simplify a dependency graph in a neural network accelerator

Patent number: 12159217

Abstract: Methods for simplifying a dependency graph in a neural network accelerator are provided. Computations and data movements for the neural network accelerator may be described with a flow graph, where graph nodes represent computation or data movement operations and graph edges represent dependencies between operations. A flow graph may contain redundant edges that can be removed while retaining the reachability of each of the nodes in the graph. To identify redundant edges, a compiler may generate vector clocks to track the relationships of operations performed by various execution engines prior to execution of a program reaching a given node or operation. Redundant edges may be identified and removed based on the relative values of the vector clocks to reduce the complexity of the graph.

Type: Grant

Filed: March 25, 2020

Date of Patent: December 3, 2024

Assignee: Amazon Technologies, Inc.

Inventor: Drazen Borkovic
Vector clocks for highly concurrent execution engines

Patent number: 12106102

Abstract: Disclosed are methods, systems, and other techniques for modeling concurrency between a set of nodes to be executed on a set of execution engines of an integrated circuit device. A computation graph that includes the set of nodes is received. A set of edges connecting the set of nodes are determined based on the computation graph. An edge type for each of the set of edges is determined based on the computation graph, the edge type indicating a type of synchronization between connected nodes. A vector clock is generated for each of the set of nodes, the vector clock for a particular node being calculated based on the vector clock for each connected preceding node and the edge type for the one of the set of edges that connects each connected preceding node and the particular node.

Type: Grant

Filed: July 13, 2023

Date of Patent: October 1, 2024

Assignee: Amazon Technologies, Inc.

Inventor: Drazen Borkovic
Static memory allocation for neural network inference

Patent number: 12093806

Abstract: Static memory allocation may be performed for weight values across multiple processing units executing a neural network. A neural network may be received for execution across multiple processing units. A partitioning scheme may be applied to divide the neural network into subgraphs. The subgraphs may be assigned to different processing units. The weights for the operations of the subgraph may be statically allocated in dedicated caches for the processing units as part of the instructions to execute the neural network across the processing units.

Type: Grant

Filed: July 1, 2019

Date of Patent: September 17, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Jindrich Zejda, Ron Diamant, Jeffrey T. Huynh, Drazen Borkovic, Randy Renfu Huang, Richard John Heaton
Reconfigurable neural network processing based on subgraph recognition

Patent number: 12045611

Abstract: In one example, a method comprises: receiving input codes, wherein the input codes represent a computational dataflow graph; traversing the computational dataflow graph to identify single-entry-single-exit (SESE) subgraphs of the computational dataflow graph, wherein each SESE subgraph has a sequence of nodes comprising a root node and a child node and representing a sequence of element-wise operators, wherein the root node receives a single input tensor, and wherein the child node outputs a single output tensor; determining a merged operator for each SESE subgraph; and generating executable instructions for the computational dataflow graph to be executed by a hardware accelerator having a first execution unit and a second execution unit, wherein the executable instructions comprise first executable instructions for the merged operators targeted at the first execution unit, and second executable instructions for other operators of the computational dataflow graph targeted at the second execution unit.

Type: Grant

Filed: August 7, 2023

Date of Patent: July 23, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Ron Diamant, Hongbin Zheng, Drazen Borkovic, Haichen Li
DMA synchronization using alternating semaphores

Patent number: 11847507

Abstract: Two or more semaphores can be used per queue for synchronization of direct memory access (DMA) transfers between a DMA engine and various computational engines by alternating the semaphores across sequential sets of consecutive DMA transfers in the queue. The DMA engine can increment a first semaphore after performing each DMA transfer of a first set of consecutive DMA transfers and a second semaphore after performing each DMA transfer of a second set of consecutive DMA transfers that is after the first set of consecutive DMA transfers in the queue. Each semaphore can be reset when all the computational engines that are dependent on the respective set of consecutive DMA transfers are done waiting on the given semaphore before performing respective operations. After reset, the first semaphore or the second semaphore can be reused for the next set of consecutive DMA transfers in the queue.

Type: Grant

Filed: December 2, 2020

Date of Patent: December 19, 2023

Assignee: Amazon Technologies, Inc.

Inventor: Drazen Borkovic
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY

Publication number: 20230359876

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

Type: Application

Filed: July 14, 2023

Publication date: November 9, 2023

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
Reconfigurable neural network processing based on subgraph recognition

Patent number: 11782706

Abstract: In one example, a method comprises: receiving input codes, wherein the input codes represent a computational dataflow graph; traversing the computational dataflow graph to identify single-entry-single-exit (SESE) subgraphs of the computational dataflow graph, wherein each SESE subgraph has a sequence of nodes comprising a root node and a child node and representing a sequence of element-wise operators, wherein the root node receives a single input tensor, and wherein the child node outputs a single output tensor; determining a merged operator for each SESE subgraph; and generating executable instructions for the computational dataflow graph to be executed by a hardware accelerator having a first execution unit and a second execution unit, wherein the executable instructions comprise first executable instructions for the merged operators targeted at the first execution unit, and second executable instructions for other operators of the computational dataflow graph targeted at the second execution unit.

Type: Grant

Filed: June 29, 2021

Date of Patent: October 10, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Ron Diamant, Hongbin Zheng, Drazen Borkovic, Haichen Li
Vector clocks for highly concurrent execution engines

Patent number: 11775299

Abstract: Disclosed are methods, systems, and other techniques for modeling concurrency between a set of nodes to be executed on a set of execution engines of an integrated circuit device. A computation graph that includes the set of nodes is received. A set of edges connecting the set of nodes are determined based on the computation graph. An edge type for each of the set of edges is determined based on the computation graph, the edge type indicating a type of synchronization between connected nodes. A vector clock is generated for each of the set of nodes, the vector clock for a particular node being calculated based on the vector clock for each connected preceding node and the edge type for the one of the set of edges that connects each connected preceding node and the particular node.

Type: Grant

Filed: March 29, 2021

Date of Patent: October 3, 2023

Assignee: Amazon Technologies, Inc.

Inventor: Drazen Borkovic
Saving intermediate outputs of a neural network

Patent number: 11748622

Abstract: A computing system is configured to access intermediate outputs of a neural network by augmenting a data flow graph generated for the neural network. The data flow graph includes a plurality of nodes interconnected by connections, each node representing an operation to be executed by the neural network. To access the intermediate output, the data flow graph is augmented by inserting a node representing an operation that saves the output of a node which produces the intermediate output. The node representing the save operation is inserted while maintaining all existing nodes and connections in the data flow graph, thereby preserving the behavior of the data flow graph. The augmenting can be performed using a compiler that generates the data flow graph from program code.

Type: Grant

Filed: March 4, 2019

Date of Patent: September 5, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Drazen Borkovic, Se jong Oh
Efficient utilization of processing element array

Patent number: 11741350

Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

Type: Grant

Filed: November 27, 2019

Date of Patent: August 29, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
Time-based memory allocation for neural network inference

Patent number: 11610102

Abstract: Techniques for time-based memory allocation for a neural network inference are disclosed. A description of a neural network comprising a plurality of operations to be executed across a set of accelerators is received. A plurality of interconnect times at a plurality of partition points within the neural network are calculated. Each of the plurality of interconnect times corresponds to a duration of time for transferring an output feature map from one of the set of accelerators to another of the set of accelerators to be used as an input feature map. A partitioning scheme that divides the plurality of operations into a set of subgraphs is determined based on the plurality of interconnect times. Each of the set of subgraphs is assigned to a different accelerator of the set of accelerators in accordance with the partitioning scheme.

Type: Grant

Filed: November 27, 2019

Date of Patent: March 21, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jindrich Zejda, Drazen Borkovic
Neural network operation reordering for parallel execution

Patent number: 11567778

Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

Type: Grant

Filed: April 28, 2021

Date of Patent: January 31, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Drazen Borkovic, Jindrich Zejda, Randy Renfu Huang, Ron Diamant
Allocation and placement of resources for network computation

Patent number: 11561833

Abstract: Techniques for operating a computing system to perform neural network operations are disclosed. In one example, a method comprises receiving a neural network model, determining a sequence of neural network operations based on data dependency in the neural network model, and determining a set of instructions to map the sequence of neural network operations to the processing resources of the neural network processor. The method further comprises determining, based on a set of memory access operations included in the set of instructions, a first set of memory references associated with a first location of an external memory to store the input data and a second set of memory references associated with a second location of the external memory to store the output data, and generating an instruction file including the set of instructions, the first set of memory references and the second set of memory references.

Type: Grant

Filed: June 28, 2018

Date of Patent: January 24, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Richard John Heaton, Randy Renfu Huang, Drazen Borkovic, Jindrich Zejda
Configurable delay insertion in compiled instructions

Patent number: 11556342

Abstract: Techniques are disclosed for utilizing configurable delays in an instruction stream. A set of instructions to be executed on a set of engines are generated. The set of engines are distributed between a set of hardware elements. A set of configurable delays are inserted into the set of instructions. Each of the set of configurable delays includes an adjustable delay amount that delays an execution of the set of instructions on the set of engines. The adjustable delay amount is adjustable by a runtime application that facilitates the execution of the set of instructions on the set of engines. The runtime application is configured to determine a runtime condition associated with the execution of the set of instructions on the set of engines and to adjust the set of configurable delays based on the runtime condition.

Type: Grant

Filed: September 24, 2020

Date of Patent: January 17, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Ravi Kumar, Drazen Borkovic
Breakpoints in neural network accelerator

Patent number: 11467946

Abstract: Techniques are disclosed for setting a breakpoint for debugging a neural network. User input is received by a debugger program executable by a host processor indicating a target layer of a neural network at which to halt execution of the neural network. The neural network includes a first set of instructions to be executed by a first execution engine and a second set of instructions to be executed by a second execution engine. A first halt point is set within the first set of instructions and a second halt point is set within the second set of instructions. It is then determined that operation of the first execution engine and the second execution engine has halted. It is then determined that the first execution engine has reached the first halt point. The second execution engine is then caused to move through instructions until reaching the second halt point.

Type: Grant

Filed: March 28, 2019

Date of Patent: October 11, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Samuel Jacob, Drazen Borkovic, Yu Zhou, Mohammad El-Shabani
Event assignment for synchronization of concurrent execution engines

Patent number: 11442794

Abstract: Techniques for synchronizing operations of execution engines of an integrated circuit device are disclosed. A description of a plurality of operations to be performed by the execution engines may be obtained. The plurality of operations may be connected through a plurality of edges. A dependency vector may be generated for each operation of the plurality of operations. The dependency vector of a corresponding operation may include a set of values that are calculated based on the set of values of one or more dependency vectors calculated for one or more immediately preceding operations of the plurality of operations. An event register of a plurality of event registers may be assigned, for each edge of one or more of the plurality of edges, to the corresponding edge based on the dependency vector generated for a start operation associated with the corresponding edge.

Type: Grant

Filed: September 27, 2019

Date of Patent: September 13, 2022

Assignee: Amazon Technologies, Inc.

Inventor: Drazen Borkovic
Efficient race-condition detection

Patent number: 11354130

Abstract: Techniques for detecting a data race condition between multiple execution engines of an integrated circuit device are provided. Computations and data movements involving execution engines of an integrated circuit may be described with a flow graph, where graph nodes represent computation or data movement operations and graph edges represent dependencies between the operations. When a graph has incorrect dependencies, data races may result. To detect data race conditions, compiler-generated vector clocks that track the relationships of operations performed by various execution engines may be used to determine concurrent operations between nodes of different execution engines, and memory access patterns for the operations may be compared to determine if the concurrent operations access the same memory address.

Type: Grant

Filed: March 19, 2020

Date of Patent: June 7, 2022

Assignee: Amazon Technologies, Inc.

Inventor: Drazen Borkovic

1 2 3 next