Patents by Inventor Jindrich Zejda

Jindrich Zejda has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Hierarchical partitioning of operators

Patent number: 12182688

Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.

Type: Grant

Filed: November 27, 2019

Date of Patent: December 31, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Animesh Jain, Yizhi Liu, Hongbin Zheng, Jeffrey T. Huynh, Haichen Li, Drazen Borkovic, Jindrich Zejda, Richard John Heaton, Randy Renfu Huang, Zhi Chen, Yida Wang
Static memory allocation for neural network inference

Patent number: 12093806

Abstract: Static memory allocation may be performed for weight values across multiple processing units executing a neural network. A neural network may be received for execution across multiple processing units. A partitioning scheme may be applied to divide the neural network into subgraphs. The subgraphs may be assigned to different processing units. The weights for the operations of the subgraph may be statically allocated in dedicated caches for the processing units as part of the instructions to execute the neural network across the processing units.

Type: Grant

Filed: July 1, 2019

Date of Patent: September 17, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Jindrich Zejda, Ron Diamant, Jeffrey T. Huynh, Drazen Borkovic, Randy Renfu Huang, Richard John Heaton
Software defined neural network layer pipelining

Patent number: 12086572

Abstract: Embodiments herein describe techniques for expressing the layers of a neural network in a software model. In one embodiment, the software model includes a class that describes the various functional blocks (e.g., convolution units, max-pooling units, rectified linear units (ReLU), and scaling functions) used to execute the neural network layers. In turn, other classes in the software model can describe the operation of each of the functional blocks. In addition, the software model can include conditional logic for expressing how the data flows between the functional blocks since different layers in the neural network can process the data differently. A compiler can convert the high-level code in the software model (e.g., C++) into a hardware description language (e.g., register transfer level (RTL)) which is used to configure a hardware system to implement a neural network accelerator.

Type: Grant

Filed: October 17, 2017

Date of Patent: September 10, 2024

Assignee: XILINX, INC.

Inventors: Yongjun Wu, Jindrich Zejda, Elliott Delaye, Ashish Sirasao
Static block scheduling in massively parallel software defined hardware systems

Patent number: 12061990

Abstract: Embodiments herein describe techniques for static scheduling a neural network implemented in a massively parallel hardware system. The neural network may be scheduled using three different scheduling levels referred to herein as an upper level, an intermediate level, and a lower level. In one embodiment, the upper level includes a hardware or software model of the layers in the neural network that establishes a sequential order of functions that operate concurrently in the hardware system. In the intermediate level, identical processes in the functions defined in the upper level are connected to form a systolic array or mesh and balanced data flow channels are used to minimize latency. In the lower level, a compiler can assign the operations performed by the processing elements in the systolic array to different portions of the hardware system to provide a static schedule for the neural network.

Type: Grant

Filed: October 17, 2017

Date of Patent: August 13, 2024

Assignee: XILINX, INC.

Inventors: Yongjun Wu, Jindrich Zejda, Elliott Delaye, Ashish Sirasao
Performing hardware operator fusion

Patent number: 11809981

Abstract: A method of generating executable instructions for a computing system is provided. The method comprises: receiving a first set of instructions including a kernel of a first operator and a kernel of a second operator, the kernel of the first operator including instructions of the first operator and write instructions to a virtual data node, the kernel of the second operator including instructions of the second operator and read instructions to the virtual data node; determining, based on a mapping between the write instructions and read instructions, instructions of data transfer operations between the first operator and the second operator; and generating a second set of instructions representing a fused operator of the first operator and the second operator, the second set of instructions including the instructions of the first operator, the instructions of the second operator, and the instructions of the data transfer operations.

Type: Grant

Filed: November 27, 2019

Date of Patent: November 7, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Animesh Jain, Tobias Joseph Kastulus Edler von Koch, Yizhi Liu, Taemin Kim, Jindrich Zejda, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang
Machine learning runtime library for neural network acceleration

Patent number: 11694066

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.

Type: Grant

Filed: October 17, 2017

Date of Patent: July 4, 2023

Assignee: XILINX, INC.

Inventors: Aaron Ng, Jindrich Zejda, Elliott Delaye, Xiao Teng, Sonal Santan, Soren T. Soe, Ashish Sirasao, Ehsan Ghasemi, Sean Settle
Multi-layer neural network processing by a neural network accelerator using host communicated merged weights and a package of per-layer instructions

Patent number: 11620490

Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.

Type: Grant

Filed: October 17, 2017

Date of Patent: April 4, 2023

Assignee: XILINX, INC.

Inventors: Aaron Ng, Elliott Delaye, Ehsan Ghasemi, Xiao Teng, Jindrich Zejda, Yongjun Wu, Sean Settle, Ashish Sirasao
Time-based memory allocation for neural network inference

Patent number: 11610102

Abstract: Techniques for time-based memory allocation for a neural network inference are disclosed. A description of a neural network comprising a plurality of operations to be executed across a set of accelerators is received. A plurality of interconnect times at a plurality of partition points within the neural network are calculated. Each of the plurality of interconnect times corresponds to a duration of time for transferring an output feature map from one of the set of accelerators to another of the set of accelerators to be used as an input feature map. A partitioning scheme that divides the plurality of operations into a set of subgraphs is determined based on the plurality of interconnect times. Each of the set of subgraphs is assigned to a different accelerator of the set of accelerators in accordance with the partitioning scheme.

Type: Grant

Filed: November 27, 2019

Date of Patent: March 21, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jindrich Zejda, Drazen Borkovic
Neural network processing system having host controlled kernel acclerators

Patent number: 11568218

Abstract: A disclosed neural network processing system includes a host computer system, a RAMs coupled to the host computer system, and neural network accelerators coupled to the RAMs, respectively. The host computer system is configured with software that when executed causes the host computer system to write input data and work requests to the RAMS. Each work request specifies a subset of neural network operations to perform and memory locations in a RAM of the input data and parameters. A graph of dependencies among neural network operations is built and additional dependencies added. The operations are partitioned into coarse grain tasks and fine grain subtasks for optimal scheduling for parallel execution. The subtasks are scheduled to accelerator kernels of matching capabilities. Each neural network accelerator is configured to read a work request from the respective RAM and perform the subset of neural network operations on the input data using the parameters.

Type: Grant

Filed: October 17, 2017

Date of Patent: January 31, 2023

Assignee: XILINX, INC.

Inventors: Aaron Ng, Jindrich Zejda, Elliott Delaye, Xiao Teng, Ashish Sirasao
Neural network operation reordering for parallel execution

Patent number: 11567778

Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

Type: Grant

Filed: April 28, 2021

Date of Patent: January 31, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Drazen Borkovic, Jindrich Zejda, Randy Renfu Huang, Ron Diamant
Allocation and placement of resources for network computation

Patent number: 11561833

Abstract: Techniques for operating a computing system to perform neural network operations are disclosed. In one example, a method comprises receiving a neural network model, determining a sequence of neural network operations based on data dependency in the neural network model, and determining a set of instructions to map the sequence of neural network operations to the processing resources of the neural network processor. The method further comprises determining, based on a set of memory access operations included in the set of instructions, a first set of memory references associated with a first location of an external memory to store the input data and a second set of memory references associated with a second location of the external memory to store the output data, and generating an instruction file including the set of instructions, the first set of memory references and the second set of memory references.

Type: Grant

Filed: June 28, 2018

Date of Patent: January 24, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Richard John Heaton, Randy Renfu Huang, Drazen Borkovic, Jindrich Zejda
Host-directed multi-layer neural network processing via per-layer work requests

Patent number: 11429848

Abstract: In disclosed approaches of neural network processing, a host computer system copies an input data matrix from host memory to a shared memory for performing neural network operations of a first layer of a neural network by a neural network accelerator. The host instructs the neural network accelerator to perform neural network operations of each layer of the neural network beginning with the input data matrix. The neural network accelerator performs neural network operations of each layer in response to the instruction from the host. The host waits until the neural network accelerator signals completion of performing neural network operations of layer i before instructing the neural network accelerator to commence performing neural network operations of layer i+1, for i?1. The host instructs the neural network accelerator to use a results data matrix in the shared memory from layer i as an input data matrix for layer i+1 for i?1.

Type: Grant

Filed: October 17, 2017

Date of Patent: August 30, 2022

Assignee: XILINX, INC.

Inventors: Aaron Ng, Elliott Delaye, Jindrich Zejda, Ashish Sirasao
Image preprocessing for generalized image processing

Patent number: 11386644

Abstract: An example preprocessor circuit includes: a first buffer configured to store rows of image data and output a row thereof; a second buffer, coupled to the first buffer, including storage locations to store respective image samples of the row output by the first buffer; shift registers; an interconnect network including connections, each connection coupling a respective one of the shift registers to more than one of the storage locations, one or more of the storage locations being coupled to more than one of the connections; and a control circuit configured to load the shift registers with the image samples based on the connections and shift the shift registers to output streams of image samples.

Type: Grant

Filed: October 17, 2017

Date of Patent: July 12, 2022

Assignee: XILINX, INC.

Inventors: Elliott Delaye, Ashish Sirasao, Aaron Ng, Yongjun Wu, Jindrich Zejda
Neural network layer-by-layer debugging

Patent number: 11308396

Abstract: Techniques are disclosed for debugging a neural network execution on a target processor. A reference processor may generate a plurality of first reference tensors for the neural network. The neural network may be repeatedly reduced to produce a plurality of lengths. For each of the lengths, a compiler converts the neural network into first machine instructions, the target processor executes the first machine instructions to generate a first device tensor, and the debugger program determines whether the first device tensor matches a first reference tensor. A shortest length is identified for which the first device tensor does not match the first reference tensor. Tensor output is enabled for a lower-level intermediate representation of the shortest neural network, and the neural network is converted into second machine instructions, which are executed by the target processor to generate a second device tensor.

Type: Grant

Filed: June 27, 2019

Date of Patent: April 19, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Jindrich Zejda, Jeffrey T. Huynh, Drazen Borkovic, Se jong Oh, Ron Diamant, Randy Renfu Huang
Re-targetable interface for data exchange between heterogeneous systems and accelerator abstraction into software instructions

Patent number: 11204747

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator that operate on two heterogeneous computing systems. For example, the neural network application may execute on a central processing unit (CPU) in a computing system while the neural network accelerator executes on a FPGA. As a result, when moving a software-hardware boundary between the two heterogeneous systems, changes may be made to both the neural network application (using software code) and to the accelerator (using RTL). The embodiments herein describe a software defined approach where shared interface code is used to express both sides of the interface between the two heterogeneous systems in a single abstraction (e.g., a software class).

Type: Grant

Filed: October 17, 2017

Date of Patent: December 21, 2021

Assignee: XILINX, INC.

Inventors: Jindrich Zejda, Elliott Delaye, Yongjun Wu, Aaron Ng, Ashish Sirasao, Khang K. Dao, Christopher J. Case
Synchronization of concurrent computation engines

Patent number: 11175919

Abstract: Integrated circuit devices and methods for synchronizing execution of program code for multiple concurrently operating execution engines of the integrated circuit devices are provided. In some cases, one execution engine of an integrated circuit device may be dependent on the operation of another execution engine of the integrated circuit device. To synchronize the execution engines around the dependency, a first execution engine may execute an instruction to set a value in a register while a second execution engine may execute an instruction to wait for a condition associated with the register value.

Type: Grant

Filed: December 13, 2018

Date of Patent: November 16, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Ilya Minkin, Ron Diamant, Drazen Borkovic, Jindrich Zejda, Dana Michelle Vantrease
NEURAL NETWORK OPERATION REORDERING FOR PARALLEL EXECUTION

Publication number: 20210247984

Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

Type: Application

Filed: April 28, 2021

Publication date: August 12, 2021

Inventors: Jeffrey T. Huynh, Drazen Borkovic, Jindrich Zejda, Randy Renfu Huang, Ron Diamant
Synchronization of concurrent computation engines

Patent number: 11061654

Abstract: Provided are systems and methods for synchronizing program code execution for a plurality of execution engines in an integrated circuit device. In some cases, the operation of one execution engine may be dependent on the operation of another execution engine. To accommodate this dependency, the instructions for the first execution engine can include a set-event instruction and the instructions for the second execution engine can include a wait-on-event instruction. The wait-on-event instruction can cause the second execution engine to wait for the first execution engine to reach the set-event instruction. In this way, the two execution engines can be synchronized around the data or resource dependency.

Type: Grant

Filed: December 12, 2018

Date of Patent: July 13, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Drazen Borkovic, Jindrich Zejda, Taemin Kim, Ron Diamant
Software-defined buffer/transposer for general matrix multiplication in a programmable IC

Patent number: 11036827

Abstract: Methods and apparatus are described for simultaneously buffering and reformatting (e.g., transposing) a matrix for high-speed data streaming in general matrix multiplication (GEMM), which may be implemented by a programmable integrated circuit (IC). Examples of the present disclosure increase the effective double data rate (DDR) memory throughput for streaming data into GEMM digital signal processing (DSP) engine multifold, as well as eliminate slow data reformatting on a host central processing unit (CPU). This may be accomplished through software-defined (e.g., C++) data structures and access patterns that result in hardware logic that simultaneously buffers and reorganizes the data to achieve linear DDR addressing.

Type: Grant

Filed: October 17, 2017

Date of Patent: June 15, 2021

Assignee: XILINX, INC.

Inventors: Jindrich Zejda, Elliott Delaye, Yongjun Wu, Aaron Ng, Ashish Sirasao, Khang K. Dao
HIERARCHICAL PARTITIONING OF OPERATORS

Publication number: 20210158131

Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.

Type: Application

Filed: November 27, 2019

Publication date: May 27, 2021

Inventors: Animesh Jain, Yizhi Liu, Hongbin Zheng, Jeffrey T. Huynh, Haichen Li, Drazen Borkovic, Jindrich Zejda, Richard John Heaton, Randy Renfu Huang, Zhi Chen, Yida Wang

1 2 3 next