Patents by Inventor Jindrich Zejda

Jindrich Zejda has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11386644
    Abstract: An example preprocessor circuit includes: a first buffer configured to store rows of image data and output a row thereof; a second buffer, coupled to the first buffer, including storage locations to store respective image samples of the row output by the first buffer; shift registers; an interconnect network including connections, each connection coupling a respective one of the shift registers to more than one of the storage locations, one or more of the storage locations being coupled to more than one of the connections; and a control circuit configured to load the shift registers with the image samples based on the connections and shift the shift registers to output streams of image samples.
    Type: Grant
    Filed: October 17, 2017
    Date of Patent: July 12, 2022
    Assignee: XILINX, INC.
    Inventors: Elliott Delaye, Ashish Sirasao, Aaron Ng, Yongjun Wu, Jindrich Zejda
  • Patent number: 11308396
    Abstract: Techniques are disclosed for debugging a neural network execution on a target processor. A reference processor may generate a plurality of first reference tensors for the neural network. The neural network may be repeatedly reduced to produce a plurality of lengths. For each of the lengths, a compiler converts the neural network into first machine instructions, the target processor executes the first machine instructions to generate a first device tensor, and the debugger program determines whether the first device tensor matches a first reference tensor. A shortest length is identified for which the first device tensor does not match the first reference tensor. Tensor output is enabled for a lower-level intermediate representation of the shortest neural network, and the neural network is converted into second machine instructions, which are executed by the target processor to generate a second device tensor.
    Type: Grant
    Filed: June 27, 2019
    Date of Patent: April 19, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Jindrich Zejda, Jeffrey T. Huynh, Drazen Borkovic, Se jong Oh, Ron Diamant, Randy Renfu Huang
  • Patent number: 11204747
    Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator that operate on two heterogeneous computing systems. For example, the neural network application may execute on a central processing unit (CPU) in a computing system while the neural network accelerator executes on a FPGA. As a result, when moving a software-hardware boundary between the two heterogeneous systems, changes may be made to both the neural network application (using software code) and to the accelerator (using RTL). The embodiments herein describe a software defined approach where shared interface code is used to express both sides of the interface between the two heterogeneous systems in a single abstraction (e.g., a software class).
    Type: Grant
    Filed: October 17, 2017
    Date of Patent: December 21, 2021
    Assignee: XILINX, INC.
    Inventors: Jindrich Zejda, Elliott Delaye, Yongjun Wu, Aaron Ng, Ashish Sirasao, Khang K. Dao, Christopher J. Case
  • Patent number: 11175919
    Abstract: Integrated circuit devices and methods for synchronizing execution of program code for multiple concurrently operating execution engines of the integrated circuit devices are provided. In some cases, one execution engine of an integrated circuit device may be dependent on the operation of another execution engine of the integrated circuit device. To synchronize the execution engines around the dependency, a first execution engine may execute an instruction to set a value in a register while a second execution engine may execute an instruction to wait for a condition associated with the register value.
    Type: Grant
    Filed: December 13, 2018
    Date of Patent: November 16, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Ilya Minkin, Ron Diamant, Drazen Borkovic, Jindrich Zejda, Dana Michelle Vantrease
  • Publication number: 20210247984
    Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.
    Type: Application
    Filed: April 28, 2021
    Publication date: August 12, 2021
    Inventors: Jeffrey T. Huynh, Drazen Borkovic, Jindrich Zejda, Randy Renfu Huang, Ron Diamant
  • Patent number: 11061654
    Abstract: Provided are systems and methods for synchronizing program code execution for a plurality of execution engines in an integrated circuit device. In some cases, the operation of one execution engine may be dependent on the operation of another execution engine. To accommodate this dependency, the instructions for the first execution engine can include a set-event instruction and the instructions for the second execution engine can include a wait-on-event instruction. The wait-on-event instruction can cause the second execution engine to wait for the first execution engine to reach the set-event instruction. In this way, the two execution engines can be synchronized around the data or resource dependency.
    Type: Grant
    Filed: December 12, 2018
    Date of Patent: July 13, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Drazen Borkovic, Jindrich Zejda, Taemin Kim, Ron Diamant
  • Patent number: 11036827
    Abstract: Methods and apparatus are described for simultaneously buffering and reformatting (e.g., transposing) a matrix for high-speed data streaming in general matrix multiplication (GEMM), which may be implemented by a programmable integrated circuit (IC). Examples of the present disclosure increase the effective double data rate (DDR) memory throughput for streaming data into GEMM digital signal processing (DSP) engine multifold, as well as eliminate slow data reformatting on a host central processing unit (CPU). This may be accomplished through software-defined (e.g., C++) data structures and access patterns that result in hardware logic that simultaneously buffers and reorganizes the data to achieve linear DDR addressing.
    Type: Grant
    Filed: October 17, 2017
    Date of Patent: June 15, 2021
    Assignee: XILINX, INC.
    Inventors: Jindrich Zejda, Elliott Delaye, Yongjun Wu, Aaron Ng, Ashish Sirasao, Khang K. Dao
  • Publication number: 20210158131
    Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.
    Type: Application
    Filed: November 27, 2019
    Publication date: May 27, 2021
    Inventors: Animesh Jain, Yizhi Liu, Hongbin Zheng, Jeffrey T. Huynh, Haichen Li, Drazen Borkovic, Jindrich Zejda, Richard John Heaton, Randy Renfu Huang, Zhi Chen, Yida Wang
  • Patent number: 11016775
    Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.
    Type: Grant
    Filed: June 26, 2019
    Date of Patent: May 25, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Jeffrey T. Huynh, Drazen Borkovic, Jindrich Zejda, Randy Renfu Huang, Ron Diamant
  • Patent number: 11003429
    Abstract: Scheduling of the operations of an integrated circuit device such as a hardware accelerator, including scheduling of movement of data into and out of the accelerator, can be performed by a compiler that produces program code for the accelerator. The compiler can produce a graph that represents operations to be performed by the accelerator. Using the graph, the compiler can determine estimated execution times for the operations represented by each node in the graph. The compiler can schedule operations by determining an estimated execution time for set of dependent operations that depend from an operation. The compiler can then select an operation that has a shortest estimated execution time from among a set of operations and which has a set of dependent operations that has a longest estimated execution time as compared to other sets of dependent operations.
    Type: Grant
    Filed: February 4, 2019
    Date of Patent: May 11, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Jindrich Zejda, Jeffrey T. Huynh, Tobias Joseph Kastulus Edler von Koch, Drazen Borkovic, Taemin Kim
  • Patent number: 10943039
    Abstract: An example multiply accumulate (MACC) circuit includes: a multiply-accumulator having an accumulator output register; a quantizer, coupled to the multiply accumulator; and a control circuit coupled to the multiply-accumulator and the quantizer, the control circuit configured to provide control data to the quantizer, the control data indicative of a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register.
    Type: Grant
    Filed: October 17, 2017
    Date of Patent: March 9, 2021
    Assignee: XILINX, INC.
    Inventors: Ashish Sirasao, Elliott Delaye, Sean Settle, Zhao Ma, Ehsan Ghasemi, Xiao Teng, Aaron Ng, Jindrich Zejda
  • Patent number: 10922146
    Abstract: Systems and methods are provided for synchronizing execution of program code for an integrated circuit device having multiple concurrently operating execution engines, where the operation of one execution engine may be dependent on the operation of another execution engine. Data or resource dependencies may be accommodated with a Set instruction to cause a first execution engine to set a register value and a Wait instruction to cause a second execution engine to wait for a condition associate with the register value. Concurrently operation of the execution engines may thus be synchronized.
    Type: Grant
    Filed: December 13, 2018
    Date of Patent: February 16, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Ilya Minkin, Ron Diamant, Drazen Borkovic, Jindrich Zejda, Dana Michelle Vantrease
  • Publication number: 20200409717
    Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.
    Type: Application
    Filed: June 26, 2019
    Publication date: December 31, 2020
    Inventors: Jeffrey T. Huynh, Drazen Borkovic, Jindrich Zejda, Randy Renfu Huang, Ron Diamant
  • Publication number: 20200410354
    Abstract: Techniques are disclosed for debugging a neural network execution on a target processor. A reference processor may generate a plurality of first reference tensors for the neural network. The neural network may be repeatedly reduced to produce a plurality of lengths. For each of the lengths, a compiler converts the neural network into first machine instructions, the target processor executes the first machine instructions to generate a first device tensor, and the debugger program determines whether the first device tensor matches a first reference tensor. A shortest length is identified for which the first device tensor does not match the first reference tensor. Tensor output is enabled for a lower-level intermediate representation of the shortest neural network, and the neural network is converted into second machine instructions, which are executed by the target processor to generate a second device tensor.
    Type: Application
    Filed: June 27, 2019
    Publication date: December 31, 2020
    Inventors: Jindrich Zejda, Jeffrey T. Huynh, Drazen Borkovic, Se jong Oh, Ron Diamant, Randy Renfu Huang
  • Patent number: 10846621
    Abstract: Provided are systems, methods, and integrated circuits neural network processor that can execute a fast context switch between one neural network and another. In various implementations, a neural network processor can include a plurality of memory banks storing a first set of weight values for a first neural network. When the neural network processor receives first input data, the neural network processor can compute a first result using the first set of weight values and the first input data. While computing the first result, the neural network processor can store, in the memory banks, a second set of weight values for a second neural network. When the neural network processor receives second input data, the neural network processor can compute a second result using the second set of weight values and the second input data, where the computation occurs upon completion of computation of the first result.
    Type: Grant
    Filed: December 12, 2017
    Date of Patent: November 24, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Randy Huang, Ron Diamant, Jindrich Zejda, Drazen Borkovic
  • Patent number: 10846201
    Abstract: Disclosed herein are techniques for debugging the performance of a neural network. In one embodiment, a neural network processor includes a processing engine, a debugging circuit coupled to the processing engine, and an interface to a memory device. The processing engine is configured to execute instructions for implementing a neural network. The debugging circuit is configurable to determine, for each instruction in a set of instructions, a first timestamp indicating a start time of executing the instruction and a second timestamp indicating an end time of executing the instruction by the processing engine. The interface is configured to save the first timestamp and the second timestamp for each instruction in the set of instructions into the memory device. The debugging circuit can be configured to different debug levels. The neural network processor can include multiple debugging circuits for multiple processing engines that operate in parallel.
    Type: Grant
    Filed: September 21, 2018
    Date of Patent: November 24, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Jindrich Zejda, Drazen Borkovic, Thomas A. Volpe
  • Patent number: 10761822
    Abstract: Provided are systems and methods for generating program code for an integrated circuit, where instructions in the code synchronize computation engines that support non-blocking instructions. In various examples, a computing device can receiving an input data set including operations to be performed by an integrated circuit device and dependencies between the operations. The input data set can include a non-blocking instruction, and an operation that requires that the non-blocking instruction be completed. The computing device can generate instructions for performing the operation including a particular instruction to wait for a value to be set in a register of the integrated circuit device. The computing device can further generate program code including the non-blocking instruction and the instructions for performing the operation, wherein the non-blocking instruction is configured to set the value in the register.
    Type: Grant
    Filed: December 12, 2018
    Date of Patent: September 1, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Drazen Borkovic, Jindrich Zejda, Taemin Kim, Ron Diamant
  • Patent number: 10678509
    Abstract: An example multiply accumulate (MACC) circuit includes a multiply-accumulator having an accumulator output register, a scaler, coupled to the multiply accumulator, and a control circuit coupled to the multiply-accumulator and the scaler. The control circuit is configured to provide control data to the scaler, the control data indicative of: a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register for implementing a first right shift; a multiplier; and a second right shift.
    Type: Grant
    Filed: August 21, 2018
    Date of Patent: June 9, 2020
    Assignee: XILINX, INC.
    Inventors: Sean Settle, Elliott Delaye, Aaron Ng, Ehsan Ghasemi, Ashish Sirasao, Xiao Teng, Jindrich Zejda
  • Patent number: 10572409
    Abstract: A memory arrangement can store a matrix of matrix data elements specified as index-value pairs that indicate row and column indices and associated values. First split-and-merge circuitry is coupled between the memory arrangement and a first set of FIFO buffers for reading the matrix data elements from the memory arrangement and putting the matrix data elements in the first set of FIFO buffers based on column indices. A pairing circuit is configured to read vector data elements, pair the vector data elements with the matrix data elements, and put the paired matrix and vector data elements in a second set of FIFO buffers based on column indices. Second split-and-merge circuitry is configured to read paired matrix and vector data elements from the second set of FIFO buffers and put the paired matrix and vector data elements in a third set of FIFO buffers based on row indices.
    Type: Grant
    Filed: May 10, 2018
    Date of Patent: February 25, 2020
    Assignee: XILINX, INC.
    Inventors: Jindrich Zejda, Ling Liu, Yifei Zhou, Ashish Sirasao
  • Patent number: 10515135
    Abstract: Methods and apparatus are described for performing data-intensive compute algorithms, such as fast massively parallel general matrix multiplication (GEMM), using a particular data format for both storing data to and reading data from memory. This data format may be utilized for arbitrarily-sized input matrices for GEMM implemented on a finite-size GEMM accelerator in the form of a rectangular compute array of digital signal processing (DSP) elements or similar compute cores. This data format solves the issue of double data rate (DDR) dynamic random access memory (DRAM) bandwidth by allowing both linear DDR addressing and single cycle loading of data into the compute array, avoiding input/output (I/O) and/or DDR bottlenecks.
    Type: Grant
    Filed: October 17, 2017
    Date of Patent: December 24, 2019
    Assignee: XILINX, INC.
    Inventors: Jindrich Zejda, Elliott Delaye, Aaron Ng, Ashish Sirasao, Yongjun Wu