Patents by Inventor Ephrem C. Wu

Ephrem C. Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240069511
    Abstract: Instruction generation for a data processing array and microcontroller includes generating a tensor-level intermediate representation from a machine learning model using kernel expressions. Statements of the tensor-level intermediate representation are partitioned into a first set of statements and a second set of statements. From the first set of statements, kernel instructions are generated based on a reconfigurable neural engine model. The kernel instructions are executable by a compute tile of a data processing array to implement compute functions of the machine learning model. From the set of second statements, microcontroller instructions are generated based on a super-graph model. The microcontroller instructions are executable by a microcontroller of the data processing array to move data into and out from the data processing array.
    Type: Application
    Filed: August 31, 2022
    Publication date: February 29, 2024
    Applicant: Xilinx, Inc.
    Inventors: Jorn Tuyls, Xiao Teng, Sanket Pandit, Rajeev Patwari, Qian Zhou, Ehsan Ghasemi, Ephrem C. Wu, Elliott Delaye, Aaron Ng
  • Publication number: 20230401480
    Abstract: Hardware acceleration of machine learning (ML) designs includes translating an ML primitive into an intermediate representation. The intermediate representation is subdivided to specify a functional compute block. The functional compute block is sized according to a compute node primitive adapted for implementing the ML primitive on target hardware. An overlay is generated for the ML primitive, at least in part, by mapping the functional compute block to the compute node primitive. The overlay is synthesizable to implement the ML primitive on the target hardware. The overlay can be scheduled for operation within the target hardware as part of an ML design including the ML primitive.
    Type: Application
    Filed: June 14, 2022
    Publication date: December 14, 2023
    Applicant: Xilinx, Inc.
    Inventors: Ehsan Ghasemi, Rajeev Patwari, Elliott Delaye, Jorn Tuyls, Ephrem C. Wu, Xiao Teng, Sanket Pandit
  • Patent number: 11531869
    Abstract: Embodiments herein describe circuitry with improved efficiency when executing layers in a nested neural network. As mentioned above, a nested neural network has at least one split operation where a tensor generated by a first layer is transmitted to, and processed by several branches in the neural network. Each of these branches can have several layers that have data dependencies which result in a multiply-add array sitting idly. In one embodiment, the circuitry can include a dedicated pre-pooler for performing a pre-pooling operation. Thus, the pre-pooling operation can be performing in parallel with other operations (e.g., the convolution performed by another layer). Once the multiply-add array is idle, the pre-pooling operation has already completed (or at least, has already started) which means the time the multiply-add array must wait before it can perform the next operation is reduced or eliminated.
    Type: Grant
    Filed: March 28, 2019
    Date of Patent: December 20, 2022
    Assignee: XILINX, INC.
    Inventors: Ephrem C. Wu, David Berman, Xiaoqian Zhang
  • Patent number: 11429850
    Abstract: A circuit arrangement includes an array of MAC circuits, wherein each MAC circuit includes a cache configured for storage of a plurality of kernels. The MAC circuits are configured to receive a first set of data elements of an IFM at a first rate. The MAC circuits are configured to perform first MAC operations on the first set of the data elements and a first one of the kernels associated with a first OFM depth index during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate. The MAC circuits are configured to perform second MAC operations on the first set of the data elements and a second one of the kernels associated with a second OFM depth index during a second MAC cycle that consecutively follows the first MAC cycle.
    Type: Grant
    Filed: July 19, 2018
    Date of Patent: August 30, 2022
    Assignee: XILINX, INC.
    Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
  • Patent number: 11429851
    Abstract: Disclosed circuits and methods involve a first register configured to store of a first convolutional neural network (CNN) instruction during processing of the first CNN instruction and a second register configured to store a second CNN instruction during processing of the second CNN instruction. Each of a plurality of address generation circuits is configured to generate one or more addresses in response to an input CNN instruction. Control circuitry is configured to select one of the first CNN instruction or the second CNN instruction as input to the address generation circuits.
    Type: Grant
    Filed: December 13, 2018
    Date of Patent: August 30, 2022
    Assignee: XILINX, INC.
    Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
  • Patent number: 11132296
    Abstract: The embodiments herein store tabulated values representing a linear or non-linear function in separate memory banks to reduce the size of memory used to store the tabulated values while being able to provide upper and lower values for performing linear interpolation in parallel (e.g., the same cycle). To do so, a linear interpolation system includes a first memory bank that stores the even indexed tabulated values while a second memory bank stores the odd indexed tabulated values. During each clock cycle, the first and second memory banks can output upper and lower values for linear interpolation (although which memory bank outputs the upper value and which outputs the lower value can vary). Using the upper and lower values, the linear interpolation system performs linear interpolation to approximate the value of a non-linear function that is between the upper and lower values.
    Type: Grant
    Filed: July 12, 2018
    Date of Patent: September 28, 2021
    Assignee: XILINX, INC.
    Inventors: Ephrem C. Wu, Xiaoqian Zhang
  • Patent number: 11127442
    Abstract: An integrated circuit (IC) includes a plurality of dies. The IC includes a plurality of memory channel interfaces configured to communicate with a memory, wherein the plurality of memory channel interfaces are disposed within a first die of the plurality of dies. The IC may include a compute array distributed across the plurality of dies and a plurality of remote buffers distributed across the plurality of dies. The plurality of remote buffers are coupled to the plurality of memory channels and to the compute array. The IC further includes a controller configured to determine that each of the plurality of remote buffers has data stored therein and, in response, broadcast a read enable signal to each of the plurality of remote buffers initiating data transfers from the plurality of remote buffers to the compute array across the plurality of dies.
    Type: Grant
    Filed: December 6, 2019
    Date of Patent: September 21, 2021
    Assignee: Xilinx, Inc.
    Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
  • Publication number: 20210174848
    Abstract: An integrated circuit (IC) includes a plurality of dies. The IC includes a plurality of memory channel interfaces configured to communicate with a memory, wherein the plurality of memory channel interfaces are disposed within a first die of the plurality of dies. The IC may include a compute array distributed across the plurality of dies and a plurality of remote buffers distributed across the plurality of dies. The plurality of remote buffers are coupled to the plurality of memory channels and to the compute array. The IC further includes a controller configured to determine that each of the plurality of remote buffers has data stored therein and, in response, broadcast a read enable signal to each of the plurality of remote buffers initiating data transfers from the plurality of remote buffers to the compute array across the plurality of dies.
    Type: Application
    Filed: December 6, 2019
    Publication date: June 10, 2021
    Applicant: Xilinx, Inc.
    Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
  • Patent number: 10673438
    Abstract: A digital signal processor (DSP) slice is disclosed. The DSP slice includes an input stage to receive a plurality of input signals, a pre-adder coupled to the input stage and configured to perform one or more operations on one or more of the plurality of input signals, and a multiplier coupled to the input stage and the pre-adder and configured to perform one or more multiplication operations on one or more of the plurality of input signals or the output of the pre-adder. The DSP slice further includes an arithmetic logic unit (ALU) coupled to the input stage, the pre-adder, and the multiplier. The ALU is configured to perform one or more mathematical or logical operations on one or more of the plurality of input signals, the output of the pre-adder, or the output of the multiplier.
    Type: Grant
    Filed: April 2, 2019
    Date of Patent: June 2, 2020
    Assignee: XILINX, INC.
    Inventors: Adam Elkins, Ephrem C. Wu, John M. Thendean, Adnan Pratama, Yashodhara Parulkar, Xiaoqian Zhang
  • Publication number: 20200026989
    Abstract: A circuit arrangement includes an array of MAC circuits, wherein each MAC circuit includes a cache configured for storage of a plurality of kernels. The MAC circuits are configured to receive a first set of data elements of an IFM at a first rate. The MAC circuits are configured to perform first MAC operations on the first set of the data elements and a first one of the kernels associated with a first OFM depth index during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate. The MAC circuits are configured to perform second MAC operations on the first set of the data elements and a second one of the kernels associated with a second OFM depth index during a second MAC cycle that consecutively follows the first MAC cycle.
    Type: Application
    Filed: July 19, 2018
    Publication date: January 23, 2020
    Applicant: Xilinx, Inc.
    Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
  • Patent number: 10346093
    Abstract: Disclosed circuitry includes RAM circuits, a memory controller, and an array of processing circuits. Each RAM circuit includes a read port and a write port. The memory controller accesses tensor data arranged in banks of tensor buffers in the RAM circuits. The memory controller is coupled to each read port by shared read control signal lines and to each write port by shared write control signal lines. The memory controller generates read control and write control signals for accessing different ones of the tensor buffers at different times. The array of processing circuits is coupled to one of the RAM circuits. The array includes multiple rows and multiple of columns of processing circuits for performing tensor operations on the tensor data. The processing circuits in each row in each array of processing circuits are coupled to input the same tensor data.
    Type: Grant
    Filed: March 16, 2018
    Date of Patent: July 9, 2019
    Assignee: XILINX, INC.
    Inventors: Ephrem C. Wu, Xiaoqian Zhang, David Berman
  • Patent number: 10141938
    Abstract: An example semiconductor device includes a first integrated circuit (IC) die including a first column of cascade-coupled resource blocks; a second IC die including a second column of cascade-coupled resource blocks, where an active side of the second IC die is mounted to an active side of the first IC die; and a plurality of electrical connections between the active side of the first IC and the active side of the second IC, the plurality of electrical connections including at least one electrical connection between the first column of cascade-coupled resource blocks and the second column of cascade-coupled resource blocks.
    Type: Grant
    Filed: September 21, 2016
    Date of Patent: November 27, 2018
    Assignee: XILINX, INC.
    Inventor: Ephrem C. Wu
  • Publication number: 20180083635
    Abstract: An example semiconductor device includes a first integrated circuit (IC) die including a first column of cascade-coupled resource blocks; a second IC die including a second column of cascade-coupled resource blocks, where an active side of the second IC die is mounted to an active side of the first IC die; and a plurality of electrical connections between the active side of the first IC and the active side of the second IC, the plurality of electrical connections including at least one electrical connection between the first column of cascade-coupled resource blocks and the second column of cascade-coupled resource blocks.
    Type: Application
    Filed: September 21, 2016
    Publication date: March 22, 2018
    Applicant: Xilinx, Inc.
    Inventor: Ephrem C. Wu
  • Patent number: 9858006
    Abstract: A memory device can be used with a shared routing resource that provides access to the memory device. The memory device can include a random access memory (RAM) circuit that includes a plurality of ports configured to provide access to the RAM circuit by the shared routing resource. A memory partition register circuit can be configured to store a plurality of addresses specifying respective context partitions within the RAM circuit. A plurality of pointer register circuits that can each be associated with a corresponding port of the plurality of ports and can be configured to store a respective set of pointers that specify a location in the RAM circuit relative to a respective context partition. Addressing logic that can be configured to provide access to the RAM circuit using the respective set of pointers for each port.
    Type: Grant
    Filed: October 13, 2015
    Date of Patent: January 2, 2018
    Assignee: XILINX, INC.
    Inventor: Ephrem C. Wu
  • Patent number: 9779786
    Abstract: A system includes global memory circuitry configured to store input tensors and output tensors. Row data paths are each connected to an output port of the memory circuitry. Column data paths are connected to an input port of the memory circuitry. Processing elements are arranged in rows and columns along the row data paths and column data paths, respectively. The processing elements include local memory circuitry configured to store multiple masks and processing circuitry. The processing circuitry is configured to receive portions of the input tensors from one of the row data paths; receive masks from the local memory circuitry; perform multiple tensor operations on a same received portion of an input tensors by applying a different retrieved mask for each tensor operation; and generate, using results of the multiple tensor operations, an output for a corresponding column data path.
    Type: Grant
    Filed: October 26, 2016
    Date of Patent: October 3, 2017
    Assignee: XILINX, INC.
    Inventors: Ephrem C. Wu, Inkeun Cho, Xiaoqian Zhang
  • Patent number: 9666266
    Abstract: In disclosed circuit arrangements, memory cell arrays are addressed by a first portion of an input address, and memory cells within each memory cell array are addressed by a second portion of the input address. A first first-in-first-out (FIFO) buffer is coupled to the memory cell arrays and delays the second portion of each input address to the memory cell arrays for a sleep period. Control circuits respectively coupled to the memory cell arrays include second FIFO buffers and decode the first portion of each input address and generate corresponding states of enable signals. The control circuits store the corresponding states of the enable signals in the second FIFO buffers concurrently with input of the second portion of each input address to the first FIFO buffer. The second FIFO buffers delay output of the corresponding states of the enable signals to the memory cell arrays for the sleep period.
    Type: Grant
    Filed: May 9, 2016
    Date of Patent: May 30, 2017
    Assignee: XILINX, INC.
    Inventors: Hongbin Ji, Ephrem C. Wu, Thomas H. Strader
  • Patent number: 9460007
    Abstract: An apparatus relates generally to time sharing of an arithmetic unit. In such an apparatus, a controller is coupled to provide read pointers and write pointers. A memory block is coupled to receive the read pointers and the write pointers. A selection network is coupled to the memory block and the arithmetic unit. The memory block includes a write-data network, a read-data network, and memory banks.
    Type: Grant
    Filed: September 24, 2014
    Date of Patent: October 4, 2016
    Assignee: XILINX, INC.
    Inventors: Ephrem C. Wu, Xiaoqian Zhang
  • Patent number: 9431095
    Abstract: A memory circuit includes an input stage having N input ports and N output ports, wherein N is an integer greater than one. The memory circuit further includes an N:1 port multiplexer coupled to the N output ports of the input stage and configured to time division multiplex the N output ports to one multiplexed port. The memory circuit also includes a random access memory matrix and a 1:N port multiplexer. The memory circuit is coupled to the multiplexed port. The 1:N port multiplexer is coupled to the random access memory matrix and is configured to de-multiplex signals received from the random access memory matrix into N output ports.
    Type: Grant
    Filed: December 10, 2014
    Date of Patent: August 30, 2016
    Assignee: XILINX, INC.
    Inventor: Ephrem C. Wu
  • Patent number: 9378170
    Abstract: An apparatus relating generally to encoding is disclosed. This apparatus includes a bus interface for communicating information from a first die including the bus interface to a second die. A first portion of a bus associated with the bus interface is associated with data bits. A second portion of the bus associated with the bus interface is associated with encoding bits. The bus interface is configured to encode a data word to provide an encoded word. The encoded word is associated with a combinatorial number system.
    Type: Grant
    Filed: March 14, 2013
    Date of Patent: June 28, 2016
    Assignee: XILINX, INC.
    Inventor: Ephrem C. Wu
  • Patent number: 9153292
    Abstract: An integrated circuit device having memory is disclosed. The integrated circuit device comprises programmable resources; programmable interconnect elements coupled to the programmable resources, the programmable interconnect elements enabling a communication of signals with the programmable resources; a plurality of memory blocks; and dedicated interconnect elements coupled to the plurality of memory blocks, the dedicated interconnect elements enabling access to the plurality of memory blocks. A method of implementing memory in an integrated circuit device is also disclosed.
    Type: Grant
    Filed: March 7, 2013
    Date of Patent: October 6, 2015
    Assignee: XILINX, INC.
    Inventor: Ephrem C. Wu