Patents by Inventor Ephrem C. Wu
Ephrem C. Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240069511Abstract: Instruction generation for a data processing array and microcontroller includes generating a tensor-level intermediate representation from a machine learning model using kernel expressions. Statements of the tensor-level intermediate representation are partitioned into a first set of statements and a second set of statements. From the first set of statements, kernel instructions are generated based on a reconfigurable neural engine model. The kernel instructions are executable by a compute tile of a data processing array to implement compute functions of the machine learning model. From the set of second statements, microcontroller instructions are generated based on a super-graph model. The microcontroller instructions are executable by a microcontroller of the data processing array to move data into and out from the data processing array.Type: ApplicationFiled: August 31, 2022Publication date: February 29, 2024Applicant: Xilinx, Inc.Inventors: Jorn Tuyls, Xiao Teng, Sanket Pandit, Rajeev Patwari, Qian Zhou, Ehsan Ghasemi, Ephrem C. Wu, Elliott Delaye, Aaron Ng
-
Publication number: 20230401480Abstract: Hardware acceleration of machine learning (ML) designs includes translating an ML primitive into an intermediate representation. The intermediate representation is subdivided to specify a functional compute block. The functional compute block is sized according to a compute node primitive adapted for implementing the ML primitive on target hardware. An overlay is generated for the ML primitive, at least in part, by mapping the functional compute block to the compute node primitive. The overlay is synthesizable to implement the ML primitive on the target hardware. The overlay can be scheduled for operation within the target hardware as part of an ML design including the ML primitive.Type: ApplicationFiled: June 14, 2022Publication date: December 14, 2023Applicant: Xilinx, Inc.Inventors: Ehsan Ghasemi, Rajeev Patwari, Elliott Delaye, Jorn Tuyls, Ephrem C. Wu, Xiao Teng, Sanket Pandit
-
Patent number: 11531869Abstract: Embodiments herein describe circuitry with improved efficiency when executing layers in a nested neural network. As mentioned above, a nested neural network has at least one split operation where a tensor generated by a first layer is transmitted to, and processed by several branches in the neural network. Each of these branches can have several layers that have data dependencies which result in a multiply-add array sitting idly. In one embodiment, the circuitry can include a dedicated pre-pooler for performing a pre-pooling operation. Thus, the pre-pooling operation can be performing in parallel with other operations (e.g., the convolution performed by another layer). Once the multiply-add array is idle, the pre-pooling operation has already completed (or at least, has already started) which means the time the multiply-add array must wait before it can perform the next operation is reduced or eliminated.Type: GrantFiled: March 28, 2019Date of Patent: December 20, 2022Assignee: XILINX, INC.Inventors: Ephrem C. Wu, David Berman, Xiaoqian Zhang
-
Patent number: 11429850Abstract: A circuit arrangement includes an array of MAC circuits, wherein each MAC circuit includes a cache configured for storage of a plurality of kernels. The MAC circuits are configured to receive a first set of data elements of an IFM at a first rate. The MAC circuits are configured to perform first MAC operations on the first set of the data elements and a first one of the kernels associated with a first OFM depth index during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate. The MAC circuits are configured to perform second MAC operations on the first set of the data elements and a second one of the kernels associated with a second OFM depth index during a second MAC cycle that consecutively follows the first MAC cycle.Type: GrantFiled: July 19, 2018Date of Patent: August 30, 2022Assignee: XILINX, INC.Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
-
Patent number: 11429851Abstract: Disclosed circuits and methods involve a first register configured to store of a first convolutional neural network (CNN) instruction during processing of the first CNN instruction and a second register configured to store a second CNN instruction during processing of the second CNN instruction. Each of a plurality of address generation circuits is configured to generate one or more addresses in response to an input CNN instruction. Control circuitry is configured to select one of the first CNN instruction or the second CNN instruction as input to the address generation circuits.Type: GrantFiled: December 13, 2018Date of Patent: August 30, 2022Assignee: XILINX, INC.Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
-
Patent number: 11132296Abstract: The embodiments herein store tabulated values representing a linear or non-linear function in separate memory banks to reduce the size of memory used to store the tabulated values while being able to provide upper and lower values for performing linear interpolation in parallel (e.g., the same cycle). To do so, a linear interpolation system includes a first memory bank that stores the even indexed tabulated values while a second memory bank stores the odd indexed tabulated values. During each clock cycle, the first and second memory banks can output upper and lower values for linear interpolation (although which memory bank outputs the upper value and which outputs the lower value can vary). Using the upper and lower values, the linear interpolation system performs linear interpolation to approximate the value of a non-linear function that is between the upper and lower values.Type: GrantFiled: July 12, 2018Date of Patent: September 28, 2021Assignee: XILINX, INC.Inventors: Ephrem C. Wu, Xiaoqian Zhang
-
Patent number: 11127442Abstract: An integrated circuit (IC) includes a plurality of dies. The IC includes a plurality of memory channel interfaces configured to communicate with a memory, wherein the plurality of memory channel interfaces are disposed within a first die of the plurality of dies. The IC may include a compute array distributed across the plurality of dies and a plurality of remote buffers distributed across the plurality of dies. The plurality of remote buffers are coupled to the plurality of memory channels and to the compute array. The IC further includes a controller configured to determine that each of the plurality of remote buffers has data stored therein and, in response, broadcast a read enable signal to each of the plurality of remote buffers initiating data transfers from the plurality of remote buffers to the compute array across the plurality of dies.Type: GrantFiled: December 6, 2019Date of Patent: September 21, 2021Assignee: Xilinx, Inc.Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
-
Publication number: 20210174848Abstract: An integrated circuit (IC) includes a plurality of dies. The IC includes a plurality of memory channel interfaces configured to communicate with a memory, wherein the plurality of memory channel interfaces are disposed within a first die of the plurality of dies. The IC may include a compute array distributed across the plurality of dies and a plurality of remote buffers distributed across the plurality of dies. The plurality of remote buffers are coupled to the plurality of memory channels and to the compute array. The IC further includes a controller configured to determine that each of the plurality of remote buffers has data stored therein and, in response, broadcast a read enable signal to each of the plurality of remote buffers initiating data transfers from the plurality of remote buffers to the compute array across the plurality of dies.Type: ApplicationFiled: December 6, 2019Publication date: June 10, 2021Applicant: Xilinx, Inc.Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
-
Patent number: 10673438Abstract: A digital signal processor (DSP) slice is disclosed. The DSP slice includes an input stage to receive a plurality of input signals, a pre-adder coupled to the input stage and configured to perform one or more operations on one or more of the plurality of input signals, and a multiplier coupled to the input stage and the pre-adder and configured to perform one or more multiplication operations on one or more of the plurality of input signals or the output of the pre-adder. The DSP slice further includes an arithmetic logic unit (ALU) coupled to the input stage, the pre-adder, and the multiplier. The ALU is configured to perform one or more mathematical or logical operations on one or more of the plurality of input signals, the output of the pre-adder, or the output of the multiplier.Type: GrantFiled: April 2, 2019Date of Patent: June 2, 2020Assignee: XILINX, INC.Inventors: Adam Elkins, Ephrem C. Wu, John M. Thendean, Adnan Pratama, Yashodhara Parulkar, Xiaoqian Zhang
-
Publication number: 20200026989Abstract: A circuit arrangement includes an array of MAC circuits, wherein each MAC circuit includes a cache configured for storage of a plurality of kernels. The MAC circuits are configured to receive a first set of data elements of an IFM at a first rate. The MAC circuits are configured to perform first MAC operations on the first set of the data elements and a first one of the kernels associated with a first OFM depth index during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate. The MAC circuits are configured to perform second MAC operations on the first set of the data elements and a second one of the kernels associated with a second OFM depth index during a second MAC cycle that consecutively follows the first MAC cycle.Type: ApplicationFiled: July 19, 2018Publication date: January 23, 2020Applicant: Xilinx, Inc.Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
-
Patent number: 10346093Abstract: Disclosed circuitry includes RAM circuits, a memory controller, and an array of processing circuits. Each RAM circuit includes a read port and a write port. The memory controller accesses tensor data arranged in banks of tensor buffers in the RAM circuits. The memory controller is coupled to each read port by shared read control signal lines and to each write port by shared write control signal lines. The memory controller generates read control and write control signals for accessing different ones of the tensor buffers at different times. The array of processing circuits is coupled to one of the RAM circuits. The array includes multiple rows and multiple of columns of processing circuits for performing tensor operations on the tensor data. The processing circuits in each row in each array of processing circuits are coupled to input the same tensor data.Type: GrantFiled: March 16, 2018Date of Patent: July 9, 2019Assignee: XILINX, INC.Inventors: Ephrem C. Wu, Xiaoqian Zhang, David Berman
-
Patent number: 10141938Abstract: An example semiconductor device includes a first integrated circuit (IC) die including a first column of cascade-coupled resource blocks; a second IC die including a second column of cascade-coupled resource blocks, where an active side of the second IC die is mounted to an active side of the first IC die; and a plurality of electrical connections between the active side of the first IC and the active side of the second IC, the plurality of electrical connections including at least one electrical connection between the first column of cascade-coupled resource blocks and the second column of cascade-coupled resource blocks.Type: GrantFiled: September 21, 2016Date of Patent: November 27, 2018Assignee: XILINX, INC.Inventor: Ephrem C. Wu
-
Publication number: 20180083635Abstract: An example semiconductor device includes a first integrated circuit (IC) die including a first column of cascade-coupled resource blocks; a second IC die including a second column of cascade-coupled resource blocks, where an active side of the second IC die is mounted to an active side of the first IC die; and a plurality of electrical connections between the active side of the first IC and the active side of the second IC, the plurality of electrical connections including at least one electrical connection between the first column of cascade-coupled resource blocks and the second column of cascade-coupled resource blocks.Type: ApplicationFiled: September 21, 2016Publication date: March 22, 2018Applicant: Xilinx, Inc.Inventor: Ephrem C. Wu
-
Patent number: 9858006Abstract: A memory device can be used with a shared routing resource that provides access to the memory device. The memory device can include a random access memory (RAM) circuit that includes a plurality of ports configured to provide access to the RAM circuit by the shared routing resource. A memory partition register circuit can be configured to store a plurality of addresses specifying respective context partitions within the RAM circuit. A plurality of pointer register circuits that can each be associated with a corresponding port of the plurality of ports and can be configured to store a respective set of pointers that specify a location in the RAM circuit relative to a respective context partition. Addressing logic that can be configured to provide access to the RAM circuit using the respective set of pointers for each port.Type: GrantFiled: October 13, 2015Date of Patent: January 2, 2018Assignee: XILINX, INC.Inventor: Ephrem C. Wu
-
Patent number: 9779786Abstract: A system includes global memory circuitry configured to store input tensors and output tensors. Row data paths are each connected to an output port of the memory circuitry. Column data paths are connected to an input port of the memory circuitry. Processing elements are arranged in rows and columns along the row data paths and column data paths, respectively. The processing elements include local memory circuitry configured to store multiple masks and processing circuitry. The processing circuitry is configured to receive portions of the input tensors from one of the row data paths; receive masks from the local memory circuitry; perform multiple tensor operations on a same received portion of an input tensors by applying a different retrieved mask for each tensor operation; and generate, using results of the multiple tensor operations, an output for a corresponding column data path.Type: GrantFiled: October 26, 2016Date of Patent: October 3, 2017Assignee: XILINX, INC.Inventors: Ephrem C. Wu, Inkeun Cho, Xiaoqian Zhang
-
Patent number: 9666266Abstract: In disclosed circuit arrangements, memory cell arrays are addressed by a first portion of an input address, and memory cells within each memory cell array are addressed by a second portion of the input address. A first first-in-first-out (FIFO) buffer is coupled to the memory cell arrays and delays the second portion of each input address to the memory cell arrays for a sleep period. Control circuits respectively coupled to the memory cell arrays include second FIFO buffers and decode the first portion of each input address and generate corresponding states of enable signals. The control circuits store the corresponding states of the enable signals in the second FIFO buffers concurrently with input of the second portion of each input address to the first FIFO buffer. The second FIFO buffers delay output of the corresponding states of the enable signals to the memory cell arrays for the sleep period.Type: GrantFiled: May 9, 2016Date of Patent: May 30, 2017Assignee: XILINX, INC.Inventors: Hongbin Ji, Ephrem C. Wu, Thomas H. Strader
-
Patent number: 9460007Abstract: An apparatus relates generally to time sharing of an arithmetic unit. In such an apparatus, a controller is coupled to provide read pointers and write pointers. A memory block is coupled to receive the read pointers and the write pointers. A selection network is coupled to the memory block and the arithmetic unit. The memory block includes a write-data network, a read-data network, and memory banks.Type: GrantFiled: September 24, 2014Date of Patent: October 4, 2016Assignee: XILINX, INC.Inventors: Ephrem C. Wu, Xiaoqian Zhang
-
Patent number: 9431095Abstract: A memory circuit includes an input stage having N input ports and N output ports, wherein N is an integer greater than one. The memory circuit further includes an N:1 port multiplexer coupled to the N output ports of the input stage and configured to time division multiplex the N output ports to one multiplexed port. The memory circuit also includes a random access memory matrix and a 1:N port multiplexer. The memory circuit is coupled to the multiplexed port. The 1:N port multiplexer is coupled to the random access memory matrix and is configured to de-multiplex signals received from the random access memory matrix into N output ports.Type: GrantFiled: December 10, 2014Date of Patent: August 30, 2016Assignee: XILINX, INC.Inventor: Ephrem C. Wu
-
Patent number: 9378170Abstract: An apparatus relating generally to encoding is disclosed. This apparatus includes a bus interface for communicating information from a first die including the bus interface to a second die. A first portion of a bus associated with the bus interface is associated with data bits. A second portion of the bus associated with the bus interface is associated with encoding bits. The bus interface is configured to encode a data word to provide an encoded word. The encoded word is associated with a combinatorial number system.Type: GrantFiled: March 14, 2013Date of Patent: June 28, 2016Assignee: XILINX, INC.Inventor: Ephrem C. Wu
-
Patent number: 9153292Abstract: An integrated circuit device having memory is disclosed. The integrated circuit device comprises programmable resources; programmable interconnect elements coupled to the programmable resources, the programmable interconnect elements enabling a communication of signals with the programmable resources; a plurality of memory blocks; and dedicated interconnect elements coupled to the plurality of memory blocks, the dedicated interconnect elements enabling access to the plurality of memory blocks. A method of implementing memory in an integrated circuit device is also disclosed.Type: GrantFiled: March 7, 2013Date of Patent: October 6, 2015Assignee: XILINX, INC.Inventor: Ephrem C. Wu