Patents by Inventor Xiaoqian Zhang

Xiaoqian Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

VECTOR OPERATION ACCELERATION WITH CONVOLUTION COMPUTATION UNIT

Publication number: 20240086151

Abstract: This application describes hybrid hardware accelerators, systems, and apparatus for performing various computations in neural network applications using the same set of hardware resources. An example accelerator may include weight selectors, activation input interfaces, and a plurality of Multiplier-Accumulation (MAC) circuits organized as a plurality of MAC lanes Each of the plurality of MAC lanes may be configured to: receive a control signal indicating whether to perform convolution or vector operations; receive one or more weights according to the control signal; receive one or more activations according to the control signal; and generate output data based on the one or more weights and the one or more input activations according to the control signal and feed the output data into an output buffer. Each of the plurality of MAC lanes includes a plurality of multiplier circuits and a plurality of adder-subtractor circuits.

Type: Application

Filed: April 3, 2023

Publication date: March 14, 2024

Inventors: Xiaoqian ZHANG, Zhibin XIAO, Changxu ZHANG, Renjie CHEN
Hierarchical networks on chip (NoC) for neural network accelerator

Patent number: 11868307

Abstract: This application describes a hardware accelerator and a device for accelerating neural network computations. An example accelerator may include multiple cores and a central processing unit (CPU) respectively associated with DDRs, a data exchange interface connecting a host device to the accelerator, and a three-layer NoC architecture. The three-layer NoC architecture includes an outer-layer NoC configured to transfer data between the host device and the DDRs, a middle-layer NoC configured to transfer data among the plurality of cores; and an inner-layer NoC within each core and including a cross-bar network for broadcasting weights and activations of neural networks from a global buffer of the core to a plurality of processing entity (PE) clusters within the core.

Type: Grant

Filed: May 15, 2023

Date of Patent: January 9, 2024

Assignee: Moffett International Co., Limited

Inventors: Xiaoqian Zhang, Zhibin Xiao
BRAKE DUST FILTERING APPARATUS AND VEHICLE

Publication number: 20230407928

Abstract: A brake dust filtering apparatus (100), comprising: a base (10) which is provided with multiple first through holes (101); a scribing sheet (20) which is provided with multiple second through holes (201) and can be slidably connected to the base (10); a filter screen (30) which is disposed on the side of the base (10) close to a brake caliper (200) and covers the first through holes (101); and a drive member (40) which is used for driving the scribing sheet (20) to slide with respect to the base (10), so as to form an off state in which each of the first through holes (101) and each of the second through holes (201) are staggered and an on state that the multiple first through holes (101) at least partially overlap the multiple second through holes (201). Further disclosed is a vehicle.

Type: Application

Filed: November 23, 2020

Publication date: December 21, 2023

Applicant: Wuhan Lotus Cars Co., Ltd.

Inventors: Bowen ZHENG, Xiaoqian ZHANG
ADAPTIVE TENSOR COMPUTE KERNEL FOR SPARSE NEURAL NETWORK

Publication number: 20230259758

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving efficiency of neural network computations using adaptive tensor compute kernels. First, the adaptive tensor compute kernels may adjust shapes according to the different shapes of input/weight tensors for distributing the weights and input values to a processing elements (PE) array for parallel processing. Depending on the shape of the tensor compute kernels, additional inter-cluster or intra-cluster adders may be needed to perform convolution computations. Second, the adaptive tensor compute kernels may support two different tensor operation modes, i.e., 1×1 tensor operation mode and 3×3 tensor operation mode, to cover all types of convolution computations. Third, the underlying PE array may configure each PE-internal buffer (e.g., a register file) differently to support different compression ratios and sparsity granularities of sparse neural networks.

Type: Application

Filed: February 16, 2022

Publication date: August 17, 2023

Inventors: XIAOQIAN ZHANG, ENXU YAN, ZHIBIN XIAO
Vector operation acceleration with convolution computation unit

Patent number: 11726746

Abstract: This application describes hybrid hardware accelerators, systems, and apparatus for performing various computations in neural network applications using the same set of hardware resources. An example accelerator may include weight selectors, activation input interfaces, and a plurality of Multiplier-Accumulation (MAC) circuits organized as a plurality of MAC lanes Each of the plurality of MAC lanes may be configured to: receive a control signal indicating whether to perform convolution or vector operations; receive one or more weights according to the control signal; receive one or more activations according to the control signal; and generate output data based on the one or more weights and the one or more input activations according to the control signal and feed the output data into an output buffer. Each of the plurality of MAC lanes includes a plurality of multiplier circuits and a plurality of adder-subtractor circuits.

Type: Grant

Filed: September 14, 2022

Date of Patent: August 15, 2023

Assignee: Moffett International Co., Limited

Inventors: Xiaoqian Zhang, Zhibin Xiao, Changxu Zhang, Renjie Chen
Neural-network pooling

Patent number: 11531869

Abstract: Embodiments herein describe circuitry with improved efficiency when executing layers in a nested neural network. As mentioned above, a nested neural network has at least one split operation where a tensor generated by a first layer is transmitted to, and processed by several branches in the neural network. Each of these branches can have several layers that have data dependencies which result in a multiply-add array sitting idly. In one embodiment, the circuitry can include a dedicated pre-pooler for performing a pre-pooling operation. Thus, the pre-pooling operation can be performing in parallel with other operations (e.g., the convolution performed by another layer). Once the multiply-add array is idle, the pre-pooling operation has already completed (or at least, has already started) which means the time the multiply-add array must wait before it can perform the next operation is reduced or eliminated.

Type: Grant

Filed: March 28, 2019

Date of Patent: December 20, 2022

Assignee: XILINX, INC.

Inventors: Ephrem C. Wu, David Berman, Xiaoqian Zhang
Neural network controller

Patent number: 11429851

Abstract: Disclosed circuits and methods involve a first register configured to store of a first convolutional neural network (CNN) instruction during processing of the first CNN instruction and a second register configured to store a second CNN instruction during processing of the second CNN instruction. Each of a plurality of address generation circuits is configured to generate one or more addresses in response to an input CNN instruction. Control circuitry is configured to select one of the first CNN instruction or the second CNN instruction as input to the address generation circuits.

Type: Grant

Filed: December 13, 2018

Date of Patent: August 30, 2022

Assignee: XILINX, INC.

Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
Performing consecutive mac operations on a set of data using different kernels in a MAC circuit

Patent number: 11429850

Abstract: A circuit arrangement includes an array of MAC circuits, wherein each MAC circuit includes a cache configured for storage of a plurality of kernels. The MAC circuits are configured to receive a first set of data elements of an IFM at a first rate. The MAC circuits are configured to perform first MAC operations on the first set of the data elements and a first one of the kernels associated with a first OFM depth index during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate. The MAC circuits are configured to perform second MAC operations on the first set of the data elements and a second one of the kernels associated with a second OFM depth index during a second MAC cycle that consecutively follows the first MAC cycle.

Type: Grant

Filed: July 19, 2018

Date of Patent: August 30, 2022

Assignee: XILINX, INC.

Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
Linear interpolator of tabulated functions

Patent number: 11132296

Abstract: The embodiments herein store tabulated values representing a linear or non-linear function in separate memory banks to reduce the size of memory used to store the tabulated values while being able to provide upper and lower values for performing linear interpolation in parallel (e.g., the same cycle). To do so, a linear interpolation system includes a first memory bank that stores the even indexed tabulated values while a second memory bank stores the odd indexed tabulated values. During each clock cycle, the first and second memory banks can output upper and lower values for linear interpolation (although which memory bank outputs the upper value and which outputs the lower value can vary). Using the upper and lower values, the linear interpolation system performs linear interpolation to approximate the value of a non-linear function that is between the upper and lower values.

Type: Grant

Filed: July 12, 2018

Date of Patent: September 28, 2021

Assignee: XILINX, INC.

Inventors: Ephrem C. Wu, Xiaoqian Zhang
Data transfers between a memory and a distributed compute array

Patent number: 11127442

Abstract: An integrated circuit (IC) includes a plurality of dies. The IC includes a plurality of memory channel interfaces configured to communicate with a memory, wherein the plurality of memory channel interfaces are disposed within a first die of the plurality of dies. The IC may include a compute array distributed across the plurality of dies and a plurality of remote buffers distributed across the plurality of dies. The plurality of remote buffers are coupled to the plurality of memory channels and to the compute array. The IC further includes a controller configured to determine that each of the plurality of remote buffers has data stored therein and, in response, broadcast a read enable signal to each of the plurality of remote buffers initiating data transfers from the plurality of remote buffers to the compute array across the plurality of dies.

Type: Grant

Filed: December 6, 2019

Date of Patent: September 21, 2021

Assignee: Xilinx, Inc.

Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
DATA TRANSFERS BETWEEN A MEMORY AND A DISTRIBUTED COMPUTE ARRAY

Publication number: 20210174848

Abstract: An integrated circuit (IC) includes a plurality of dies. The IC includes a plurality of memory channel interfaces configured to communicate with a memory, wherein the plurality of memory channel interfaces are disposed within a first die of the plurality of dies. The IC may include a compute array distributed across the plurality of dies and a plurality of remote buffers distributed across the plurality of dies. The plurality of remote buffers are coupled to the plurality of memory channels and to the compute array. The IC further includes a controller configured to determine that each of the plurality of remote buffers has data stored therein and, in response, broadcast a read enable signal to each of the plurality of remote buffers initiating data transfers from the plurality of remote buffers to the compute array across the plurality of dies.

Type: Application

Filed: December 6, 2019

Publication date: June 10, 2021

Applicant: Xilinx, Inc.

Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
Digital signal processing block

Patent number: 10673438

Abstract: A digital signal processor (DSP) slice is disclosed. The DSP slice includes an input stage to receive a plurality of input signals, a pre-adder coupled to the input stage and configured to perform one or more operations on one or more of the plurality of input signals, and a multiplier coupled to the input stage and the pre-adder and configured to perform one or more multiplication operations on one or more of the plurality of input signals or the output of the pre-adder. The DSP slice further includes an arithmetic logic unit (ALU) coupled to the input stage, the pre-adder, and the multiplier. The ALU is configured to perform one or more mathematical or logical operations on one or more of the plurality of input signals, the output of the pre-adder, or the output of the multiplier.

Type: Grant

Filed: April 2, 2019

Date of Patent: June 2, 2020

Assignee: XILINX, INC.

Inventors: Adam Elkins, Ephrem C. Wu, John M. Thendean, Adnan Pratama, Yashodhara Parulkar, Xiaoqian Zhang
PERFORMING CONSECUTIVE MAC OPERATIONS ON A SET OF DATA USING DIFFERENT KERNELS IN A MAC CIRCUIT

Publication number: 20200026989

Abstract: A circuit arrangement includes an array of MAC circuits, wherein each MAC circuit includes a cache configured for storage of a plurality of kernels. The MAC circuits are configured to receive a first set of data elements of an IFM at a first rate. The MAC circuits are configured to perform first MAC operations on the first set of the data elements and a first one of the kernels associated with a first OFM depth index during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate. The MAC circuits are configured to perform second MAC operations on the first set of the data elements and a second one of the kernels associated with a second OFM depth index during a second MAC cycle that consecutively follows the first MAC cycle.

Type: Application

Filed: July 19, 2018

Publication date: January 23, 2020

Applicant: Xilinx, Inc.

Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
Memory arrangement for tensor data

Patent number: 10346093

Abstract: Disclosed circuitry includes RAM circuits, a memory controller, and an array of processing circuits. Each RAM circuit includes a read port and a write port. The memory controller accesses tensor data arranged in banks of tensor buffers in the RAM circuits. The memory controller is coupled to each read port by shared read control signal lines and to each write port by shared write control signal lines. The memory controller generates read control and write control signals for accessing different ones of the tensor buffers at different times. The array of processing circuits is coupled to one of the RAM circuits. The array includes multiple rows and multiple of columns of processing circuits for performing tensor operations on the tensor data. The processing circuits in each row in each array of processing circuits are coupled to input the same tensor data.

Type: Grant

Filed: March 16, 2018

Date of Patent: July 9, 2019

Assignee: XILINX, INC.

Inventors: Ephrem C. Wu, Xiaoqian Zhang, David Berman
(3A,9B,10A,13A,14B,17A,20S,22E)-ergosta-5,7,22-trien-3-ol and methods of preparing and using the same

Patent number: 9896474

Abstract: (3?,9?,10?,13?,14?,17?,20S,22E)-Ergosta-5,7,22-trien-3-ol. A method for preparing the same by drying a fruiting body of Cordyceps militaris, grinding the fruiting body to yield ultrafine powders; boiling and extracting the ultrafine powders, centrifuging and collecting a precipitate. A method for treating a tumor by administering to a patient in need of treating a tumor (3?,9?,10?,13?,14?,17?,20S,22E)-ergosta-5,7,22-trien-3-ol.

Type: Grant

Filed: August 3, 2016

Date of Patent: February 20, 2018

Assignee: ZHENGYUANTANG (TIANJIN BINHAI NEW AREA) BIOTECH CO., LTD.

Inventors: Yaozhou Zhang, Jiachen Sun, Lei Jiang, Jian Zhang, Yujiao Chen, Xiaoqian Zhang, Simiao Du, Pengai Gu, Jinsong Cui
Tensor operations and acceleration

Patent number: 9779786

Abstract: A system includes global memory circuitry configured to store input tensors and output tensors. Row data paths are each connected to an output port of the memory circuitry. Column data paths are connected to an input port of the memory circuitry. Processing elements are arranged in rows and columns along the row data paths and column data paths, respectively. The processing elements include local memory circuitry configured to store multiple masks and processing circuitry. The processing circuitry is configured to receive portions of the input tensors from one of the row data paths; receive masks from the local memory circuitry; perform multiple tensor operations on a same received portion of an input tensors by applying a different retrieved mask for each tensor operation; and generate, using results of the multiple tensor operations, an output for a corresponding column data path.

Type: Grant

Filed: October 26, 2016

Date of Patent: October 3, 2017

Assignee: XILINX, INC.

Inventors: Ephrem C. Wu, Inkeun Cho, Xiaoqian Zhang
(3A,9B,10A,13A,14B,17A,20S,22E)-ERGOSTA-5,7,22-TRIEN-3-OL AND METHODS OF PREPARING AND USING THE SAME

Publication number: 20160340383

Abstract: (3?,9?,10?,13?,14?,17?,20S,22E)-Ergosta-5,7,22-trien-3-ol. A method for preparing the same by drying a fruiting body of Cordyceps militaris, grinding the fruiting body to yield ultrafine powders; boiling and extracting the ultrafine powders, centrifuging and collecting a precipitate. A method for treating a tumor by administering to a patient in need of treating a tumor (3?,9?,10?,13?,14?,17?,20S,22E)-ergosta-5,7,22-trien-3-ol.

Type: Application

Filed: August 3, 2016

Publication date: November 24, 2016

Inventors: Yaozhou ZHANG, Jiachen SUN, Lei JIANG, Jian ZHANG, Yujiao CHEN, Xiaoqian ZHANG, Simiao DU, Pengai GU, Jinsong CUI
Programmable hardware blocks for time-sharing arithmetic units using memory mapping of periodic functions

Patent number: 9460007

Abstract: An apparatus relates generally to time sharing of an arithmetic unit. In such an apparatus, a controller is coupled to provide read pointers and write pointers. A memory block is coupled to receive the read pointers and the write pointers. A selection network is coupled to the memory block and the arithmetic unit. The memory block includes a write-data network, a read-data network, and memory banks.

Type: Grant

Filed: September 24, 2014

Date of Patent: October 4, 2016

Assignee: XILINX, INC.

Inventors: Ephrem C. Wu, Xiaoqian Zhang
Calibration in a control device receiving from a source synchronous interface

Patent number: 9355696

Abstract: In an example, a control device includes a data path, a clock path, a multiplexing circuit, and a calibration unit. The data path comprises a data delay unit coupled to a data input of a sampling circuit. The clock path comprises a clock delay unit coupled to a clock input of the sampling circuit. The multiplexing circuit selectively couples a reference clock or a data bus to an input of the data delay unit, and selectively couples the reference clock or a source clock to an input of the clock delay unit. The calibration unit is coupled to a data output of the sampling circuit. The calibration unit is operable to adjust delay values of the data delay unit and the clock delay unit based on the data output of the sampling circuit to establish and maintain a relative delay between the data path and the clock path.

Type: Grant

Filed: November 6, 2014

Date of Patent: May 31, 2016

Assignee: XILINX, INC.

Inventors: Terence J. Magee, Xiaoqian Zhang
Keyboard

Patent number: D1021901

Type: Grant

Filed: February 21, 2022

Date of Patent: April 9, 2024

Inventor: Xiaoqian Zhang

1 2 3 next