Patents by Inventor Xiaoqian Zhang

Xiaoqian Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240086151
    Abstract: This application describes hybrid hardware accelerators, systems, and apparatus for performing various computations in neural network applications using the same set of hardware resources. An example accelerator may include weight selectors, activation input interfaces, and a plurality of Multiplier-Accumulation (MAC) circuits organized as a plurality of MAC lanes Each of the plurality of MAC lanes may be configured to: receive a control signal indicating whether to perform convolution or vector operations; receive one or more weights according to the control signal; receive one or more activations according to the control signal; and generate output data based on the one or more weights and the one or more input activations according to the control signal and feed the output data into an output buffer. Each of the plurality of MAC lanes includes a plurality of multiplier circuits and a plurality of adder-subtractor circuits.
    Type: Application
    Filed: April 3, 2023
    Publication date: March 14, 2024
    Inventors: Xiaoqian ZHANG, Zhibin XIAO, Changxu ZHANG, Renjie CHEN
  • Patent number: 11868307
    Abstract: This application describes a hardware accelerator and a device for accelerating neural network computations. An example accelerator may include multiple cores and a central processing unit (CPU) respectively associated with DDRs, a data exchange interface connecting a host device to the accelerator, and a three-layer NoC architecture. The three-layer NoC architecture includes an outer-layer NoC configured to transfer data between the host device and the DDRs, a middle-layer NoC configured to transfer data among the plurality of cores; and an inner-layer NoC within each core and including a cross-bar network for broadcasting weights and activations of neural networks from a global buffer of the core to a plurality of processing entity (PE) clusters within the core.
    Type: Grant
    Filed: May 15, 2023
    Date of Patent: January 9, 2024
    Assignee: Moffett International Co., Limited
    Inventors: Xiaoqian Zhang, Zhibin Xiao
  • Publication number: 20230407928
    Abstract: A brake dust filtering apparatus (100), comprising: a base (10) which is provided with multiple first through holes (101); a scribing sheet (20) which is provided with multiple second through holes (201) and can be slidably connected to the base (10); a filter screen (30) which is disposed on the side of the base (10) close to a brake caliper (200) and covers the first through holes (101); and a drive member (40) which is used for driving the scribing sheet (20) to slide with respect to the base (10), so as to form an off state in which each of the first through holes (101) and each of the second through holes (201) are staggered and an on state that the multiple first through holes (101) at least partially overlap the multiple second through holes (201). Further disclosed is a vehicle.
    Type: Application
    Filed: November 23, 2020
    Publication date: December 21, 2023
    Applicant: Wuhan Lotus Cars Co., Ltd.
    Inventors: Bowen ZHENG, Xiaoqian ZHANG
  • Publication number: 20230259758
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving efficiency of neural network computations using adaptive tensor compute kernels. First, the adaptive tensor compute kernels may adjust shapes according to the different shapes of input/weight tensors for distributing the weights and input values to a processing elements (PE) array for parallel processing. Depending on the shape of the tensor compute kernels, additional inter-cluster or intra-cluster adders may be needed to perform convolution computations. Second, the adaptive tensor compute kernels may support two different tensor operation modes, i.e., 1×1 tensor operation mode and 3×3 tensor operation mode, to cover all types of convolution computations. Third, the underlying PE array may configure each PE-internal buffer (e.g., a register file) differently to support different compression ratios and sparsity granularities of sparse neural networks.
    Type: Application
    Filed: February 16, 2022
    Publication date: August 17, 2023
    Inventors: XIAOQIAN ZHANG, ENXU YAN, ZHIBIN XIAO
  • Patent number: 11726746
    Abstract: This application describes hybrid hardware accelerators, systems, and apparatus for performing various computations in neural network applications using the same set of hardware resources. An example accelerator may include weight selectors, activation input interfaces, and a plurality of Multiplier-Accumulation (MAC) circuits organized as a plurality of MAC lanes Each of the plurality of MAC lanes may be configured to: receive a control signal indicating whether to perform convolution or vector operations; receive one or more weights according to the control signal; receive one or more activations according to the control signal; and generate output data based on the one or more weights and the one or more input activations according to the control signal and feed the output data into an output buffer. Each of the plurality of MAC lanes includes a plurality of multiplier circuits and a plurality of adder-subtractor circuits.
    Type: Grant
    Filed: September 14, 2022
    Date of Patent: August 15, 2023
    Assignee: Moffett International Co., Limited
    Inventors: Xiaoqian Zhang, Zhibin Xiao, Changxu Zhang, Renjie Chen
  • Patent number: 11531869
    Abstract: Embodiments herein describe circuitry with improved efficiency when executing layers in a nested neural network. As mentioned above, a nested neural network has at least one split operation where a tensor generated by a first layer is transmitted to, and processed by several branches in the neural network. Each of these branches can have several layers that have data dependencies which result in a multiply-add array sitting idly. In one embodiment, the circuitry can include a dedicated pre-pooler for performing a pre-pooling operation. Thus, the pre-pooling operation can be performing in parallel with other operations (e.g., the convolution performed by another layer). Once the multiply-add array is idle, the pre-pooling operation has already completed (or at least, has already started) which means the time the multiply-add array must wait before it can perform the next operation is reduced or eliminated.
    Type: Grant
    Filed: March 28, 2019
    Date of Patent: December 20, 2022
    Assignee: XILINX, INC.
    Inventors: Ephrem C. Wu, David Berman, Xiaoqian Zhang
  • Patent number: 11429851
    Abstract: Disclosed circuits and methods involve a first register configured to store of a first convolutional neural network (CNN) instruction during processing of the first CNN instruction and a second register configured to store a second CNN instruction during processing of the second CNN instruction. Each of a plurality of address generation circuits is configured to generate one or more addresses in response to an input CNN instruction. Control circuitry is configured to select one of the first CNN instruction or the second CNN instruction as input to the address generation circuits.
    Type: Grant
    Filed: December 13, 2018
    Date of Patent: August 30, 2022
    Assignee: XILINX, INC.
    Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
  • Patent number: 11429850
    Abstract: A circuit arrangement includes an array of MAC circuits, wherein each MAC circuit includes a cache configured for storage of a plurality of kernels. The MAC circuits are configured to receive a first set of data elements of an IFM at a first rate. The MAC circuits are configured to perform first MAC operations on the first set of the data elements and a first one of the kernels associated with a first OFM depth index during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate. The MAC circuits are configured to perform second MAC operations on the first set of the data elements and a second one of the kernels associated with a second OFM depth index during a second MAC cycle that consecutively follows the first MAC cycle.
    Type: Grant
    Filed: July 19, 2018
    Date of Patent: August 30, 2022
    Assignee: XILINX, INC.
    Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
  • Patent number: 11132296
    Abstract: The embodiments herein store tabulated values representing a linear or non-linear function in separate memory banks to reduce the size of memory used to store the tabulated values while being able to provide upper and lower values for performing linear interpolation in parallel (e.g., the same cycle). To do so, a linear interpolation system includes a first memory bank that stores the even indexed tabulated values while a second memory bank stores the odd indexed tabulated values. During each clock cycle, the first and second memory banks can output upper and lower values for linear interpolation (although which memory bank outputs the upper value and which outputs the lower value can vary). Using the upper and lower values, the linear interpolation system performs linear interpolation to approximate the value of a non-linear function that is between the upper and lower values.
    Type: Grant
    Filed: July 12, 2018
    Date of Patent: September 28, 2021
    Assignee: XILINX, INC.
    Inventors: Ephrem C. Wu, Xiaoqian Zhang
  • Patent number: 11127442
    Abstract: An integrated circuit (IC) includes a plurality of dies. The IC includes a plurality of memory channel interfaces configured to communicate with a memory, wherein the plurality of memory channel interfaces are disposed within a first die of the plurality of dies. The IC may include a compute array distributed across the plurality of dies and a plurality of remote buffers distributed across the plurality of dies. The plurality of remote buffers are coupled to the plurality of memory channels and to the compute array. The IC further includes a controller configured to determine that each of the plurality of remote buffers has data stored therein and, in response, broadcast a read enable signal to each of the plurality of remote buffers initiating data transfers from the plurality of remote buffers to the compute array across the plurality of dies.
    Type: Grant
    Filed: December 6, 2019
    Date of Patent: September 21, 2021
    Assignee: Xilinx, Inc.
    Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
  • Publication number: 20210174848
    Abstract: An integrated circuit (IC) includes a plurality of dies. The IC includes a plurality of memory channel interfaces configured to communicate with a memory, wherein the plurality of memory channel interfaces are disposed within a first die of the plurality of dies. The IC may include a compute array distributed across the plurality of dies and a plurality of remote buffers distributed across the plurality of dies. The plurality of remote buffers are coupled to the plurality of memory channels and to the compute array. The IC further includes a controller configured to determine that each of the plurality of remote buffers has data stored therein and, in response, broadcast a read enable signal to each of the plurality of remote buffers initiating data transfers from the plurality of remote buffers to the compute array across the plurality of dies.
    Type: Application
    Filed: December 6, 2019
    Publication date: June 10, 2021
    Applicant: Xilinx, Inc.
    Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
  • Patent number: 10673438
    Abstract: A digital signal processor (DSP) slice is disclosed. The DSP slice includes an input stage to receive a plurality of input signals, a pre-adder coupled to the input stage and configured to perform one or more operations on one or more of the plurality of input signals, and a multiplier coupled to the input stage and the pre-adder and configured to perform one or more multiplication operations on one or more of the plurality of input signals or the output of the pre-adder. The DSP slice further includes an arithmetic logic unit (ALU) coupled to the input stage, the pre-adder, and the multiplier. The ALU is configured to perform one or more mathematical or logical operations on one or more of the plurality of input signals, the output of the pre-adder, or the output of the multiplier.
    Type: Grant
    Filed: April 2, 2019
    Date of Patent: June 2, 2020
    Assignee: XILINX, INC.
    Inventors: Adam Elkins, Ephrem C. Wu, John M. Thendean, Adnan Pratama, Yashodhara Parulkar, Xiaoqian Zhang
  • Publication number: 20200026989
    Abstract: A circuit arrangement includes an array of MAC circuits, wherein each MAC circuit includes a cache configured for storage of a plurality of kernels. The MAC circuits are configured to receive a first set of data elements of an IFM at a first rate. The MAC circuits are configured to perform first MAC operations on the first set of the data elements and a first one of the kernels associated with a first OFM depth index during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate. The MAC circuits are configured to perform second MAC operations on the first set of the data elements and a second one of the kernels associated with a second OFM depth index during a second MAC cycle that consecutively follows the first MAC cycle.
    Type: Application
    Filed: July 19, 2018
    Publication date: January 23, 2020
    Applicant: Xilinx, Inc.
    Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman
  • Patent number: 10346093
    Abstract: Disclosed circuitry includes RAM circuits, a memory controller, and an array of processing circuits. Each RAM circuit includes a read port and a write port. The memory controller accesses tensor data arranged in banks of tensor buffers in the RAM circuits. The memory controller is coupled to each read port by shared read control signal lines and to each write port by shared write control signal lines. The memory controller generates read control and write control signals for accessing different ones of the tensor buffers at different times. The array of processing circuits is coupled to one of the RAM circuits. The array includes multiple rows and multiple of columns of processing circuits for performing tensor operations on the tensor data. The processing circuits in each row in each array of processing circuits are coupled to input the same tensor data.
    Type: Grant
    Filed: March 16, 2018
    Date of Patent: July 9, 2019
    Assignee: XILINX, INC.
    Inventors: Ephrem C. Wu, Xiaoqian Zhang, David Berman
  • Patent number: 9896474
    Abstract: (3?,9?,10?,13?,14?,17?,20S,22E)-Ergosta-5,7,22-trien-3-ol. A method for preparing the same by drying a fruiting body of Cordyceps militaris, grinding the fruiting body to yield ultrafine powders; boiling and extracting the ultrafine powders, centrifuging and collecting a precipitate. A method for treating a tumor by administering to a patient in need of treating a tumor (3?,9?,10?,13?,14?,17?,20S,22E)-ergosta-5,7,22-trien-3-ol.
    Type: Grant
    Filed: August 3, 2016
    Date of Patent: February 20, 2018
    Assignee: ZHENGYUANTANG (TIANJIN BINHAI NEW AREA) BIOTECH CO., LTD.
    Inventors: Yaozhou Zhang, Jiachen Sun, Lei Jiang, Jian Zhang, Yujiao Chen, Xiaoqian Zhang, Simiao Du, Pengai Gu, Jinsong Cui
  • Patent number: 9779786
    Abstract: A system includes global memory circuitry configured to store input tensors and output tensors. Row data paths are each connected to an output port of the memory circuitry. Column data paths are connected to an input port of the memory circuitry. Processing elements are arranged in rows and columns along the row data paths and column data paths, respectively. The processing elements include local memory circuitry configured to store multiple masks and processing circuitry. The processing circuitry is configured to receive portions of the input tensors from one of the row data paths; receive masks from the local memory circuitry; perform multiple tensor operations on a same received portion of an input tensors by applying a different retrieved mask for each tensor operation; and generate, using results of the multiple tensor operations, an output for a corresponding column data path.
    Type: Grant
    Filed: October 26, 2016
    Date of Patent: October 3, 2017
    Assignee: XILINX, INC.
    Inventors: Ephrem C. Wu, Inkeun Cho, Xiaoqian Zhang
  • Publication number: 20160340383
    Abstract: (3?,9?,10?,13?,14?,17?,20S,22E)-Ergosta-5,7,22-trien-3-ol. A method for preparing the same by drying a fruiting body of Cordyceps militaris, grinding the fruiting body to yield ultrafine powders; boiling and extracting the ultrafine powders, centrifuging and collecting a precipitate. A method for treating a tumor by administering to a patient in need of treating a tumor (3?,9?,10?,13?,14?,17?,20S,22E)-ergosta-5,7,22-trien-3-ol.
    Type: Application
    Filed: August 3, 2016
    Publication date: November 24, 2016
    Inventors: Yaozhou ZHANG, Jiachen SUN, Lei JIANG, Jian ZHANG, Yujiao CHEN, Xiaoqian ZHANG, Simiao DU, Pengai GU, Jinsong CUI
  • Patent number: 9460007
    Abstract: An apparatus relates generally to time sharing of an arithmetic unit. In such an apparatus, a controller is coupled to provide read pointers and write pointers. A memory block is coupled to receive the read pointers and the write pointers. A selection network is coupled to the memory block and the arithmetic unit. The memory block includes a write-data network, a read-data network, and memory banks.
    Type: Grant
    Filed: September 24, 2014
    Date of Patent: October 4, 2016
    Assignee: XILINX, INC.
    Inventors: Ephrem C. Wu, Xiaoqian Zhang
  • Patent number: 9355696
    Abstract: In an example, a control device includes a data path, a clock path, a multiplexing circuit, and a calibration unit. The data path comprises a data delay unit coupled to a data input of a sampling circuit. The clock path comprises a clock delay unit coupled to a clock input of the sampling circuit. The multiplexing circuit selectively couples a reference clock or a data bus to an input of the data delay unit, and selectively couples the reference clock or a source clock to an input of the clock delay unit. The calibration unit is coupled to a data output of the sampling circuit. The calibration unit is operable to adjust delay values of the data delay unit and the clock delay unit based on the data output of the sampling circuit to establish and maintain a relative delay between the data path and the clock path.
    Type: Grant
    Filed: November 6, 2014
    Date of Patent: May 31, 2016
    Assignee: XILINX, INC.
    Inventors: Terence J. Magee, Xiaoqian Zhang
  • Patent number: D1021901
    Type: Grant
    Filed: February 21, 2022
    Date of Patent: April 9, 2024
    Inventor: Xiaoqian Zhang