Patents by Inventor Zhibin Xiao

Zhibin Xiao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240086151
    Abstract: This application describes hybrid hardware accelerators, systems, and apparatus for performing various computations in neural network applications using the same set of hardware resources. An example accelerator may include weight selectors, activation input interfaces, and a plurality of Multiplier-Accumulation (MAC) circuits organized as a plurality of MAC lanes Each of the plurality of MAC lanes may be configured to: receive a control signal indicating whether to perform convolution or vector operations; receive one or more weights according to the control signal; receive one or more activations according to the control signal; and generate output data based on the one or more weights and the one or more input activations according to the control signal and feed the output data into an output buffer. Each of the plurality of MAC lanes includes a plurality of multiplier circuits and a plurality of adder-subtractor circuits.
    Type: Application
    Filed: April 3, 2023
    Publication date: March 14, 2024
    Inventors: Xiaoqian ZHANG, Zhibin XIAO, Changxu ZHANG, Renjie CHEN
  • Patent number: 11868307
    Abstract: This application describes a hardware accelerator and a device for accelerating neural network computations. An example accelerator may include multiple cores and a central processing unit (CPU) respectively associated with DDRs, a data exchange interface connecting a host device to the accelerator, and a three-layer NoC architecture. The three-layer NoC architecture includes an outer-layer NoC configured to transfer data between the host device and the DDRs, a middle-layer NoC configured to transfer data among the plurality of cores; and an inner-layer NoC within each core and including a cross-bar network for broadcasting weights and activations of neural networks from a global buffer of the core to a plurality of processing entity (PE) clusters within the core.
    Type: Grant
    Filed: May 15, 2023
    Date of Patent: January 9, 2024
    Assignee: Moffett International Co., Limited
    Inventors: Xiaoqian Zhang, Zhibin Xiao
  • Patent number: 11763150
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for balanced-weight sparse convolution processing. An exemplary method comprises: obtaining an input tensor and a plurality of filters at a layer within a neural network; segmenting the input tensor into a plurality of sub-tensors; dividing a channel dimension of each of the plurality of filters into a plurality of channel groups; pruning each of the plurality of filters so that each of the plurality of channel groups of each filter comprises a same number of non-zero weights; segmenting each of the plurality of filters into a plurality of the sub-filters according to the plurality of channel groups; and assigning the plurality of sub-tensors and the plurality of sub-filters to a plurality of processors for parallel convolution processing.
    Type: Grant
    Filed: August 2, 2021
    Date of Patent: September 19, 2023
    Assignee: Moffett International Co., Limited
    Inventors: Zhibin Xiao, Enxu Yan, Wei Wang, Yong Lu
  • Publication number: 20230259758
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving efficiency of neural network computations using adaptive tensor compute kernels. First, the adaptive tensor compute kernels may adjust shapes according to the different shapes of input/weight tensors for distributing the weights and input values to a processing elements (PE) array for parallel processing. Depending on the shape of the tensor compute kernels, additional inter-cluster or intra-cluster adders may be needed to perform convolution computations. Second, the adaptive tensor compute kernels may support two different tensor operation modes, i.e., 1×1 tensor operation mode and 3×3 tensor operation mode, to cover all types of convolution computations. Third, the underlying PE array may configure each PE-internal buffer (e.g., a register file) differently to support different compression ratios and sparsity granularities of sparse neural networks.
    Type: Application
    Filed: February 16, 2022
    Publication date: August 17, 2023
    Inventors: XIAOQIAN ZHANG, ENXU YAN, ZHIBIN XIAO
  • Patent number: 11726746
    Abstract: This application describes hybrid hardware accelerators, systems, and apparatus for performing various computations in neural network applications using the same set of hardware resources. An example accelerator may include weight selectors, activation input interfaces, and a plurality of Multiplier-Accumulation (MAC) circuits organized as a plurality of MAC lanes Each of the plurality of MAC lanes may be configured to: receive a control signal indicating whether to perform convolution or vector operations; receive one or more weights according to the control signal; receive one or more activations according to the control signal; and generate output data based on the one or more weights and the one or more input activations according to the control signal and feed the output data into an output buffer. Each of the plurality of MAC lanes includes a plurality of multiplier circuits and a plurality of adder-subtractor circuits.
    Type: Grant
    Filed: September 14, 2022
    Date of Patent: August 15, 2023
    Assignee: Moffett International Co., Limited
    Inventors: Xiaoqian Zhang, Zhibin Xiao, Changxu Zhang, Renjie Chen
  • Publication number: 20230111362
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallelizing convolution processing. An exemplary method comprises: segmenting an input tensor into a plurality of sub-tensors and a plurality of filters into a plurality of sub-filter groups; respectively assigning a plurality of combinations of the sub-tensors and the sub-filter groups to a plurality of processors; storing, by each of the plurality of processors, nonzero values of the sub-tensor and the sub-filter group in the assigned combination as index-value pairs; parallelly performing for a plurality of iterations, by the plurality of processors, multiply-and-accumulate (MAC) operations based on the index-value pairs to obtain a plurality of outputs, where the index-value pairs of the sub-filter groups are rotated among the plurality of processors across the plurality of iterations; and aggregating the plurality of outputs as an output tensor.
    Type: Application
    Filed: December 12, 2022
    Publication date: April 13, 2023
    Inventors: Enxu YAN, Yong LU, Wei WANG, Zhibin XIAO, Jiachao LIU, Hengchang XIONG
  • Patent number: 11586601
    Abstract: The present disclosure relates to a method and an apparatus for representation of a sparse matrix in a neural network. In some embodiments, an exemplary operation unit includes a buffer for storing a representation of a sparse matrix in a neural network, a sparse engine communicatively coupled with the buffer, and a processing array communicatively coupled with the sparse engine. The sparse engine includes circuitry to: read the representation of the sparse matrix from the buffer, the representation comprising a first level bitmap, a second level bitmap, and an element array; decompress the first level bitmap to determine whether a block of the sparse matrix comprises a non-zero element; and in response to the block comprising a non-zero element, decompress the second level bitmap using the element array to obtain the block of the sparse matrix. The processing array includes circuitry to execute the neural network with the sparse matrix.
    Type: Grant
    Filed: February 5, 2020
    Date of Patent: February 21, 2023
    Assignee: Alibaba Group Holding Limited
    Inventors: Zhibin Xiao, Xiaoxin Fan, Minghai Qin
  • Patent number: 11568021
    Abstract: Vector-vector multiplication or matrix-matrix multiplication computation on computing systems can include computing a first portion of a vector-vector multiplication product based on a most-significant-bit set of a first vector and a most-significant-bit set of a second vector, and determining if the first portion of the vector-vector multiplication product is less than a threshold. If the first partial vector-vector multiplication product is not less than the threshold, a remaining portion of the vector-vector multiplication product can be computed, and a rectified linear vector-vector multiplication product can be determined for the sum of the first portion of the vector-vector multiplication product and the remaining portion of the vector-vector multiplication product.
    Type: Grant
    Filed: February 21, 2020
    Date of Patent: January 31, 2023
    Assignee: Alibaba Group Holding Limited
    Inventors: Minghai Qin, Zhibin Xiao, Chunsheng Liu
  • Publication number: 20230021511
    Abstract: Disclosed are systems and methods that determine whether instances of data (e.g., forward activations, backward derivatives of activations) that are used to train deep neural networks are to be stored on-chip or off-chip. The disclosed systems and methods are also used to prune the data (discard or delete selected instances of data). A system includes a hierarchical arrangement of on-chip and off-chip memories, and also includes a hierarchical arrangement of data selector devices that are used to decide whether to discard data and where in the system the data is to be discarded.
    Type: Application
    Filed: October 4, 2022
    Publication date: January 26, 2023
    Inventors: Minghai QIN, Chunsheng LIU, Zhibin XIAO, Tianchan GUAN, Yuan GAO
  • Patent number: 11455222
    Abstract: Systems and methods are provided for testing many-core processors consisting of processing element cores. The systems and methods can include grouping the processing elements according to the dataflow of the many-core processor. Each group can include a processing element that only receives inputs from other processing elements in the group. After grouping the processing elements, test information can be provided in parallel to each group. The test information can be configured to ensure a desired degree of test coverage for the processing element that that only receives inputs from other processing elements in the group. Each group can perform testing operations in parallel to generate test results. The test results can be read out of each group. The processing elements can then be regrouped according to the dataflow of the many-core processor and the testing can be repeated to achieve a target test coverage.
    Type: Grant
    Filed: March 30, 2020
    Date of Patent: September 27, 2022
    Assignee: Alibaba Group Holding Limited
    Inventors: Chunsheng Meon Liu, Arjun Chaudhuri, Zhibin Xiao
  • Publication number: 20220268019
    Abstract: The present disclosure relates to a notched steel beam and a floor slab structure of a flange embedded floor slab and a construction method. The notched steel beam comprises a web (1), wherein an upper flange (2) and a lower flange (3) are respectively arranged on the upper end and the lower end of the web (1); the flange embedded floor slab comprises four rectangularly distributed floor slab stand columns (7), a steel beam (8) is arranged between the adjacent floor slab stand columns (7), laminated slab bottom slabs (9) are arranged between the two steel beams (8) which are symmetrically distributed, floor slab reinforcing steel bars (10) are arranged above the laminated slab bottom slabs (9), a concrete layer (11) is arranged on the floor slab reinforcing steel bars (10); and the steel beam (8) is the notched steel beam of the flange embedded floor slab.
    Type: Application
    Filed: January 6, 2022
    Publication date: August 25, 2022
    Applicant: The Architectural Design & Research Institute of Zhejiang University Co., Ltd.
    Inventors: Quanbiao XU, Benyue LI, Mingshan ZHANG, Jiawei ZHOU, Zhibin XIAO, Shunfeng GONG, Liang XIA, Kepeng CHEN, Jiayin YANG, Yuxuan WANG
  • Publication number: 20220228381
    Abstract: The present disclosure provides a combined prefabricated reinforced concrete stair mold and a splicing method. The combined prefabricated reinforced concrete stair mold comprises a bottom mold platform (1), wherein an upper platform module (2), a tread module (3) and a lower platform module (4) which are spliced with one another are sequentially arranged on the upper surface of the bottom mold platform (1), and corner modules (5) are arranged between the upper platform module (2) and the tread module (3) and between the tread module (3) and the lower platform module (4); the tread module (3) comprises a tread upper surface mold (6) and a tread lower surface mold (7); and the tread lower surface mold (7) comprises a plurality of combined templates (8) formed by mutual splicing, and the tread upper surface mold (6) comprises a plurality of tread splicing pieces (9) formed by mutual splicing.
    Type: Application
    Filed: January 5, 2022
    Publication date: July 21, 2022
    Applicants: Zhejiang University, The Architectural Design & Research Institute of Zhejiang University Co., Ltd.
    Inventors: Benyue LI, Quanbiao XU, Mingshan ZHANG, Zhibin XIAO, Jiayin YANG, Liang XIA, Kepeng CHEN, Tao HONG, Minwei CHEN
  • Patent number: 11366690
    Abstract: A method and an apparatus for scheduling commands in a virtual computing environment includes picking a command. It is determined whether the command is a synchronization command or a conditional command. A synchronization command is an independent command. A conditional command is a dependent command that depends on a synchronization command. In response to the command being determined as the synchronization command, a waiting queue is enabled for the command, the waiting queue storing conditional commands dependent on a running synchronization command. The command is dispatched to a processing engine.
    Type: Grant
    Filed: December 2, 2019
    Date of Patent: June 21, 2022
    Assignee: Alibaba Group Holding Limited
    Inventors: Zhibin Xiao, Chunsheng Liu, Yuan Xie
  • Publication number: 20220147826
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for convolution with workload-balanced activation sparsity are described. An exemplary method comprises: assigning an input tensor and a weight tensor at a convolution layer into a plurality of processors to perform Multiply-Accumulate (MAC) operations in parallel based on the input tensor and the weight tensor; obtaining a plurality of output values based on results of the MAC operations; constructing one or more banks of output values based on the plurality of output values; for each of the banks, performing a top-K sorting on the one or more output values in the bank to obtain K output values; pruning each of the banks by setting the one or more output values other than the obtained K output values in the each bank as zeros; and constructing an output tensor of the convolution layer based on the pruned banks.
    Type: Application
    Filed: November 6, 2020
    Publication date: May 12, 2022
    Inventors: ZHIBIN XIAO, ENXU YAN, YONG LU, WEI WANG
  • Publication number: 20210406686
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for balanced-weight sparse convolution processing. An exemplary method comprises: obtaining an input tensor and a plurality of filters at a layer within a neural network; segmenting the input tensor into a plurality of sub-tensors; dividing a channel dimension of each of the plurality of filters into a plurality of channel groups; pruning each of the plurality of filters so that each of the plurality of channel groups of each filter comprises a same number of non-zero weights; segmenting each of the plurality of filters into a plurality of the sub-filters according to the plurality of channel groups; and assigning the plurality of sub-tensors and the plurality of sub-filters to a plurality of processors for parallel convolution processing.
    Type: Application
    Filed: August 2, 2021
    Publication date: December 30, 2021
    Inventors: ZHIBIN XIAO, ENXU YAN, WEI WANG, YONG LU
  • Patent number: 11200497
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for knowledge-preserving sparse pruning on neural networks are described. An exemplary method includes obtaining a pre-trained machine learning model trained based on a plurality of general-purpose training data; training a task-specific machine learning model by tuning the pre-trained machine learning model based on a plurality of task-specific training data corresponding to a task; constructing a student network based on the task-specific machine learning model; simultaneously performing (1) knowledge distillation from the trained task-specific machine learning model as a teacher network to the student network and (2) network pruning on the student network; and obtaining the trained student network for serving the task.
    Type: Grant
    Filed: March 16, 2021
    Date of Patent: December 14, 2021
    Assignee: MOFFETT TECHNOLOGIES CO., LIMITED
    Inventors: Enxu Yan, Dongkuan Xu, Zhibin Xiao
  • Patent number: 11144823
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for hierarchical weight-sparse convolution processing are described. An exemplary method comprises: obtaining an input tensor and a filter at a convolution layer of a neural network; segmenting the filter into a plurality of sub-filters; generating a hierarchical bit representation of the filter representing a plurality of non-zero weights in the filter, wherein the hierarchical bit representation comprises a first layer, the first layer comprising a plurality of bits respectively corresponding to the plurality of sub-filters in the filter, each of the plurality of bits indicating whether the corresponding sub-filter includes at least one non-zero weight; and performing multiply-and-accumulate (MAC) operations based on the hierarchical bit representation of the filter and the input tensor.
    Type: Grant
    Filed: April 5, 2021
    Date of Patent: October 12, 2021
    Assignee: MOFFETT TECHNOLOGIES CO., LIMITED
    Inventors: Zhibin Xiao, Enxu Yan, Wei Wang, Yong Lu
  • Publication number: 20210303426
    Abstract: Systems and methods are provided for testing many-core processors consisting of processing element cores. The systems and methods can include grouping the processing elements according to the dataflow of the many-core processor. Each group can include a processing element that only receives inputs from other processing elements in the group. After grouping the processing elements, test information can be provided in parallel to each group. The test information can be configured to ensure a desired degree of test coverage for the processing element that that only receives inputs from other processing elements in the group. Each group can perform testing operations in parallel to generate test results. The test results can be read out of each group. The processing elements can then be regrouped according to the dataflow of the many-core processor and the testing can be repeated to achieve a target test coverage.
    Type: Application
    Filed: March 30, 2020
    Publication date: September 30, 2021
    Inventors: Chunsheng Meon LIU, Arjun Chaudhuri, Zhibin Xiao
  • Patent number: 11113601
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for balanced-weight sparse convolution processing.
    Type: Grant
    Filed: June 30, 2020
    Date of Patent: September 7, 2021
    Assignee: MOFFETT TECHNOLOGIES CO., LIMITED
    Inventors: Zhibin Xiao, Enxu Yan, Wei Wang, Yong Lu
  • Publication number: 20210263992
    Abstract: Vector-vector multiplication or matrix-matrix multiplication computation on computing systems can include computing a first portion of a vector-vector multiplication product based on a most-significant-bit set of a first vector and a most-significant-bit set of a second vector, and determining if the first portion of the vector-vector multiplication product is less than a threshold. If the first partial vector-vector multiplication product is not less than the threshold, a remaining portion of the vector-vector multiplication product can be computed, and a rectified linear vector-vector multiplication product can be determined for the sum of the first portion of the vector-vector multiplication product and the remaining portion of the vector-vector multiplication product.
    Type: Application
    Filed: February 21, 2020
    Publication date: August 26, 2021
    Inventors: Minghai QIN, Zhibin XIAO, Chunsheng LIU