Patents by Inventor Zhibin Xiao
Zhibin Xiao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240086151Abstract: This application describes hybrid hardware accelerators, systems, and apparatus for performing various computations in neural network applications using the same set of hardware resources. An example accelerator may include weight selectors, activation input interfaces, and a plurality of Multiplier-Accumulation (MAC) circuits organized as a plurality of MAC lanes Each of the plurality of MAC lanes may be configured to: receive a control signal indicating whether to perform convolution or vector operations; receive one or more weights according to the control signal; receive one or more activations according to the control signal; and generate output data based on the one or more weights and the one or more input activations according to the control signal and feed the output data into an output buffer. Each of the plurality of MAC lanes includes a plurality of multiplier circuits and a plurality of adder-subtractor circuits.Type: ApplicationFiled: April 3, 2023Publication date: March 14, 2024Inventors: Xiaoqian ZHANG, Zhibin XIAO, Changxu ZHANG, Renjie CHEN
-
Patent number: 11868307Abstract: This application describes a hardware accelerator and a device for accelerating neural network computations. An example accelerator may include multiple cores and a central processing unit (CPU) respectively associated with DDRs, a data exchange interface connecting a host device to the accelerator, and a three-layer NoC architecture. The three-layer NoC architecture includes an outer-layer NoC configured to transfer data between the host device and the DDRs, a middle-layer NoC configured to transfer data among the plurality of cores; and an inner-layer NoC within each core and including a cross-bar network for broadcasting weights and activations of neural networks from a global buffer of the core to a plurality of processing entity (PE) clusters within the core.Type: GrantFiled: May 15, 2023Date of Patent: January 9, 2024Assignee: Moffett International Co., LimitedInventors: Xiaoqian Zhang, Zhibin Xiao
-
Patent number: 11763150Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for balanced-weight sparse convolution processing. An exemplary method comprises: obtaining an input tensor and a plurality of filters at a layer within a neural network; segmenting the input tensor into a plurality of sub-tensors; dividing a channel dimension of each of the plurality of filters into a plurality of channel groups; pruning each of the plurality of filters so that each of the plurality of channel groups of each filter comprises a same number of non-zero weights; segmenting each of the plurality of filters into a plurality of the sub-filters according to the plurality of channel groups; and assigning the plurality of sub-tensors and the plurality of sub-filters to a plurality of processors for parallel convolution processing.Type: GrantFiled: August 2, 2021Date of Patent: September 19, 2023Assignee: Moffett International Co., LimitedInventors: Zhibin Xiao, Enxu Yan, Wei Wang, Yong Lu
-
Publication number: 20230259758Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving efficiency of neural network computations using adaptive tensor compute kernels. First, the adaptive tensor compute kernels may adjust shapes according to the different shapes of input/weight tensors for distributing the weights and input values to a processing elements (PE) array for parallel processing. Depending on the shape of the tensor compute kernels, additional inter-cluster or intra-cluster adders may be needed to perform convolution computations. Second, the adaptive tensor compute kernels may support two different tensor operation modes, i.e., 1×1 tensor operation mode and 3×3 tensor operation mode, to cover all types of convolution computations. Third, the underlying PE array may configure each PE-internal buffer (e.g., a register file) differently to support different compression ratios and sparsity granularities of sparse neural networks.Type: ApplicationFiled: February 16, 2022Publication date: August 17, 2023Inventors: XIAOQIAN ZHANG, ENXU YAN, ZHIBIN XIAO
-
Patent number: 11726746Abstract: This application describes hybrid hardware accelerators, systems, and apparatus for performing various computations in neural network applications using the same set of hardware resources. An example accelerator may include weight selectors, activation input interfaces, and a plurality of Multiplier-Accumulation (MAC) circuits organized as a plurality of MAC lanes Each of the plurality of MAC lanes may be configured to: receive a control signal indicating whether to perform convolution or vector operations; receive one or more weights according to the control signal; receive one or more activations according to the control signal; and generate output data based on the one or more weights and the one or more input activations according to the control signal and feed the output data into an output buffer. Each of the plurality of MAC lanes includes a plurality of multiplier circuits and a plurality of adder-subtractor circuits.Type: GrantFiled: September 14, 2022Date of Patent: August 15, 2023Assignee: Moffett International Co., LimitedInventors: Xiaoqian Zhang, Zhibin Xiao, Changxu Zhang, Renjie Chen
-
Publication number: 20230111362Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallelizing convolution processing. An exemplary method comprises: segmenting an input tensor into a plurality of sub-tensors and a plurality of filters into a plurality of sub-filter groups; respectively assigning a plurality of combinations of the sub-tensors and the sub-filter groups to a plurality of processors; storing, by each of the plurality of processors, nonzero values of the sub-tensor and the sub-filter group in the assigned combination as index-value pairs; parallelly performing for a plurality of iterations, by the plurality of processors, multiply-and-accumulate (MAC) operations based on the index-value pairs to obtain a plurality of outputs, where the index-value pairs of the sub-filter groups are rotated among the plurality of processors across the plurality of iterations; and aggregating the plurality of outputs as an output tensor.Type: ApplicationFiled: December 12, 2022Publication date: April 13, 2023Inventors: Enxu YAN, Yong LU, Wei WANG, Zhibin XIAO, Jiachao LIU, Hengchang XIONG
-
Patent number: 11586601Abstract: The present disclosure relates to a method and an apparatus for representation of a sparse matrix in a neural network. In some embodiments, an exemplary operation unit includes a buffer for storing a representation of a sparse matrix in a neural network, a sparse engine communicatively coupled with the buffer, and a processing array communicatively coupled with the sparse engine. The sparse engine includes circuitry to: read the representation of the sparse matrix from the buffer, the representation comprising a first level bitmap, a second level bitmap, and an element array; decompress the first level bitmap to determine whether a block of the sparse matrix comprises a non-zero element; and in response to the block comprising a non-zero element, decompress the second level bitmap using the element array to obtain the block of the sparse matrix. The processing array includes circuitry to execute the neural network with the sparse matrix.Type: GrantFiled: February 5, 2020Date of Patent: February 21, 2023Assignee: Alibaba Group Holding LimitedInventors: Zhibin Xiao, Xiaoxin Fan, Minghai Qin
-
Patent number: 11568021Abstract: Vector-vector multiplication or matrix-matrix multiplication computation on computing systems can include computing a first portion of a vector-vector multiplication product based on a most-significant-bit set of a first vector and a most-significant-bit set of a second vector, and determining if the first portion of the vector-vector multiplication product is less than a threshold. If the first partial vector-vector multiplication product is not less than the threshold, a remaining portion of the vector-vector multiplication product can be computed, and a rectified linear vector-vector multiplication product can be determined for the sum of the first portion of the vector-vector multiplication product and the remaining portion of the vector-vector multiplication product.Type: GrantFiled: February 21, 2020Date of Patent: January 31, 2023Assignee: Alibaba Group Holding LimitedInventors: Minghai Qin, Zhibin Xiao, Chunsheng Liu
-
Publication number: 20230021511Abstract: Disclosed are systems and methods that determine whether instances of data (e.g., forward activations, backward derivatives of activations) that are used to train deep neural networks are to be stored on-chip or off-chip. The disclosed systems and methods are also used to prune the data (discard or delete selected instances of data). A system includes a hierarchical arrangement of on-chip and off-chip memories, and also includes a hierarchical arrangement of data selector devices that are used to decide whether to discard data and where in the system the data is to be discarded.Type: ApplicationFiled: October 4, 2022Publication date: January 26, 2023Inventors: Minghai QIN, Chunsheng LIU, Zhibin XIAO, Tianchan GUAN, Yuan GAO
-
Patent number: 11455222Abstract: Systems and methods are provided for testing many-core processors consisting of processing element cores. The systems and methods can include grouping the processing elements according to the dataflow of the many-core processor. Each group can include a processing element that only receives inputs from other processing elements in the group. After grouping the processing elements, test information can be provided in parallel to each group. The test information can be configured to ensure a desired degree of test coverage for the processing element that that only receives inputs from other processing elements in the group. Each group can perform testing operations in parallel to generate test results. The test results can be read out of each group. The processing elements can then be regrouped according to the dataflow of the many-core processor and the testing can be repeated to achieve a target test coverage.Type: GrantFiled: March 30, 2020Date of Patent: September 27, 2022Assignee: Alibaba Group Holding LimitedInventors: Chunsheng Meon Liu, Arjun Chaudhuri, Zhibin Xiao
-
Publication number: 20220268019Abstract: The present disclosure relates to a notched steel beam and a floor slab structure of a flange embedded floor slab and a construction method. The notched steel beam comprises a web (1), wherein an upper flange (2) and a lower flange (3) are respectively arranged on the upper end and the lower end of the web (1); the flange embedded floor slab comprises four rectangularly distributed floor slab stand columns (7), a steel beam (8) is arranged between the adjacent floor slab stand columns (7), laminated slab bottom slabs (9) are arranged between the two steel beams (8) which are symmetrically distributed, floor slab reinforcing steel bars (10) are arranged above the laminated slab bottom slabs (9), a concrete layer (11) is arranged on the floor slab reinforcing steel bars (10); and the steel beam (8) is the notched steel beam of the flange embedded floor slab.Type: ApplicationFiled: January 6, 2022Publication date: August 25, 2022Applicant: The Architectural Design & Research Institute of Zhejiang University Co., Ltd.Inventors: Quanbiao XU, Benyue LI, Mingshan ZHANG, Jiawei ZHOU, Zhibin XIAO, Shunfeng GONG, Liang XIA, Kepeng CHEN, Jiayin YANG, Yuxuan WANG
-
Publication number: 20220228381Abstract: The present disclosure provides a combined prefabricated reinforced concrete stair mold and a splicing method. The combined prefabricated reinforced concrete stair mold comprises a bottom mold platform (1), wherein an upper platform module (2), a tread module (3) and a lower platform module (4) which are spliced with one another are sequentially arranged on the upper surface of the bottom mold platform (1), and corner modules (5) are arranged between the upper platform module (2) and the tread module (3) and between the tread module (3) and the lower platform module (4); the tread module (3) comprises a tread upper surface mold (6) and a tread lower surface mold (7); and the tread lower surface mold (7) comprises a plurality of combined templates (8) formed by mutual splicing, and the tread upper surface mold (6) comprises a plurality of tread splicing pieces (9) formed by mutual splicing.Type: ApplicationFiled: January 5, 2022Publication date: July 21, 2022Applicants: Zhejiang University, The Architectural Design & Research Institute of Zhejiang University Co., Ltd.Inventors: Benyue LI, Quanbiao XU, Mingshan ZHANG, Zhibin XIAO, Jiayin YANG, Liang XIA, Kepeng CHEN, Tao HONG, Minwei CHEN
-
Patent number: 11366690Abstract: A method and an apparatus for scheduling commands in a virtual computing environment includes picking a command. It is determined whether the command is a synchronization command or a conditional command. A synchronization command is an independent command. A conditional command is a dependent command that depends on a synchronization command. In response to the command being determined as the synchronization command, a waiting queue is enabled for the command, the waiting queue storing conditional commands dependent on a running synchronization command. The command is dispatched to a processing engine.Type: GrantFiled: December 2, 2019Date of Patent: June 21, 2022Assignee: Alibaba Group Holding LimitedInventors: Zhibin Xiao, Chunsheng Liu, Yuan Xie
-
Publication number: 20220147826Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for convolution with workload-balanced activation sparsity are described. An exemplary method comprises: assigning an input tensor and a weight tensor at a convolution layer into a plurality of processors to perform Multiply-Accumulate (MAC) operations in parallel based on the input tensor and the weight tensor; obtaining a plurality of output values based on results of the MAC operations; constructing one or more banks of output values based on the plurality of output values; for each of the banks, performing a top-K sorting on the one or more output values in the bank to obtain K output values; pruning each of the banks by setting the one or more output values other than the obtained K output values in the each bank as zeros; and constructing an output tensor of the convolution layer based on the pruned banks.Type: ApplicationFiled: November 6, 2020Publication date: May 12, 2022Inventors: ZHIBIN XIAO, ENXU YAN, YONG LU, WEI WANG
-
Publication number: 20210406686Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for balanced-weight sparse convolution processing. An exemplary method comprises: obtaining an input tensor and a plurality of filters at a layer within a neural network; segmenting the input tensor into a plurality of sub-tensors; dividing a channel dimension of each of the plurality of filters into a plurality of channel groups; pruning each of the plurality of filters so that each of the plurality of channel groups of each filter comprises a same number of non-zero weights; segmenting each of the plurality of filters into a plurality of the sub-filters according to the plurality of channel groups; and assigning the plurality of sub-tensors and the plurality of sub-filters to a plurality of processors for parallel convolution processing.Type: ApplicationFiled: August 2, 2021Publication date: December 30, 2021Inventors: ZHIBIN XIAO, ENXU YAN, WEI WANG, YONG LU
-
Patent number: 11200497Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for knowledge-preserving sparse pruning on neural networks are described. An exemplary method includes obtaining a pre-trained machine learning model trained based on a plurality of general-purpose training data; training a task-specific machine learning model by tuning the pre-trained machine learning model based on a plurality of task-specific training data corresponding to a task; constructing a student network based on the task-specific machine learning model; simultaneously performing (1) knowledge distillation from the trained task-specific machine learning model as a teacher network to the student network and (2) network pruning on the student network; and obtaining the trained student network for serving the task.Type: GrantFiled: March 16, 2021Date of Patent: December 14, 2021Assignee: MOFFETT TECHNOLOGIES CO., LIMITEDInventors: Enxu Yan, Dongkuan Xu, Zhibin Xiao
-
Patent number: 11144823Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for hierarchical weight-sparse convolution processing are described. An exemplary method comprises: obtaining an input tensor and a filter at a convolution layer of a neural network; segmenting the filter into a plurality of sub-filters; generating a hierarchical bit representation of the filter representing a plurality of non-zero weights in the filter, wherein the hierarchical bit representation comprises a first layer, the first layer comprising a plurality of bits respectively corresponding to the plurality of sub-filters in the filter, each of the plurality of bits indicating whether the corresponding sub-filter includes at least one non-zero weight; and performing multiply-and-accumulate (MAC) operations based on the hierarchical bit representation of the filter and the input tensor.Type: GrantFiled: April 5, 2021Date of Patent: October 12, 2021Assignee: MOFFETT TECHNOLOGIES CO., LIMITEDInventors: Zhibin Xiao, Enxu Yan, Wei Wang, Yong Lu
-
Publication number: 20210303426Abstract: Systems and methods are provided for testing many-core processors consisting of processing element cores. The systems and methods can include grouping the processing elements according to the dataflow of the many-core processor. Each group can include a processing element that only receives inputs from other processing elements in the group. After grouping the processing elements, test information can be provided in parallel to each group. The test information can be configured to ensure a desired degree of test coverage for the processing element that that only receives inputs from other processing elements in the group. Each group can perform testing operations in parallel to generate test results. The test results can be read out of each group. The processing elements can then be regrouped according to the dataflow of the many-core processor and the testing can be repeated to achieve a target test coverage.Type: ApplicationFiled: March 30, 2020Publication date: September 30, 2021Inventors: Chunsheng Meon LIU, Arjun Chaudhuri, Zhibin Xiao
-
Patent number: 11113601Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for balanced-weight sparse convolution processing.Type: GrantFiled: June 30, 2020Date of Patent: September 7, 2021Assignee: MOFFETT TECHNOLOGIES CO., LIMITEDInventors: Zhibin Xiao, Enxu Yan, Wei Wang, Yong Lu
-
Publication number: 20210263992Abstract: Vector-vector multiplication or matrix-matrix multiplication computation on computing systems can include computing a first portion of a vector-vector multiplication product based on a most-significant-bit set of a first vector and a most-significant-bit set of a second vector, and determining if the first portion of the vector-vector multiplication product is less than a threshold. If the first partial vector-vector multiplication product is not less than the threshold, a remaining portion of the vector-vector multiplication product can be computed, and a rectified linear vector-vector multiplication product can be determined for the sum of the first portion of the vector-vector multiplication product and the remaining portion of the vector-vector multiplication product.Type: ApplicationFiled: February 21, 2020Publication date: August 26, 2021Inventors: Minghai QIN, Zhibin XIAO, Chunsheng LIU