Patents by Inventor Guoyang CHEN

Guoyang CHEN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240176845
    Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.
    Type: Application
    Filed: February 6, 2024
    Publication date: May 30, 2024
    Inventors: Guoyang CHEN, Yu PU, Yongzhi ZHANG, Weifeng ZHANG, Yuan XIE
  • Publication number: 20240160933
    Abstract: Methods and apparatus for reducing a size of a neural network model, the method including: compressing data of the neural network model; identifying structure information of a vector register, wherein the structure information includes a number of registers included in the vector register; comparing a number of elements in the compressed data with a first condition, wherein the first condition is determined based on the number of registers in the vector register; and in response to the number of elements satisfying the first condition, associating the compressed data with the vector register to enable loading the compressed data to the vector register.
    Type: Application
    Filed: January 23, 2024
    Publication date: May 16, 2024
    Inventors: Weifeng ZHANG, Guoyang CHEN, Yu PU, Yongzhi ZHANG, Yuan XIE
  • Patent number: 11921814
    Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.
    Type: Grant
    Filed: June 14, 2022
    Date of Patent: March 5, 2024
    Assignee: Alibaba Group Holding Limited
    Inventors: Guoyang Chen, Yu Pu, Yongzhi Zhang, Weifeng Zhang, Yuan Xie
  • Patent number: 11915138
    Abstract: Methods and apparatus for reducing a size of a neural network model, the method including: compressing data of the neural network model; identifying structure information of a vector register, wherein the structure information includes a number of registers included in the vector register; comparing a number of elements in the compressed data with a first condition, wherein the first condition is determined based on the number of registers in the vector register; and in response to the number of elements satisfying the first condition, associating the compressed data with the vector register to enable loading the compressed data to the vector register.
    Type: Grant
    Filed: February 18, 2020
    Date of Patent: February 27, 2024
    Assignee: Alibaba Group Holding Limited
    Inventors: Weifeng Zhang, Guoyang Chen, Yu Pu, Yongzhi Zhang, Yuan Xie
  • Patent number: 11669443
    Abstract: The present disclosure relates to a method for scheduling a computation graph on a processing in memory (PIM) enabled device comprising a memory block assembly. The method comprises allocating a first node of the computation graph on a first memory block of a first array of memory blocks in the memory block assembly and allocating a second node of the computation graph on a second memory block of a second array of memory blocks in the memory block assembly, wherein output data of the first node is used for executing the second node. The memory block assembly can be configured to support data transfer from the first memory block to the second memory block via an internal data coupling in the memory block assembly.
    Type: Grant
    Filed: January 17, 2020
    Date of Patent: June 6, 2023
    Assignee: Alibaba Group Holding Limited
    Inventors: Minxuan Zhou, Guoyang Chen, Weifeng Zhang
  • Publication number: 20220300577
    Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.
    Type: Application
    Filed: June 14, 2022
    Publication date: September 22, 2022
    Inventors: Guoyang CHEN, Yu PU, Yongzhi ZHANG, Weifeng ZHANG, Yuan XIE
  • Patent number: 11366875
    Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.
    Type: Grant
    Filed: March 13, 2020
    Date of Patent: June 21, 2022
    Assignee: ALIBABA GROUP HOLDING LIMITED
    Inventors: Guoyang Chen, Yu Pu, Yongzhi Zhang, Weifeng Zhang, Yuan Xie
  • Patent number: 11263131
    Abstract: Embodiments of the disclosure provide systems and methods for allocating memory space in a memory device. The system can include: a memory device for providing the memory space; and a compiler component configured for: receiving a request for allocating a data array having a plurality of data elements in the memory device, wherein each of the plurality of data elements has a logical address; generating an instruction for allocating memory space for the data array in the memory device based on the request; generating device addresses for the plurality of data elements in the memory device based on logical addresses of the plurality of data elements; and allocating the memory space for the data array in the memory device based on the device addresses and the instruction.
    Type: Grant
    Filed: April 8, 2020
    Date of Patent: March 1, 2022
    Assignee: ALIBABA GROUP HOLDING LIMITED
    Inventors: Shuangchen Li, Dimin Niu, Fei Sun, Jingjun Chu, Hongzhong Zheng, Guoyang Chen, Yingmin Li, Weifeng Zhang, Xipeng Shen
  • Publication number: 20210318955
    Abstract: Embodiments of the disclosure provide systems and methods for allocating memory space in a memory device. The system can include: a memory device for providing the memory space; and a compiler component configured for: receiving a request for allocating a data array having a plurality of data elements in the memory device, wherein each of the plurality of data elements has a logical address; generating an instruction for allocating memory space for the data array in the memory device based on the request; generating device addresses for the plurality of data elements in the memory device based on logical addresses of the plurality of data elements; and allocating the memory space for the data array in the memory device based on the device addresses and the instruction.
    Type: Application
    Filed: April 8, 2020
    Publication date: October 14, 2021
    Inventors: Shuangchen LI, Dimin NIU, Fei SUN, Jingjun CHU, Hongzhong ZHENG, Guoyang CHEN, Yingmin LI, Weifeng ZHANG, Xipeng SHEN
  • Publication number: 20210286860
    Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.
    Type: Application
    Filed: March 13, 2020
    Publication date: September 16, 2021
    Inventors: Guoyang CHEN, Yu PU, Yongzhi ZHANG, Weifeng ZHANG, Yuan XIE
  • Publication number: 20210256380
    Abstract: Methods and apparatus for reducing a size of a neural network model, the method including: compressing data of the neural network model; identifying structure information of a vector register, wherein the structure information includes a number of registers included in the vector register; comparing a number of elements in the compressed data with a first condition, wherein the first condition is determined based on the number of registers in the vector register; and in response to the number of elements satisfying the first condition, associating the compressed data with the vector register to enable loading the compressed data to the vector register.
    Type: Application
    Filed: February 18, 2020
    Publication date: August 19, 2021
    Inventors: Weifeng ZHANG, Guoyang CHEN, Yu PU, Yongzhi ZHANG, Yuan XIE
  • Publication number: 20210224185
    Abstract: The present disclosure relates to a method for scheduling a computation graph on a processing in memory (PIM) enabled device comprising a memory block assembly. The method comprises allocating a first node of the computation graph on a first memory block of a first array of memory blocks in the memory block assembly and allocating a second node of the computation graph on a second memory block of a second array of memory blocks in the memory block assembly, wherein output data of the first node is used for executing the second node. The memory block assembly can be configured to support data transfer from the first memory block to the second memory block via an internal data coupling in the memory block assembly.
    Type: Application
    Filed: January 17, 2020
    Publication date: July 22, 2021
    Inventors: Minxuan Zhou, Guoyang Chen, Weifeng Zhang
  • Publication number: 20210150311
    Abstract: The present disclosure relates to a processing in memory (PIM) enabled device for executing a neural network model. The PIM enabled device comprises a memory block assembly comprising a first array of memory blocks, a second array of memory blocks adjacent to the first array of memory blocks, a plurality of first data links associated with the first array of memory blocks and the second array of memory blocks, wherein each data link of the plurality of first data links communicatively couples two corresponding memory blocks of which are from the first array of memory blocks and the second array of memory blocks respectively, and a second data link communicatively coupled to the plurality of first data links. The data from a first memory block of the first array of memory blocks can be transferable to a second memory block of the second array of memory blocks via the plurality of first data links and the second data link.
    Type: Application
    Filed: November 19, 2019
    Publication date: May 20, 2021
    Inventors: Minxuan ZHOU, Weifeng ZHANG, Guoyang CHEN
  • Patent number: 10996976
    Abstract: The present disclosure relates to computer-implemented systems and methods for scheduling a neural network for execution. In one implementation, a system for scheduling a neural network for execution may include at least one memory storing instructions and at least one processor configured to execute the instructions to determine a profile for one or more applications co-scheduled with at least one neural network; determine a batch size for the at least one neural network based on the determined profile for the one or more applications; and scheduling the one or more applications and the at least one neural network based on the batch size.
    Type: Grant
    Filed: April 5, 2019
    Date of Patent: May 4, 2021
    Assignee: ALIBABA GROUP HOLDING LIMITED
    Inventors: Shuai Che, Guoyang Chen, Yingmin Li
  • Publication number: 20200320395
    Abstract: A method for training a machine learning model, including acquiring an initial machine learning model, updating features of the initial machine learning model, updating dimension of the initial machine learning model based on the updated features of the initial machine learning model and one or more latency hysteresis points obtained based on a hardware profile of an accelerator configured to perform machine learning operations, and generating a final machine learning model based on the updated dimensions.
    Type: Application
    Filed: April 3, 2019
    Publication date: October 8, 2020
    Inventors: Hongxu YIN, Weifeng ZHANG, Guoyang CHEN
  • Publication number: 20200319919
    Abstract: The present disclosure relates to computer-implemented systems and methods for scheduling a neural network for execution. In one implementation, a system for scheduling a neural network for execution may include at least one memory storing instructions and at least one processor configured to execute the instructions to determine a profile for one or more applications co-scheduled with at least one neural network; determine a batch size for the at least one neural network based on the determined profile for the one or more applications; and scheduling the one or more applications and the at least one neural network based on the batch size.
    Type: Application
    Filed: April 5, 2019
    Publication date: October 8, 2020
    Inventors: Shuai CHE, Guoyang CHEN, Yingmin LI
  • Publication number: 20200175361
    Abstract: Systems and methods are provided for improving the learning inference performance by partitioning the learning inference based on system fluctuations and available resources by parsing a trained neural network model of a neural network into a data flow graph with a plurality of nodes; generating a traversal order of the data flow graph; assigning a load level range to each edge device, an interconnect connecting the edge device and a cloud computing platform, and the cloud computing platform; profiling performance of each node over the load level range for the edge device and the cloud computing platform; and determining a partition point of the data flow graph based on the profiled performance of each node. By using a lookup table storing the profiled performance, the data flow diagram may be readily re-partitioned as needed for improving performance.
    Type: Application
    Filed: November 30, 2018
    Publication date: June 4, 2020
    Inventors: Shuai Che, Guoyang Chen, Yingmin Li
  • Publication number: 20200117978
    Abstract: The present disclosure relates to computer-implemented systems and methods for efficiently mapping neural networks to programmable logic devices (PLDs). In one implementation, a method for mapping a neural network to an FPGA may include receiving a data structure defining an architecture of the PLD; receiving a data structure defining an architecture of the neural network; partitioning the architecture of the PLD into a plurality of layers, each layer having a starting primitive adjacent to a first off-chip buffer and an ending primitive adjacent to a second off-chip buffer; mapping the architecture of the neural network onto one or more of the plurality of layers such that a data transfer size is at least locally minimized; scheduling the mapped architecture of the neural network for execution on the one or more of the plurality of layers; and outputting an execution sequence based on the scheduled and mapped architecture of the neural network.
    Type: Application
    Filed: October 12, 2018
    Publication date: April 16, 2020
    Inventors: Guoyang CHEN, Weifeng ZHANG