Patents by Inventor Guoyang CHEN
Guoyang CHEN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240176845Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.Type: ApplicationFiled: February 6, 2024Publication date: May 30, 2024Inventors: Guoyang CHEN, Yu PU, Yongzhi ZHANG, Weifeng ZHANG, Yuan XIE
-
Publication number: 20240160933Abstract: Methods and apparatus for reducing a size of a neural network model, the method including: compressing data of the neural network model; identifying structure information of a vector register, wherein the structure information includes a number of registers included in the vector register; comparing a number of elements in the compressed data with a first condition, wherein the first condition is determined based on the number of registers in the vector register; and in response to the number of elements satisfying the first condition, associating the compressed data with the vector register to enable loading the compressed data to the vector register.Type: ApplicationFiled: January 23, 2024Publication date: May 16, 2024Inventors: Weifeng ZHANG, Guoyang CHEN, Yu PU, Yongzhi ZHANG, Yuan XIE
-
Patent number: 11921814Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.Type: GrantFiled: June 14, 2022Date of Patent: March 5, 2024Assignee: Alibaba Group Holding LimitedInventors: Guoyang Chen, Yu Pu, Yongzhi Zhang, Weifeng Zhang, Yuan Xie
-
Patent number: 11915138Abstract: Methods and apparatus for reducing a size of a neural network model, the method including: compressing data of the neural network model; identifying structure information of a vector register, wherein the structure information includes a number of registers included in the vector register; comparing a number of elements in the compressed data with a first condition, wherein the first condition is determined based on the number of registers in the vector register; and in response to the number of elements satisfying the first condition, associating the compressed data with the vector register to enable loading the compressed data to the vector register.Type: GrantFiled: February 18, 2020Date of Patent: February 27, 2024Assignee: Alibaba Group Holding LimitedInventors: Weifeng Zhang, Guoyang Chen, Yu Pu, Yongzhi Zhang, Yuan Xie
-
Patent number: 11669443Abstract: The present disclosure relates to a method for scheduling a computation graph on a processing in memory (PIM) enabled device comprising a memory block assembly. The method comprises allocating a first node of the computation graph on a first memory block of a first array of memory blocks in the memory block assembly and allocating a second node of the computation graph on a second memory block of a second array of memory blocks in the memory block assembly, wherein output data of the first node is used for executing the second node. The memory block assembly can be configured to support data transfer from the first memory block to the second memory block via an internal data coupling in the memory block assembly.Type: GrantFiled: January 17, 2020Date of Patent: June 6, 2023Assignee: Alibaba Group Holding LimitedInventors: Minxuan Zhou, Guoyang Chen, Weifeng Zhang
-
Publication number: 20220300577Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.Type: ApplicationFiled: June 14, 2022Publication date: September 22, 2022Inventors: Guoyang CHEN, Yu PU, Yongzhi ZHANG, Weifeng ZHANG, Yuan XIE
-
Patent number: 11366875Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.Type: GrantFiled: March 13, 2020Date of Patent: June 21, 2022Assignee: ALIBABA GROUP HOLDING LIMITEDInventors: Guoyang Chen, Yu Pu, Yongzhi Zhang, Weifeng Zhang, Yuan Xie
-
Patent number: 11263131Abstract: Embodiments of the disclosure provide systems and methods for allocating memory space in a memory device. The system can include: a memory device for providing the memory space; and a compiler component configured for: receiving a request for allocating a data array having a plurality of data elements in the memory device, wherein each of the plurality of data elements has a logical address; generating an instruction for allocating memory space for the data array in the memory device based on the request; generating device addresses for the plurality of data elements in the memory device based on logical addresses of the plurality of data elements; and allocating the memory space for the data array in the memory device based on the device addresses and the instruction.Type: GrantFiled: April 8, 2020Date of Patent: March 1, 2022Assignee: ALIBABA GROUP HOLDING LIMITEDInventors: Shuangchen Li, Dimin Niu, Fei Sun, Jingjun Chu, Hongzhong Zheng, Guoyang Chen, Yingmin Li, Weifeng Zhang, Xipeng Shen
-
Publication number: 20210318955Abstract: Embodiments of the disclosure provide systems and methods for allocating memory space in a memory device. The system can include: a memory device for providing the memory space; and a compiler component configured for: receiving a request for allocating a data array having a plurality of data elements in the memory device, wherein each of the plurality of data elements has a logical address; generating an instruction for allocating memory space for the data array in the memory device based on the request; generating device addresses for the plurality of data elements in the memory device based on logical addresses of the plurality of data elements; and allocating the memory space for the data array in the memory device based on the device addresses and the instruction.Type: ApplicationFiled: April 8, 2020Publication date: October 14, 2021Inventors: Shuangchen LI, Dimin NIU, Fei SUN, Jingjun CHU, Hongzhong ZHENG, Guoyang CHEN, Yingmin LI, Weifeng ZHANG, Xipeng SHEN
-
Publication number: 20210286860Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.Type: ApplicationFiled: March 13, 2020Publication date: September 16, 2021Inventors: Guoyang CHEN, Yu PU, Yongzhi ZHANG, Weifeng ZHANG, Yuan XIE
-
Publication number: 20210256380Abstract: Methods and apparatus for reducing a size of a neural network model, the method including: compressing data of the neural network model; identifying structure information of a vector register, wherein the structure information includes a number of registers included in the vector register; comparing a number of elements in the compressed data with a first condition, wherein the first condition is determined based on the number of registers in the vector register; and in response to the number of elements satisfying the first condition, associating the compressed data with the vector register to enable loading the compressed data to the vector register.Type: ApplicationFiled: February 18, 2020Publication date: August 19, 2021Inventors: Weifeng ZHANG, Guoyang CHEN, Yu PU, Yongzhi ZHANG, Yuan XIE
-
Publication number: 20210224185Abstract: The present disclosure relates to a method for scheduling a computation graph on a processing in memory (PIM) enabled device comprising a memory block assembly. The method comprises allocating a first node of the computation graph on a first memory block of a first array of memory blocks in the memory block assembly and allocating a second node of the computation graph on a second memory block of a second array of memory blocks in the memory block assembly, wherein output data of the first node is used for executing the second node. The memory block assembly can be configured to support data transfer from the first memory block to the second memory block via an internal data coupling in the memory block assembly.Type: ApplicationFiled: January 17, 2020Publication date: July 22, 2021Inventors: Minxuan Zhou, Guoyang Chen, Weifeng Zhang
-
Publication number: 20210150311Abstract: The present disclosure relates to a processing in memory (PIM) enabled device for executing a neural network model. The PIM enabled device comprises a memory block assembly comprising a first array of memory blocks, a second array of memory blocks adjacent to the first array of memory blocks, a plurality of first data links associated with the first array of memory blocks and the second array of memory blocks, wherein each data link of the plurality of first data links communicatively couples two corresponding memory blocks of which are from the first array of memory blocks and the second array of memory blocks respectively, and a second data link communicatively coupled to the plurality of first data links. The data from a first memory block of the first array of memory blocks can be transferable to a second memory block of the second array of memory blocks via the plurality of first data links and the second data link.Type: ApplicationFiled: November 19, 2019Publication date: May 20, 2021Inventors: Minxuan ZHOU, Weifeng ZHANG, Guoyang CHEN
-
Patent number: 10996976Abstract: The present disclosure relates to computer-implemented systems and methods for scheduling a neural network for execution. In one implementation, a system for scheduling a neural network for execution may include at least one memory storing instructions and at least one processor configured to execute the instructions to determine a profile for one or more applications co-scheduled with at least one neural network; determine a batch size for the at least one neural network based on the determined profile for the one or more applications; and scheduling the one or more applications and the at least one neural network based on the batch size.Type: GrantFiled: April 5, 2019Date of Patent: May 4, 2021Assignee: ALIBABA GROUP HOLDING LIMITEDInventors: Shuai Che, Guoyang Chen, Yingmin Li
-
Publication number: 20200320395Abstract: A method for training a machine learning model, including acquiring an initial machine learning model, updating features of the initial machine learning model, updating dimension of the initial machine learning model based on the updated features of the initial machine learning model and one or more latency hysteresis points obtained based on a hardware profile of an accelerator configured to perform machine learning operations, and generating a final machine learning model based on the updated dimensions.Type: ApplicationFiled: April 3, 2019Publication date: October 8, 2020Inventors: Hongxu YIN, Weifeng ZHANG, Guoyang CHEN
-
Publication number: 20200319919Abstract: The present disclosure relates to computer-implemented systems and methods for scheduling a neural network for execution. In one implementation, a system for scheduling a neural network for execution may include at least one memory storing instructions and at least one processor configured to execute the instructions to determine a profile for one or more applications co-scheduled with at least one neural network; determine a batch size for the at least one neural network based on the determined profile for the one or more applications; and scheduling the one or more applications and the at least one neural network based on the batch size.Type: ApplicationFiled: April 5, 2019Publication date: October 8, 2020Inventors: Shuai CHE, Guoyang CHEN, Yingmin LI
-
Publication number: 20200175361Abstract: Systems and methods are provided for improving the learning inference performance by partitioning the learning inference based on system fluctuations and available resources by parsing a trained neural network model of a neural network into a data flow graph with a plurality of nodes; generating a traversal order of the data flow graph; assigning a load level range to each edge device, an interconnect connecting the edge device and a cloud computing platform, and the cloud computing platform; profiling performance of each node over the load level range for the edge device and the cloud computing platform; and determining a partition point of the data flow graph based on the profiled performance of each node. By using a lookup table storing the profiled performance, the data flow diagram may be readily re-partitioned as needed for improving performance.Type: ApplicationFiled: November 30, 2018Publication date: June 4, 2020Inventors: Shuai Che, Guoyang Chen, Yingmin Li
-
Publication number: 20200117978Abstract: The present disclosure relates to computer-implemented systems and methods for efficiently mapping neural networks to programmable logic devices (PLDs). In one implementation, a method for mapping a neural network to an FPGA may include receiving a data structure defining an architecture of the PLD; receiving a data structure defining an architecture of the neural network; partitioning the architecture of the PLD into a plurality of layers, each layer having a starting primitive adjacent to a first off-chip buffer and an ending primitive adjacent to a second off-chip buffer; mapping the architecture of the neural network onto one or more of the plurality of layers such that a data transfer size is at least locally minimized; scheduling the mapped architecture of the neural network for execution on the one or more of the plurality of layers; and outputting an execution sequence based on the scheduled and mapped architecture of the neural network.Type: ApplicationFiled: October 12, 2018Publication date: April 16, 2020Inventors: Guoyang CHEN, Weifeng ZHANG