Patents by Inventor Guoyang CHEN

Guoyang CHEN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHOD AND DEVICE FOR MATRIX MULTIPLICATION OPTIMIZATION USING VECTOR REGISTERS

Publication number: 20240176845

Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.

Type: Application

Filed: February 6, 2024

Publication date: May 30, 2024

Inventors: Guoyang CHEN, Yu PU, Yongzhi ZHANG, Weifeng ZHANG, Yuan XIE
METHOD AND DEVICE FOR REDUCING A SIZE OF A NEURAL NETWORK MODEL

Publication number: 20240160933

Abstract: Methods and apparatus for reducing a size of a neural network model, the method including: compressing data of the neural network model; identifying structure information of a vector register, wherein the structure information includes a number of registers included in the vector register; comparing a number of elements in the compressed data with a first condition, wherein the first condition is determined based on the number of registers in the vector register; and in response to the number of elements satisfying the first condition, associating the compressed data with the vector register to enable loading the compressed data to the vector register.

Type: Application

Filed: January 23, 2024

Publication date: May 16, 2024

Inventors: Weifeng ZHANG, Guoyang CHEN, Yu PU, Yongzhi ZHANG, Yuan XIE
Method and device for matrix multiplication optimization using vector registers

Patent number: 11921814

Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.

Type: Grant

Filed: June 14, 2022

Date of Patent: March 5, 2024

Assignee: Alibaba Group Holding Limited

Inventors: Guoyang Chen, Yu Pu, Yongzhi Zhang, Weifeng Zhang, Yuan Xie
Method and device for reducing a size of a neural network model

Patent number: 11915138

Abstract: Methods and apparatus for reducing a size of a neural network model, the method including: compressing data of the neural network model; identifying structure information of a vector register, wherein the structure information includes a number of registers included in the vector register; comparing a number of elements in the compressed data with a first condition, wherein the first condition is determined based on the number of registers in the vector register; and in response to the number of elements satisfying the first condition, associating the compressed data with the vector register to enable loading the compressed data to the vector register.

Type: Grant

Filed: February 18, 2020

Date of Patent: February 27, 2024

Assignee: Alibaba Group Holding Limited

Inventors: Weifeng Zhang, Guoyang Chen, Yu Pu, Yongzhi Zhang, Yuan Xie
Data layout optimization on processing in memory architecture for executing neural network model

Patent number: 11669443

Abstract: The present disclosure relates to a method for scheduling a computation graph on a processing in memory (PIM) enabled device comprising a memory block assembly. The method comprises allocating a first node of the computation graph on a first memory block of a first array of memory blocks in the memory block assembly and allocating a second node of the computation graph on a second memory block of a second array of memory blocks in the memory block assembly, wherein output data of the first node is used for executing the second node. The memory block assembly can be configured to support data transfer from the first memory block to the second memory block via an internal data coupling in the memory block assembly.

Type: Grant

Filed: January 17, 2020

Date of Patent: June 6, 2023

Assignee: Alibaba Group Holding Limited

Inventors: Minxuan Zhou, Guoyang Chen, Weifeng Zhang
METHOD AND DEVICE FOR MATRIX MULTIPLICATION OPTIMIZATION USING VECTOR REGISTERS

Publication number: 20220300577

Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.

Type: Application

Filed: June 14, 2022

Publication date: September 22, 2022

Inventors: Guoyang CHEN, Yu PU, Yongzhi ZHANG, Weifeng ZHANG, Yuan XIE
Method and device for matrix multiplication optimization using vector registers

Patent number: 11366875

Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.

Type: Grant

Filed: March 13, 2020

Date of Patent: June 21, 2022

Assignee: ALIBABA GROUP HOLDING LIMITED

Inventors: Guoyang Chen, Yu Pu, Yongzhi Zhang, Weifeng Zhang, Yuan Xie
System and method for allocating memory space

Patent number: 11263131

Abstract: Embodiments of the disclosure provide systems and methods for allocating memory space in a memory device. The system can include: a memory device for providing the memory space; and a compiler component configured for: receiving a request for allocating a data array having a plurality of data elements in the memory device, wherein each of the plurality of data elements has a logical address; generating an instruction for allocating memory space for the data array in the memory device based on the request; generating device addresses for the plurality of data elements in the memory device based on logical addresses of the plurality of data elements; and allocating the memory space for the data array in the memory device based on the device addresses and the instruction.

Type: Grant

Filed: April 8, 2020

Date of Patent: March 1, 2022

Assignee: ALIBABA GROUP HOLDING LIMITED

Inventors: Shuangchen Li, Dimin Niu, Fei Sun, Jingjun Chu, Hongzhong Zheng, Guoyang Chen, Yingmin Li, Weifeng Zhang, Xipeng Shen
SYSTEM AND METHOD FOR ALLOCATING MEMORY SPACE

Publication number: 20210318955

Abstract: Embodiments of the disclosure provide systems and methods for allocating memory space in a memory device. The system can include: a memory device for providing the memory space; and a compiler component configured for: receiving a request for allocating a data array having a plurality of data elements in the memory device, wherein each of the plurality of data elements has a logical address; generating an instruction for allocating memory space for the data array in the memory device based on the request; generating device addresses for the plurality of data elements in the memory device based on logical addresses of the plurality of data elements; and allocating the memory space for the data array in the memory device based on the device addresses and the instruction.

Type: Application

Filed: April 8, 2020

Publication date: October 14, 2021

Inventors: Shuangchen LI, Dimin NIU, Fei SUN, Jingjun CHU, Hongzhong ZHENG, Guoyang CHEN, Yingmin LI, Weifeng ZHANG, Xipeng SHEN
METHOD AND DEVICE FOR MATRIX MULTIPLICATION OPTIMIZATION USING VECTOR REGISTERS

Publication number: 20210286860

Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.

Type: Application

Filed: March 13, 2020

Publication date: September 16, 2021

Inventors: Guoyang CHEN, Yu PU, Yongzhi ZHANG, Weifeng ZHANG, Yuan XIE
METHOD AND DEVICE FOR REDUCING A SIZE OF A NEURAL NETWORK MODEL

Publication number: 20210256380

Abstract: Methods and apparatus for reducing a size of a neural network model, the method including: compressing data of the neural network model; identifying structure information of a vector register, wherein the structure information includes a number of registers included in the vector register; comparing a number of elements in the compressed data with a first condition, wherein the first condition is determined based on the number of registers in the vector register; and in response to the number of elements satisfying the first condition, associating the compressed data with the vector register to enable loading the compressed data to the vector register.

Type: Application

Filed: February 18, 2020

Publication date: August 19, 2021

Inventors: Weifeng ZHANG, Guoyang CHEN, Yu PU, Yongzhi ZHANG, Yuan XIE
DATA LAYOUT OPTIMIZATION ON PROCESSING IN MEMORY ARCHITECTURE FOR EXECUTING NEURAL NETWORK MODEL

Publication number: 20210224185

Abstract: The present disclosure relates to a method for scheduling a computation graph on a processing in memory (PIM) enabled device comprising a memory block assembly. The method comprises allocating a first node of the computation graph on a first memory block of a first array of memory blocks in the memory block assembly and allocating a second node of the computation graph on a second memory block of a second array of memory blocks in the memory block assembly, wherein output data of the first node is used for executing the second node. The memory block assembly can be configured to support data transfer from the first memory block to the second memory block via an internal data coupling in the memory block assembly.

Type: Application

Filed: January 17, 2020

Publication date: July 22, 2021

Inventors: Minxuan Zhou, Guoyang Chen, Weifeng Zhang
DATA LAYOUT CONSCIOUS PROCESSING IN MEMORY ARCHITECTURE FOR EXECUTING NEURAL NETWORK MODEL

Publication number: 20210150311

Abstract: The present disclosure relates to a processing in memory (PIM) enabled device for executing a neural network model. The PIM enabled device comprises a memory block assembly comprising a first array of memory blocks, a second array of memory blocks adjacent to the first array of memory blocks, a plurality of first data links associated with the first array of memory blocks and the second array of memory blocks, wherein each data link of the plurality of first data links communicatively couples two corresponding memory blocks of which are from the first array of memory blocks and the second array of memory blocks respectively, and a second data link communicatively coupled to the plurality of first data links. The data from a first memory block of the first array of memory blocks can be transferable to a second memory block of the second array of memory blocks via the plurality of first data links and the second data link.

Type: Application

Filed: November 19, 2019

Publication date: May 20, 2021

Inventors: Minxuan ZHOU, Weifeng ZHANG, Guoyang CHEN
Systems and methods for scheduling neural networks by varying batch sizes

Patent number: 10996976

Abstract: The present disclosure relates to computer-implemented systems and methods for scheduling a neural network for execution. In one implementation, a system for scheduling a neural network for execution may include at least one memory storing instructions and at least one processor configured to execute the instructions to determine a profile for one or more applications co-scheduled with at least one neural network; determine a batch size for the at least one neural network based on the determined profile for the one or more applications; and scheduling the one or more applications and the at least one neural network based on the batch size.

Type: Grant

Filed: April 5, 2019

Date of Patent: May 4, 2021

Assignee: ALIBABA GROUP HOLDING LIMITED

Inventors: Shuai Che, Guoyang Chen, Yingmin Li
METHODS AND DEVICES FOR OPTIMIZING MACHINE LEARNING MODEL COMPACTNESS AND ACCURACY THROUGH HARDWARE LATENCY HYSTERESIS EFFECT

Publication number: 20200320395

Abstract: A method for training a machine learning model, including acquiring an initial machine learning model, updating features of the initial machine learning model, updating dimension of the initial machine learning model based on the updated features of the initial machine learning model and one or more latency hysteresis points obtained based on a hardware profile of an accelerator configured to perform machine learning operations, and generating a final machine learning model based on the updated dimensions.

Type: Application

Filed: April 3, 2019

Publication date: October 8, 2020

Inventors: Hongxu YIN, Weifeng ZHANG, Guoyang CHEN
SYSTEMS AND METHODS FOR SCHEDULING NEURAL NETWORKS BY VARYING BATCH SIZES

Publication number: 20200319919

Abstract: The present disclosure relates to computer-implemented systems and methods for scheduling a neural network for execution. In one implementation, a system for scheduling a neural network for execution may include at least one memory storing instructions and at least one processor configured to execute the instructions to determine a profile for one or more applications co-scheduled with at least one neural network; determine a batch size for the at least one neural network based on the determined profile for the one or more applications; and scheduling the one or more applications and the at least one neural network based on the batch size.

Type: Application

Filed: April 5, 2019

Publication date: October 8, 2020

Inventors: Shuai CHE, Guoyang CHEN, Yingmin LI
PARTITIONING OF DEEP LEARNING INFERENCE WITH DYNAMIC OFFLOADING

Publication number: 20200175361

Abstract: Systems and methods are provided for improving the learning inference performance by partitioning the learning inference based on system fluctuations and available resources by parsing a trained neural network model of a neural network into a data flow graph with a plurality of nodes; generating a traversal order of the data flow graph; assigning a load level range to each edge device, an interconnect connecting the edge device and a cloud computing platform, and the cloud computing platform; profiling performance of each node over the load level range for the edge device and the cloud computing platform; and determining a partition point of the data flow graph based on the profiled performance of each node. By using a lookup table storing the profiled performance, the data flow diagram may be readily re-partitioned as needed for improving performance.

Type: Application

Filed: November 30, 2018

Publication date: June 4, 2020

Inventors: Shuai Che, Guoyang Chen, Yingmin Li
SYSTEMS AND METHODS FOR EFFICIENTLY MAPPING NEURAL NETWORKS TO PROGRAMMABLE LOGIC DEVICES

Publication number: 20200117978

Abstract: The present disclosure relates to computer-implemented systems and methods for efficiently mapping neural networks to programmable logic devices (PLDs). In one implementation, a method for mapping a neural network to an FPGA may include receiving a data structure defining an architecture of the PLD; receiving a data structure defining an architecture of the neural network; partitioning the architecture of the PLD into a plurality of layers, each layer having a starting primitive adjacent to a first off-chip buffer and an ending primitive adjacent to a second off-chip buffer; mapping the architecture of the neural network onto one or more of the plurality of layers such that a data transfer size is at least locally minimized; scheduling the mapped architecture of the neural network for execution on the one or more of the plurality of layers; and outputting an execution sequence based on the scheduled and mapped architecture of the neural network.

Type: Application

Filed: October 12, 2018

Publication date: April 16, 2020

Inventors: Guoyang CHEN, Weifeng ZHANG