Patents by Inventor Hongzhong Zheng

Hongzhong Zheng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240005127
    Abstract: This application describes systems and methods for facilitating memory access for graph neural network (GNN) processing. An example system includes a plurality of processing units, each configured to perform graph neural network (GNN) processing; and a plurality of memory extension cards, each configured to store graph data for the GNN processing, wherein: each of the plurality of processing units is communicatively coupled with three other processing units via one or more interconnects respectively; the plurality of processing units are communicatively coupled with the plurality of memory extension cards respectively; and each of the plurality of memory extension cards includes a graphic access engine circuitry configured to acceleratre GNN memory access.
    Type: Application
    Filed: November 28, 2022
    Publication date: January 4, 2024
    Inventors: Yijin GUAN, Dimin NIU, Shengcheng WANG, Shuangchen LI, Hongzhong ZHENG
  • Publication number: 20240004954
    Abstract: This application describes an hardware acceleration design for improving SpGEMM efficiency. An exemplary method may include: obtaining a first sparse matrix and a second sparse matrix for performing SpGEMM; allocating a pair of buffers respectively pointed by a first pointer and a second pointer; for each first row in the first sparse matrix that comprises a plurality of non-zero elements, identifying a plurality of second rows in the second sparse matrix that correspond to the plurality of non-zero elements; obtaining a plurality of intermediate lists computed based on each of the plurality of non-zero elements in the first row and one of the plurality of second rows that corresponds to the non-zero element; performing accumulation of the intermediate lists using the pair of buffers; and migrating the one final merged list to a system memory as a row of an output matrix of the SpGEMM.
    Type: Application
    Filed: November 1, 2022
    Publication date: January 4, 2024
    Inventors: Zhaoyang DU, Yijin GUAN, Dimin NIU, Hongzhong ZHENG
  • Publication number: 20240004824
    Abstract: This application describes systems and methods for facilitating memory access for graph neural network (GNN) processing. An example method includes fetching, by an access engine circuitry implemented on a circuitry board, a portion of structure data of a graph from a pinned memory in a host memory of a host via a first peripheral component interconnect express (PCIe) connection; performing node sampling using the fetched portion of the structure data of the graph to select one or more sampled nodes; fetching, by the access engine circuitry, a portion of attribute data of the graph from the pinned memory via the first PCIe connection; sending the fetched portion of the attribute data of the graph to one or more processors; and performing, by the one or more processors, GNN processing for the graph using the fetched portion of the attribute data of the graph.
    Type: Application
    Filed: November 30, 2022
    Publication date: January 4, 2024
    Inventors: Shuangchen LI, Dimin NIU, Hongzhong ZHENG, Zhe ZHANG, Yuhao WANG
  • Publication number: 20240005075
    Abstract: This application describes systems and methods for facilitating memory access for graph neural network (GNN) processing. An example method includes fetching, by an access engine circuitry implemented on a circuitry board, a portion of structure data of a graph from one or more of a plurality of flash memory drives implemented on the circuitry board; performing node sampling using the fetched portion of the structure data of the graph to select one or more sampled nodes; fetching a portion of attribute data of the graph from two or more of the plurality of memory drives in parallel according to the selected one or more sampled nodes; sending the fetched portion of the attribute data of the graph to a host outside of the circuitry board; and performing, by the host, GNN processing for the graph using the fetched portion of the attribute data of the graph.
    Type: Application
    Filed: November 30, 2022
    Publication date: January 4, 2024
    Inventors: Shuangchen LI, Dimin NIU, Hongzhong ZHENG
  • Publication number: 20240004547
    Abstract: A 3D-stacked memory device including: a base die including a plurality of switches to direct data flow and a plurality of arithmetic logic units (ALUs) to compute data; a plurality of memory dies stacked on the base die; and an interface to transfer signals to control the base die.
    Type: Application
    Filed: September 15, 2023
    Publication date: January 4, 2024
    Inventors: Mu-Tien Chang, Prasun Gera, Dimin Niu, Hongzhong Zheng
  • Patent number: 11847049
    Abstract: The total memory space that is logically available to a processor in a general-purpose graphics processing unit (GPGPU) module is increased to accommodate terabyte-sized amounts of data by utilizing the memory space in an external memory module, and by further utilizing a portion of the memory space in a number of other external memory modules.
    Type: Grant
    Filed: January 21, 2022
    Date of Patent: December 19, 2023
    Assignee: Alibaba Damo (Hangzhou) Technology Co., Ltd
    Inventors: Yuhao Wang, Dimin Niu, Yijin Guan, Shengcheng Wang, Shuangchen Li, Hongzhong Zheng
  • Patent number: 11841799
    Abstract: This application describes a hardware accelerator, a computer system, and a method for accelerating Graph Neural Network (GNN) node attribute fetching. The hardware accelerator comprises a GNN attribute processor; and a first memory, wherein the GNN attribute processor is configured to: receive a graph node identifier; determine a target memory address within the first memory based on the graph node identifier; determine, based on the received graph node identifier, whether attribute data corresponding to the received graph node identifier is cached in the first memory at the target memory address; and in response to determining that the attribute data is not cached in the first memory: fetch the attribute data from a second memory, and write the fetched attribute data into the first memory at the target memory address.
    Type: Grant
    Filed: January 21, 2022
    Date of Patent: December 12, 2023
    Assignee: T-Head (Shanghai) Semiconductor Co., Ltd.
    Inventors: Tianchan Guan, Heng Liu, Shuangchen Li, Hongzhong Zheng
  • Publication number: 20230393851
    Abstract: A number of domain specific accelerators (DSA1-DSAn) are integrated into a conventional processing system (100) to operate on the same chip by adding additional instructions to a conventional instruction set architecture (ISA), and further adding an accelerator interface unit (130) to the processing system (100) to respond to the additional instructions and interact with the DSAs.
    Type: Application
    Filed: June 20, 2023
    Publication date: December 7, 2023
    Inventors: Yuhao WANG, Zhaoyang DU, Yen-kuang CHEN, Wei HAN, Shuangchen LI, Fei XUE, Hongzhong ZHENG
  • Patent number: 11836188
    Abstract: A programmable device receives commands from a processor and, based on the commands: identifies a root node in a graph; identifies nodes in the graph that are neighbors of the root node; identifies nodes in the graph that are neighbors of the neighbors; retrieves data associated with the root node; retrieves data associated with at least a subset of the nodes that are neighbors of the root node and that are neighbors of the neighbor nodes; and writes the data that is retrieved into a memory.
    Type: Grant
    Filed: January 21, 2022
    Date of Patent: December 5, 2023
    Assignee: Alibaba Damo (Hangzhou) Technology Co., Ltd
    Inventors: Shuangchen Li, Tianchan Guan, Zhe Zhang, Heng Liu, Wei Han, Dimin Niu, Hongzhong Zheng
  • Patent number: 11789610
    Abstract: A 3D-stacked memory device including: a base die including a plurality of switches to direct data flow and a plurality of arithmetic logic units (ALUs) to compute data; a plurality of memory dies stacked on the base die; and an interface to transfer signals to control the base die.
    Type: Grant
    Filed: June 21, 2021
    Date of Patent: October 17, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Mu-Tien Chang, Prasun Gera, Dimin Niu, Hongzhong Zheng
  • Publication number: 20230326905
    Abstract: Aspects of the present technology are directed toward three-dimensional (3D) stacked processing systems characterized by high memory capacity, high memory bandwidth, low power consumption and small form factor. The 3D stacked processing systems include a plurality of processor chiplets and input/output circuits directly coupled to each of the plurality of processor chiplets.
    Type: Application
    Filed: September 17, 2020
    Publication date: October 12, 2023
    Inventors: Dimin NIU, Wei HAN, Tianchan GUAN, Yuhao WANG, Shuangchen LI, Hongzhong ZHENG
  • Patent number: 11775294
    Abstract: According to some example embodiments of the present disclosure, in a method for a memory lookup mechanism in a high-bandwidth memory system, the method includes: using a memory die to conduct a multiplication operation using a lookup table (LUT) methodology by accessing a LUT, which includes floating point operation results, stored on the memory die; sending, by the memory die, a result of the multiplication operation to a logic die including a processor and a buffer; and conducting, by the logic die, a matrix multiplication operation using computation units.
    Type: Grant
    Filed: November 30, 2021
    Date of Patent: October 3, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Peng Gu, Krishna T. Malladi, Hongzhong Zheng
  • Patent number: 11762776
    Abstract: The present application discloses a cache access method and an associated graph neural network system. The graph neural network processor is used for performing computation upon a graph neural network. The graph neural network is stored in the memory in compressed sparse row format. The method includes: receiving an address corresponding to a node of the graph neural network and a type of the address; in response to the type is one of a first type or a second type, performing lookup by comparing the address with a tag field of a degree lookup table to at least obtain a degree of the node; determining whether the degree is greater than a predetermined value to obtain a determination result; and determining whether to perform lookup on a region of the cache corresponding to the type according to the determination result.
    Type: Grant
    Filed: January 25, 2022
    Date of Patent: September 19, 2023
    Assignee: T-HEAD (SHANGHAI) SEMICONDUCTOR CO., LTD.
    Inventors: Zhe Zhang, Shuangchen Li, Hongzhong Zheng
  • Publication number: 20230289081
    Abstract: A storage device and method of controlling a storage device are disclosed. The storage device includes a host, a logic die, and a high bandwidth memory stack including a memory die. A computation lookup table is stored on a memory array of the memory die. The host sends a command to perform an operation utilizing a kernel and a plurality of input feature maps, includes finding the product of a weight of the kernel and values of multiple input feature maps. The computation lookup table includes a row corresponding to a weight of the kernel, and a column corresponding to a value of the input feature maps. A result value stored at a position corresponding to a row and a column is the product of the weight corresponding to the row and the value corresponding to the column.
    Type: Application
    Filed: May 11, 2023
    Publication date: September 14, 2023
    Inventors: Peng Gu, Krishna T. Malladi, Hongzhong Zheng
  • Patent number: 11755507
    Abstract: A method of transferring data between a memory controller and at least one memory module via a primary data bus having a primary data bus width is disclosed. The method includes accessing a first one of a memory device group via a corresponding data bus path in response to a threaded memory request from the memory controller. The accessing results in data groups collectively forming a first data thread transferred across a corresponding secondary data bus path. Transfer of the first data thread across the primary data bus width is carried out over a first time interval, while using less than the primary data transfer continuous throughput during that first time interval. During the first time interval, at least one data group from a second data thread is transferred on the primary data bus.
    Type: Grant
    Filed: May 13, 2022
    Date of Patent: September 12, 2023
    Assignee: Rambus Inc.
    Inventors: Hongzhong Zheng, Frederick A Ware
  • Publication number: 20230281124
    Abstract: Apparatus, method, and system provided herein are directed to prioritizing cache line writing of compressed data. The memory controller comprises a cache line compression engine that receives raw data, compresses the raw data, determines a compression rate between the raw data and the compressed data, determines whether the compression rate is greater than a predetermined rate, and outputs the compressed data as data-to-be-written if the compression rate is greater than the predetermined rate. In response to determining that the compression rate is greater than the predetermined rate, the cache line compression engine generates a compression signal indicating the data-to-be-written is the compressed data and sends the compression signal to a scheduler of a command queue in the memory controller where writing of compressed data is prioritized.
    Type: Application
    Filed: August 6, 2020
    Publication date: September 7, 2023
    Inventors: Dimin Niu, Tianchan Guan, Lide Duan, Hongzhong Zheng
  • Patent number: 11729268
    Abstract: Various embodiments of the present disclosure relate to a computer-implemented method, a system, and a storage medium, where a graph stored in a computing system is logically divided into subgraphs, the subgraphs are stored on different interconnected (or coupled) devices in the computing system, and nodes of the subgraphs include hub nodes connected to adjacent subgraphs. Each device stores attributes and node structure information of the hub nodes of the subgraphs into other devices, and software or hardware prefetch engine on the device prefetches attributes and node structure information associated with a sampled node. A prefetcher on a device interfacing with the interconnected (or coupled) devices may further prefetch attributes and node structure information of nodes of the subgraphs on other devices. A traffic monitor is provided on an interface device to monitor traffic. When the traffic is small, the interface device prefetches node attributes and node structure information.
    Type: Grant
    Filed: June 8, 2022
    Date of Patent: August 15, 2023
    Assignee: Alibaba (China) Co., Ltd.
    Inventors: Wei Han, Shuangcheng Li, Hongzhong Zheng, Yawen Zhang, Heng Liu, Dimin Niu
  • Publication number: 20230245711
    Abstract: The present invention provides systems and methods for efficiently and effectively priming and initializing a memory. In one embodiment, a memory controller includes a normal data path and a priming path. The normal data path directs storage operations during a normal memory read/write operation after power startup of a memory chip. The priming path includes a priming module, wherein the priming module directs memory priming operations during a power startup of the memory chip, including forwarding a priming pattern for storage in a write pattern mode register of a memory chip and selection of a memory address in the memory chip for initialization with the priming pattern. The priming pattern includes information corresponding to proper initial data values. The priming pattern can also include proper corresponding error correction code (ECC) values. The priming module can include a priming pattern register that stores the priming pattern.
    Type: Application
    Filed: January 19, 2021
    Publication date: August 3, 2023
    Inventors: Dimin NIU, Shuangchen LI, Tianchan GUAN, Hongzhong ZHENG
  • Publication number: 20230229555
    Abstract: A method of correcting a memory error of a dynamic random-access memory module (DRAM) using a double data rate (DDR) interface, the method includes conducting a memory transaction including multiple bursts with a memory controller to send data from data chips of the DRAM to the memory controller, detecting one or more errors using an ECC chip of the DRAM, determining a number of the bursts having the errors using the ECC chip of the DRAM, determining whether the number of the bursts having the errors is greater than a threshold number, determining a type of the errors, and directing the memory controller based on the determined type of the errors, wherein the DRAM includes a single ECC chip per memory channel.
    Type: Application
    Filed: March 28, 2023
    Publication date: July 20, 2023
    Inventors: Dimin NIU, Mu-Tien CHANG, Hongzhong ZHENG, Hyun-Joong KIM, Won-hyung SONG, Jangseok CHOI
  • Patent number: 11704271
    Abstract: A system-in-package architecture in accordance with aspects includes a logic die and one or more memory dice coupled together in a three-dimensional slack. The logic die can include one or more global building blocks and a plurality of local building blocks. The number of local building blocks can be scalable. The local building blocks can include a plurality of engines and memory controllers. The memory controllers can be configured to directly couple one or more of the engines to the one or more memory dice. The number and type of local building blocks, and the number and types of engines and memory controllers can be scalable.
    Type: Grant
    Filed: August 20, 2020
    Date of Patent: July 18, 2023
    Assignee: Alibaba Group Holding Limited
    Inventors: Lide Duan, Wei Han, Yuhao Wang, Fei Xue, Yuanwei Fang, Hongzhong Zheng