Patents by Inventor Hongzhong Zheng

Hongzhong Zheng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11556476
    Abstract: A method of processing in-memory commands in a high-bandwidth memory (HBM) system includes sending a function-in-HBM instruction to the HBM by a HBM memory controller of a GPU. A logic component of the HBM receives the FIM instruction and coordinates the instructions execution using the controller, an ALU, and a SRAM located on the logic component.
    Type: Grant
    Filed: December 14, 2020
    Date of Patent: January 17, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Mu-Tien Chang, Krishna T. Malladi, Dimin Niu, Hongzhong Zheng
  • Patent number: 11544189
    Abstract: Embodiments of the disclosure provide methods and systems for memory management. The method can include: receiving a request for allocating target node data to a memory space, wherein the memory space includes a buffer and an external memory and the target node data comprises property data and structural data and represents a target node of a graph having a plurality of nodes and edges; determining a node degree associated with the target node data; allocating the target node data to the memory space based on the determined node degree.
    Type: Grant
    Filed: February 12, 2020
    Date of Patent: January 3, 2023
    Assignee: Alibaba Group Holding Limited
    Inventors: Jilan Lin, Shuangchen Li, Dimin Niu, Hongzhong Zheng
  • Publication number: 20220417324
    Abstract: Various embodiments of the present disclosure relate to a computer-implemented method, a system, and a storage medium, where a graph stored in a computing system is logically divided into subgraphs, the subgraphs are stored on different interconnected (or coupled) devices in the computing system, and nodes of the subgraphs include hub nodes connected to adjacent subgraphs. Each device stores attributes and node structure information of the hub nodes of the subgraphs into other devices, and software or hardware prefetch engine on the device prefetches attributes and node structure information associated with a sampled node. A prefetcher on a device interfacing with the interconnected (or coupled) devices may further prefetch attributes and node structure information of nodes of the subgraphs on other devices. A traffic monitor is provided on an interface device to monitor traffic. When the traffic is small, the interface device prefetches node attributes and node structure information.
    Type: Application
    Filed: June 8, 2022
    Publication date: December 29, 2022
    Inventors: Wei HAN, Shuangcheng LI, Hongzhong ZHENG, Yawen ZHANG, Heng LIU, Dimin NIU
  • Publication number: 20220414030
    Abstract: A high-bandwidth memory (HBM) includes a memory and a controller. The controller receives a data write request from a processor external to the HBM and the controller stores an entry in the memory indicating at least one address of data of the data write request and generates an indication that a data bus is available for an operation during a cycle time of the data write request based on the data write request comprising sparse data or data-value similarity. Sparse data includes a predetermined percentage of data values equal to zero, and data-value similarity includes a predetermined amount of spatial value locality of the data values. The predetermined percentage of data values equal to zero of sparse data and the predetermined amount of spatial value locality of the special-value pattern are both based on a predetermined data granularity.
    Type: Application
    Filed: September 1, 2022
    Publication date: December 29, 2022
    Inventors: Krishna T. MALLADI, Dimin NIU, Hongzhong ZHENG
  • Patent number: 11537531
    Abstract: The disclosed embodiments relate to a computer system with a cache memory that supports tagless addressing. During operation, the system receives a request to perform a memory access, wherein the request includes a virtual address. In response to the request, the system performs an address-translation operation, which translates the virtual address into both a physical address and a cache address. Next, the system uses the physical address to access one or more levels of physically addressed cache memory, wherein accessing a given level of physically addressed cache memory involves performing a tag-checking operation based on the physical address. If the access to the one or more levels of physically addressed cache memory fails to hit on a cache line for the memory access, the system uses the cache address to directly index a cache memory, wherein directly indexing the cache memory does not involve performing a tag-checking operation and eliminates the tag storage overhead.
    Type: Grant
    Filed: December 8, 2020
    Date of Patent: December 27, 2022
    Assignee: Rambus Inc.
    Inventors: Hongzhong Zheng, Trung A. Diep
  • Publication number: 20220391332
    Abstract: A memory module includes at least two memory devices. Each of the memory devices perform verify operations after attempted writes to their respective memory cores. When a write is unsuccessful, each memory device stores information about the unsuccessful write in an internal write retry buffer. The write operations may have only been unsuccessful for one memory device and not any other memory devices on the memory module. When the memory module is instructed, both memory devices on the memory module can retry the unsuccessful memory write operations concurrently. Both devices can retry these write operations concurrently even though the unsuccessful memory write operations were to different addresses.
    Type: Application
    Filed: June 28, 2022
    Publication date: December 8, 2022
    Inventors: Hongzhong ZHENG, Brent HAUKNESS
  • Patent number: 11513965
    Abstract: A high bandwidth memory system. In some embodiments, the system includes: a memory stack having a plurality of memory dies and eight 128-bit channels; and a logic die, the memory dies being stacked on, and connected to, the logic die; wherein the logic die may be configured to operate a first channel of the 128-bit channels in: a first mode, in which a first 64 bits operate in pseudo-channel mode, and a second 64 bits operate as two 32-bit fine-grain channels, or a second mode, in which the first 64 bits operate as two 32-bit fine-grain channels, and the second 64 bits operate as two 32-bit fine-grain channels.
    Type: Grant
    Filed: January 22, 2021
    Date of Patent: November 29, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Krishna T. Malladi, Mu-Tien Chang, Dimin Niu, Hongzhong Zheng
  • Publication number: 20220367412
    Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.
    Type: Application
    Filed: July 25, 2022
    Publication date: November 17, 2022
    Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG
  • Patent number: 11500781
    Abstract: A cache memory includes cache lines to store information. The stored information is associated with physical addresses that include first, second, and third distinct portions. The cache lines are indexed by the second portions of respective physical addresses associated with the stored information. The cache memory also includes one or more tables, each of which includes respective table entries that are indexed by the first portions of the respective physical addresses. The respective table entries in each of the one or more tables are to store indications of the second portions of respective physical addresses associated with the stored information.
    Type: Grant
    Filed: November 30, 2020
    Date of Patent: November 15, 2022
    Assignee: RAMBUS INC.
    Inventors: Trung Diep, Hongzhong Zheng
  • Publication number: 20220358060
    Abstract: A memory module that includes a non-volatile memory and an asynchronous memory interface to interface with a memory controller is presented. The asynchronous memory interface may use repurposed pins of a double data rate (DDR) memory channel to send an asynchronous data to the memory controller. The asynchronous data may be device feedback indicating a status of the non-volatile memory.
    Type: Application
    Filed: July 25, 2022
    Publication date: November 10, 2022
    Inventors: Dimin NIU, Mu-Tien CHANG, Hongzhong ZHENG, Sun Young LIM, lndong KIM, Jangseok CHOI, Craig HANSON
  • Publication number: 20220350526
    Abstract: The presented systems enable efficient and effective network communications. The presented systems enable efficient and effective network communications. In one embodiment a memory device includes a memory module, including a plurality of memory chips configured to store information; and an inter-chip network (ICN)/shared smart memory extension (SMX) memory interface controller (ICN/SMX memory interface controller) configured to interface between the memory module and an inter-chip network (ICN), wherein the ICN is configured to communicatively couple the memory device to a parallel processing unit (PPU). In one exemplary implementation, the ICN/SMX memory controller includes a plurality of package buffers, an ICN physical layer interface, a PRC/MAC interface, and a switch. The memory device and be a memory card including memory module (e.g., DDR DIMM, etc.).
    Type: Application
    Filed: July 15, 2022
    Publication date: November 3, 2022
    Inventors: Dimin NIU, Yijin GUAN, Shengcheng WANG, Yuhao WANG, Shuangchen LI, Hongzhong ZHENG
  • Patent number: 11487676
    Abstract: A memory system includes an address mapping circuit. The address mapping circuit receives an input memory address having a first set of address bits. The address mapping circuit applies a logic function to the input memory address to generate a mapped memory address. The logic function uses at least a subset of the first set of address bits in two separate operations that respectively determine two portions of the mapped memory address.
    Type: Grant
    Filed: November 19, 2020
    Date of Patent: November 1, 2022
    Assignee: Rambus Inc.
    Inventors: Hongzhong Zheng, James Tringali
  • Publication number: 20220343146
    Abstract: This application describes a hardware accelerator, a computer system and a method for accelerating temporal graphic neural networks computations. An exemplary hardware accelerator comprises: a key-graph memory configured to store a key graph; a nodes classification circuit configured to: fetch the key graph from the key-graph memory; receive a current graph for performing temporal GNN computation with the key graph; and identify one or more nodes of the current graph based on a comparison between the key graph and the current graph; and a nodes reconstruction circuit configured to: perform spatial computations on the one or more nodes identified by the node classification circuit to obtain updated nodes; generate an updated key graph based on the key graph and the updated nodes; and store the updated key graph in the key-graph memory for processing a next graph.
    Type: Application
    Filed: April 23, 2021
    Publication date: October 27, 2022
    Inventors: Fei XUE, Yangjie ZHOU, Hongzhong ZHENG
  • Publication number: 20220342834
    Abstract: A method of transferring data between a memory controller and at least one memory module via a primary data bus having a primary data bus width is disclosed. The method includes accessing a first one of a memory device group via a corresponding data bus path in response to a threaded memory request from the memory controller. The accessing results in data groups collectively forming a first data thread transferred across a corresponding secondary data bus path. Transfer of the first data thread across the primary data bus width is carried out over a first time interval, while using less than the primary data transfer continuous throughput during that first time interval. During the first time interval, at least one data group from a second data thread is transferred on the primary data bus.
    Type: Application
    Filed: May 13, 2022
    Publication date: October 27, 2022
    Inventors: Hongzhong Zheng, Frederick A. Ware
  • Publication number: 20220343145
    Abstract: This application describes a hardware accelerator, a computer system, and a method for accelerating Graph Neural Network (GNN) computations. The hardware accelerator comprises a matrix partitioning circuit configured to partition an adjacency matrix of an input graph for GNN computations into a plurality of sub-matrices; a sub-matrix reordering circuit configured to reorder rows and columns of the plurality of sub-matrices; a tile partitioning circuit configured to divide the plurality of sub-matrices with reordered rows and columns into a plurality of tiles based on processing granularities of one or more processors; and a tile distributing circuit configured to distribute the plurality of tiles to the one or more processors for performing the GNN computations.
    Type: Application
    Filed: April 21, 2021
    Publication date: October 27, 2022
    Inventors: Fei XUE, Yangjie ZHOU, Hongzhong ZHENG
  • Patent number: 11475102
    Abstract: An adaptive matrix multiplier. In some embodiments, the matrix multiplier includes a first multiplying unit a second multiplying unit, a memory load circuit, and an outer buffer circuit. The first multiplying unit includes a first inner buffer circuit and a second inner buffer circuit, and the second multiplying unit includes a first inner buffer circuit and a second inner buffer circuit. The memory load circuit is configured to load data from memory, in a single burst of a burst memory access mode, into the first inner buffer circuit of the first multiplying unit; and into the first inner buffer circuit of the second multiplying unit.
    Type: Grant
    Filed: May 8, 2019
    Date of Patent: October 18, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Dongyan Jiang, Dimin Niu, Hongzhong Zheng
  • Publication number: 20220318592
    Abstract: A sampler for executing a graph neural network (GNN) model are disclosed. The sampler is configured to implement random sampling for neighbor nodes around a specified node of a GNN model, and performs: obtaining a quantity of neighbor nodes around the specified node and a target number of neighbor nodes to be sampled; dividing a range into a plurality of subranges based on the target number; generating random numbers; determining a plurality of integer values within the plurality of subranges based on the random numbers; determining index values of the target number of neighbor nodes to be sampled by matching index values of the neighbor nodes and the plurality of determined integer values; and writing the determined index values into an output buffer. The sampler provided in the present disclosure can uniformly sample the neighbor nodes around the specified node for the specified node.
    Type: Application
    Filed: February 22, 2022
    Publication date: October 6, 2022
    Inventors: Tianchan GUAN, Yanhong WANG, Shuangchen LI, Heng LIU, Hongzhong ZHENG
  • Publication number: 20220300426
    Abstract: According to some embodiments of the present invention, there is provided a hybrid cache memory for a processing device having a host processor, the hybrid cache memory comprising: a high bandwidth memory (HBM) configured to store host data; a non-volatile memory (NVM) physically integrated with the HBM in a same package and configured to store a copy of the host data at the HBM; and a cache controller configured to be in bi-directional communication with the host processor, and to manage data transfer between the HBM and NVM and, in response to a command received from the host processor, to manage data transfer between the hybrid cache memory and the host processor.
    Type: Application
    Filed: June 6, 2022
    Publication date: September 22, 2022
    Inventors: Krishna T. Malladi, Hongzhong Zheng
  • Patent number: 11437337
    Abstract: A chip or integrated circuit includes a layer that includes a first device and a second device. A scribe line is located between the first device and the second device and separates the first device from the second device. An electrically conductive connection traverses the scribe line and is coupled to the first device and the second device, thus connecting the first and second devices.
    Type: Grant
    Filed: April 13, 2020
    Date of Patent: September 6, 2022
    Assignee: Alibaba Group Holding Limited
    Inventors: Shuangchen Li, Wei Han, Dimin Niu, Yuhao Wang, Hongzhong Zheng
  • Patent number: 11436165
    Abstract: A high-bandwidth memory (HBM) includes a memory and a controller. The controller receives a data write request from a processor external to the HBM and the controller stores an entry in the memory indicating at least one address of data of the data write request and generates an indication that a data bus is available for an operation during a cycle time of the data write request based on the data write request comprising sparse data or data-value similarity. Sparse data includes a predetermined percentage of data values equal to zero, and data-value similarity includes a predetermined amount of spatial value locality of the data values. The predetermined percentage of data values equal to zero of sparse data and the predetermined amount of spatial value locality of the special-value pattern are both based on a predetermined data granularity.
    Type: Grant
    Filed: September 12, 2019
    Date of Patent: September 6, 2022
    Inventors: Krishna T. Malladi, Dimin Niu, Hongzhong Zheng