Patents by Inventor Hongzhong Zheng

Hongzhong Zheng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12141438
    Abstract: Zero skipping sparsity techniques for reduced data movement between memory and accelerators and reduced computational workload of accelerators. The techniques include detection of zero and near-zero values on the memory. The non-zero values are transferred to the accelerator for computation. The zero and near-zero values are written back within the memory as zero values.
    Type: Grant
    Filed: February 25, 2021
    Date of Patent: November 12, 2024
    Assignee: Alibaba Group Holding Limited
    Inventors: Fei Xue, Fei Sun, Yangjie Zhou, Lide Duan, Hongzhong Zheng
  • Patent number: 12142338
    Abstract: The present invention provides systems and methods for efficiently and effectively priming and initializing a memory. In one embodiment, a memory controller includes a normal data path and a priming path. The normal data path directs storage operations during a normal memory read/write operation after power startup of a memory chip. The priming path includes a priming module, wherein the priming module directs memory priming operations during a power startup of the memory chip, including forwarding a priming pattern for storage in a write pattern mode register of a memory chip and selection of a memory address in the memory chip for initialization with the priming pattern. The priming pattern includes information corresponding to proper initial data values. The priming pattern can also include proper corresponding error correction code (ECC) values. The priming module can include a priming pattern register that stores the priming pattern.
    Type: Grant
    Filed: January 19, 2021
    Date of Patent: November 12, 2024
    Assignee: Alibaba Group Holding Limited
    Inventors: Dimin Niu, Shuangchen Li, Tianchan Guan, Hongzhong Zheng
  • Patent number: 12141227
    Abstract: An adaptive matrix multiplier. In some embodiments, the matrix multiplier includes a first multiplying unit a second multiplying unit, a memory load circuit, and an outer buffer circuit. The first multiplying unit includes a first inner buffer circuit and a second inner buffer circuit, and the second multiplying unit includes a first inner buffer circuit and a second inner buffer circuit. The memory load circuit is configured to load data from memory, in a single burst of a burst memory access mode, into the first inner buffer circuit of the first multiplying unit; and into the first inner buffer circuit of the second multiplying unit.
    Type: Grant
    Filed: July 29, 2020
    Date of Patent: November 12, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Dongyan Jiang, Dimin Niu, Hongzhong Zheng
  • Publication number: 20240370374
    Abstract: The present disclosure relates to a computer system, a method for a computer system, and a computer-readable storage medium for executing the method for a computer system.
    Type: Application
    Filed: May 3, 2024
    Publication date: November 7, 2024
    Inventors: Jiacheng MA, Dimin NIU, Tianchan GUAN, Yijin GUAN, Hongzhong ZHENG
  • Publication number: 20240370168
    Abstract: The present disclosure provides a physical host including a memory, a first buffer, a second buffer, a third buffer and a processor. The first buffer stores a log regarding a plurality of dirty pages. The second buffer stores a dirty bitmap, where the dirty bitmap is written into the second buffer according to the log read from the first buffer. The third buffer stores the dirty bitmap. The processor obtains the current memory address to be migrated and a destination memory address, and marks a page table corresponding to the memory address to be migrated as a plurality of dirty pages and writes the log marked as the plurality of dirty pages into the first buffer when the memory address to be migrated is written. The processor includes a memory copy engine for reading the dirty bitmap from the third buffer, and copying the content corresponding to the plurality of dirty pages to the destination memory according to the dirty bitmap.
    Type: Application
    Filed: May 3, 2024
    Publication date: November 7, 2024
    Inventors: Jiacheng MA, Tianchan GUAN, Yijin GUAN, Dimin NIU, Hongzhong ZHENG
  • Publication number: 20240370384
    Abstract: This disclosure discloses a memory extension device, an operation method of the memory extension device, and a computer readable storage medium for executing the operation method. The method includes: converting local information received from a local host into local transaction layer information according to a first sub-protocol of a coherent interconnection protocol; converting the local transaction layer information into converted local transaction layer information according to a second sub-protocol of the coherent interconnection protocol, the converted local transaction layer information conforming to the second sub-protocol; packaging the converted local transaction layer information into a plurality of local data packets; and transmitting the plurality of local data packets to a remote memory extension device.
    Type: Application
    Filed: May 3, 2024
    Publication date: November 7, 2024
    Inventors: Tianchan GUAN, Yijin GUAN, Dimin NIU, Jiacheng MA, Zhaoyang DU, Hongzhong ZHENG
  • Patent number: 12130884
    Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
    Type: Grant
    Filed: July 13, 2021
    Date of Patent: October 29, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Peng Gu, Krishna Malladi, Hongzhong Zheng, Dimin Niu
  • Patent number: 12124709
    Abstract: The present application discloses a computing system and an associated method. The computing system includes a first host, a second host, a first memory extension device and a second memory extension device. The first host includes a first memory, and the second host includes a second memory. The first host has a plurality of first memory addresses corresponding to a plurality of memory spaces of the first memory, and a plurality of second memory addresses corresponding to a plurality of memory spaces of the second memory. The first memory extension device is coupled to the first host. The second memory extension device is coupled to the second host and the first memory extension device. The first host accesses the plurality of memory spaces of the second memory through the first memory extension device and the second memory extension device.
    Type: Grant
    Filed: December 12, 2022
    Date of Patent: October 22, 2024
    Assignee: ALIBABA (CHINA) CO., LTD.
    Inventors: Tianchan Guan, Yijin Guan, Dimin Niu, Hongzhong Zheng
  • Patent number: 12124382
    Abstract: The disclosed embodiments relate to a computer system with a cache memory that supports tagless addressing. During operation, the system receives a request to perform a memory access, wherein the request includes a virtual address. In response to the request, the system performs an address-translation operation, which translates the virtual address into both a physical address and a cache address. Next, the system uses the physical address to access one or more levels of physically addressed cache memory, wherein accessing a given level of physically addressed cache memory involves performing a tag-checking operation based on the physical address. If the access to the one or more levels of physically addressed cache memory fails to hit on a cache line for the memory access, the system uses the cache address to directly index a cache memory, wherein directly indexing the cache memory does not involve performing a tag-checking operation and eliminates the tag storage overhead.
    Type: Grant
    Filed: November 22, 2022
    Date of Patent: October 22, 2024
    Assignee: Rambus Inc.
    Inventors: Hongzhong Zheng, Trung A. Diep
  • Patent number: 12099736
    Abstract: A memory system provides deduplication of user data in the physical memory space of the system for user data that is duplicated in the virtual memory space of a host system. A transaction manager (TM) uses a transaction table to maintain data coherency and data concurrency for the virtual memory space. A write data engine manager (WDEM) uses an outstanding bucket number and command queues to maintain data coherency and data concurrency for the physical memory space. The WDEM receives data write requests from the TM and sends a corresponding write command to a selected command queue. A write data engine responds to a write command in a command queue by storing the data in an overflow memory region if the data is not duplicated in the virtual memory space, or by incrementing a reference counter for the data if the data is duplicated in the virtual memory space.
    Type: Grant
    Filed: March 24, 2020
    Date of Patent: September 24, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Dongyan Jiang, Qiang Peng, Hongzhong Zheng
  • Publication number: 20240303090
    Abstract: A data processing method, applicable to an accelerator that is communicatively coupled to a processor core, includes obtaining a service data processing request from a first queue; obtaining to-be-processed service data corresponding to the service data processing request from the processor core via a service interface; generating result service data based on the to-be-processed service data; and writing the result service data into a second queue for providing to the processor core.
    Type: Application
    Filed: March 7, 2024
    Publication date: September 12, 2024
    Inventors: Shijian Zhang, Lide Duan, Hongzhong Zheng
  • Patent number: 12073490
    Abstract: The maximum capacity of a very fast memory in a system that requires very fast memory access times is increased by adding a memory with remote access times that are slower than required, and then moving infrequently accessed data from the memory with the very fast access times to the memory with the slow access times.
    Type: Grant
    Filed: January 21, 2022
    Date of Patent: August 27, 2024
    Assignee: Alibaba Damo (Hangzhou) Technology Co., Ltd.
    Inventors: Yuhao Wang, Dimin Niu, Yijin Guan, Shengcheng Wang, Shuangchen Li, Hongzhong Zheng
  • Publication number: 20240273032
    Abstract: A cache memory includes cache lines to store information. The stored information is associated with physical addresses that include first, second, and third distinct portions. The cache lines are indexed by the second portions of respective physical addresses associated with the stored information. The cache memory also includes one or more tables, each of which includes respective table entries that are indexed by the first portions of the respective physical addresses. The respective table entries in each of the one or more tables are to store indications of the second portions of respective physical addresses associated with the stored information.
    Type: Application
    Filed: February 29, 2024
    Publication date: August 15, 2024
    Inventors: Trung Diep, Hongzhong Zheng
  • Publication number: 20240273038
    Abstract: A memory module includes at least two memory devices. Each of the memory devices perform verify operations after attempted writes to their respective memory cores. When a write is unsuccessful, each memory device stores information about the unsuccessful write in an internal write retry buffer. The write operations may have only been unsuccessful for one memory device and not any other memory devices on the memory module. When the memory module is instructed, both memory devices on the memory module can retry the unsuccessful memory write operations concurrently. Both devices can retry these write operations concurrently even though the unsuccessful memory write operations were to different addresses.
    Type: Application
    Filed: February 26, 2024
    Publication date: August 15, 2024
    Inventors: Hongzhong ZHENG, Brent Haukness
  • Publication number: 20240273048
    Abstract: A data operation system includes: a plurality of data processing units; a memory expansion unit communicatively coupled to the plurality of data processing units; a plurality of data operation units communicatively coupled to the plurality of data processing units and the memory expansion unit; and a plurality of first storage units communicatively coupled to the plurality of data processing units; wherein the memory expansion unit comprises a plurality of memory expansion cards, each of the plurality of data processing units is communicatively coupled to at least one of the plurality of memory expansion cards, and the plurality of memory expansion cards are interconnected.
    Type: Application
    Filed: February 15, 2024
    Publication date: August 15, 2024
    Inventors: Linyong HUANG, Zhe ZHANG, Shuangchen LI, Hongzhong ZHENG
  • Publication number: 20240267256
    Abstract: Embodiments of this disclosure provide a processing system and an instruction transmission method. The instruction transmission method includes: receiving a processor instruction from a main processor communicatively coupled to an interface unit; generating an accelerator instruction corresponding to the processor instruction; determining a target accelerator corresponding to the accelerator instruction from a plurality of accelerators communicatively coupled to a first bus network; and transmitting the accelerator instruction to the target accelerator through the first bus network.
    Type: Application
    Filed: February 8, 2024
    Publication date: August 8, 2024
    Inventors: Zhe ZHANG, Shuangchen LI, Linyong HUANG, Hongzhong ZHENG
  • Patent number: 12056379
    Abstract: A storage device and method of controlling a storage device are disclosed. The storage device includes a host, a logic die, and a high bandwidth memory stack including a memory die. A computation lookup table is stored on a memory array of the memory die. The host sends a command to perform an operation utilizing a kernel and a plurality of input feature maps, includes finding the product of a weight of the kernel and values of multiple input feature maps. The computation lookup table includes a row corresponding to a weight of the kernel, and a column corresponding to a value of the input feature maps. A result value stored at a position corresponding to a row and a column is the product of the weight corresponding to the row and the value corresponding to the column.
    Type: Grant
    Filed: May 11, 2023
    Date of Patent: August 6, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Peng Gu, Krishna T. Malladi, Hongzhong Zheng
  • Patent number: 12056374
    Abstract: A dynamic bias coherency configuration engine can include control logic, a host threshold register, and device threshold register and a plurality of memory region monitoring units. The memory region monitoring units can include a starting page number register, an ending page number register, a host access register and a device access register. The memory region monitoring units can be utilized by dynamic bias coherency configuration engine to configure corresponding portions of a memory space in a device bias mode or a host bias mode.
    Type: Grant
    Filed: February 3, 2021
    Date of Patent: August 6, 2024
    Assignee: Alibaba Group Holding Limited
    Inventors: Lide Duan, Dimin Niu, Hongzhong Zheng
  • Patent number: RE50130
    Abstract: A computing system includes: an adaptive back-up controller configured to calculate an adaptive back-up time based on a reserve power source for backing up a volatile memory to a nonvolatile memory; and a processor core, coupled to the adaptive back-up controller, configured to back up at least a portion of the volatile memory to the nonvolatile memory within the adaptive back-up time based on a back-up priority.
    Type: Grant
    Filed: September 5, 2020
    Date of Patent: September 17, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hongzhong Zheng, Keith Chan, Wonseok Lee, Tackhwi Lee
  • Patent number: RE50205
    Abstract: A computing system includes: an adaptive back-up controller configured to calculate an adaptive back-up time based on a reserve power source for backing up a volatile memory to a nonvolatile memory; and a processor core, coupled to the adaptive back-up controller, configured to back up at least a portion of the volatile memory to the nonvolatile memory within the adaptive back-up time.
    Type: Grant
    Filed: September 5, 2020
    Date of Patent: November 12, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hongzhong Zheng, Keith Chan, Wonseok Lee, Tackhwi Lee