Patents by Inventor Dimin Niu

Dimin Niu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12181987
    Abstract: According to one general aspect, an apparatus may include a plurality of stacked integrated circuit dies that include a memory cell die and a logic die. The memory cell die may be configured to store data at a memory address. The logic die may include an interface to the stacked integrated circuit dies and configured to communicate memory accesses between the memory cell die and at least one external device. The logic die may include a reliability circuit configured to ameliorate data errors within the memory cell die. The reliability circuit may include a spare memory configured to store data, and an address table configured to map a memory address associated with an error to the spare memory. The reliability circuit may be configured to determine if the memory access is associated with an error, and if so completing the memory access with the spare memory.
    Type: Grant
    Filed: October 12, 2021
    Date of Patent: December 31, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Dimin Niu, Krishna Malladi, Hongzhong Zheng
  • Patent number: 12164593
    Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
    Type: Grant
    Filed: July 13, 2021
    Date of Patent: December 10, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Peng Gu, Krishna Malladi, Hongzhong Zheng, Dimin Niu
  • Patent number: 12153646
    Abstract: An adaptive matrix multiplier. In some embodiments, the matrix multiplier includes a first multiplying unit a second multiplying unit, a memory load circuit, and an outer buffer circuit. The first multiplying unit includes a first inner buffer circuit and a second inner buffer circuit, and the second multiplying unit includes a first inner buffer circuit and a second inner buffer circuit. The memory load circuit is configured to load data from memory, in a single burst of a burst memory access mode, into the first inner buffer circuit of the first multiplying unit; and into the first inner buffer circuit of the second multiplying unit.
    Type: Grant
    Filed: October 17, 2022
    Date of Patent: November 26, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Dongyan Jiang, Dimin Niu, Hongzhong Zheng
  • Publication number: 20240385880
    Abstract: A task scheduling unit includes a progress detection subunit having circuitry configured to obtain first progress information of a first chip in which the task scheduling unit is located, the first progress information indicating a task execution progress of the first chip; a transmission subunit having circuitry configured to transmit the first progress information to a second chip, wherein the first chip and the second chip are located on a same wafer, and the task execution progress of the first chip is less than a task execution progress of the second chip; and a transfer subunit having circuitry configured to receive first request information transmitted by the second chip in response to the first progress information; and transfer at least some of tasks executed by the first chip to the second chip for execution based on the first request information.
    Type: Application
    Filed: May 17, 2024
    Publication date: November 21, 2024
    Inventors: Youwei ZHUO, Han XU, Zhe ZHANG, Shuangchen LI, Dimin NIU, Hongzhong ZHENG
  • Patent number: 12147360
    Abstract: A memory module that includes a non-volatile memory and an asynchronous memory interface to interface with a memory controller is presented. The asynchronous memory interface may use repurposed pins of a double data rate (DDR) memory channel to send an asynchronous data to the memory controller. The asynchronous data may be device feedback indicating a status of the non-volatile memory.
    Type: Grant
    Filed: July 25, 2022
    Date of Patent: November 19, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Dimin Niu, Mu-Tien Chang, Hongzhong Zheng, Sun Young Lim, Indong Kim, Jangseok Choi, Craig Hanson
  • Patent number: 12147474
    Abstract: An embodiment of the present disclosure relates to a graph node sampling system and a computer-implemented method, where structure information of nodes in a graph neural network is stored in a set of data structures, and attribute data of the nodes is stored in another set of data structures. Node sampling may be performed by a sampling unit in a solid state drive. A node sampling unit selects, reads, and collects attribute data of a sampled node and a neighboring node of the sampled node, and transfers the data to a main memory. The method and system according to the embodiments of the present disclosure can save bandwidth consumed by node sampling in large applications such as a graph neural network.
    Type: Grant
    Filed: April 20, 2022
    Date of Patent: November 19, 2024
    Assignee: Alibaba (China) Co., Ltd.
    Inventors: Tianchan Guan, Dimin Niu, Shuangchen Li, Honzhong Zheng
  • Patent number: 12147341
    Abstract: Apparatus, method, and system provided herein are directed to prioritizing cache line writing of compressed data. The memory controller comprises a cache line compression engine that receives raw data, compresses the raw data, determines a compression rate between the raw data and the compressed data, determines whether the compression rate is greater than a predetermined rate, and outputs the compressed data as data-to-be-written if the compression rate is greater than the predetermined rate. In response to determining that the compression rate is greater than the predetermined rate, the cache line compression engine generates a compression signal indicating the data-to-be-written is the compressed data and sends the compression signal to a scheduler of a command queue in the memory controller where writing of compressed data is prioritized.
    Type: Grant
    Filed: August 6, 2020
    Date of Patent: November 19, 2024
    Assignee: Alibaba Group Holding Limited
    Inventors: Dimin Niu, Tianchan Guan, Lide Duan, Hongzhong Zheng
  • Patent number: 12142338
    Abstract: The present invention provides systems and methods for efficiently and effectively priming and initializing a memory. In one embodiment, a memory controller includes a normal data path and a priming path. The normal data path directs storage operations during a normal memory read/write operation after power startup of a memory chip. The priming path includes a priming module, wherein the priming module directs memory priming operations during a power startup of the memory chip, including forwarding a priming pattern for storage in a write pattern mode register of a memory chip and selection of a memory address in the memory chip for initialization with the priming pattern. The priming pattern includes information corresponding to proper initial data values. The priming pattern can also include proper corresponding error correction code (ECC) values. The priming module can include a priming pattern register that stores the priming pattern.
    Type: Grant
    Filed: January 19, 2021
    Date of Patent: November 12, 2024
    Assignee: Alibaba Group Holding Limited
    Inventors: Dimin Niu, Shuangchen Li, Tianchan Guan, Hongzhong Zheng
  • Patent number: 12141227
    Abstract: An adaptive matrix multiplier. In some embodiments, the matrix multiplier includes a first multiplying unit a second multiplying unit, a memory load circuit, and an outer buffer circuit. The first multiplying unit includes a first inner buffer circuit and a second inner buffer circuit, and the second multiplying unit includes a first inner buffer circuit and a second inner buffer circuit. The memory load circuit is configured to load data from memory, in a single burst of a burst memory access mode, into the first inner buffer circuit of the first multiplying unit; and into the first inner buffer circuit of the second multiplying unit.
    Type: Grant
    Filed: July 29, 2020
    Date of Patent: November 12, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Dongyan Jiang, Dimin Niu, Hongzhong Zheng
  • Publication number: 20240370168
    Abstract: The present disclosure provides a physical host including a memory, a first buffer, a second buffer, a third buffer and a processor. The first buffer stores a log regarding a plurality of dirty pages. The second buffer stores a dirty bitmap, where the dirty bitmap is written into the second buffer according to the log read from the first buffer. The third buffer stores the dirty bitmap. The processor obtains the current memory address to be migrated and a destination memory address, and marks a page table corresponding to the memory address to be migrated as a plurality of dirty pages and writes the log marked as the plurality of dirty pages into the first buffer when the memory address to be migrated is written. The processor includes a memory copy engine for reading the dirty bitmap from the third buffer, and copying the content corresponding to the plurality of dirty pages to the destination memory according to the dirty bitmap.
    Type: Application
    Filed: May 3, 2024
    Publication date: November 7, 2024
    Inventors: Jiacheng MA, Tianchan GUAN, Yijin GUAN, Dimin NIU, Hongzhong ZHENG
  • Publication number: 20240370384
    Abstract: This disclosure discloses a memory extension device, an operation method of the memory extension device, and a computer readable storage medium for executing the operation method. The method includes: converting local information received from a local host into local transaction layer information according to a first sub-protocol of a coherent interconnection protocol; converting the local transaction layer information into converted local transaction layer information according to a second sub-protocol of the coherent interconnection protocol, the converted local transaction layer information conforming to the second sub-protocol; packaging the converted local transaction layer information into a plurality of local data packets; and transmitting the plurality of local data packets to a remote memory extension device.
    Type: Application
    Filed: May 3, 2024
    Publication date: November 7, 2024
    Inventors: Tianchan GUAN, Yijin GUAN, Dimin NIU, Jiacheng MA, Zhaoyang DU, Hongzhong ZHENG
  • Publication number: 20240370374
    Abstract: The present disclosure relates to a computer system, a method for a computer system, and a computer-readable storage medium for executing the method for a computer system.
    Type: Application
    Filed: May 3, 2024
    Publication date: November 7, 2024
    Inventors: Jiacheng MA, Dimin NIU, Tianchan GUAN, Yijin GUAN, Hongzhong ZHENG
  • Patent number: 12130884
    Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
    Type: Grant
    Filed: July 13, 2021
    Date of Patent: October 29, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Peng Gu, Krishna Malladi, Hongzhong Zheng, Dimin Niu
  • Patent number: 12124709
    Abstract: The present application discloses a computing system and an associated method. The computing system includes a first host, a second host, a first memory extension device and a second memory extension device. The first host includes a first memory, and the second host includes a second memory. The first host has a plurality of first memory addresses corresponding to a plurality of memory spaces of the first memory, and a plurality of second memory addresses corresponding to a plurality of memory spaces of the second memory. The first memory extension device is coupled to the first host. The second memory extension device is coupled to the second host and the first memory extension device. The first host accesses the plurality of memory spaces of the second memory through the first memory extension device and the second memory extension device.
    Type: Grant
    Filed: December 12, 2022
    Date of Patent: October 22, 2024
    Assignee: ALIBABA (CHINA) CO., LTD.
    Inventors: Tianchan Guan, Yijin Guan, Dimin Niu, Hongzhong Zheng
  • Publication number: 20240348422
    Abstract: A privacy calculation unit includes a first calculation subunit, a storage subunit, and a communication subunit. The first calculation subunit includes circuitry to calculate first domain conversion ciphertexts sequentially. The storage subunit is configured to store the calculated first domain conversion ciphertexts received from the first calculation subunit. The first domain conversion ciphertext is an intermediate ciphertext when first to-be-converted data is converted from a first privacy-preserving computation domain to a second privacy-preserving computation domain.
    Type: Application
    Filed: April 16, 2024
    Publication date: October 17, 2024
    Inventors: Zhaohui CHEN, Zhen GU, Yanheng LU, Dimin NIU, Ziyuan LIANG, Qi'ao JIN, Fan ZHANG, Yuan XIE
  • Patent number: 12073490
    Abstract: The maximum capacity of a very fast memory in a system that requires very fast memory access times is increased by adding a memory with remote access times that are slower than required, and then moving infrequently accessed data from the memory with the very fast access times to the memory with the slow access times.
    Type: Grant
    Filed: January 21, 2022
    Date of Patent: August 27, 2024
    Assignee: Alibaba Damo (Hangzhou) Technology Co., Ltd.
    Inventors: Yuhao Wang, Dimin Niu, Yijin Guan, Shengcheng Wang, Shuangchen Li, Hongzhong Zheng
  • Patent number: 12056374
    Abstract: A dynamic bias coherency configuration engine can include control logic, a host threshold register, and device threshold register and a plurality of memory region monitoring units. The memory region monitoring units can include a starting page number register, an ending page number register, a host access register and a device access register. The memory region monitoring units can be utilized by dynamic bias coherency configuration engine to configure corresponding portions of a memory space in a device bias mode or a host bias mode.
    Type: Grant
    Filed: February 3, 2021
    Date of Patent: August 6, 2024
    Assignee: Alibaba Group Holding Limited
    Inventors: Lide Duan, Dimin Niu, Hongzhong Zheng
  • Publication number: 20240244013
    Abstract: Embodiments of this disclosure provide a data packet transmission method, a scheduling management unit, a chip, and a graphic card. The data packet transmission method includes: determining a source node and a destination node of a data packet to be transmitted; determining at least one intermediate routing node corresponding to the data packet to be transmitted based on the source node and the destination node of the data packet to be transmitted and a data transmission state of each node in a network on chip (NoC); and transmitting identification information of the at least one intermediate routing node to the source node of the data packet to be transmitted.
    Type: Application
    Filed: January 16, 2024
    Publication date: July 18, 2024
    Inventors: Yunfang LI, Jiayi HUANG, Lide DUAN, Dimin NIU
  • Patent number: 12032497
    Abstract: A high bandwidth memory (HBM) system includes a first HBM+ card. The first HBM+ card includes a plurality of HBM+ cubes. Each HBM+ cube has a logic die and a memory die. The first HBM+ card also includes a HBM+ card controller coupled to each of the plurality of HBM+ cubes and configured to interface with a host, a pin connection configured to connect to the host, and a fabric connection configured to connect to at least one HBM+ card.
    Type: Grant
    Filed: September 8, 2021
    Date of Patent: July 9, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Krishna T. Malladi, Hongzhong Zheng, Dimin Niu, Peng Gu
  • Patent number: 12032828
    Abstract: A memory module includes a memory array, an interface and a controller. The memory array includes an array of memory cells and is configured as a dual in-line memory module (DIMM). The DIMM includes a plurality of connections that have been repurposed from a standard DIMM pin out configuration to interface operational status of the memory device to a host device. The interface is coupled to the memory array and the plurality of connections of the DIMM to interface the memory array to the host device. The controller is coupled to the memory array and the interface and controls at least one of a refresh operation of the memory array, control an error-correction operation of the memory array, control a memory scrubbing operation of the memory array, and control a wear-level control operation of the array, and the controller to interface with the host device.
    Type: Grant
    Filed: April 4, 2022
    Date of Patent: July 9, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Mu-Tien Chang, Dimin Niu, Hongzhong Zheng, Sun Young Lim, Indong Kim, Jangseok Choi