Patents by Inventor Krishna Malladi

Krishna Malladi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11507435
    Abstract: A method for migrating a workload includes: receiving workloads generated from a plurality of applications running in a plurality of server nodes of a rack system; monitoring latency requirements for the workloads and detecting a violation of the latency requirement for a workload; collecting system utilization information of the rack system; calculating rewards for migrating the workload to other server nodes in the rack system; determining a target server node among the plurality of server nodes that maximizes the reward; and performing migration of the workload to the target server node.
    Type: Grant
    Filed: March 24, 2020
    Date of Patent: November 22, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Qiumin Xu, Krishna Malladi, Manu Awasthi
  • Publication number: 20220367412
    Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.
    Type: Application
    Filed: July 25, 2022
    Publication date: November 17, 2022
    Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG
  • Patent number: 11398453
    Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.
    Type: Grant
    Filed: March 2, 2018
    Date of Patent: July 26, 2022
    Inventors: Peng Gu, Krishna Malladi, Hongzhong Zheng
  • Publication number: 20220035719
    Abstract: According to one general aspect, an apparatus may include a plurality of stacked integrated circuit dies that include a memory cell die and a logic die. The memory cell die may be configured to store data at a memory address. The logic die may include an interface to the stacked integrated circuit dies and configured to communicate memory accesses between the memory cell die and at least one external device. The logic die may include a reliability circuit configured to ameliorate data errors within the memory cell die. The reliability circuit may include a spare memory configured to store data, and an address table configured to map a memory address associated with an error to the spare memory. The reliability circuit may be configured to determine if the memory access is associated with an error, and if so completing the memory access with the spare memory.
    Type: Application
    Filed: October 12, 2021
    Publication date: February 3, 2022
    Inventors: Dimin NIU, Krishna MALLADI, Hongzhong ZHENG
  • Publication number: 20210374210
    Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
    Type: Application
    Filed: July 13, 2021
    Publication date: December 2, 2021
    Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG, Dimin NIU
  • Patent number: 11151006
    Abstract: According to one general aspect, an apparatus may include a plurality of stacked integrated circuit dies that include a memory cell die and a logic die. The memory cell die may be configured to store data at a memory address. The logic die may include an interface to the stacked integrated circuit dies and configured to communicate memory accesses between the memory cell die and at least one external device. The logic die may include a reliability circuit configured to ameliorate data errors within the memory cell die. The reliability circuit may include a spare memory configured to store data, and an address table configured to map a memory address associated with an error to the spare memory. The reliability circuit may be configured to determine if the memory access is associated with an error, and if so completing the memory access with the spare memory.
    Type: Grant
    Filed: October 2, 2018
    Date of Patent: October 19, 2021
    Inventors: Dimin Niu, Krishna Malladi, Hongzhong Zheng
  • Patent number: 11100193
    Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
    Type: Grant
    Filed: April 18, 2019
    Date of Patent: August 24, 2021
    Inventors: Peng Gu, Krishna Malladi, Hongzhong Zheng, Dimin Niu
  • Publication number: 20200225999
    Abstract: A method for migrating a workload includes: receiving workloads generated from a plurality of applications running in a plurality of server nodes of a rack system; monitoring latency requirements for the workloads and detecting a violation of the latency requirement for a workload; collecting system utilization information of the rack system; calculating rewards for migrating the workload to other server nodes in the rack system; determining a target server node among the plurality of server nodes that maximizes the reward; and performing migration of the workload to the target server node.
    Type: Application
    Filed: March 24, 2020
    Publication date: July 16, 2020
    Inventors: Qiumin Xu, Krishna Malladi, Manu Awasthi
  • Publication number: 20200184001
    Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
    Type: Application
    Filed: April 18, 2019
    Publication date: June 11, 2020
    Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG, Dimin NIU
  • Publication number: 20200183837
    Abstract: A tensor computation dataflow accelerator semiconductor circuit is disclosed. The data flow accelerator includes a DRAM bank and a peripheral array of multiply-and-add units disposed adjacent to the DRAM bank. The peripheral array of multiply-and-add units are configured to form a pipelined dataflow chain in which partial output data from one multiply-and-add unit from among the array of multiply-and-add units is fed into another multiply-and-add unit from among the array of multiply-and-add units for data accumulation. Near-DRAM-processing dataflow (NDP-DF) accelerator unit dies may be stacked atop a base die. The base die may be disposed on a passive silicon interposer adjacent to a processor or a controller. The NDP-DF accelerator units may process partial matrix output data in parallel. The partial matrix output data may be propagated in a forward or backward direction. The tensor computation dataflow accelerator may perform a partial matrix transposition.
    Type: Application
    Filed: April 18, 2019
    Publication date: June 11, 2020
    Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG, Dimin NIU
  • Patent number: 10678704
    Abstract: A method of retrieving data stored in a memory associated with a dedupe module is provided. The method includes: identifying a logical address of the data; identifying a physical line ID of the data in accordance with the logical address by looking up at least a portion of the logical address in a translation table; locating a respective physical line, the respective physical line corresponding to the PLID; and retrieving the data from the respective physical line, the retrieving including copying a respective hash cylinder to the read cache, the respective hash cylinder including: a respective hash bucket, the respective hash bucket including the respective physical line; and a respective reference counter bucket, the respective reference counter bucket including a respective reference counter associated with the respective physical line.
    Type: Grant
    Filed: March 31, 2017
    Date of Patent: June 9, 2020
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Dongyan Jiang, Changhui Lin, Krishna Malladi, Jongmin Gim, Hongzhong Zheng
  • Patent number: 10628233
    Abstract: A method for migrating a workload includes: receiving workloads generated from a plurality of applications running in a plurality of server nodes of a rack system; monitoring latency requirements for the workloads and detecting a violation of the latency requirement for a workload; collecting system utilization information of the rack system; calculating rewards for migrating the workload to other server nodes in the rack system; determining a target server node among the plurality of server nodes that maximizes the reward; and performing migration of the workload to the target server node.
    Type: Grant
    Filed: March 23, 2017
    Date of Patent: April 21, 2020
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Qiumin Xu, Krishna Malladi, Manu Awasthi
  • Patent number: 10528284
    Abstract: A dedupe module is provided. The dedupe module includes: a host interface; a dedupe engine to receive a data request from a host system via the host interface; a memory controller; a plurality of memory modules, each memory module being coupled to the memory controller; and a read cache for caching data from the memory controller for use by the dedupe engine.
    Type: Grant
    Filed: April 26, 2017
    Date of Patent: January 7, 2020
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Dongyan Jiang, Changhui Lin, Krishna Malladi, Jongmin Gim, Hongzhong Zheng
  • Publication number: 20200004652
    Abstract: According to one general aspect, an apparatus may include a plurality of stacked integrated circuit dies that include a memory cell die and a logic die. The memory cell die may be configured to store data at a memory address. The logic die may include an interface to the stacked integrated circuit dies and configured to communicate memory accesses between the memory cell die and at least one external device. The logic die may include a reliability circuit configured to ameliorate data errors within the memory cell die. The reliability circuit may include a spare memory configured to store data, and an address table configured to map a memory address associated with an error to the spare memory. The reliability circuit may be configured to determine if the memory access is associated with an error, and if so completing the memory access with the spare memory.
    Type: Application
    Filed: October 2, 2018
    Publication date: January 2, 2020
    Inventors: Dimin NIU, Krishna MALLADI, Hongzhong ZHENG
  • Patent number: 10372606
    Abstract: A memory device includes a memory interface to a host computer and a memory overprovisioning logic configured to provide a virtual memory capacity to a host operating system (OS). A kernel driver module of the host OS is configured to manage the virtual memory capacity of the memory device provided by the memory overprovisioning logic of the memory device and provide a fast swap of anonymous pages to a frontswap space and file pages to a cleancache space of the memory device based on the virtual memory capacity of the memory device.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: August 6, 2019
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Krishna Malladi, Jongmin Gim, Hongzhong Zheng
  • Publication number: 20190214365
    Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.
    Type: Application
    Filed: March 2, 2018
    Publication date: July 11, 2019
    Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG
  • Patent number: 10268413
    Abstract: A memory module includes a host interface configured to provide an interface to a host computer; one or more memory devices; a deduplication engine configured to provide a virtual memory capacity of the memory module that is larger than a physical size of the one or more memory devices; a memory controller for controlling access to the one or more memory devices; a volatile memory comprising a hash table, an overflow memory region, and a credit unit, wherein the overflow memory region stores user data when a hash collision occurs or the hash table is full, and wherein the credit unit stores an address of an invalidated entry in the overflow memory region; and a control logic is configured to control the overflow memory region and the credit unit and generate a warning indicating a status of the overflow memory region and the credit unit.
    Type: Grant
    Filed: March 29, 2017
    Date of Patent: April 23, 2019
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Dongyan Jiang, Changhui Lin, Krishna Malladi, Jongmin Gim, Hongzhong Zheng
  • Patent number: 10242728
    Abstract: A dynamic random access memory (DRAM) processing unit (DPU) may include at least one computing cell array having a plurality of DRAM-based computing cells arranged in an array having at least one column in which the at least one column may include at least three rows of DRAM-based computing cells configured to provide a logic function that operates on a first and a second row of the at least three rows and configured to store a result of the logic function in a third row of the at least three rows; and a controller that may be coupled to the at least one computing cell array to configure the at least one computing cell array to perform a DPU operation.
    Type: Grant
    Filed: February 6, 2017
    Date of Patent: March 26, 2019
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Shaungchen Li, Dimin Niu, Krishna Malladi, Hongzhong Zheng
  • Patent number: 10180808
    Abstract: A system includes a library, a compiler, a driver and at least one dynamic random access memory (DRAM) processing unit (DPU). The library may determine at least one DPU operation corresponding to a received command. The compiler may form at least one DPU instruction for the DPU operation. The driver may send the at least one DPU instruction to at least one DPU. The DPU may include at least one computing cell array that includes a plurality of DRAM-based computing cells arranged in an array having at least one column in which the at least one column may include at least three rows of DRAM-based computing cells configured to provide a logic function that operates on a first row and a second row of the at least three rows and configured to store a result of the logic function in a third row of the at least three rows.
    Type: Grant
    Filed: February 6, 2017
    Date of Patent: January 15, 2019
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Shaungchen Li, Dimin Niu, Krishna Malladi, Hongzhong Zheng
  • Patent number: 10162554
    Abstract: A memory module has a logic including a programming register, a deduplication ratio control logic, and a deduplication engine. The programming register stores a maximum deduplication ratio of the memory module. The control logic is configured to control a deduplication ratio of the memory module according to the maximum deduplication ratio. The deduplication ratio is programmable by the host computer.
    Type: Grant
    Filed: October 4, 2016
    Date of Patent: December 25, 2018
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hongzhong Zheng, Krishna Malladi, Dimin Niu