Patents by Inventor Krishna Malladi
Krishna Malladi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11507435Abstract: A method for migrating a workload includes: receiving workloads generated from a plurality of applications running in a plurality of server nodes of a rack system; monitoring latency requirements for the workloads and detecting a violation of the latency requirement for a workload; collecting system utilization information of the rack system; calculating rewards for migrating the workload to other server nodes in the rack system; determining a target server node among the plurality of server nodes that maximizes the reward; and performing migration of the workload to the target server node.Type: GrantFiled: March 24, 2020Date of Patent: November 22, 2022Assignee: Samsung Electronics Co., Ltd.Inventors: Qiumin Xu, Krishna Malladi, Manu Awasthi
-
Publication number: 20220367412Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.Type: ApplicationFiled: July 25, 2022Publication date: November 17, 2022Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG
-
Patent number: 11398453Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.Type: GrantFiled: March 2, 2018Date of Patent: July 26, 2022Inventors: Peng Gu, Krishna Malladi, Hongzhong Zheng
-
Publication number: 20220035719Abstract: According to one general aspect, an apparatus may include a plurality of stacked integrated circuit dies that include a memory cell die and a logic die. The memory cell die may be configured to store data at a memory address. The logic die may include an interface to the stacked integrated circuit dies and configured to communicate memory accesses between the memory cell die and at least one external device. The logic die may include a reliability circuit configured to ameliorate data errors within the memory cell die. The reliability circuit may include a spare memory configured to store data, and an address table configured to map a memory address associated with an error to the spare memory. The reliability circuit may be configured to determine if the memory access is associated with an error, and if so completing the memory access with the spare memory.Type: ApplicationFiled: October 12, 2021Publication date: February 3, 2022Inventors: Dimin NIU, Krishna MALLADI, Hongzhong ZHENG
-
Publication number: 20210374210Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.Type: ApplicationFiled: July 13, 2021Publication date: December 2, 2021Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG, Dimin NIU
-
Patent number: 11151006Abstract: According to one general aspect, an apparatus may include a plurality of stacked integrated circuit dies that include a memory cell die and a logic die. The memory cell die may be configured to store data at a memory address. The logic die may include an interface to the stacked integrated circuit dies and configured to communicate memory accesses between the memory cell die and at least one external device. The logic die may include a reliability circuit configured to ameliorate data errors within the memory cell die. The reliability circuit may include a spare memory configured to store data, and an address table configured to map a memory address associated with an error to the spare memory. The reliability circuit may be configured to determine if the memory access is associated with an error, and if so completing the memory access with the spare memory.Type: GrantFiled: October 2, 2018Date of Patent: October 19, 2021Inventors: Dimin Niu, Krishna Malladi, Hongzhong Zheng
-
Patent number: 11100193Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.Type: GrantFiled: April 18, 2019Date of Patent: August 24, 2021Inventors: Peng Gu, Krishna Malladi, Hongzhong Zheng, Dimin Niu
-
Publication number: 20200225999Abstract: A method for migrating a workload includes: receiving workloads generated from a plurality of applications running in a plurality of server nodes of a rack system; monitoring latency requirements for the workloads and detecting a violation of the latency requirement for a workload; collecting system utilization information of the rack system; calculating rewards for migrating the workload to other server nodes in the rack system; determining a target server node among the plurality of server nodes that maximizes the reward; and performing migration of the workload to the target server node.Type: ApplicationFiled: March 24, 2020Publication date: July 16, 2020Inventors: Qiumin Xu, Krishna Malladi, Manu Awasthi
-
Publication number: 20200184001Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.Type: ApplicationFiled: April 18, 2019Publication date: June 11, 2020Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG, Dimin NIU
-
Publication number: 20200183837Abstract: A tensor computation dataflow accelerator semiconductor circuit is disclosed. The data flow accelerator includes a DRAM bank and a peripheral array of multiply-and-add units disposed adjacent to the DRAM bank. The peripheral array of multiply-and-add units are configured to form a pipelined dataflow chain in which partial output data from one multiply-and-add unit from among the array of multiply-and-add units is fed into another multiply-and-add unit from among the array of multiply-and-add units for data accumulation. Near-DRAM-processing dataflow (NDP-DF) accelerator unit dies may be stacked atop a base die. The base die may be disposed on a passive silicon interposer adjacent to a processor or a controller. The NDP-DF accelerator units may process partial matrix output data in parallel. The partial matrix output data may be propagated in a forward or backward direction. The tensor computation dataflow accelerator may perform a partial matrix transposition.Type: ApplicationFiled: April 18, 2019Publication date: June 11, 2020Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG, Dimin NIU
-
Patent number: 10678704Abstract: A method of retrieving data stored in a memory associated with a dedupe module is provided. The method includes: identifying a logical address of the data; identifying a physical line ID of the data in accordance with the logical address by looking up at least a portion of the logical address in a translation table; locating a respective physical line, the respective physical line corresponding to the PLID; and retrieving the data from the respective physical line, the retrieving including copying a respective hash cylinder to the read cache, the respective hash cylinder including: a respective hash bucket, the respective hash bucket including the respective physical line; and a respective reference counter bucket, the respective reference counter bucket including a respective reference counter associated with the respective physical line.Type: GrantFiled: March 31, 2017Date of Patent: June 9, 2020Assignee: Samsung Electronics Co., Ltd.Inventors: Dongyan Jiang, Changhui Lin, Krishna Malladi, Jongmin Gim, Hongzhong Zheng
-
Patent number: 10628233Abstract: A method for migrating a workload includes: receiving workloads generated from a plurality of applications running in a plurality of server nodes of a rack system; monitoring latency requirements for the workloads and detecting a violation of the latency requirement for a workload; collecting system utilization information of the rack system; calculating rewards for migrating the workload to other server nodes in the rack system; determining a target server node among the plurality of server nodes that maximizes the reward; and performing migration of the workload to the target server node.Type: GrantFiled: March 23, 2017Date of Patent: April 21, 2020Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Qiumin Xu, Krishna Malladi, Manu Awasthi
-
Patent number: 10528284Abstract: A dedupe module is provided. The dedupe module includes: a host interface; a dedupe engine to receive a data request from a host system via the host interface; a memory controller; a plurality of memory modules, each memory module being coupled to the memory controller; and a read cache for caching data from the memory controller for use by the dedupe engine.Type: GrantFiled: April 26, 2017Date of Patent: January 7, 2020Assignee: Samsung Electronics Co., Ltd.Inventors: Dongyan Jiang, Changhui Lin, Krishna Malladi, Jongmin Gim, Hongzhong Zheng
-
Publication number: 20200004652Abstract: According to one general aspect, an apparatus may include a plurality of stacked integrated circuit dies that include a memory cell die and a logic die. The memory cell die may be configured to store data at a memory address. The logic die may include an interface to the stacked integrated circuit dies and configured to communicate memory accesses between the memory cell die and at least one external device. The logic die may include a reliability circuit configured to ameliorate data errors within the memory cell die. The reliability circuit may include a spare memory configured to store data, and an address table configured to map a memory address associated with an error to the spare memory. The reliability circuit may be configured to determine if the memory access is associated with an error, and if so completing the memory access with the spare memory.Type: ApplicationFiled: October 2, 2018Publication date: January 2, 2020Inventors: Dimin NIU, Krishna MALLADI, Hongzhong ZHENG
-
Patent number: 10372606Abstract: A memory device includes a memory interface to a host computer and a memory overprovisioning logic configured to provide a virtual memory capacity to a host operating system (OS). A kernel driver module of the host OS is configured to manage the virtual memory capacity of the memory device provided by the memory overprovisioning logic of the memory device and provide a fast swap of anonymous pages to a frontswap space and file pages to a cleancache space of the memory device based on the virtual memory capacity of the memory device.Type: GrantFiled: September 30, 2016Date of Patent: August 6, 2019Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Krishna Malladi, Jongmin Gim, Hongzhong Zheng
-
Publication number: 20190214365Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.Type: ApplicationFiled: March 2, 2018Publication date: July 11, 2019Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG
-
Patent number: 10268413Abstract: A memory module includes a host interface configured to provide an interface to a host computer; one or more memory devices; a deduplication engine configured to provide a virtual memory capacity of the memory module that is larger than a physical size of the one or more memory devices; a memory controller for controlling access to the one or more memory devices; a volatile memory comprising a hash table, an overflow memory region, and a credit unit, wherein the overflow memory region stores user data when a hash collision occurs or the hash table is full, and wherein the credit unit stores an address of an invalidated entry in the overflow memory region; and a control logic is configured to control the overflow memory region and the credit unit and generate a warning indicating a status of the overflow memory region and the credit unit.Type: GrantFiled: March 29, 2017Date of Patent: April 23, 2019Assignee: Samsung Electronics Co., Ltd.Inventors: Dongyan Jiang, Changhui Lin, Krishna Malladi, Jongmin Gim, Hongzhong Zheng
-
Patent number: 10242728Abstract: A dynamic random access memory (DRAM) processing unit (DPU) may include at least one computing cell array having a plurality of DRAM-based computing cells arranged in an array having at least one column in which the at least one column may include at least three rows of DRAM-based computing cells configured to provide a logic function that operates on a first and a second row of the at least three rows and configured to store a result of the logic function in a third row of the at least three rows; and a controller that may be coupled to the at least one computing cell array to configure the at least one computing cell array to perform a DPU operation.Type: GrantFiled: February 6, 2017Date of Patent: March 26, 2019Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Shaungchen Li, Dimin Niu, Krishna Malladi, Hongzhong Zheng
-
Patent number: 10180808Abstract: A system includes a library, a compiler, a driver and at least one dynamic random access memory (DRAM) processing unit (DPU). The library may determine at least one DPU operation corresponding to a received command. The compiler may form at least one DPU instruction for the DPU operation. The driver may send the at least one DPU instruction to at least one DPU. The DPU may include at least one computing cell array that includes a plurality of DRAM-based computing cells arranged in an array having at least one column in which the at least one column may include at least three rows of DRAM-based computing cells configured to provide a logic function that operates on a first row and a second row of the at least three rows and configured to store a result of the logic function in a third row of the at least three rows.Type: GrantFiled: February 6, 2017Date of Patent: January 15, 2019Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Shaungchen Li, Dimin Niu, Krishna Malladi, Hongzhong Zheng
-
Patent number: 10162554Abstract: A memory module has a logic including a programming register, a deduplication ratio control logic, and a deduplication engine. The programming register stores a maximum deduplication ratio of the memory module. The control logic is configured to control a deduplication ratio of the memory module according to the maximum deduplication ratio. The deduplication ratio is programmable by the host computer.Type: GrantFiled: October 4, 2016Date of Patent: December 25, 2018Assignee: Samsung Electronics Co., Ltd.Inventors: Hongzhong Zheng, Krishna Malladi, Dimin Niu