Patents by Inventor Krishna Malladi

Krishna Malladi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Rack-level scheduling for reducing the long tail latency using high performance SSDs

Patent number: 11507435

Abstract: A method for migrating a workload includes: receiving workloads generated from a plurality of applications running in a plurality of server nodes of a rack system; monitoring latency requirements for the workloads and detecting a violation of the latency requirement for a workload; collecting system utilization information of the rack system; calculating rewards for migrating the workload to other server nodes in the rack system; determining a target server node among the plurality of server nodes that maximizes the reward; and performing migration of the workload to the target server node.

Type: Grant

Filed: March 24, 2020

Date of Patent: November 22, 2022

Assignee: Samsung Electronics Co., Ltd.

Inventors: Qiumin Xu, Krishna Malladi, Manu Awasthi
HBM SILICON PHOTONIC TSV ARCHITECTURE FOR LOOKUP COMPUTING AI ACCELERATOR

Publication number: 20220367412

Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.

Type: Application

Filed: July 25, 2022

Publication date: November 17, 2022

Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG
HBM silicon photonic TSV architecture for lookup computing AI accelerator

Patent number: 11398453

Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.

Type: Grant

Filed: March 2, 2018

Date of Patent: July 26, 2022

Inventors: Peng Gu, Krishna Malladi, Hongzhong Zheng
HBM RAS CACHE ARCHITECTURE

Publication number: 20220035719

Abstract: According to one general aspect, an apparatus may include a plurality of stacked integrated circuit dies that include a memory cell die and a logic die. The memory cell die may be configured to store data at a memory address. The logic die may include an interface to the stacked integrated circuit dies and configured to communicate memory accesses between the memory cell die and at least one external device. The logic die may include a reliability circuit configured to ameliorate data errors within the memory cell die. The reliability circuit may include a spare memory configured to store data, and an address table configured to map a memory address associated with an error to the spare memory. The reliability circuit may be configured to determine if the memory access is associated with an error, and if so completing the memory access with the spare memory.

Type: Application

Filed: October 12, 2021

Publication date: February 3, 2022

Inventors: Dimin NIU, Krishna MALLADI, Hongzhong ZHENG
DATAFLOW ACCELERATOR ARCHITECTURE FOR GENERAL MATRIX-MATRIX MULTIPLICATION AND TENSOR COMPUTATION IN DEEP LEARNING

Publication number: 20210374210

Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.

Type: Application

Filed: July 13, 2021

Publication date: December 2, 2021

Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG, Dimin NIU
HBM RAS cache architecture

Patent number: 11151006

Abstract: According to one general aspect, an apparatus may include a plurality of stacked integrated circuit dies that include a memory cell die and a logic die. The memory cell die may be configured to store data at a memory address. The logic die may include an interface to the stacked integrated circuit dies and configured to communicate memory accesses between the memory cell die and at least one external device. The logic die may include a reliability circuit configured to ameliorate data errors within the memory cell die. The reliability circuit may include a spare memory configured to store data, and an address table configured to map a memory address associated with an error to the spare memory. The reliability circuit may be configured to determine if the memory access is associated with an error, and if so completing the memory access with the spare memory.

Type: Grant

Filed: October 2, 2018

Date of Patent: October 19, 2021

Inventors: Dimin Niu, Krishna Malladi, Hongzhong Zheng
Dataflow accelerator architecture for general matrix-matrix multiplication and tensor computation in deep learning

Patent number: 11100193

Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.

Type: Grant

Filed: April 18, 2019

Date of Patent: August 24, 2021

Inventors: Peng Gu, Krishna Malladi, Hongzhong Zheng, Dimin Niu
RACK-LEVEL SCHEDULING FOR REDUCING THE LONG TAIL LATENCY USING HIGH PERFORMANCE SSDS

Publication number: 20200225999

Abstract: A method for migrating a workload includes: receiving workloads generated from a plurality of applications running in a plurality of server nodes of a rack system; monitoring latency requirements for the workloads and detecting a violation of the latency requirement for a workload; collecting system utilization information of the rack system; calculating rewards for migrating the workload to other server nodes in the rack system; determining a target server node among the plurality of server nodes that maximizes the reward; and performing migration of the workload to the target server node.

Type: Application

Filed: March 24, 2020

Publication date: July 16, 2020

Inventors: Qiumin Xu, Krishna Malladi, Manu Awasthi
DATAFLOW ACCELERATOR ARCHITECTURE FOR GENERAL MATRIX-MATRIX MULTIPLICATION AND TENSOR COMPUTATION IN DEEP LEARNING

Publication number: 20200184001

Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.

Type: Application

Filed: April 18, 2019

Publication date: June 11, 2020

Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG, Dimin NIU
DATAFLOW ACCELERATOR ARCHITECTURE FOR GENERAL MATRIX-MATRIX MULTIPLICATION AND TENSOR COMPUTATION IN DEEP LEARNING

Publication number: 20200183837

Abstract: A tensor computation dataflow accelerator semiconductor circuit is disclosed. The data flow accelerator includes a DRAM bank and a peripheral array of multiply-and-add units disposed adjacent to the DRAM bank. The peripheral array of multiply-and-add units are configured to form a pipelined dataflow chain in which partial output data from one multiply-and-add unit from among the array of multiply-and-add units is fed into another multiply-and-add unit from among the array of multiply-and-add units for data accumulation. Near-DRAM-processing dataflow (NDP-DF) accelerator unit dies may be stacked atop a base die. The base die may be disposed on a passive silicon interposer adjacent to a processor or a controller. The NDP-DF accelerator units may process partial matrix output data in parallel. The partial matrix output data may be propagated in a forward or backward direction. The tensor computation dataflow accelerator may perform a partial matrix transposition.

Type: Application

Filed: April 18, 2019

Publication date: June 11, 2020

Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG, Dimin NIU
Method and apparatus for enabling larger memory capacity than physical memory size

Patent number: 10678704

Abstract: A method of retrieving data stored in a memory associated with a dedupe module is provided. The method includes: identifying a logical address of the data; identifying a physical line ID of the data in accordance with the logical address by looking up at least a portion of the logical address in a translation table; locating a respective physical line, the respective physical line corresponding to the PLID; and retrieving the data from the respective physical line, the retrieving including copying a respective hash cylinder to the read cache, the respective hash cylinder including: a respective hash bucket, the respective hash bucket including the respective physical line; and a respective reference counter bucket, the respective reference counter bucket including a respective reference counter associated with the respective physical line.

Type: Grant

Filed: March 31, 2017

Date of Patent: June 9, 2020

Assignee: Samsung Electronics Co., Ltd.

Inventors: Dongyan Jiang, Changhui Lin, Krishna Malladi, Jongmin Gim, Hongzhong Zheng
Rack-level scheduling for reducing the long tail latency using high performance SSDS

Patent number: 10628233

Abstract: A method for migrating a workload includes: receiving workloads generated from a plurality of applications running in a plurality of server nodes of a rack system; monitoring latency requirements for the workloads and detecting a violation of the latency requirement for a workload; collecting system utilization information of the rack system; calculating rewards for migrating the workload to other server nodes in the rack system; determining a target server node among the plurality of server nodes that maximizes the reward; and performing migration of the workload to the target server node.

Type: Grant

Filed: March 23, 2017

Date of Patent: April 21, 2020

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Qiumin Xu, Krishna Malladi, Manu Awasthi
Method and apparatus for enabling larger memory capacity than physical memory size

Patent number: 10528284

Abstract: A dedupe module is provided. The dedupe module includes: a host interface; a dedupe engine to receive a data request from a host system via the host interface; a memory controller; a plurality of memory modules, each memory module being coupled to the memory controller; and a read cache for caching data from the memory controller for use by the dedupe engine.

Type: Grant

Filed: April 26, 2017

Date of Patent: January 7, 2020

Assignee: Samsung Electronics Co., Ltd.

Inventors: Dongyan Jiang, Changhui Lin, Krishna Malladi, Jongmin Gim, Hongzhong Zheng
HBM RAS CACHE ARCHITECTURE

Publication number: 20200004652

Abstract: According to one general aspect, an apparatus may include a plurality of stacked integrated circuit dies that include a memory cell die and a logic die. The memory cell die may be configured to store data at a memory address. The logic die may include an interface to the stacked integrated circuit dies and configured to communicate memory accesses between the memory cell die and at least one external device. The logic die may include a reliability circuit configured to ameliorate data errors within the memory cell die. The reliability circuit may include a spare memory configured to store data, and an address table configured to map a memory address associated with an error to the spare memory. The reliability circuit may be configured to determine if the memory access is associated with an error, and if so completing the memory access with the spare memory.

Type: Application

Filed: October 2, 2018

Publication date: January 2, 2020

Inventors: Dimin NIU, Krishna MALLADI, Hongzhong ZHENG
System and method for integrating overprovisioned memory devices

Patent number: 10372606

Abstract: A memory device includes a memory interface to a host computer and a memory overprovisioning logic configured to provide a virtual memory capacity to a host operating system (OS). A kernel driver module of the host OS is configured to manage the virtual memory capacity of the memory device provided by the memory overprovisioning logic of the memory device and provide a fast swap of anonymous pages to a frontswap space and file pages to a cleancache space of the memory device based on the virtual memory capacity of the memory device.

Type: Grant

Filed: September 30, 2016

Date of Patent: August 6, 2019

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Krishna Malladi, Jongmin Gim, Hongzhong Zheng
HBM SILICON PHOTONIC TSV ARCHITECTURE FOR LOOKUP COMPUTING AI ACCELERATOR

Publication number: 20190214365

Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.

Type: Application

Filed: March 2, 2018

Publication date: July 11, 2019

Inventors: Peng GU, Krishna MALLADI, Hongzhong ZHENG
Overflow region memory management

Patent number: 10268413

Abstract: A memory module includes a host interface configured to provide an interface to a host computer; one or more memory devices; a deduplication engine configured to provide a virtual memory capacity of the memory module that is larger than a physical size of the one or more memory devices; a memory controller for controlling access to the one or more memory devices; a volatile memory comprising a hash table, an overflow memory region, and a credit unit, wherein the overflow memory region stores user data when a hash collision occurs or the hash table is full, and wherein the credit unit stores an address of an invalidated entry in the overflow memory region; and a control logic is configured to control the overflow memory region and the credit unit and generate a warning indicating a status of the overflow memory region and the credit unit.

Type: Grant

Filed: March 29, 2017

Date of Patent: April 23, 2019

Assignee: Samsung Electronics Co., Ltd.

Inventors: Dongyan Jiang, Changhui Lin, Krishna Malladi, Jongmin Gim, Hongzhong Zheng
DPU architecture

Patent number: 10242728

Abstract: A dynamic random access memory (DRAM) processing unit (DPU) may include at least one computing cell array having a plurality of DRAM-based computing cells arranged in an array having at least one column in which the at least one column may include at least three rows of DRAM-based computing cells configured to provide a logic function that operates on a first and a second row of the at least three rows and configured to store a result of the logic function in a third row of the at least three rows; and a controller that may be coupled to the at least one computing cell array to configure the at least one computing cell array to perform a DPU operation.

Type: Grant

Filed: February 6, 2017

Date of Patent: March 26, 2019

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Shaungchen Li, Dimin Niu, Krishna Malladi, Hongzhong Zheng
Software stack and programming for DPU operations

Patent number: 10180808

Abstract: A system includes a library, a compiler, a driver and at least one dynamic random access memory (DRAM) processing unit (DPU). The library may determine at least one DPU operation corresponding to a received command. The compiler may form at least one DPU instruction for the DPU operation. The driver may send the at least one DPU instruction to at least one DPU. The DPU may include at least one computing cell array that includes a plurality of DRAM-based computing cells arranged in an array having at least one column in which the at least one column may include at least three rows of DRAM-based computing cells configured to provide a logic function that operates on a first row and a second row of the at least three rows and configured to store a result of the logic function in a third row of the at least three rows.

Type: Grant

Filed: February 6, 2017

Date of Patent: January 15, 2019

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Shaungchen Li, Dimin Niu, Krishna Malladi, Hongzhong Zheng
System and method for controlling a programmable deduplication ratio for a memory system

Patent number: 10162554

Abstract: A memory module has a logic including a programming register, a deduplication ratio control logic, and a deduplication engine. The programming register stores a maximum deduplication ratio of the memory module. The control logic is configured to control a deduplication ratio of the memory module according to the maximum deduplication ratio. The deduplication ratio is programmable by the host computer.

Type: Grant

Filed: October 4, 2016

Date of Patent: December 25, 2018

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hongzhong Zheng, Krishna Malladi, Dimin Niu

1 2 3 next