Patents by Inventor Krishna T. MALLADI

Krishna T. MALLADI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11262980
    Abstract: A computing accelerator using a lookup table. The accelerator may accelerate floating point multiplications by retrieving the fraction portion of the product of two floating-point operands from a lookup table, or by retrieving the product of two floating-point operands of two floating-point operands from a lookup table, or it may retrieve dot products of floating point vectors from a lookup table. The accelerator may be implemented in a three-dimensional memory assembly. It may use approximation, the symmetry of a multiplication lookup table, and zero-skipping to improve performance.
    Type: Grant
    Filed: July 1, 2020
    Date of Patent: March 1, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Krishna T. Malladi, Peng Gu, Hongzhong Zheng, Robert Brennan
  • Patent number: 11226816
    Abstract: According to one embodiment, a memory module includes: a memory die including a dynamic random access memory (DRAM) banks, each including: an array of DRAM cells arranged in pages; a row buffer to store values of one of the pages; an input/output (IO) module; and an in-memory compute (IMC) module including: an arithmetic logic unit (ALU) to receive operands from the row buffer or the IO module and to compute an output based on the operands and one of a plurality of ALU operations; and a result register to store the output of the ALU; and a controller to: receive, from a host processor, operands and an instruction; determine, based on the instruction, a data layout; supply the operands to the DRAM banks in accordance with the data layout; and control an IMC module to perform one of the ALU operations on the operands in accordance with the instruction.
    Type: Grant
    Filed: April 27, 2020
    Date of Patent: January 18, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Krishna T. Malladi, Wenqin Huangfu
  • Patent number: 11226914
    Abstract: An apparatus may include a heterogeneous computing environment that may be controlled, at least in part, by a task scheduler in which the heterogeneous computing environment may include a processing unit having fixed logical circuits configured to execute instructions; a reprogrammable processing unit having reprogrammable logical circuits configured to execute instructions that include instructions to control processing-in-memory functionality; and a stack of high-bandwidth memory dies in which each may be configured to store data and to provide processing-in-memory functionality controllable by the reprogrammable processing unit such that the reprogrammable processing unit is at least partially stacked with the high-bandwidth memory dies. The task scheduler may be configured to schedule computational tasks between the processing unit, and the reprogrammable processing unit.
    Type: Grant
    Filed: October 7, 2019
    Date of Patent: January 18, 2022
    Inventors: Krishna T. Malladi, Hongzhong Zheng
  • Publication number: 20210406202
    Abstract: A high bandwidth memory (HBM) system includes a first HBM+ card. The first HBM+ card includes a plurality of HBM+ cubes. Each HBM+ cube has a logic die and a memory die. The first HBM+ card also includes a HBM+ card controller coupled to each of the plurality of HBM+ cubes and configured to interface with a host, a pin connection configured to connect to the host, and a fabric connection configured to connect to at least one HBM+ card.
    Type: Application
    Filed: September 8, 2021
    Publication date: December 30, 2021
    Inventors: Krishna T. Malladi, Hongzhong Zheng, Dimin Niu, Peng Gu
  • Publication number: 20210405877
    Abstract: A storage device and method of controlling a storage device are disclosed. The storage device includes a host, a logic die, and a high bandwidth memory stack including a memory die. A computation lookup table is stored on a memory array of the memory die. The host sends a command to perform an operation utilizing a kernel and a plurality of input feature maps, includes finding the product of a weight of the kernel and values of multiple input feature maps. The computation lookup table includes a row corresponding to a weight of the kernel, and a column corresponding to a value of the input feature maps. A result value stored at a position corresponding to a row and a column is the product of the weight corresponding to the row and the value corresponding to the column.
    Type: Application
    Filed: September 13, 2021
    Publication date: December 30, 2021
    Inventors: Peng Gu, Krishna T. Malladi, Hongzhong Zheng
  • Publication number: 20210373951
    Abstract: Provided are systems, methods, and apparatuses for resource allocation. The method can include: determining a first value of a parameter associated with at least one first device in a first cluster; determining a threshold based on the first value of the parameter; receiving a request for processing a workload at the first device; determining that a second value of the parameter associated with at least one second device in a second cluster meets the threshold; and responsive to meeting the threshold, routing at least a portion of the workload to the second device.
    Type: Application
    Filed: December 28, 2020
    Publication date: December 2, 2021
    Inventors: Krishna T. Malladi, Andrew Chang, Ehsan Najafabadi, Yasser A. Zaghloul
  • Publication number: 20210374056
    Abstract: Provided are systems, methods, and apparatuses for providing a storage resource. The method can include: operating a first controller coupled to a network interface in accordance with a cache coherent protocol; performing at least one operation on data associated with a cache using a second controller coupled to the first controller and coupled to a first memory; and storing the data on a second memory coupled to one of the first controller or the second controller.
    Type: Application
    Filed: April 30, 2021
    Publication date: December 2, 2021
    Inventors: Krishna T. Malladi, Andrew Chang, Ehsan Najafabadi
  • Patent number: 11188327
    Abstract: According to some example embodiments of the present disclosure, in a method for a memory lookup mechanism in a high-bandwidth memory system, the method includes: using a memory die to conduct a multiplication operation using a lookup table (LUT) methodology by accessing a LUT, which includes floating point operation results, stored on the memory die; sending, by the memory die, a result of the multiplication operation to a logic die including a processor and a buffer; and conducting, by the logic die, a matrix multiplication operation using computation units.
    Type: Grant
    Filed: March 18, 2020
    Date of Patent: November 30, 2021
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Peng Gu, Krishna T. Malladi, Hongzhong Zheng
  • Publication number: 20210349837
    Abstract: A memory module may include one or more memory devices, and a near-memory computing module coupled to the one or more memory devices, the near-memory computing module including one or more processing elements configured to process data from the one or more memory devices, and a memory controller configured to coordinate access of the one or more memory devices from a host and the one or more processing elements. A method of processing a dataset may include distributing a first portion of the dataset to a first memory module, distributing a second portion of the dataset to a second memory module, constructing a first local data structure at the first memory module based on the first portion of the dataset, constructing a second local data structure at the second memory module based on the second portion of the dataset, and merging the first and second local data structures.
    Type: Application
    Filed: March 26, 2021
    Publication date: November 11, 2021
    Inventors: Wenqin HUANGFU, Krishna T. MALLADI, Dongyan JIANG
  • Publication number: 20210311739
    Abstract: A system for computing. In some embodiments, the system includes: a memory, the memory including one or more function-in-memory circuits; and a cache coherent protocol interface circuit having a first interface and a second interface. A function-in-memory circuit of the one or more function-in-memory circuits may be configured to perform an operation on operands including a first operand retrieved from the memory, to form a result. The first interface of the cache coherent protocol interface circuit may be connected to the memory, and the second interface of the cache coherent protocol interface circuit may be configured as a cache coherent protocol interface on a bus interface.
    Type: Application
    Filed: June 26, 2020
    Publication date: October 7, 2021
    Inventors: Krishna T. Malladi, Andrew Chang
  • Patent number: 11138135
    Abstract: A high bandwidth memory (HBM) system includes a first HBM+ card. The first HBM+ card includes a plurality of HBM+ cubes. Each HBM+ cube has a logic die and a memory die. The first HBM+ card also includes a HBM+ card controller coupled to each of the plurality of HBM+ cubes and configured to interface with a host, a pin connection configured to connect to the host, and a fabric connection configured to connect to at least one HBM+ card.
    Type: Grant
    Filed: November 16, 2018
    Date of Patent: October 5, 2021
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Krishna T. Malladi, Hongzhong Zheng, Dimin Niu, Peng Gu
  • Publication number: 20210294711
    Abstract: A method for computing. In some embodiments, the method includes: calculating an advantage score of a first computing task, the advantage score being a measure of an extent to which a plurality of function in memory circuits is capable of executing the first computing task more efficiently by than one or more extra-memory processing circuits, the first computing task including instructions and data; in response to determining that the advantage score of the first computing task is less than a first threshold, executing the first computing task by the one or more extra-memory processing circuits; and in response to determining that the first computing task is at least equal to the first threshold: compiling the instructions for execution by the function in memory circuits; formatting the data for the function in memory circuits; and executing the first computing task, by the function in memory circuits.
    Type: Application
    Filed: June 26, 2020
    Publication date: September 23, 2021
    Inventors: Krishna T. Malladi, Wenqin Huangfu
  • Patent number: 11119677
    Abstract: A storage device and method of controlling a storage device are disclosed. The storage device includes a host, a logic die, and a high bandwidth memory stack including a memory die. A computation lookup table is stored on a memory array of the memory die. The host sends a command to perform an operation utilizing a kernel and a plurality of input feature maps, includes finding the product of a weight of the kernel and values of multiple input feature maps. The computation lookup table includes a row corresponding to a weight of the kernel, and a column corresponding to a value of the input feature maps. A result value stored at a position corresponding to a row and a column is the product of the weight corresponding to the row and the value corresponding to the column.
    Type: Grant
    Filed: March 8, 2018
    Date of Patent: September 14, 2021
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Peng Gu, Krishna T. Malladi, Hongzhong Zheng
  • Publication number: 20210278992
    Abstract: A method for in-memory computing. In some embodiments, the method includes: executing, by a first function-in-memory circuit, a first instruction, to produce, as a result, a first value, wherein a first computing task includes a second computing task and a third computing task, the second computing task including the first instruction; storing, by the first function-in-memory circuit, the first value in a first buffer; reading, by a second function-in- memory circuit, the first value from the first buffer; and executing, by a second function-in- memory circuit, a second instruction, the second instruction using the first value as an argument, the third computing task including the second instruction, wherein: the storing, by the first function-in-memory circuit, of the first value in the first buffer includes directly storing the first value in the first buffer.
    Type: Application
    Filed: June 26, 2020
    Publication date: September 9, 2021
    Inventor: Krishna T. Malladi
  • Publication number: 20210271594
    Abstract: A pseudo main memory system. The system includes a memory adapter circuit for performing memory augmentation using compression, deduplication, and/or error correction. The memory adapter circuit is connected to a memory, and employs the memory augmentation methods to increase the effective storage capacity of the memory. The memory adapter circuit is also connected to a memory bus and implements an NVDIMM-F or modified NVDIMM-F interface for connecting to the memory bus.
    Type: Application
    Filed: May 17, 2021
    Publication date: September 2, 2021
    Inventors: Krishna T. Malladi, Jongmin Gim, Hongzhong Zheng
  • Publication number: 20210247978
    Abstract: According to one embodiment, a memory module includes: a memory die including a dynamic random access memory (DRAM) banks, each including: an array of DRAM cells arranged in pages; a row buffer to store values of one of the pages; an input/output (IO) module; and an in-memory compute (IMC) module including: an arithmetic logic unit (ALU) to receive operands from the row buffer or the IO module and to compute an output based on the operands and one of a plurality of ALU operations; and a result register to store the output of the ALU; and a controller to: receive, from a host processor, operands and an instruction; determine, based on the instruction, a data layout; supply the operands to the DRAM banks in accordance with the data layout; and control an IMC module to perform one of the ALU operations on the operands in accordance with the instruction.
    Type: Application
    Filed: April 27, 2020
    Publication date: August 12, 2021
    Inventors: Krishna T. Malladi, Wenqin Huangfu
  • Patent number: 11030088
    Abstract: A pseudo main memory system. The system includes a memory adapter circuit for performing memory augmentation using compression, deduplication, and/or error correction. The memory adapter circuit is connected to a memory, and employs the memory augmentation methods to increase the effective storage capacity of the memory. The memory adapter circuit is also connected to a memory bus and implements an NVDIMM-F or modified NVDIMM-F interface for connecting to the memory bus.
    Type: Grant
    Filed: October 11, 2019
    Date of Patent: June 8, 2021
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Krishna T. Malladi, Jongmin Gim, Hongzhong Zheng
  • Publication number: 20210141735
    Abstract: A high bandwidth memory system. In some embodiments, the system includes: a memory stack having a plurality of memory dies and eight 128-bit channels; and a logic die, the memory dies being stacked on, and connected to, the logic die; wherein the logic die may be configured to operate a first channel of the 128-bit channels in: a first mode, in which a first 64 bits operate in pseudo-channel mode, and a second 64 bits operate as two 32-bit fine-grain channels, or a second mode, in which the first 64 bits operate as two 32-bit fine-grain channels, and the second 64 bits operate as two 32-bit fine-grain channels.
    Type: Application
    Filed: January 22, 2021
    Publication date: May 13, 2021
    Inventors: Krishna T. MALLADI, Mu-Tien CHANG, Dimin NIU, Hongzhong ZHENG
  • Publication number: 20210117103
    Abstract: A high-bandwidth memory (HBM) system includes an HBM device and a logic circuit. The logic circuit receives a first command from the host device and converts the received first command to a processing-in-memory (PIM) command that is sent to the HBM device through the second interface. A time between when the first command is received from the host device and when the HBM system is ready to receive another command from the host device is deterministic. The logic circuit further receives a fourth command and a fifth command from the host device. The fifth command requests time-estimate information relating to a time between when the fifth command is received and when the HBM system is ready to receive another command from the host device. The time-estimate information includes a deterministic period of time and an estimated period of time for a non-deterministic period of time.
    Type: Application
    Filed: December 24, 2020
    Publication date: April 22, 2021
    Inventors: Krishna T. MALLADI, Hongzhong ZHENG, Robert BRENNAN
  • Publication number: 20210096999
    Abstract: A method of processing in-memory commands in a high-bandwidth memory (HBM) system includes sending a function-in-HBM instruction to the HBM by a HBM memory controller of a GPU. A logic component of the HBM receives the FIM instruction and coordinates the instructions execution using the controller, an ALU, and a SRAM located on the logic component.
    Type: Application
    Filed: December 14, 2020
    Publication date: April 1, 2021
    Inventors: Mu-Tien Chang, Krishna T. Malladi, Dimin Niu, Hongzhong Zheng