Patents by Inventor Krishna T. MALLADI

Krishna T. MALLADI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12248402
    Abstract: A high bandwidth memory system. In some embodiments, the system includes: a memory stack having a plurality of memory dies and eight 128-bit channels; and a logic die, the memory dies being stacked on, and connected to, the logic die; wherein the logic die may be configured to operate a first channel of the 128-bit channels in: a first mode, in which a first 64 bits operate in pseudo-channel mode, and a second 64 bits operate as two 32-bit fine-grain channels, or a second mode, in which the first 64 bits operate as two 32-bit fine-grain channels, and the second 64 bits operate as two 32-bit fine-grain channels.
    Type: Grant
    Filed: November 28, 2022
    Date of Patent: March 11, 2025
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Krishna T. Malladi, Mu-Tien Chang, Dimin Niu, Hongzhong Zheng
  • Patent number: 12236239
    Abstract: According to one embodiment, a memory module includes: a memory die including a dynamic random access memory (DRAM) banks, each including: an array of DRAM cells arranged in pages; a row buffer to store values of one of the pages; an input/output (IO) module; and an in-memory compute (IMC) module including: an arithmetic logic unit (ALU) to receive operands from the row buffer or the IO module and to compute an output based on the operands and one of a plurality of ALU operations; and a result register to store the output of the ALU; and a controller to: receive, from a host processor, operands and an instruction; determine, based on the instruction, a data layout; supply the operands to the DRAM banks in accordance with the data layout; and control an IMC module to perform one of the ALU operations on the operands in accordance with the instruction.
    Type: Grant
    Filed: September 14, 2023
    Date of Patent: February 25, 2025
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Krishna T. Malladi, Wenqin Huangfu
  • Publication number: 20250028642
    Abstract: A coherent memory system. In some embodiments, the coherent memory system includes a first memory device. The first memory device may include a cache coherent controller; a volatile memory controller; a volatile memory; a nonvolatile memory controller; and a nonvolatile memory. The first memory device may be configured to receive a quality of service requirement and to selectively enable a first feature in response to the quality of service requirement.
    Type: Application
    Filed: October 3, 2024
    Publication date: January 23, 2025
    Inventors: Krishna T. MALLADI, Andrew Z. CHANG, Ehsan NAJAFABADI
  • Publication number: 20250004658
    Abstract: A storage device and method of controlling a storage device are disclosed. The storage device includes a host, a logic die, and a high bandwidth memory stack including a memory die. A computation lookup table is stored on a memory array of the memory die. The host sends a command to perform an operation utilizing a kernel and a plurality of input feature maps, includes finding the product of a weight of the kernel and values of multiple input feature maps. The computation lookup table includes a row corresponding to a weight of the kernel, and a column corresponding to a value of the input feature maps. A result value stored at a position corresponding to a row and a column is the product of the weight corresponding to the row and the value corresponding to the column.
    Type: Application
    Filed: July 3, 2024
    Publication date: January 2, 2025
    Inventors: Peng Gu, Krishna T. Malladi, Hongzhong Zheng
  • Patent number: 12136138
    Abstract: A system and method for training a neural network. In some embodiments, the system includes: a graphics processing unit cluster; and a computational storage cluster connected to the graphics processing unit cluster by a cache-coherent system interconnect. The graphics processing unit cluster may include one or more graphics processing units. The computational storage cluster may include one or more computational storage devices. A first computational storage device of the one or more computational storage devices may be configured to (i) store an embedding table, (ii) receive an index vector including a first index and a second index; and (iii) calculate an embedded vector based on: a first row of the embedding table, corresponding to the first index, and a second row of the embedding table, corresponding to the second index.
    Type: Grant
    Filed: February 11, 2022
    Date of Patent: November 5, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Shiyu Li, Krishna T. Malladi, Andrew Chang, Yang Seok Ki
  • Publication number: 20240329872
    Abstract: A method for in-memory computing. In some embodiments, the method includes: executing, by a first function-in-memory circuit, a first instruction, to produce, as a result, a first value, wherein a first computing task includes a second computing task and a third computing task, the second computing task including the first instruction; storing, by the first function-in-memory circuit, the first value in a first buffer; reading, by a second function-in-memory circuit, the first value from the first buffer; and executing, by a second function-in-memory circuit, a second instruction, the second instruction using the first value as an argument, the third computing task including the second instruction, wherein: the storing, by the first function-in-memory circuit, of the first value in the first buffer includes directly storing the first value in the first buffer.
    Type: Application
    Filed: June 14, 2024
    Publication date: October 3, 2024
    Inventor: Krishna T. Malladi
  • Patent number: 12056388
    Abstract: A method for in-memory computing. In some embodiments, the method includes: executing, by a first function-in-memory circuit, a first instruction, to produce, as a result, a first value, wherein a first computing task includes a second computing task and a third computing task, the second computing task including the first instruction; storing, by the first function-in-memory circuit, the first value in a first buffer; reading, by a second function-in-memory circuit, the first value from the first buffer; and executing, by a second function-in-memory circuit, a second instruction, the second instruction using the first value as an argument, the third computing task including the second instruction, wherein: the storing, by the first function-in-memory circuit, of the first value in the first buffer includes directly storing the first value in the first buffer.
    Type: Grant
    Filed: August 29, 2022
    Date of Patent: August 6, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Krishna T. Malladi
  • Patent number: 12056379
    Abstract: A storage device and method of controlling a storage device are disclosed. The storage device includes a host, a logic die, and a high bandwidth memory stack including a memory die. A computation lookup table is stored on a memory array of the memory die. The host sends a command to perform an operation utilizing a kernel and a plurality of input feature maps, includes finding the product of a weight of the kernel and values of multiple input feature maps. The computation lookup table includes a row corresponding to a weight of the kernel, and a column corresponding to a value of the input feature maps. A result value stored at a position corresponding to a row and a column is the product of the weight corresponding to the row and the value corresponding to the column.
    Type: Grant
    Filed: May 11, 2023
    Date of Patent: August 6, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Peng Gu, Krishna T. Malladi, Hongzhong Zheng
  • Patent number: 12032497
    Abstract: A high bandwidth memory (HBM) system includes a first HBM+ card. The first HBM+ card includes a plurality of HBM+ cubes. Each HBM+ cube has a logic die and a memory die. The first HBM+ card also includes a HBM+ card controller coupled to each of the plurality of HBM+ cubes and configured to interface with a host, a pin connection configured to connect to the host, and a fabric connection configured to connect to at least one HBM+ card.
    Type: Grant
    Filed: September 8, 2021
    Date of Patent: July 9, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Krishna T. Malladi, Hongzhong Zheng, Dimin Niu, Peng Gu
  • Publication number: 20240211149
    Abstract: A processor includes a plurality of memory units, each of the memory units including a plurality of memory cells, wherein each of the memory units is configurable to operate as memory, as a computation unit, or as a hybrid memory-computation unit.
    Type: Application
    Filed: March 2, 2024
    Publication date: June 27, 2024
    Inventors: Dimin Niu, Shuangchen Li, Bob Brennan, Krishna T. Malladi, Hongzhong Zheng
  • Publication number: 20240201909
    Abstract: A device may include an interconnect interface, a memory system including one or more first type memory devices to receive first data, one or more second type memory devices to receive second data, and an accelerator configured to perform an operation using the first data and the second data. The memory system may further include a cache configured to cache the second data for the one or more second type memory devices. A device may include an interconnect interface, a memory system coupled to the interconnect interface to receive data, an accelerator coupled to the memory system, and virtualization logic configured to partition one or more resources of the accelerator into one or more virtual accelerators, wherein a first one of the one or more virtual accelerators may be configured to perform a first operation on a first portion of the data.
    Type: Application
    Filed: February 26, 2024
    Publication date: June 20, 2024
    Inventors: Yang Seok KI, Krishna T. MALLADI, Rekha PITCHUMANI
  • Publication number: 20240193111
    Abstract: An apparatus may include a heterogeneous computing environment that may be controlled, at least in part, by a task scheduler in which the heterogeneous computing environment may include a processing unit having fixed logical circuits configured to execute instructions; a reprogrammable processing unit having reprogrammable logical circuits configured to execute instructions that include instructions to control processing-in-memory functionality; and a stack of high-bandwidth memory dies in which each may be configured to store data and to provide processing-in-memory functionality controllable by the reprogrammable processing unit such that the reprogrammable processing unit is at least partially stacked with the high-bandwidth memory dies. The task scheduler may be configured to schedule computational tasks between the processing unit, and the reprogrammable processing unit.
    Type: Application
    Filed: February 16, 2024
    Publication date: June 13, 2024
    Inventors: Krishna T. MALLADI, Hongzhong ZHENG
  • Patent number: 11995009
    Abstract: A high bandwidth memory (HBM) system includes a first HBM+ card. The first HBM+ card includes a plurality of HBM+ cubes. Each HBM+ cube has a logic die and a memory die. The first HBM+ card also includes a HBM+ card controller coupled to each of the plurality of HBM+ cubes and configured to interface with a host, a pin connection configured to connect to the host, and a fabric connection configured to connect to at least one HBM+ card.
    Type: Grant
    Filed: September 8, 2021
    Date of Patent: May 28, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Krishna T. Malladi, Hongzhong Zheng, Dimin Niu, Peng Gu
  • Patent number: 11947961
    Abstract: According to some example embodiments of the present disclosure, in a method for a memory lookup mechanism in a high-bandwidth memory system, the method includes: using a memory die to conduct a multiplication operation using a lookup table (LUT) methodology by accessing a LUT, which includes floating point operation results, stored on the memory die; sending, by the memory die, a result of the multiplication operation to a logic die including a processor and a buffer; and conducting, by the logic die, a matrix multiplication operation using computation units.
    Type: Grant
    Filed: November 30, 2022
    Date of Patent: April 2, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Peng Gu, Krishna T. Malladi, Hongzhong Zheng
  • Patent number: 11940922
    Abstract: A method of processing in-memory commands in a high-bandwidth memory (HBM) system includes sending a function-in-HBM instruction to the HBM by a HBM memory controller of a GPU. A logic component of the HBM receives the FIM instruction and coordinates the instructions execution using the controller, an ALU, and a SRAM located on the logic component.
    Type: Grant
    Filed: December 14, 2022
    Date of Patent: March 26, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Mu-Tien Chang, Krishna T. Malladi, Dimin Niu, Hongzhong Zheng
  • Patent number: 11934669
    Abstract: A processor includes a plurality of memory units, each of the memory units including a plurality of memory cells, wherein each of the memory units is configurable to operate as memory, as a computation unit, or as a hybrid memory-computation unit.
    Type: Grant
    Filed: July 29, 2020
    Date of Patent: March 19, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Dimin Niu, Shuangchen Li, Bob Brennan, Krishna T. Malladi, Hongzhong Zheng
  • Publication number: 20240086292
    Abstract: A method for computing. In some embodiments, the method includes: calculating an advantage score of a first computing task, the advantage score being a measure of an extent to which a plurality of function in memory circuits is capable of executing the first computing task more efficiently by than one or more extra-memory processing circuits, the first computing task including instructions and data; in response to determining that the advantage score of the first computing task is less than a first threshold, executing the first computing task by the one or more extra-memory processing circuits; and in response to determining that the first computing task is at least equal to the first threshold: compiling the instructions for execution by the function in memory circuits; formatting the data for the function in memory circuits; and executing the first computing task, by the function in memory circuits.
    Type: Application
    Filed: November 17, 2023
    Publication date: March 14, 2024
    Inventors: Krishna T. Malladi, Wenqin Huangfu
  • Patent number: 11921638
    Abstract: According to some embodiments of the present invention, there is provided a hybrid cache memory for a processing device having a host processor, the hybrid cache memory comprising: a high bandwidth memory (HBM) configured to store host data; a non-volatile memory (NVM) physically integrated with the HBM in a same package and configured to store a copy of the host data at the HBM; and a cache controller configured to be in bi-directional communication with the host processor, and to manage data transfer between the HBM and NVM and, in response to a command received from the host processor, to manage data transfer between the hybrid cache memory and the host processor.
    Type: Grant
    Filed: June 6, 2022
    Date of Patent: March 5, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Krishna T. Malladi, Hongzhong Zheng
  • Patent number: 11921656
    Abstract: An apparatus may include a heterogeneous computing environment that may be controlled, at least in part, by a task scheduler in which the heterogeneous computing environment may include a processing unit having fixed logical circuits configured to execute instructions; a reprogrammable processing unit having reprogrammable logical circuits configured to execute instructions that include instructions to control processing-in-memory functionality; and a stack of high-bandwidth memory dies in which each may be configured to store data and to provide processing-in-memory functionality controllable by the reprogrammable processing unit such that the reprogrammable processing unit is at least partially stacked with the high-bandwidth memory dies. The task scheduler may be configured to schedule computational tasks between the processing unit, and the reprogrammable processing unit.
    Type: Grant
    Filed: January 17, 2022
    Date of Patent: March 5, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Krishna T. Malladi, Hongzhong Zheng
  • Patent number: 11914903
    Abstract: A device may include an interconnect interface, a memory system including one or more first type memory devices to receive first data, one or more second type memory devices to receive second data, and an accelerator configured to perform an operation using the first data and the second data. The memory system may further include a cache configured to cache the second data for the one or more second type memory devices. A device may include an interconnect interface, a memory system coupled to the interconnect interface to receive data, an accelerator coupled to the memory system, and virtualization logic configured to partition one or more resources of the accelerator into one or more virtual accelerators, wherein a first one of the one or more virtual accelerators may be configured to perform a first operation on a first portion of the data.
    Type: Grant
    Filed: October 8, 2021
    Date of Patent: February 27, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Yang Seok Ki, Krishna T. Malladi, Rekha Pitchumani