Patents Assigned to d-MATRIX CORPORATION

Server system with AI accelerator apparatuses using in-memory compute chiplet devices for transformer workloads

Patent number: 12353985

Abstract: A server system with AI accelerator apparatuses using in-memory compute chiplet devices. The system includes a plurality of multiprocessors each having at least a first server central processing unit (CPU) and a second server CPU, both of which are coupled to a plurality of switch devices. Each switch device is coupled to a plurality of AI accelerator apparatuses. The apparatus includes one or more chiplets, each of which includes a plurality of tiles. Each tile includes a plurality of slices, a CPU, and a hardware dispatch device. Each slice can include a digital in-memory compute (DIMC) device configured to perform high throughput computations. In particular, the DIMC device can be configured to accelerate the computations of attention functions for transformer-based models (a.k.a. transformers) applied to machine learning applications. A single input multiple data (SIMD) device configured to further process the DIMC output and compute softmax functions for the attention functions.

Type: Grant

Filed: October 13, 2023

Date of Patent: July 8, 2025

Assignee: d-MATRIX CORPORATION

Inventors: Jayaprakash Balachandran, Akhil Arunkumar, Aayush Ankit, Nithesh Kurella, Sudeep Bhoja
Hardware and software co-designed system for efficient distributed control of execution on a compute accelerator

Patent number: 12299484

Abstract: A hardware and software co-designed dispatch engine (DE) apparatus. The DE apparatus can be configured to store a compute workload having groups of tasks in the form of a hierarchy of serial and/or concurrent queues in a task queue. Also, the DE can use various hardware modules to asynchronously delegate the tasks to various resources or destination devices and to track the completion of such tasks and task groups in an efficient manner. The DE can also include an interrupt/completion handler module, a resource monitor module, and a task dispatcher module configured with the task queue module to track and dispatch work units that are sent to various destination devices for processing. Using this approach, the DE apparatus can be configured with a processing unit to coordinate the processing of work units in a manner that efficient uses the most critical resources with minimal added cost of silicon area.

Type: Grant

Filed: March 16, 2022

Date of Patent: May 13, 2025

Assignee: d-MATRIX CORPORATION

Inventors: Satyam Srivastava, Joseph P. Sprowes, Mayank Gupta
AI accelerator apparatus using in-memory compute chiplet devices for transformer workloads

Patent number: 12271321

Abstract: An AI accelerator apparatus using in-memory compute chiplet devices. The apparatus includes one or more chiplets, each of which includes a plurality of tiles. Each tile includes a plurality of slices, a central processing unit (CPU), and a hardware dispatch device. Each slice can include a digital in-memory compute (DIMC) device configured to perform high throughput computations. In particular, the DIMC device can be configured to accelerate the computations of attention functions for transformer-based models (a.k.a. transformers) applied to machine learning applications. A single input multiple data (SIMD) device configured to further process the DIMC output and compute softmax functions for the attention functions. The chiplet can also include die-to-die (D2D) interconnects, a peripheral component interconnect express (PCIe) bus, a dynamic random access memory (DRAM) interface, and a global CPU interface to facilitate communication between the chiplets, memory and a server or host system.

Type: Grant

Filed: October 24, 2023

Date of Patent: April 8, 2025

Assignee: d-MATRIX CORPORATION

Inventors: Sudeep Bhoja, Siddharth Sheth
Generative AI accelerator apparatus using in-memory compute chiplet devices for transformer workloads

Patent number: 12260223

Abstract: An AI accelerator apparatus using in-memory compute chiplet devices. The apparatus includes one or more chiplets, each of which includes a plurality of tiles. Each tile includes a plurality of slices, a central processing unit (CPU), and a hardware dispatch device. Each slice can include a digital in-memory compute (DIMC) device configured to perform high throughput computations. In particular, the DIMC device can be configured to accelerate the computations of attention functions for transformer-based models (a.k.a. transformers) applied to machine learning applications, including generative AI. A single input multiple data (SIMD) device configured to further process the DIMC output and compute softmax functions for the attention functions.

Type: Grant

Filed: November 23, 2022

Date of Patent: March 25, 2025

Assignee: d-MATRIX CORPORATION

Inventors: Sudeep Bhoja, Siddharth Sheth
Method and apparatus to cache key-value data in low-precision numerics for efficient generative transformer execution

Patent number: 12182028

Abstract: A transformer compute apparatus and method of operation therefor. The apparatus receives matrix inputs in a first format and generates projection tokens from these inputs. Among others, the apparatus includes a first cache device configured for processing first projection tokens and a second cache device configured for processing second projection tokens. The first cache device stores the first projection tokens in a first cache region and stores these tokens converted to a second format in a second cache region. The second cache device stores the second projection tokens converted to the second format in a first cache region and stores the converted second projection tokens after being transposed. Then, a compute device performs various matrix computations with the converted first projection tokens and transposed second projection tokens. Re-processing data and expensive padding and de-padding operations for transposed storage and byte alignment can be avoided using this caching process.

Type: Grant

Filed: September 28, 2023

Date of Patent: December 31, 2024

Assignee: d-MATRIX CORPORATION

Inventors: Akhil Arunkumar, Satyam Srivastava, Aayush Ankit
AI accelerator apparatus using in-memory compute chiplet devices for transformer workloads

Patent number: 12147359

Abstract: An AI accelerator apparatus using in-memory compute chiplet devices. The apparatus includes one or more chiplets, each of which includes a plurality of tiles. Each tile includes a plurality of slices, a central processing unit (CPU), and a hardware dispatch device. Each slice can include a digital in-memory compute (DIMC) device configured to perform high throughput computations. In particular, the DIMC device can be configured to accelerate the computations of attention functions for transformer-based models (a.k.a. transformers) applied to machine learning applications. A single input multiple data (SIMD) device configured to further process the DIMC output and compute softmax functions for the attention functions. The chiplet can also include die-to-die (D2D) interconnects, a peripheral component interconnect express (PCIe) bus, a dynamic random access memory (DRAM) interface, and a global CPU interface to facilitate communication between the chiplets, memory and a server or host system.

Type: Grant

Filed: January 25, 2024

Date of Patent: November 19, 2024

Assignee: d-MATRIX CORPORATION

Inventors: Sudeep Bhoja, Siddharth Sheth
AI accelerator apparatus using in-memory compute chiplet devices for transformer workloads

Patent number: 11886359

Abstract: An AI accelerator apparatus using in-memory compute chiplet devices. The apparatus includes one or more chiplets, each of which includes a plurality of tiles. Each tile includes a plurality of slices, a central processing unit (CPU), and a hardware dispatch device. Each slice can include a digital in-memory compute (DIMC) device configured to perform high throughput computations. In particular, the DIMC device can be configured to accelerate the computations of attention functions for transformer-based models (a.k.a. transformers) applied to machine learning applications. A single input multiple data (SIMD) device configured to further process the DIMC output and compute softmax functions for the attention functions. The chiplet can also include die-to-die (D2D) interconnects, a peripheral component interconnect express (PCIe) bus, a dynamic random access memory (DRAM) interface, and a global CPU interface to facilitate communication between the chiplets, memory and a server or host system.

Type: Grant

Filed: October 17, 2022

Date of Patent: January 30, 2024

Assignee: d-MATRIX CORPORATION

Inventors: Sudeep Bhoja, Siddharth Sheth
Ai accelerator apparatus using in-memory compute chiplet devices for transformer workloads

Patent number: 11847072

Abstract: An AI accelerator apparatus using in-memory compute chiplet devices. The apparatus includes one or more chiplets, each of which includes a plurality of tiles. Each tile includes a plurality of slices, a central processing unit (CPU), and a hardware dispatch device. Each slice can include a digital in-memory compute (DIMC) device configured to perform high throughput computations. In particular, the DIMC device can be configured to accelerate the computations of attention functions for transformer-based models (a.k.a. transformers) applied to machine learning applications. A single input multiple data (SIMD) device configured to further process the DIMC output and compute softmax functions for the attention functions. The chiplet can also include die-to-die (D2D) interconnects, a peripheral component interconnect express (PCIe) bus, a dynamic random access memory (DRAM) interface, and a global CPU interface to facilitate communication between the chiplets, memory and a server or host system.

Type: Grant

Filed: November 30, 2021

Date of Patent: December 19, 2023

Assignee: d-MATRIX CORPORATION

Inventors: Sudeep Bhoja, Siddharth Sheth