Patents by Inventor Lide Duan

Lide Duan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Systems and methods for memory bandwidth allocation

Patent number: 12248400

Abstract: A computer-implemented method for allocating memory bandwidth of multiple CPU cores in a server includes: receiving an access request to a last level cache (LLC) shared by the multiple CPU cores in the server, the access request being sent from a core with a private cache holding copies of frequently accessed data from a memory; determining whether the access request is an LLC hit or an LLC miss; and controlling a memory bandwidth controller based on the determination. The memory bandwidth controller performs a memory bandwidth throttling to control a request rate between the private cache and the last level cache. The LLC hit of the access request causes the memory bandwidth throttling initiated by the memory bandwidth controller to be disabled and the LLC miss of the access request causes the memory bandwidth throttling initiated by the memory bandwidth controller to be enabled.

Type: Grant

Filed: August 16, 2023

Date of Patent: March 11, 2025

Assignee: Alibaba (China) Co., Ltd.

Inventors: Lide Duan, Bowen Huang, Qichen Zhang, Shengcheng Wang, Yen-Kuang Chen, Hongzhong Zheng
Management of IOMMU TLB entries for compute units of a SIMD processing device

Patent number: 12248406

Abstract: The present application discloses a computing system and an associated method. The computing system includes a memory, a master computing device and a slave computing device. The master computing device includes a memory controller and an input-output memory management unit (IOMMU). When the slave computing device accesses a first virtual address, and a first translation lookaside buffer (TLB) of the slave computing device does not store the first virtual address, the first TLB sends a translation request to the IOMMU. The IOMMU traverses page tables of the memory controller to obtain a first physical address corresponding to the first virtual address, selects and clears a first virtual address entry from a second TLB of the computing system according to a recent use time and a dependent workload of each virtual address entry to store the first virtual address and the first physical address.

Type: Grant

Filed: December 13, 2022

Date of Patent: March 11, 2025

Assignee: ALIBABA (CHINA) CO., LTD.

Inventors: Lide Duan, Qichen Zhang, Shijian Zhang, Yen-Kuang Chen
Memory controller prioritizing writing compressed data

Patent number: 12147341

Abstract: Apparatus, method, and system provided herein are directed to prioritizing cache line writing of compressed data. The memory controller comprises a cache line compression engine that receives raw data, compresses the raw data, determines a compression rate between the raw data and the compressed data, determines whether the compression rate is greater than a predetermined rate, and outputs the compressed data as data-to-be-written if the compression rate is greater than the predetermined rate. In response to determining that the compression rate is greater than the predetermined rate, the cache line compression engine generates a compression signal indicating the data-to-be-written is the compressed data and sends the compression signal to a scheduler of a command queue in the memory controller where writing of compressed data is prioritized.

Type: Grant

Filed: August 6, 2020

Date of Patent: November 19, 2024

Assignee: Alibaba Group Holding Limited

Inventors: Dimin Niu, Tianchan Guan, Lide Duan, Hongzhong Zheng
Zero skipping techniques for reducing data movement

Patent number: 12141438

Abstract: Zero skipping sparsity techniques for reduced data movement between memory and accelerators and reduced computational workload of accelerators. The techniques include detection of zero and near-zero values on the memory. The non-zero values are transferred to the accelerator for computation. The zero and near-zero values are written back within the memory as zero values.

Type: Grant

Filed: February 25, 2021

Date of Patent: November 12, 2024

Assignee: Alibaba Group Holding Limited

Inventors: Fei Xue, Fei Sun, Yangjie Zhou, Lide Duan, Hongzhong Zheng
DATA PROCESSING METHOD AND DATA PROCESSING ACCELERATOR SYSTEM

Publication number: 20240303090

Abstract: A data processing method, applicable to an accelerator that is communicatively coupled to a processor core, includes obtaining a service data processing request from a first queue; obtaining to-be-processed service data corresponding to the service data processing request from the processor core via a service interface; generating result service data based on the to-be-processed service data; and writing the result service data into a second queue for providing to the processor core.

Type: Application

Filed: March 7, 2024

Publication date: September 12, 2024

Inventors: Shijian Zhang, Lide Duan, Hongzhong Zheng
DATA TRANSMISSION METHOD

Publication number: 20240303135

Abstract: Embodiments of the present disclosure provide a data transmission method. The data transmission method is applied to an operation chip. The operation chip includes a plurality of nodes of a network on chip (NoC), and the method includes: receiving a data processing instruction of target service data, where the data processing instruction carries information about a receiving node and a processing node set; determining a relay processing node in the processing node set based on the receiving node; and transmitting the target service data from the receiving node to the relay processing node, and transmitting the target service data from the relay processing node to another processing node in the processing node set.

Type: Application

Filed: March 8, 2024

Publication date: September 12, 2024

Inventors: Huatao Zhao, Shengcheng Wang, Yunfan Li, Lide Duan
Dynamic memory coherency biasing techniques

Patent number: 12056374

Abstract: A dynamic bias coherency configuration engine can include control logic, a host threshold register, and device threshold register and a plurality of memory region monitoring units. The memory region monitoring units can include a starting page number register, an ending page number register, a host access register and a device access register. The memory region monitoring units can be utilized by dynamic bias coherency configuration engine to configure corresponding portions of a memory space in a device bias mode or a host bias mode.

Type: Grant

Filed: February 3, 2021

Date of Patent: August 6, 2024

Assignee: Alibaba Group Holding Limited

Inventors: Lide Duan, Dimin Niu, Hongzhong Zheng
METHOD AND SCHEDULING MANAGEMENT UNIT FOR TRANSMITTING DATA PACKET

Publication number: 20240244013

Abstract: Embodiments of this disclosure provide a data packet transmission method, a scheduling management unit, a chip, and a graphic card. The data packet transmission method includes: determining a source node and a destination node of a data packet to be transmitted; determining at least one intermediate routing node corresponding to the data packet to be transmitted based on the source node and the destination node of the data packet to be transmitted and a data transmission state of each node in a network on chip (NoC); and transmitting identification information of the at least one intermediate routing node to the source node of the data packet to be transmitted.

Type: Application

Filed: January 16, 2024

Publication date: July 18, 2024

Inventors: Yunfang LI, Jiayi HUANG, Lide DUAN, Dimin NIU
SYSTEMS AND METHODS FOR MEMORY BANDWIDTH ALLOCATION

Publication number: 20240061780

Abstract: A computer-implemented method for allocating memory bandwidth of multiple CPU cores in a server includes: receiving an access request to a last level cache (LLC) shared by the multiple CPU cores in the server, the access request being sent from a core with a private cache holding copies of frequently accessed data from a memory; determining whether the access request is an LLC hit or an LLC miss; and controlling a memory bandwidth controller based on the determination. The memory bandwidth controller performs a memory bandwidth throttling to control a request rate between the private cache and the last level cache. The LLC hit of the access request causes the memory bandwidth throttling initiated by the memory bandwidth controller to be disabled and the LLC miss of the access request causes the memory bandwidth throttling initiated by the memory bandwidth controller to be enabled.

Type: Application

Filed: August 16, 2023

Publication date: February 22, 2024

Inventors: Lide DUAN, Bowen HUANG, Qichen ZHANG, Shengcheng WANG, Yen-Kuang CHEN, Hongzhong ZHENG
COMPUTING DEVICE AND METHOD FOR THE SAME

Publication number: 20240045960

Abstract: A computing device includes a processor, at least one storage block, and an access detection unit. The processor includes a load/store unit (LSU). When the processor switches from a program to another program, the LSU stores a return address of the another program to the at least one storage block. The access detection unit includes a store-once stack and a comparison logic circuit. The store-once stack stores a storage address of the return address in the at least one storage block when the at least one storage block stores the return address. Before the LSU performs a storage operation on the at least one storage block, the comparison logic circuit compares a write address of the storage operation with a storage addresses of the return address stored in the store-once stack to determine whether the return address will be modified.

Type: Application

Filed: December 9, 2022

Publication date: February 8, 2024

Inventors: SHIJIAN ZHANG, LIDE DUAN
COMPUTING SYSTEM, MASTER COMPUTING DEVICE, SLAVE COMPUTING DEVICE AND ASSOCIATED METHOD

Publication number: 20240045809

Abstract: The present application discloses a computing system and an associated method. The computing system includes a memory, a master computing device and a slave computing device. The master computing device includes a memory controller and an input-output memory management unit (IOMMU). When the slave computing device accesses a first virtual address, and a first translation lookaside buffer (TLB) of the slave computing device does not store the first virtual address, the first TLB sends a translation request to the IOMMU. The IOMMU traverses page tables of the memory controller to obtain a first physical address corresponding to the first virtual address, selects and clears a first virtual address entry from a second TLB of the computing system according to a recent use time and a dependent workload of each virtual address entry to store the first virtual address and the first physical address.

Type: Application

Filed: December 13, 2022

Publication date: February 8, 2024

Inventors: LIDE DUAN, QICHEN ZHANG, SHIJIAN ZHANG, YEN-KUANG CHEN
PROCESSING UNIT FOR MEMORY AND ACCESS DETECTION METHOD THEREOF

Publication number: 20240045599

Abstract: The present application discloses a processing unit and an access detection method thereof. The processing unit includes an execution circuit. The execution circuit connects to a memory and is configured to: execute an access request, wherein the access request is for accessing at least one part of a first physical memory section corresponding to a first access base address; determine whether a first tag of the access request is equal to a second tag corresponding to the first memory base address and whether the at least one part of the first physical memory section matches a first legal access section corresponding to the first memory base address; and determine whether to send an alert message according to the determination result.

Type: Application

Filed: December 13, 2022

Publication date: February 8, 2024

Inventors: SHIJIAN ZHANG, LIDE DUAN
CORE-AWARE CACHING SYSTEMS AND METHODS FOR MULTICORE PROCESSORS

Publication number: 20240045805

Abstract: Core-aware caching systems and methods for non-inclusive non-exclusive shared caching based on core sharing behaviors of the data and/or instructions. In one implementation, the caching between a shared cache level and a core specific cache level can be based on physical page number (PPN) and core identifier sets for previous accesses to the respective physical page numbers. In another implementation, the caching between a shared cache level and a core specific cache level can be based on physical page number and core valid bit vector sets for previous accesses to the respective physical page numbers by each of the plurality of cores.

Type: Application

Filed: January 20, 2021

Publication date: February 8, 2024

Inventors: Lide DUAN, Guocai ZHU, Yen-kuang Chen, Hongzhong ZHENG
PROCESSOR AND ATTACK DETECTION METHOD THEREOF

Publication number: 20240037221

Abstract: The present application discloses a processor and an attack detection method thereof. The processor includes a first register and an execution unit. The execution unit is configured to: execute a first jump-related instruction under a first privilege mode; set a first field of the first register to a first jump status parameter according to execution of the first jump-related instruction; jump to a first corresponding instruction in a specified register of the first jump-related instruction; determine whether the first corresponding instruction is a legal instruction and whether a first parameter of the first corresponding instruction is equal to the first jump status parameter to obtain a first determination; and determine whether to send an alert message according to the first determination.

Type: Application

Filed: December 13, 2022

Publication date: February 1, 2024

Inventors: SHIJIAN ZHANG, LIDE DUAN
FLOORPLAN-OPTIMIZED MATRIX EXTENSION ARCHITECTURE FOR PROCESSORS

Publication number: 20240004830

Abstract: Embodiments of the present disclosure includes a processor. The processor may include a systolic array of processing elements; a first group of buffers coupled to the systolic array, wherein the first group comprises one or more first buffers; a second group of buffers coupled to the systolic array, wherein the second group comprises one or more second buffers; an accumulator coupled to the systolic array; and a third group of buffers coupled to the accumulator, wherein the third group comprises one or more third buffers.

Type: Application

Filed: November 7, 2022

Publication date: January 4, 2024

Inventors: Qichen ZHANG, Lide DUAN, Shengcheng WANG
MEMORY CONTROLLER

Publication number: 20230281124

Abstract: Apparatus, method, and system provided herein are directed to prioritizing cache line writing of compressed data. The memory controller comprises a cache line compression engine that receives raw data, compresses the raw data, determines a compression rate between the raw data and the compressed data, determines whether the compression rate is greater than a predetermined rate, and outputs the compressed data as data-to-be-written if the compression rate is greater than the predetermined rate. In response to determining that the compression rate is greater than the predetermined rate, the cache line compression engine generates a compression signal indicating the data-to-be-written is the compressed data and sends the compression signal to a scheduler of a command queue in the memory controller where writing of compressed data is prioritized.

Type: Application

Filed: August 6, 2020

Publication date: September 7, 2023

Inventors: Dimin Niu, Tianchan Guan, Lide Duan, Hongzhong Zheng
Scalable system-in-package architectures

Patent number: 11704271

Abstract: A system-in-package architecture in accordance with aspects includes a logic die and one or more memory dice coupled together in a three-dimensional slack. The logic die can include one or more global building blocks and a plurality of local building blocks. The number of local building blocks can be scalable. The local building blocks can include a plurality of engines and memory controllers. The memory controllers can be configured to directly couple one or more of the engines to the one or more memory dice. The number and type of local building blocks, and the number and types of engines and memory controllers can be scalable.

Type: Grant

Filed: August 20, 2020

Date of Patent: July 18, 2023

Assignee: Alibaba Group Holding Limited

Inventors: Lide Duan, Wei Han, Yuhao Wang, Fei Xue, Yuanwei Fang, Hongzhong Zheng
Dual-modal memory interface controller

Patent number: 11604744

Abstract: A dual-model memory interface of a computing system is provided, configurable to present memory interfaces having differently-graded bandwidth capacity to different processors of the computing system. A mode switch controller of the memory interface controller, based on at least an arbitration rule written to a configuration register, switches the memory interface controller between a narrow-band mode and a wide-band mode. In each mode, the memory interface controller disables either a plurality of narrow-band memory interfaces of the memory interface controller according to a first bus standard, or a wide-band memory interface of the memory interface controller according to a second bus standard. The memory interface controller virtualizes a plurality of system memory units of the computing system as a virtual wide-band memory unit according to the second bus standard, or virtualizes a system memory unit of the computing system as a virtual narrow-band memory unit according to the first bus standard.

Type: Grant

Filed: October 16, 2020

Date of Patent: March 14, 2023

Assignee: Alibaba Group Holding Limited

Inventors: Yuhao Wang, Wei Han, Dimin Niu, Lide Duan, Shuangchen Li, Fei Xue, Hongzhong Zheng
MEMORY SYSTEM FOR ACCELERATING GRAPH NEURAL NETWORK PROCESSING

Publication number: 20230026824

Abstract: A memory system for accelerating graph neural network processing can include an on-host chip memory to cache data needed for processing a current root node. The system can also include a volatile memory interface between the host and non-volatile memory. The volatile memory can be configured to save one or more sets of next root nodes, neighbor nodes and corresponding attributes. The non-volatile memory can have sufficient capacity to store the entire graph data. The non-volatile memory can also be configured to pre-arrange the sets of next root nodes, neighbor nodes and corresponding attributes for storage in the volatile memory.

Type: Application

Filed: July 15, 2022

Publication date: January 26, 2023

Inventors: Fei XUE, Yangjie ZHOU, Lide DUAN, Hongzhong ZHENG
DYNAMIC MEMORY COHERENCY BIASING TECHNIQUES

Publication number: 20220244870

Abstract: A dynamic bias coherency configuration engine can include control logic, a host threshold register, and device threshold register and a plurality of memory region monitoring units. The memory region monitoring units can include a starting page number register, an ending page number register, a host access register and a device access register. The memory region monitoring units can be utilized by dynamic bias coherency configuration engine to configure corresponding portions of a memory space in a device bias mode or a host bias mode.

Type: Application

Filed: February 3, 2021

Publication date: August 4, 2022

Inventors: Lide DUAN, Dimin NIU, Hongzhong ZHENG

1 2 next