Patents by Inventor Aamer Jaleel

Aamer Jaleel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

OBJECT-LEVEL METADATA LOCATOR

Publication number: 20250190373

Abstract: Metadata generally refers to data that describes, or gives information about, other data. Metadata can be used for a wide variety of purposes, including for ensuring the safety of memory accesses. For example, to prevent memory safety errors, metadata, which indicates the base address and size of the data, can be used to validate memory access requests as prerequisite to allowing the memory access. While there are many useful applications of metadata, including for memory safety as mentioned above, the underlying metadata storage and retrieval processes that have been developed to date suffer from various problems. The present disclosure provides an object-level metadata locator, which can allow for an internal object layout to be maintained and which can scale to an arbitrary number of objects while requiring lower memory overhead than that required in the prior art.

Type: Application

Filed: July 16, 2024

Publication date: June 12, 2025

Inventors: Mohamed Tarek Bnziad Mohamed Hassan, Aamer Jaleel
SOFTWARE/HARDWARE CO-DESIGN FOR MEMORY SAFETY

Publication number: 20250190632

Abstract: Applications written in memory unsafe languages, such as C, C++, and CUDA, are vulnerable to a variety of memory safety errors because they do not validate the bounds and lifetime of memory accesses. For example, spatial memory safety errors occur when a pointer is used to access an object beyond its intended bounds while temporal memory safety errors occur when a pointer is used to access an object beyond its lifetime. Memory safety errors can lead to control-flow hijacking, silent data corruption, difficult-to-diagnose crashes, and security exploitation. Unfortunately, existing software-based solutions either provide low error detection coverage or come with significant runtime overheads, and existing hardware-accelerated GPU-based solutions have poor scalability or intrusive hardware changes. The present disclosure provides memory safety using a combination of hardware and software.

Type: Application

Filed: June 17, 2024

Publication date: June 12, 2025

Inventors: Mohamed Tarek Bnziad Mohamed Hassan, Aamer Jaleel, Sana Damani, Mark Stephenson, Stephen William Keckler
Alias-free tagged error correcting codes for machine memory operations

Patent number: 12321230

Abstract: Implicit Memory Tagging (IMT) mechanisms utilizing alias-free memory tags that enable hardware-assisted memory tagging without incurring storage overhead above those incurred by conventional tagging mechanisms, while providing enhanced data integrity and memory security. The IMT mechanisms enhance the utility of error correcting codes (ECCs) to test memory tags in addition to the traditional utility of ECCs for detecting and correcting data errors and enable a finer granularity of memory tagging than many conventional approaches.

Type: Grant

Filed: October 11, 2023

Date of Patent: June 3, 2025

Assignee: NVIDIA Corp.

Inventors: Michael B Sullivan, Mohamed Tarek Bnziad Mohamed Hassan, Aamer Jaleel
IMPLEMENTING HARDWARE-BASED MEMORY SAFETY FOR A GRAPHIC PROCESSING UNIT

Publication number: 20250021642

Abstract: While a compiler compiles source code to create an executable binary, code is added into the compiled source code that, when executed, identifies and stores in a metadata table base and bounds information associated with memory allocations. Additionally, additional code is added into the compiled source code that enables hardware to determine a safety of memory access requests during an implementation of the compiled source code by performing an out-of-bounds (OOB) check in hardware using the base and bounds information stored in the metadata table. This enables the identification and avoidance of unsafe memory operations during the implementation of the executable by a GPU.

Type: Application

Filed: September 30, 2024

Publication date: January 16, 2025

Inventors: Aamer Jaleel, Mohamed Tarek Bnziad Mohamed Hassan, Mark Stephenson
PROBABILISTIC TRACKER MANAGEMENT FOR MEMORY ATTACK MITIGATION

Publication number: 20240403417

Abstract: Rowhammer attacks, which are malicious processes that rapidly issue access requests to memory, can impose serious security threats including being used to tamper data, take control of entire systems, and even breach confidentiality. Current solutions to defend against these attacks are limited, as they typically employ a deterministic tracker to track the portions of memory accessed and to mitigate potential attacks accordingly. However, the deterministic nature of these trackers results in their own vulnerability. The present disclosure provides probabilistic tracker management for mitigation of rowhammer attacks and/or other memory attacks in which a row (or other defined portion of memory) is maliciously targeted to disturb contents of neighboring rows, which can prevent these types of attacks that otherwise take advantage of the determinism in prior used tracker designs.

Type: Application

Filed: December 19, 2023

Publication date: December 5, 2024

Inventors: Aamer Jaleel, Gururaj Saileshwar
Implementing hardware-based memory safety for a graphic processing unit

Patent number: 12135781

Abstract: While a compiler compiles source code to create an executable binary, code is added into the compiled source code that, when executed, identifies and stores in a metadata table base and bounds information associated with memory allocations. Additionally, additional code is added into the compiled source code that enables hardware to determine a safety of memory access requests during an implementation of the compiled source code by performing an out-of-bounds (OOB) check in hardware using the base and bounds information stored in the metadata table. This enables the identification and avoidance of unsafe memory operations during the implementation of the executable by a GPU.

Type: Grant

Filed: December 29, 2021

Date of Patent: November 5, 2024

Assignee: NVIDIA CORPORATION

Inventors: Aamer Jaleel, Mohamed Tarek Bnziad Mohamed Hassan, Mark Stephenson
ALIAS-FREE TAGGED ERROR CORRECTING CODES FOR MACHINE MEMORY OPERATIONS

Publication number: 20240184670

Abstract: Implicit Memory Tagging (IMT) mechanisms utilizing alias-free memory tags that enable hardware-assisted memory tagging without incurring storage overhead above those incurred by conventional tagging mechanisms, while providing enhanced data integrity and memory security. The IMT mechanisms enhance the utility of error correcting codes (ECCs) to test memory tags in addition to the traditional utility of ECCs for detecting and correcting data errors and enable a finer granularity of memory tagging than many conventional approaches.

Type: Application

Filed: October 11, 2023

Publication date: June 6, 2024

Applicant: NVIDIA Corp.

Inventors: Michael B. Sullivan, Mohamed Tarek Bnziad Mohamed Hassan, Aamer Jaleel
Implementing compiler-based memory safety for a graphic processing unit

Patent number: 11836361

Abstract: While a compiler compiles source code to create an executable binary, code is added into the compiled source code that, when executed, identifies and stores in a metadata table base and bounds information associated with memory allocations. Additionally, additional code is added into the compiled source code that performs memory safety checks during execution. This updated compiled source code automatically determines a safety of memory access requests during execution by performing an out-of-bounds (OOB) check using the base and bounds information retrieved and stored in the metadata table. This enables the identification and avoidance of unsafe memory operations during the implementation of the executable by a GPU.

Type: Grant

Filed: December 29, 2021

Date of Patent: December 5, 2023

Assignee: NVIDIA CORPORATION

Inventors: Mohamed Tarek Bnziad Mohamed Hassan, Aamer Jaleel, Mark Stephenson, Michael Sullivan
IMPLEMENTING HARDWARE-BASED MEMORY SAFETY FOR A GRAPHIC PROCESSING UNIT

Publication number: 20230061154

Abstract: While a compiler compiles source code to create an executable binary, code is added into the compiled source code that, when executed, identifies and stores in a metadata table base and bounds information associated with memory allocations. Additionally, additional code is added into the compiled source code that enables hardware to determine a safety of memory access requests during an implementation of the compiled source code by performing an out-of-bounds (OOB) check in hardware using the base and bounds information stored in the metadata table. This enables the identification and avoidance of unsafe memory operations during the implementation of the executable by a GPU.

Type: Application

Filed: December 29, 2021

Publication date: March 2, 2023

Inventors: Aamer Jaleel, Mohamed Tarek Bnziad Mohamed Hassan, Mark Stephenson
IMPLEMENTING COMPILER-BASED MEMORY SAFETY FOR A GRAPHIC PROCESSING UNIT

Publication number: 20230063568

Abstract: While a compiler compiles source code to create an executable binary, code is added into the compiled source code that, when executed, identifies and stores in a metadata table base and bounds information associated with memory allocations. Additionally, additional code is added into the compiled source code that performs memory safety checks during execution. This updated compiled source code automatically determines a safety of memory access requests during execution by performing an out-of-bounds (OOB) check using the base and bounds information retrieved and stored in the metadata table. This enables the identification and avoidance of unsafe memory operations during the implementation of the executable by a GPU.

Type: Application

Filed: December 29, 2021

Publication date: March 2, 2023

Inventors: Mohamed Tarek Bnziad Mohamed Hassan, Aamer Jaleel, Mark Stephenson, Michael Sullivan
Processor and method implementing a cacheline demote machine instruction

Patent number: 11513957

Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.

Type: Grant

Filed: September 21, 2020

Date of Patent: November 29, 2022

Assignee: Intel Corporation

Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
HARDWARE/SOFTWARE CO-OPTIMIZATION TO IMPROVE PERFORMANCE AND ENERGY FOR INTER-VM COMMUNICATION FOR NFVS AND OTHER PRODUCER-CONSUMER WORKLOADS

Publication number: 20210004328

Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.

Type: Application

Filed: September 21, 2020

Publication date: January 7, 2021

Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
Executing distributed memory operations using processing elements connected by distributed channels

Patent number: 10853276

Abstract: A technology for implementing a method for distributed memory operations. A method of the disclosure includes obtaining distributed channel information for an algorithm to be executed by a plurality of spatially distributed processing elements. For each distributed channel in the distributed channel information, the method further associates one or more of the plurality of spatially distributed processing elements with the distributed channel based on the algorithm.

Type: Grant

Filed: June 17, 2019

Date of Patent: December 1, 2020

Assignee: Intel Corporation

Inventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
Hardware/software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads

Patent number: 10817425

Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.

Type: Grant

Filed: December 26, 2014

Date of Patent: October 27, 2020

Assignee: Intel Corporation

Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
EXECUTING DISTRIBUTED MEMORY OPERATIONS USING PROCESSING ELEMENTS CONNECTED BY DISTRIBUTED CHANNELS

Publication number: 20190303312

Abstract: A technology for implementing a method for distributed memory operations. A method of the disclosure includes obtaining distributed channel information for an algorithm to be executed by a plurality of spatially distributed processing elements. For each distributed channel in the distributed channel information, the method further associates one or more of the plurality of spatially distributed processing elements with the distributed channel based on the algorithm.

Type: Application

Filed: June 17, 2019

Publication date: October 3, 2019

Inventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
Processors, methods, and systems for a configurable spatial accelerator with memory system performance, power reduction, and atomics support features

Patent number: 10387319

Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. The processor also includes a streamer element to prefetch the incoming operand set from two or more levels of a memory system.

Type: Grant

Filed: July 1, 2017

Date of Patent: August 20, 2019

Assignee: Intel Corporation

Inventors: Michael C. Adler, Chiachen Chou, Neal C. Crago, Kermin Fleming, Kent D. Glossop, Aamer Jaleel, Pratik M. Marolia, Simon C. Steely, Jr., Samantika S. Sury
Executing distributed memory operations using processing elements connected by distributed channels

Patent number: 10331583

Abstract: A processing device for executing distributed memory operations using spatial processing units (SPU) connected by distributed channels is disclosed. A distributed channel may or may not be associated with memory operations, such as load operations or store operations. Distributed channel information is obtained for an algorithm to be executed by a group of spatially distributed processing elements. The group of spatially distributed processing elements can be connected to a shared memory controller. For each distributed channel in the distributed channel information, one or more of the group of spatially distributed processing elements may be associated with the distributed channel based on the algorithm. By associating the spatially distributed processing elements to a distributed channel, the functionality of the processing element can vary depending on the algorithm mapped onto the SPU.

Type: Grant

Filed: September 26, 2013

Date of Patent: June 25, 2019

Assignee: Intel Corporation

Inventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
Technologies for network device flow lookup management

Patent number: 10284470

Abstract: Technologies for managing network flow lookups of a network device include a network controller and a target device, each communicatively coupled to the network device. The network device includes a cache for a processor of the network device and a main memory. The network device additionally includes a multi-level hash table having a first-level hash table stored in the cache of the network device and a second-level hash table stored in the main memory of the network device. The network device is configured to determine whether to store a network flow hash corresponding to a network flow indicating the target device in the first-level or second-level hash table based on a priority of the network flow provided to the network device by the network controller.

Type: Grant

Filed: December 23, 2014

Date of Patent: May 7, 2019

Assignee: Intel Corporation

Inventors: Ren Wang, Namakkal N. Venkatesan, Aamer Jaleel, Tsung-Yuan C. Tai, Sameh Gobriel, Christian Maciocco
PROCESSORS, METHODS, AND SYSTEMS FOR A CONFIGURABLE SPATIAL ACCELERATOR WITH MEMORY SYSTEM PERFORMANCE, POWER REDUCTION, AND ATOMICS SUPPORT FEATURES

Publication number: 20190004955

Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. The processor also includes a streamer element to prefetch the incoming operand set from two or more levels of a memory system.

Type: Application

Filed: July 1, 2017

Publication date: January 3, 2019

Inventors: Michael C. Adler, Chiachen Chou, Neal C. Crago, Kermin Fleming, Kent D. Glossop, Aamer Jaleel, Pratik M. Marolia, Simon C. Steely, JR., Samantika S. Sury
Instruction and logic for run-time evaluation of multiple prefetchers

Patent number: 10102134

Abstract: A processor includes a cache, a prefetcher module to select information according to a prefetcher algorithm, and a prefetcher algorithm selection module. The prefetcher algorithm selection module includes logic to select a candidate prefetcher algorithm determine and store memory addresses of predicted memory accesses of the candidate prefetcher algorithm when performed by the prefetcher module, determine cache lines accessed during memory operations, and evaluate whether the determined cache lines match the stored memory addresses. The prefetcher algorithm selection module further includes logic to adjust an accuracy ratio of the candidate prefetcher algorithm, compare the accuracy ratio with a threshold accuracy ratio, and determine whether to apply the first candidate prefetcher algorithm to the prefetcher module.

Type: Grant

Filed: June 23, 2016

Date of Patent: October 16, 2018

Assignee: Intel Corporation

Inventors: Zeshan A. Chishti, Christopher B. Wilkerson, Seth Pugsley, Peng-Fei Chuang, Robert L. Scott, Aamer Jaleel, Shih-Lien L. Lu, Kingsum Chow

1 2 3 next