Patents by Inventor Aamer Jaleel
Aamer Jaleel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250190373Abstract: Metadata generally refers to data that describes, or gives information about, other data. Metadata can be used for a wide variety of purposes, including for ensuring the safety of memory accesses. For example, to prevent memory safety errors, metadata, which indicates the base address and size of the data, can be used to validate memory access requests as prerequisite to allowing the memory access. While there are many useful applications of metadata, including for memory safety as mentioned above, the underlying metadata storage and retrieval processes that have been developed to date suffer from various problems. The present disclosure provides an object-level metadata locator, which can allow for an internal object layout to be maintained and which can scale to an arbitrary number of objects while requiring lower memory overhead than that required in the prior art.Type: ApplicationFiled: July 16, 2024Publication date: June 12, 2025Inventors: Mohamed Tarek Bnziad Mohamed Hassan, Aamer Jaleel
-
Publication number: 20250190632Abstract: Applications written in memory unsafe languages, such as C, C++, and CUDA, are vulnerable to a variety of memory safety errors because they do not validate the bounds and lifetime of memory accesses. For example, spatial memory safety errors occur when a pointer is used to access an object beyond its intended bounds while temporal memory safety errors occur when a pointer is used to access an object beyond its lifetime. Memory safety errors can lead to control-flow hijacking, silent data corruption, difficult-to-diagnose crashes, and security exploitation. Unfortunately, existing software-based solutions either provide low error detection coverage or come with significant runtime overheads, and existing hardware-accelerated GPU-based solutions have poor scalability or intrusive hardware changes. The present disclosure provides memory safety using a combination of hardware and software.Type: ApplicationFiled: June 17, 2024Publication date: June 12, 2025Inventors: Mohamed Tarek Bnziad Mohamed Hassan, Aamer Jaleel, Sana Damani, Mark Stephenson, Stephen William Keckler
-
Patent number: 12321230Abstract: Implicit Memory Tagging (IMT) mechanisms utilizing alias-free memory tags that enable hardware-assisted memory tagging without incurring storage overhead above those incurred by conventional tagging mechanisms, while providing enhanced data integrity and memory security. The IMT mechanisms enhance the utility of error correcting codes (ECCs) to test memory tags in addition to the traditional utility of ECCs for detecting and correcting data errors and enable a finer granularity of memory tagging than many conventional approaches.Type: GrantFiled: October 11, 2023Date of Patent: June 3, 2025Assignee: NVIDIA Corp.Inventors: Michael B Sullivan, Mohamed Tarek Bnziad Mohamed Hassan, Aamer Jaleel
-
Publication number: 20250021642Abstract: While a compiler compiles source code to create an executable binary, code is added into the compiled source code that, when executed, identifies and stores in a metadata table base and bounds information associated with memory allocations. Additionally, additional code is added into the compiled source code that enables hardware to determine a safety of memory access requests during an implementation of the compiled source code by performing an out-of-bounds (OOB) check in hardware using the base and bounds information stored in the metadata table. This enables the identification and avoidance of unsafe memory operations during the implementation of the executable by a GPU.Type: ApplicationFiled: September 30, 2024Publication date: January 16, 2025Inventors: Aamer Jaleel, Mohamed Tarek Bnziad Mohamed Hassan, Mark Stephenson
-
Publication number: 20240403417Abstract: Rowhammer attacks, which are malicious processes that rapidly issue access requests to memory, can impose serious security threats including being used to tamper data, take control of entire systems, and even breach confidentiality. Current solutions to defend against these attacks are limited, as they typically employ a deterministic tracker to track the portions of memory accessed and to mitigate potential attacks accordingly. However, the deterministic nature of these trackers results in their own vulnerability. The present disclosure provides probabilistic tracker management for mitigation of rowhammer attacks and/or other memory attacks in which a row (or other defined portion of memory) is maliciously targeted to disturb contents of neighboring rows, which can prevent these types of attacks that otherwise take advantage of the determinism in prior used tracker designs.Type: ApplicationFiled: December 19, 2023Publication date: December 5, 2024Inventors: Aamer Jaleel, Gururaj Saileshwar
-
Patent number: 12135781Abstract: While a compiler compiles source code to create an executable binary, code is added into the compiled source code that, when executed, identifies and stores in a metadata table base and bounds information associated with memory allocations. Additionally, additional code is added into the compiled source code that enables hardware to determine a safety of memory access requests during an implementation of the compiled source code by performing an out-of-bounds (OOB) check in hardware using the base and bounds information stored in the metadata table. This enables the identification and avoidance of unsafe memory operations during the implementation of the executable by a GPU.Type: GrantFiled: December 29, 2021Date of Patent: November 5, 2024Assignee: NVIDIA CORPORATIONInventors: Aamer Jaleel, Mohamed Tarek Bnziad Mohamed Hassan, Mark Stephenson
-
Publication number: 20240184670Abstract: Implicit Memory Tagging (IMT) mechanisms utilizing alias-free memory tags that enable hardware-assisted memory tagging without incurring storage overhead above those incurred by conventional tagging mechanisms, while providing enhanced data integrity and memory security. The IMT mechanisms enhance the utility of error correcting codes (ECCs) to test memory tags in addition to the traditional utility of ECCs for detecting and correcting data errors and enable a finer granularity of memory tagging than many conventional approaches.Type: ApplicationFiled: October 11, 2023Publication date: June 6, 2024Applicant: NVIDIA Corp.Inventors: Michael B. Sullivan, Mohamed Tarek Bnziad Mohamed Hassan, Aamer Jaleel
-
Patent number: 11836361Abstract: While a compiler compiles source code to create an executable binary, code is added into the compiled source code that, when executed, identifies and stores in a metadata table base and bounds information associated with memory allocations. Additionally, additional code is added into the compiled source code that performs memory safety checks during execution. This updated compiled source code automatically determines a safety of memory access requests during execution by performing an out-of-bounds (OOB) check using the base and bounds information retrieved and stored in the metadata table. This enables the identification and avoidance of unsafe memory operations during the implementation of the executable by a GPU.Type: GrantFiled: December 29, 2021Date of Patent: December 5, 2023Assignee: NVIDIA CORPORATIONInventors: Mohamed Tarek Bnziad Mohamed Hassan, Aamer Jaleel, Mark Stephenson, Michael Sullivan
-
Publication number: 20230061154Abstract: While a compiler compiles source code to create an executable binary, code is added into the compiled source code that, when executed, identifies and stores in a metadata table base and bounds information associated with memory allocations. Additionally, additional code is added into the compiled source code that enables hardware to determine a safety of memory access requests during an implementation of the compiled source code by performing an out-of-bounds (OOB) check in hardware using the base and bounds information stored in the metadata table. This enables the identification and avoidance of unsafe memory operations during the implementation of the executable by a GPU.Type: ApplicationFiled: December 29, 2021Publication date: March 2, 2023Inventors: Aamer Jaleel, Mohamed Tarek Bnziad Mohamed Hassan, Mark Stephenson
-
Publication number: 20230063568Abstract: While a compiler compiles source code to create an executable binary, code is added into the compiled source code that, when executed, identifies and stores in a metadata table base and bounds information associated with memory allocations. Additionally, additional code is added into the compiled source code that performs memory safety checks during execution. This updated compiled source code automatically determines a safety of memory access requests during execution by performing an out-of-bounds (OOB) check using the base and bounds information retrieved and stored in the metadata table. This enables the identification and avoidance of unsafe memory operations during the implementation of the executable by a GPU.Type: ApplicationFiled: December 29, 2021Publication date: March 2, 2023Inventors: Mohamed Tarek Bnziad Mohamed Hassan, Aamer Jaleel, Mark Stephenson, Michael Sullivan
-
Patent number: 11513957Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.Type: GrantFiled: September 21, 2020Date of Patent: November 29, 2022Assignee: Intel CorporationInventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
-
Publication number: 20210004328Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.Type: ApplicationFiled: September 21, 2020Publication date: January 7, 2021Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
-
Patent number: 10853276Abstract: A technology for implementing a method for distributed memory operations. A method of the disclosure includes obtaining distributed channel information for an algorithm to be executed by a plurality of spatially distributed processing elements. For each distributed channel in the distributed channel information, the method further associates one or more of the plurality of spatially distributed processing elements with the distributed channel based on the algorithm.Type: GrantFiled: June 17, 2019Date of Patent: December 1, 2020Assignee: Intel CorporationInventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
-
Patent number: 10817425Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.Type: GrantFiled: December 26, 2014Date of Patent: October 27, 2020Assignee: Intel CorporationInventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
-
Publication number: 20190303312Abstract: A technology for implementing a method for distributed memory operations. A method of the disclosure includes obtaining distributed channel information for an algorithm to be executed by a plurality of spatially distributed processing elements. For each distributed channel in the distributed channel information, the method further associates one or more of the plurality of spatially distributed processing elements with the distributed channel based on the algorithm.Type: ApplicationFiled: June 17, 2019Publication date: October 3, 2019Inventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
-
Patent number: 10387319Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. The processor also includes a streamer element to prefetch the incoming operand set from two or more levels of a memory system.Type: GrantFiled: July 1, 2017Date of Patent: August 20, 2019Assignee: Intel CorporationInventors: Michael C. Adler, Chiachen Chou, Neal C. Crago, Kermin Fleming, Kent D. Glossop, Aamer Jaleel, Pratik M. Marolia, Simon C. Steely, Jr., Samantika S. Sury
-
Patent number: 10331583Abstract: A processing device for executing distributed memory operations using spatial processing units (SPU) connected by distributed channels is disclosed. A distributed channel may or may not be associated with memory operations, such as load operations or store operations. Distributed channel information is obtained for an algorithm to be executed by a group of spatially distributed processing elements. The group of spatially distributed processing elements can be connected to a shared memory controller. For each distributed channel in the distributed channel information, one or more of the group of spatially distributed processing elements may be associated with the distributed channel based on the algorithm. By associating the spatially distributed processing elements to a distributed channel, the functionality of the processing element can vary depending on the algorithm mapped onto the SPU.Type: GrantFiled: September 26, 2013Date of Patent: June 25, 2019Assignee: Intel CorporationInventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
-
Patent number: 10284470Abstract: Technologies for managing network flow lookups of a network device include a network controller and a target device, each communicatively coupled to the network device. The network device includes a cache for a processor of the network device and a main memory. The network device additionally includes a multi-level hash table having a first-level hash table stored in the cache of the network device and a second-level hash table stored in the main memory of the network device. The network device is configured to determine whether to store a network flow hash corresponding to a network flow indicating the target device in the first-level or second-level hash table based on a priority of the network flow provided to the network device by the network controller.Type: GrantFiled: December 23, 2014Date of Patent: May 7, 2019Assignee: Intel CorporationInventors: Ren Wang, Namakkal N. Venkatesan, Aamer Jaleel, Tsung-Yuan C. Tai, Sameh Gobriel, Christian Maciocco
-
Publication number: 20190004955Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. The processor also includes a streamer element to prefetch the incoming operand set from two or more levels of a memory system.Type: ApplicationFiled: July 1, 2017Publication date: January 3, 2019Inventors: Michael C. Adler, Chiachen Chou, Neal C. Crago, Kermin Fleming, Kent D. Glossop, Aamer Jaleel, Pratik M. Marolia, Simon C. Steely, JR., Samantika S. Sury
-
Patent number: 10102134Abstract: A processor includes a cache, a prefetcher module to select information according to a prefetcher algorithm, and a prefetcher algorithm selection module. The prefetcher algorithm selection module includes logic to select a candidate prefetcher algorithm determine and store memory addresses of predicted memory accesses of the candidate prefetcher algorithm when performed by the prefetcher module, determine cache lines accessed during memory operations, and evaluate whether the determined cache lines match the stored memory addresses. The prefetcher algorithm selection module further includes logic to adjust an accuracy ratio of the candidate prefetcher algorithm, compare the accuracy ratio with a threshold accuracy ratio, and determine whether to apply the first candidate prefetcher algorithm to the prefetcher module.Type: GrantFiled: June 23, 2016Date of Patent: October 16, 2018Assignee: Intel CorporationInventors: Zeshan A. Chishti, Christopher B. Wilkerson, Seth Pugsley, Peng-Fei Chuang, Robert L. Scott, Aamer Jaleel, Shih-Lien L. Lu, Kingsum Chow