Patents by Inventor David Nellans
David Nellans has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20260119172Abstract: In a system including a processing unit and a set of one or more stacked memory chips, the processing unit can request data. When the data is distributed such that there is at least one non-contiguous memory sector in the smallest unit of memory segments usable by the system, then a gather operation can be utilized to instruct the set of one or more stacked memory chips to gather the requested data into a virtual address space, e.g., a gather accelerated address space. The requested data can be aligned to the byte chunk size used by the processing unit and at least some of the unneeded memory segments can be skipped, e.g., not copied into the virtual address space. The requested data in the virtual address space can be communicated to the processing unit using less bandwidth resources than when not using the gather operation.Type: ApplicationFiled: October 24, 2024Publication date: April 30, 2026Inventors: Donghyuk Lee, James Michael O"Connor, David Nellans, Niladrish Chatterjee
-
Patent number: 12566704Abstract: Various embodiments include techniques for migrating points of coherence (PoCs) in a cache hierarchy. The techniques comprise receiving, at a first cache memory and from a second cache memory, a memory access request associated with a first scope group, and, in response to determining a first directory within the first cache memory includes a first entry indicating (i) a third cache memory is a child of the first cache memory in a tree associated with the first scope group, and (ii) the first cache memory is a root of the tree associated with the first scope group: performing one or more operations to acquire a PoC token from a descendant cache memory of the first cache memory in the tree associated with the first scope group, and updating the first entry to indicate that the first cache memory is a PoC of the tree associated with the first scope group.Type: GrantFiled: April 2, 2024Date of Patent: March 3, 2026Assignee: NVIDIA CORPORATIONInventors: Nicolai Alexander Oswald, Evgeny Bolotin, Daniel Joseph Lustig, David Nellans, Sean J. Treichler
-
Publication number: 20260037310Abstract: While the capabilities of GPUs are being consistently enhanced with each new generation thereby enabling them to process data at a faster rate, many applications configured to execute on the GPU do not exploit the full potential of a GPU. To better utilize GPU resources and to more efficiently run applications, applications can be co-scheduled on the GPU such that the GPU concurrently executes processes of the co-scheduled applications. However, current GPU scheduling solutions are limited in that they either do not consider the QoS requirements of an application or do not allow for dynamic allocations during application execution. The present disclosure provides for dynamic allocation of GPU resources for concurrent processes which can optimize GPU resource utilization while minimizing power consumption and adhering to QoS requirements of each application.Type: ApplicationFiled: July 31, 2024Publication date: February 5, 2026Inventors: Harini Muthukrishnan, Oreste Villa, David Nellans
-
Publication number: 20250307149Abstract: Various embodiments include techniques for migrating points of coherence (PoCs) in a cache hierarchy. The techniques comprise receiving, at a first cache memory and from a second cache memory, a memory access request associated with a first scope group, and, in response to determining a first directory within the first cache memory includes a first entry indicating (i) a third cache memory is a child of the first cache memory in a tree associated with the first scope group, and (ii) the first cache memory is a root of the tree associated with the first scope group: performing one or more operations to acquire a PoC token from a descendant cache memory of the first cache memory in the tree associated with the first scope group, and updating the first entry to indicate that the first cache memory is a PoC of the tree associated with the first scope group.Type: ApplicationFiled: April 2, 2024Publication date: October 2, 2025Inventors: Nicolai Alexander OSWALD, Evgeny BOLOTIN, Daniel Joseph LUSTIG, David NELLANS, Sean J. TREICHLER
-
Publication number: 20250307152Abstract: Various embodiments include techniques for migrating points of coherence (PoCs) in a cache hierarchy. The techniques comprise receiving, at a first cache memory and from a second cache memory, a memory access request associated with a first scope group, and, in response to determining a first directory within the first cache memory includes a first entry indicating (i) a third cache memory is a child of the first cache memory in a tree associated with the first scope group, and (ii) the first cache memory is a root of the tree associated with the first scope group: performing one or more operations to acquire a PoC token from a descendant cache memory of the first cache memory in the tree associated with the first scope group, and updating the first entry to indicate that the first cache memory is a PoC of the tree associated with the first scope group.Type: ApplicationFiled: April 2, 2024Publication date: October 2, 2025Inventors: Nicolai Alexander OSWALD, Evgeny BOLOTIN, Daniel Joseph LUSTIG, David NELLANS, Sean J. TREICHLER
-
Publication number: 20250307148Abstract: Various embodiments include techniques for migrating points of coherence (PoCs) in a cache hierarchy. The techniques comprise receiving, at a first cache memory and from a second cache memory, a memory access request associated with a first scope group, and, in response to determining a first directory within the first cache memory includes a first entry indicating (i) a third cache memory is a child of the first cache memory in a tree associated with the first scope group, and (ii) the first cache memory is a root of the tree associated with the first scope group: performing one or more operations to acquire a PoC token from a descendant cache memory of the first cache memory in the tree associated with the first scope group, and updating the first entry to indicate that the first cache memory is a PoC of the tree associated with the first scope group.Type: ApplicationFiled: April 2, 2024Publication date: October 2, 2025Inventors: Nicolai Alexander OSWALD, Evgeny BOLOTIN, Daniel Joseph LUSTIG, David NELLANS, Sean J. TREICHLER
-
Patent number: 12130750Abstract: Computer systems often employ virtual address translation hierarchies in which virtual memory addresses are mapped to physical memory. Use of the virtual address translation hierarchy speeds up the virtual address translation when the required mapping is stored in one of the higher levels of the hierarchy. To reduce a number of misses occurring in the virtual address translation hierarchy, huge memory pages may be selectively employed, which map larger continuous regions of virtual memory to continuous regions of physical memory, thereby increasing the coverage of each entry in the virtual address translation hierarchy. The present disclosure provides hardware support for optimizing this huge memory page selection.Type: GrantFiled: March 6, 2023Date of Patent: October 29, 2024Assignee: NVIDIA CORPORATIONInventors: Aninda Manocha, Zi Yan, David Nellans
-
Publication number: 20240303201Abstract: Computer systems often employ virtual address translation hierarchies in which virtual memory addresses are mapped to physical memory. Use of the virtual address translation hierarchy speeds up the virtual address translation when the required mapping is stored in one of the higher levels of the hierarchy. To reduce a number of misses occurring in the virtual address translation hierarchy, huge memory pages may be selectively employed, which map larger continuous regions of virtual memory to continuous regions of physical memory, thereby increasing the coverage of each entry in the virtual address translation hierarchy. The present disclosure provides hardware support for optimizing this huge memory page selection.Type: ApplicationFiled: March 6, 2023Publication date: September 12, 2024Inventors: Aninda Manocha, Zi Yan, David Nellans
-
Patent number: 11880261Abstract: A system, method, and apparatus of power management for computing systems are included herein that optimize individual frequencies of components of the computing systems using machine learning. The computing systems can be tightly integrated systems that consider an overall operating budget that is shared between the components of the computing system while adjusting the frequencies of the individual components. An example of an automated method of power management includes: (1) learning, using a power management (PM) agent, frequency settings for different components of a computing system during execution of a repetitive application, and (2) adjusting the frequency settings of the different components using the PM agent, wherein the adjusting is based on the repetitive application and one or more limitations corresponding to a shared operating budget for the computing system.Type: GrantFiled: March 31, 2022Date of Patent: January 23, 2024Assignee: NVIDIA CorporationInventors: Evgeny Bolotin, Yaosheng Fu, Zi Yan, Gal Dalal, Shie Mannor, David Nellans
-
Publication number: 20230137205Abstract: Introduced herein is a technique that uses ML to autonomously find a cache management policy that achieves an optimal execution of a given workload of an application. Leveraging ML such as reinforcement learning, the technique trains an agent in an ML environment over multiple episodes of a stabilization process. For each time step in these training episodes, the agent executes the application while making an incremental change to the current policy, i.e., cache-residency statuses of memory address space associated with the workload, until the application can be executed at a stable level. The stable level of execution, for example, can be indicated by performance variations, such as standard deviations, between a certain number of neighboring measurement periods remaining within a certain threshold. The agent, who has been trained in the training episodes, infers the final cache management policy during the final, inferring episode.Type: ApplicationFiled: October 29, 2021Publication date: May 4, 2023Inventors: Yaosheng Fu, Shie Mannor, Evgeny Bolotin, David Nellans, Gal Dalal
-
Patent number: 11625279Abstract: In general, an application executes on a compute unit, such as a central processing unit (CPU) or graphics processing unit (GPU), to perform some function(s). In some circumstances, improved performance of an application, such as a graphics application, may be provided by executing the application across multiple compute units. However, when using multiple compute units in this manner, synchronization must be provided between the compute units. Synchronization, including the sharing of the data, is typically accomplished through memory. While a shared memory may cause bottlenecks, employing local memory for each compute unit may itself require synchronization (coherence) which can be costly in terms of resources, delay, etc. The present disclosure provides read-write page replication for multiple compute units that avoids the traditional challenges associated with coherence.Type: GrantFiled: February 11, 2020Date of Patent: April 11, 2023Assignee: NVIDIA CORPORATIONInventors: Daniel Lustig, Oreste Villa, David Nellans
-
Patent number: 11609879Abstract: In various embodiments, a parallel processor includes a parallel processor module implemented within a first die and a memory system module implemented within a second die. The memory system module is coupled to the parallel processor module via an on-package link. The parallel processor module includes multiple processor cores and multiple cache memories. The memory system module includes a memory controller for accessing a DRAM. Advantageously, the performance of the parallel processor module can be effectively tailored for memory bandwidth demands that typify one or more application domains via the memory system module.Type: GrantFiled: July 1, 2021Date of Patent: March 21, 2023Assignee: NVIDIA CorporationInventors: Yaosheng Fu, Evgeny Bolotin, Niladrish Chatterjee, Stephen William Keckler, David Nellans
-
Publication number: 20230079978Abstract: A system, method, and apparatus of power management for computing systems are included herein that optimize individual frequencies of components of the computing systems using machine learning. The computing systems can be tightly integrated systems that consider an overall operating budget that is shared between the components of the computing system while adjusting the frequencies of the individual components. An example of an automated method of power management includes: (1) learning, using a power management (PM) agent, frequency settings for different components of a computing system during execution of a repetitive application, and (2) adjusting the frequency settings of the different components using the PM agent, wherein the adjusting is based on the repetitive application and one or more limitations corresponding to a shared operating budget for the computing system.Type: ApplicationFiled: March 31, 2022Publication date: March 16, 2023Inventors: Evgeny Bolotin, Yaosheng Fu, Zi Yan, Gal Dalal, Shie Mannor, David Nellans
-
Publication number: 20220276984Abstract: In various embodiments, a parallel processor includes a parallel processor module implemented within a first die and a memory system module implemented within a second die. The memory system module is coupled to the parallel processor module via an on-package link. The parallel processor module includes multiple processor cores and multiple cache memories. The memory system module includes a memory controller for accessing a DRAM. Advantageously, the performance of the parallel processor module can be effectively tailored for memory bandwidth demands that typify one or more application domains via the memory system module.Type: ApplicationFiled: July 1, 2021Publication date: September 1, 2022Inventors: Yaosheng FU, Evgeny BOLOTIN, Niladrish CHATTERJEE, Stephen William KECKLER, David NELLANS
-
Publication number: 20210248014Abstract: In general, an application executes on a compute unit, such as a central processing unit (CPU) or graphics processing unit (GPU), to perform some function(s). In some circumstances, improved performance of an application, such as a graphics application, may be provided by executing the application across multiple compute units. However, when using multiple compute units in this manner, synchronization must be provided between the compute units. Synchronization, including the sharing of the data, is typically accomplished through memory. While a shared memory may cause bottlenecks, employing local memory for each compute unit may itself require synchronization (coherence) which can be costly in terms of resources, delay, etc. The present disclosure provides read-write page replication for multiple compute units that avoids the traditional challenges associated with coherence.Type: ApplicationFiled: February 11, 2020Publication date: August 12, 2021Inventors: Daniel Lustig, Oreste Villa, David Nellans
-
Patent number: 10489295Abstract: A system includes a data store and a memory cache subsystem. A method for pre-fetching data from the data store for the cache includes determining a performance characteristic of a data store. The method also includes identifying a pre-fetch policy configured to utilize the determined performance characteristic of the data store. The method also includes pre-fetching data stored in the data store by copying data from the data store to the cache according to the pre-fetch policy identified to utilize the determined performance characteristic of the data store.Type: GrantFiled: March 14, 2013Date of Patent: November 26, 2019Assignee: SANDISK TECHNOLOGIES LLCInventors: David Nellans, Torben Mathiasen, David Flynn, Nisha Talagala
-
Patent number: 10318324Abstract: Techniques are disclosed relating to enabling virtual machines to access data on a physical recording medium. In one embodiment, a computing system provides a logical address space for a storage device to an allocation agent that is executable to allocate the logical address space to a plurality of virtual machines having access to the storage device. In such an embodiment, the logical address space is larger than a physical address space of the storage device. The computing system may then process a storage request from one of the plurality of virtual machines. In some embodiments, the allocation agent is a hypervisor executing on the computing system. In some embodiments, the computing system tracks utilizations of the storage device by the plurality of virtual machines, and based on the utilizations, enforces a quality of service level associated with one or more of the plurality of virtual machines.Type: GrantFiled: July 13, 2017Date of Patent: June 11, 2019Assignee: SANDISK TECHNOLOGIES LLCInventors: Neil Carson, Nisha Talagala, Mark Brinicombe, Robert Wipfel, Anirudh Badam, David Nellans
-
Publication number: 20190073296Abstract: Data is stored on a non-volatile storage media in a sequential, log-based format. The formatted data defines an ordered sequence of storage operations performed on the non-volatile storage media. A storage layer maintains volatile metadata, which may include a forward index associating logical identifiers with respective physical storage units on the non-volatile storage media. The volatile metadata may be reconstructed from the ordered sequence of storage operations. Persistent notes may be used to maintain consistency between the volatile metadata and the contents of the non-volatile storage media. Persistent notes may identify data that does not need to be retained on the non-volatile storage media and/or is no longer valid.Type: ApplicationFiled: November 1, 2018Publication date: March 7, 2019Inventors: David Atkisson, David Nellans, David Flynn, Jens Axboe, Michael Zappe
-
Patent number: 10133663Abstract: Data is stored on a non-volatile storage media in a sequential, log-based format. The formatted data defines an ordered sequence of storage operations performed on the non-volatile storage media. A storage layer maintains volatile metadata, which may include a forward index associating logical identifiers with respective physical storage units on the non-volatile storage media. The volatile metadata may be reconstructed from the ordered sequence of storage operations. Persistent notes may be used to maintain consistency between the volatile metadata and the contents of the non-volatile storage media. Persistent notes may identify data that does not need to be retained on the non-volatile storage media and/or is no longer valid.Type: GrantFiled: October 3, 2013Date of Patent: November 20, 2018Assignee: Longitude Enterprise Flash S.A.R.L.Inventors: David Atkisson, David Nellans, David Flynn, Jens Axboe, Michael Zappe
-
Patent number: 10102075Abstract: A storage layer of a non-volatile storage device may be configured to provide key-value storage services. Key conflicts may be resolved by modifying the logical interface of data stored on the non-volatile storage device. Resolving a key conflict may comprise identifying an alternative key and implementing one or more range move operations configured to bind the stored data to the alternative key. The move operations may be implemented without relocating the data on the non-volatile storage device.Type: GrantFiled: March 24, 2016Date of Patent: October 16, 2018Assignee: SanDisk Technologies LLCInventors: Nisha Talagala, David Flynn, Swaminathan Sundararaman, Sriram Subramanian, David Nellans, Robert Wipfel, John Strasser