Patents by Inventor Jonathon Stuart Ramsay EVANS
Jonathon Stuart Ramsay EVANS has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20260133884Abstract: Apparatuses, systems, and techniques to detect memory errors and isolate or migrate partitions on a parallel processing unit using an application programming interface to facilitate parallel computing, such as CUDA. In at least one embodiment, interrupts are intercepted and processed on a graphics processing unit indicating a memory error for one or more partitions, and a policy is applied to isolate that memory error from other partitions.Type: ApplicationFiled: December 22, 2025Publication date: May 14, 2026Inventors: Jonathon Stuart Ramsay Evans, Naveen Cherukuri, Jerome Francis Duluk, JR., Shailendra Singh, Vaibhav Vyas, Wishwesh Gandhi, Arvind Gopalakrishnan, Manas Mandal
-
Patent number: 12517798Abstract: Apparatuses, systems, and techniques to detect memory errors and isolate or migrate partitions on a parallel processing unit using an application programming interface to facilitate parallel computing, such as CUDA. In at least one embodiment, interrupts are intercepted and processed on a graphics processing unit indicating a memory error for one or more partitions, and a policy is applied to isolate that memory error from other partitions.Type: GrantFiled: March 20, 2020Date of Patent: January 6, 2026Assignee: NVIDIA CorporationInventors: Jonathon Stuart Ramsay Evans, Naveen Cherukuri, Jerome Francis Duluk, Jr., Shailendra Singh, Vaibhav Vyas, Wishwesh Gandhi, Arvind Gopalakrishnan, Manas Mandal
-
Patent number: 12498979Abstract: A parallel processing unit (PPU), operating in a traditional processing environment or in a virtualized processing environment, can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: February 1, 2021Date of Patent: December 16, 2025Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsay Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Sonata Gale Wen, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Publication number: 20250355716Abstract: A parallel processing unit (PPU), operating in a traditional processing environment or in a virtualized processing environment, can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: ApplicationFiled: July 28, 2025Publication date: November 20, 2025Inventors: Jerome F. DULUK, JR., Gregory Scott PALMER, Jonathon Stuart Ramsay EVANS, Shailendra SINGH, Samuel H. DUNCAN, Wishwesh Anil GANDHI, Lacky V. SHAH, Sonata Gale WEN, Feiqi SU, James Leroy DEMING, Alan MENEZES, Pranav VAIDYA, Praveen JOGINIPALLY, Timothy John PURCELL, Manas MANDAL
-
Patent number: 12443363Abstract: Various embodiments include techniques for accessing extended memory in a parallel processing system via a high-bandwidth path to extended memory residing on a central processing unit. The disclosed extended memory system extends the directly addressable high-bandwidth memory local to a parallel processing system and avoids the performance penalties associated with low-bandwidth system memory. As a result, execution threads that are highly parallelizable and access a large memory space execute with increased performance on a parallel processing system relative to prior approaches.Type: GrantFiled: March 18, 2022Date of Patent: October 14, 2025Assignee: NVIDIA CORPORATIONInventors: Hemayet Hossain, Steven E. Molnar, Jonathon Stuart Ramsay Evans, Wishwesh Anil Gandhi, Lacky V. Shah, Vyas Venkataraman, Mark Hairgrove, Geoffrey Gerfin, Jeffrey M. Smith, Terje Bergstrom, Vikram Sethi, Piyush Patel
-
Publication number: 20240403048Abstract: Various embodiments include techniques for launching processing work in a computing system. The disclosed techniques include load and store operations that specify a category. The disclosed techniques further include barrier instructions that specify a category. A processing unit of the computing system executes a set of load and store operations that specify various categories. When the processor subsequently executes a barrier instruction that specifies a category, the barrier instruction waits for data for only load and store operations that specify the same category. After the barrier instruction completes execution, the processing unit can launch processes that are dependent on data from load and store operations of the specified category, even if data from load and store operations of other categories is still pending. As a result, the processing unit can launch processes as soon as the relevant data is available without waiting for nonrelevant data.Type: ApplicationFiled: May 30, 2023Publication date: December 5, 2024Inventors: Kaushal AGARWAL, Jonathon Stuart Ramsay EVANS
-
Publication number: 20230315328Abstract: Various embodiments include techniques for accessing extended memory in a parallel processing system via a high-bandwidth path to extended memory residing on a central processing unit. The disclosed extended memory system extends the directly addressable high-bandwidth memory local to a parallel processing system and avoids the performance penalties associated with low-bandwidth system memory. As a result, execution threads that are highly parallelizable and access a large memory space execute with increased performance on a parallel processing system relative to prior approaches.Type: ApplicationFiled: March 18, 2022Publication date: October 5, 2023Inventors: Hemayet HOSSAIN, Steven E. MOLNAR, Jonathon Stuart Ramsay EVANS, Wishwesh Anil GANDHI, Lacky V. SHAH, Vyas VENKATARAMAN, Mark HAIRGROVE, Geoffrey GERFIN, Jeffrey M. SMITH, Terje BERGSTROM, Vikram SETHI, Piyush PATEL
-
Publication number: 20210294707Abstract: Apparatuses, systems, and techniques to detect memory errors and isolate or migrate partitions on a parallel processing unit using an application programming interface to facilitate parallel computing, such as CUDA. In at least one embodiment, interrupts are intercepted and processed on a graphics processing unit indicating a memory error for one or more partitions, and a policy is applied to isolate that memory error from other partitions.Type: ApplicationFiled: March 20, 2020Publication date: September 23, 2021Inventors: Jonathon Stuart Ramsay Evans, Naveen Cherukuri, Jerome Francis Duluk, JR., Shailendra Singh, Vaibhav Vyas, Wishwesh Gandhi, Arvind Gopalakrishnan, Manas Mandal
-
Publication number: 20210157651Abstract: A parallel processing unit (PPU), operating in a traditional processing environment or in a virtualized processing environment, can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: ApplicationFiled: February 1, 2021Publication date: May 27, 2021Inventors: Jerome F. DULUK, Jr., Gregory Scott PALMER, Jonathon Stuart Ramsay EVANS, Shailendra SINGH, Samuel H. DUNCAN, Wishwesh Anil GANDHI, Lacky V. SHAH, Eric ROCK, Feiqi SU, James Leroy DEMING, Alan MENEZES, Pranav VAIDYA, Praveen JOGINIPALLY, Timothy John PURCELL, Manas MANDAL
-
Patent number: 10114758Abstract: One embodiment of the present invention includes techniques to support demand paging across a processing unit. Before a host unit transmits a command to an engine that does not tolerate page faults, the host unit ensures that the virtual memory addresses associated with the command are appropriately mapped to physical memory addresses. In particular, if the virtual memory addresses are not appropriately mapped, then the processing unit performs actions to map the virtual memory address to appropriate locations in physical memory. Further, the processing unit ensures that the access permissions required for successful execution of the command are established. Because the virtual memory address mappings associated with the command are valid when the engine receives the command, the engine does not encounter page faults upon executing the command. Consequently, in contrast to prior-art techniques, the engine supports demand paging regardless of whether the engine is involved in remedying page faults.Type: GrantFiled: September 13, 2013Date of Patent: October 30, 2018Assignee: NVIDIA CORPORATIONInventors: Samuel H. Duncan, Jerome F. Duluk, Jr., Jonathon Stuart Ramsay Evans, James Leroy Deming
-
Patent number: 9442759Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.Type: GrantFiled: December 9, 2011Date of Patent: September 13, 2016Assignee: NVIDIA CorporationInventors: Samuel H. Duncan, Lacky V. Shah, Sean J. Treichler, Daniel Elliot Wexler, Jerome F. Duluk, Jr., Philip Browning Johnson, Jonathon Stuart Ramsay Evans
-
Publication number: 20150082001Abstract: One embodiment of the present invention includes techniques to support demand paging across a processing unit. Before a host unit transmits a command to an engine that does not tolerate page faults, the host unit ensures that the virtual memory addresses associated with the command are appropriately mapped to physical memory addresses. In particular, if the virtual memory addresses are not appropriately mapped, then the processing unit performs actions to map the virtual memory address to appropriate locations in physical memory. Further, the processing unit ensures that the access permissions required for successful execution of the command are established. Because the virtual memory address mappings associated with the command are valid when the engine receives the command, the engine does not encounter page faults upon executing the command. Consequently, in contrast to prior-art techniques, the engine supports demand paging regardless of whether the engine is involved in remedying page faults.Type: ApplicationFiled: September 13, 2013Publication date: March 19, 2015Applicant: NVIDIA CORPORATIONInventors: Samuel H. DUNCAN, Jerome F. DULUK, JR., Jonathon Stuart Ramsay EVANS, James Leroy DEMING
-
Publication number: 20140281263Abstract: One embodiment of the present invention is a parallel processing unit (PPU) that includes one or more streaming multiprocessors (SMs) and implements a replay unit per SM. Upon detecting a page fault associated with a memory transaction issued by a particular SM, the corresponding replay unit causes the SM, but not any unaffected SMs, to cease issuing new memory transactions. The replay unit then stores the faulting memory transaction and any faulting in-flight memory transaction in a replay buffer. As page faults are resolved, the replay unit replays the memory transactions in the replay buffer—removing successful memory transactions from the replay buffer—until all of the stored memory transactions have successfully executed. Advantageously, the overall performance of the PPU is improved compared to conventional PPUs that, upon detecting a page fault, stop performing memory transactions across all SMs included in the PPU until the fault is resolved.Type: ApplicationFiled: December 17, 2013Publication date: September 18, 2014Applicant: NVIDIA CORPORATIONInventors: James Leroy DEMING, Jerome F. DULUK, Jr., John MASHEY, Mark HAIRGROVE, Lucien DUNNING, Jonathon Stuart Ramsay EVANS, Samuel H. DUNCAN, Cameron BUSCHARDT, Brian FAHS
-
Publication number: 20130152093Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.Type: ApplicationFiled: December 9, 2011Publication date: June 13, 2013Inventors: Samuel H. DUNCAN, Lacky V. SHAH, Sean J. TREICHLER, Daniel Elliot WEXLER, Jerome F. DULUK, JR., Phillip Browning JOHNSON, Jonathon Stuart Ramsay EVANS