Patents by Inventor Lacky V. Shah
Lacky V. Shah has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11893423Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: September 5, 2019Date of Patent: February 6, 2024Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Sonata Gale Wen, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Publication number: 20230315328Abstract: Various embodiments include techniques for accessing extended memory in a parallel processing system via a high-bandwidth path to extended memory residing on a central processing unit. The disclosed extended memory system extends the directly addressable high-bandwidth memory local to a parallel processing system and avoids the performance penalties associated with low-bandwidth system memory. As a result, execution threads that are highly parallelizable and access a large memory space execute with increased performance on a parallel processing system relative to prior approaches.Type: ApplicationFiled: March 18, 2022Publication date: October 5, 2023Inventors: Hemayet HOSSAIN, Steven E. MOLNAR, Jonathon Stuart Ramsay EVANS, Wishwesh Anil GANDHI, Lacky V. SHAH, Vyas VENKATARAMAN, Mark HAIRGROVE, Geoffrey GERFIN, Jeffrey M. SMITH, Terje BERGSTROM, Vikram SETHI, Piyush PATEL
-
Patent number: 11663036Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: September 5, 2019Date of Patent: May 30, 2023Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Patent number: 11635986Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: September 5, 2019Date of Patent: April 25, 2023Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Patent number: 11579925Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: September 5, 2019Date of Patent: February 14, 2023Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Patent number: 11249905Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: September 5, 2019Date of Patent: February 15, 2022Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Publication number: 20210349763Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.Type: ApplicationFiled: February 5, 2021Publication date: November 11, 2021Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, JR., Christopher Lamb
-
Publication number: 20210157651Abstract: A parallel processing unit (PPU), operating in a traditional processing environment or in a virtualized processing environment, can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: ApplicationFiled: February 1, 2021Publication date: May 27, 2021Inventors: Jerome F. DULUK, Jr., Gregory Scott PALMER, Jonathon Stuart Ramsay EVANS, Shailendra SINGH, Samuel H. DUNCAN, Wishwesh Anil GANDHI, Lacky V. SHAH, Eric ROCK, Feiqi SU, James Leroy DEMING, Alan MENEZES, Pranav VAIDYA, Praveen JOGINIPALLY, Timothy John PURCELL, Manas MANDAL
-
Publication number: 20210073035Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: ApplicationFiled: September 5, 2019Publication date: March 11, 2021Inventors: Jerome F. DULUK, Jr., Gregory Scott PALMER, Jonathon Stuart Ramsey EVANS, Shailendra SINGH, Samuel H. DUNCAN, Wishwesh Anil GANDHI, Lacky V. SHAH, Eric ROCK, Feiqi SU, James Leroy DEMING, Alan MENEZES, Pranav VAIDYA, Praveen JOGINIPALLY, Timothy John PURCELL, Manas MANDAL
-
Publication number: 20210073042Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: ApplicationFiled: September 5, 2019Publication date: March 11, 2021Inventors: Jerome F. DULUK, Jr., Gregory Scott PALMER, Jonathon Stuart Ramsey EVANS, Shailendra SINGH, Samuel H. DUNCAN, Wishwesh Anil GANDHI, Lacky V. SHAH, Eric ROCK, Feiqi SU, James Leroy DEMING, Alan MENEZES, Pranav VAIDYA, Praveen JOGINIPALLY, Timothy John PURCELL, Manas MANDAL
-
Publication number: 20210073025Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: ApplicationFiled: September 5, 2019Publication date: March 11, 2021Inventors: Jerome F. DULUK, JR., Gregory Scott PALMER, Jonathon Stuart Ramsey EVANS, Shailendra SINGH, Samuel H. DUNCAN, Wishwesh Anil GANDHI, Lacky V. SHAH, Eric ROCK, Feiqi SU, James Leroy DEMING, Alan MENEZES, Pranav VAIDYA, Praveen JOGINIPALLY, Timothy John PURCELL, Manas MANDAL
-
Publication number: 20210073125Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: ApplicationFiled: September 5, 2019Publication date: March 11, 2021Inventors: Jerome F. DULUK, JR., Gregory Scott PALMER, Jonathon Stuart Ramsey EVANS, Shailendra SINGH, Samuel H. DUNCAN, Wishwesh Anil GANDHI, Lacky V. SHAH, Eric ROCK, Feiqi SU, James Leroy DEMING, Alan MENEZES, Pranav VAIDYA, Praveen JOGINIPALLY, Timothy John PURCELL, Manas MANDAL
-
Patent number: 10915364Abstract: Apparatuses, systems, and techniques for performing nested kernel execution within a parallel processing subsystem. In at least one embodiment, a parent thread launches a nested child grid on the parallel processing subsystem, and enables the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid.Type: GrantFiled: December 2, 2016Date of Patent: February 9, 2021Assignee: NVIDIA CorporationInventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Christopher Lamb
-
Publication number: 20210019185Abstract: One embodiment of the present invention sets forth a technique for encapsulating compute task state that enables out-of-order scheduling and execution of the compute tasks. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes. Each group is maintained as a linked list of pointers to compute tasks that are encoded as task metadata (TMD) stored in memory. A TMD encapsulates the state and parameters needed to initialize, schedule, and execute a compute task.Type: ApplicationFiled: October 5, 2020Publication date: January 21, 2021Inventors: Jerome F. DULUK, JR., Lacky V. SHAH, Sean J. TREICHLER
-
Patent number: 10795722Abstract: One embodiment of the present invention sets forth a technique for encapsulating compute task state that enables out-of-order scheduling and execution of the compute tasks. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes. Each group is maintained as a linked list of pointers to compute tasks that are encoded as task metadata (TMD) stored in memory. A TMD encapsulates the state and parameters needed to initialize, schedule, and execute a compute task.Type: GrantFiled: November 9, 2011Date of Patent: October 6, 2020Assignee: NVIDIA CorporationInventors: Jerome F. Duluk, Jr., Lacky V. Shah, Sean J. Treichler
-
Publication number: 20200151016Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.Type: ApplicationFiled: January 17, 2020Publication date: May 14, 2020Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Jr., Christopher Lamb
-
Patent number: 10552201Abstract: One embodiment of the present invention sets forth a technique for instruction level execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. Any in-flight instructions that follow the preemption command in the processing pipeline are captured and stored in a processing task buffer to be reissued when the preempted program is resumed. The processing task buffer is designated as a high priority task to ensure the preempted instructions are reissued before any new instructions for the preempted context when execution of the preempted context is restored.Type: GrantFiled: May 12, 2017Date of Patent: February 4, 2020Assignee: NVIDIA CORPORATIONInventors: Philip Alexander Cuadra, Christopher Lamb, Lacky V. Shah
-
Patent number: 10552202Abstract: One embodiment of the present invention sets forth a technique for instruction level execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. Any in-flight instructions that follow the preemption command in the processing pipeline are captured and stored in a processing task buffer to be reissued when the preempted program is resumed. The processing task buffer is designated as a high priority task to ensure the preempted instructions are reissued before any new instructions for the preempted context when execution of the preempted context is restored.Type: GrantFiled: May 12, 2017Date of Patent: February 4, 2020Assignee: NVIDIA CORPORATIONInventors: Philip Alexander Cuadra, Christopher Lamb, Lacky V. Shah
-
Patent number: 10289418Abstract: Techniques are provided for handling a trap encountered in a thread that is part of a thread array that is being executed in a plurality of execution units. In these techniques, a data structure with an identifier associated with the thread is updated to indicate that the trap occurred during the execution of the thread array. Also in these techniques, the execution units execute a trap handling routine that includes a context switch. The execution units perform this context switch for at least one of the execution units as part of the trap handling routine while allowing the remaining execution units to exit the trap handling routine before the context switch. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors.Type: GrantFiled: December 27, 2012Date of Patent: May 14, 2019Assignee: NVIDIA CORPORATIONInventors: Gerald F. Luiz, Philip Alexander Cuadra, Luke Durant, Shirish Gadre, Robert Ohannessian, Lacky V. Shah, Nicholas Wang, Arthur Danskin
-
Patent number: 10235208Abstract: A streaming multiprocessor (SM) included within a parallel processing unit (PPU) is configured to suspend a thread group executing on the SM and to save the operating state of the suspended thread group. A load-store unit (LSU) within the SM re-maps local memory associated with the thread group to a location in global memory. Subsequently, the SM may re-launch the suspended thread group. The LSU may then perform local memory access operations on behalf of the re-launched thread group with the re-mapped local memory that resides in global memory.Type: GrantFiled: December 11, 2012Date of Patent: March 19, 2019Assignee: NVIDIA CORPORATIONInventors: Nicholas Wang, Lacky V. Shah, Gerald F. Luiz, Philip Alexander Cuadra, Luke Durant, Shirish Gadre