Patents by Inventor Alan Menezes
Alan Menezes has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11893423Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: September 5, 2019Date of Patent: February 6, 2024Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Sonata Gale Wen, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Publication number: 20230297485Abstract: Various embodiments include a system for generating performance monitoring data in a computing system. The system includes a unit level counter with a set of counters, where each counter increments during each clock cycle in which a corresponding electronic signal is at a first state, such as a high or low logic level state. Periodically, the unit level counter transmits the counter values to a corresponding counter collection unit. The counter collection unit includes a set of counters that aggregates the values of the counters in multiple unit level counters. Based on certain trigger conditions, the counter collection unit transmits records to a reduction channel. The reduction channel includes a set of counters that aggregates the values of the counters in multiple counter collection units. Each virtual machine executing on the system can access a different corresponding reduction channel, providing secure performance metric data for each virtual machine.Type: ApplicationFiled: March 18, 2022Publication date: September 21, 2023Inventors: Pranav VAIDYA, Alan MENEZES, Siddharth SHARMA, Jin OUYANG, Gregory Paul SMITH, Timothy J. MCDONALD, Shounak KAMALAPURKAR, Abhijat RANADE, Thomas Melvin OGLETREE
-
Patent number: 11663036Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: September 5, 2019Date of Patent: May 30, 2023Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Patent number: 11635986Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: September 5, 2019Date of Patent: April 25, 2023Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Patent number: 11579925Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: September 5, 2019Date of Patent: February 14, 2023Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Patent number: 11307903Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.Type: GrantFiled: January 31, 2018Date of Patent: April 19, 2022Assignee: NVIDIA CorporationInventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
-
Patent number: 11249905Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: September 5, 2019Date of Patent: February 15, 2022Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Patent number: 11144087Abstract: Performance monitors are placed on computational units in different clock domains of an integrated circuit. A central dispatcher generates trigger signals to the performance monitors to cause the performance monitors to respond to the trigger signals with packets reporting local performance counts for the associated computational units. The data in the packets are correlated into a single clock domain. By applying a trigger and reporting system, the disclosed approach can synchronize the performance metrics of the various computational units in the different clock domains without having to route a complex global clock reference signal to all of the performance monitors.Type: GrantFiled: March 12, 2019Date of Patent: October 12, 2021Assignee: NVIDIA CorporationInventors: Roger Allen, Alan Menezes, Tom Ogletree, Shounak Kamalapurkar, Abhijat Ranade
-
Publication number: 20210157651Abstract: A parallel processing unit (PPU), operating in a traditional processing environment or in a virtualized processing environment, can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: ApplicationFiled: February 1, 2021Publication date: May 27, 2021Inventors: Jerome F. DULUK, Jr., Gregory Scott PALMER, Jonathon Stuart Ramsay EVANS, Shailendra SINGH, Samuel H. DUNCAN, Wishwesh Anil GANDHI, Lacky V. SHAH, Eric ROCK, Feiqi SU, James Leroy DEMING, Alan MENEZES, Pranav VAIDYA, Praveen JOGINIPALLY, Timothy John PURCELL, Manas MANDAL
-
Publication number: 20210073025Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: ApplicationFiled: September 5, 2019Publication date: March 11, 2021Inventors: Jerome F. DULUK, JR., Gregory Scott PALMER, Jonathon Stuart Ramsey EVANS, Shailendra SINGH, Samuel H. DUNCAN, Wishwesh Anil GANDHI, Lacky V. SHAH, Eric ROCK, Feiqi SU, James Leroy DEMING, Alan MENEZES, Pranav VAIDYA, Praveen JOGINIPALLY, Timothy John PURCELL, Manas MANDAL
-
Publication number: 20210073125Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: ApplicationFiled: September 5, 2019Publication date: March 11, 2021Inventors: Jerome F. DULUK, JR., Gregory Scott PALMER, Jonathon Stuart Ramsey EVANS, Shailendra SINGH, Samuel H. DUNCAN, Wishwesh Anil GANDHI, Lacky V. SHAH, Eric ROCK, Feiqi SU, James Leroy DEMING, Alan MENEZES, Pranav VAIDYA, Praveen JOGINIPALLY, Timothy John PURCELL, Manas MANDAL
-
Publication number: 20210073042Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: ApplicationFiled: September 5, 2019Publication date: March 11, 2021Inventors: Jerome F. DULUK, Jr., Gregory Scott PALMER, Jonathon Stuart Ramsey EVANS, Shailendra SINGH, Samuel H. DUNCAN, Wishwesh Anil GANDHI, Lacky V. SHAH, Eric ROCK, Feiqi SU, James Leroy DEMING, Alan MENEZES, Pranav VAIDYA, Praveen JOGINIPALLY, Timothy John PURCELL, Manas MANDAL
-
Publication number: 20210073035Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: ApplicationFiled: September 5, 2019Publication date: March 11, 2021Inventors: Jerome F. DULUK, Jr., Gregory Scott PALMER, Jonathon Stuart Ramsey EVANS, Shailendra SINGH, Samuel H. DUNCAN, Wishwesh Anil GANDHI, Lacky V. SHAH, Eric ROCK, Feiqi SU, James Leroy DEMING, Alan MENEZES, Pranav VAIDYA, Praveen JOGINIPALLY, Timothy John PURCELL, Manas MANDAL
-
Patent number: 10817338Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.Type: GrantFiled: January 31, 2018Date of Patent: October 27, 2020Assignee: NVIDIA CorporationInventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
-
Publication number: 20200050482Abstract: Performance monitors are placed on computational units in different clock domains of an integrated circuit. A central dispatcher generates trigger signals to the performance monitors to cause the performance monitors to respond to the trigger signals with packets reporting local performance counts for the associated computational units. The data in the packets are correlated into a single clock domain. By applying a trigger and reporting system, the disclosed approach can synchronize the performance metrics of the various computational units in the different clock domains without having to route a complex global clock reference signal to all of the performance monitors.Type: ApplicationFiled: March 12, 2019Publication date: February 13, 2020Inventors: Roger Allen, Alan Menezes, Tom Ogletree, Shounak Kamalapurkar, Abhijat Ranade
-
Publication number: 20190235928Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.Type: ApplicationFiled: January 31, 2018Publication date: August 1, 2019Inventors: Jerome F. Duluk, JR., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
-
Publication number: 20190235924Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.Type: ApplicationFiled: January 31, 2018Publication date: August 1, 2019Inventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris