Patents by Inventor Wishwesh Anil GANDHI
Wishwesh Anil GANDHI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240354106Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a data center or multiprocessor computing system. During a remote memory operation, a source processor transmits multiple data segments to a destination processor. For each data segment, the source processor transmits a remote memory operation to the destination processor that includes associated metadata that identifies the memory location of a corresponding synchronization object representing a count of data segments to be stored or a flag for each data segment to be stored. The remote memory operation along with the metadata is transmitted as a single unit to the destination processor. The destination processor splits the operation into the remote memory operation and the memory synchronization operation.Type: ApplicationFiled: June 26, 2024Publication date: October 24, 2024Inventors: Srinivas Santosh Kumar MADUGULA, Olivier GIROUX, Wishwesh Anil GANDHI, Michael Allen PARKER, Raghuram L, Ivan TANASIC, Manan PATEL, Mark HUMMEL, Alexander L. MINKIN, Gregory Michael THORSON
-
Patent number: 12105960Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a multiprocessor computing system. During a remote memory operation in the multiprocessor computing system, a source processing unit transmits multiple segments of data to a destination processing. For each segment of data, the source processing unit transmits a remote memory operation to the destination processing unit that includes associated metadata that identifies the memory location of a corresponding synchronization object. The remote memory operation along with the metadata is transmitted as a single unit to the destination processing unit. The destination processing unit splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processing unit avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.Type: GrantFiled: August 31, 2022Date of Patent: October 1, 2024Assignee: NVIDIA CORPORATIONInventors: Srinivas Santosh Kumar Madugula, Olivier Giroux, Wishwesh Anil Gandhi, Michael Allen Parker, Raghuram L, Ivan Tanasic, Manan Patel, Mark Hummel, Alexander L. Minkin
-
Patent number: 12001725Abstract: A combined on-package and off-package memory system uses a custom base-layer within which are fabricated one or more dedicated interfaces to off-package memories. An on-package processor and on-package memories are also directly coupled to the custom base-layer. The custom base-layer includes memory management logic between the processor and memories (both off and on package) to steer requests. The memories are exposed as a combined memory space having greater bandwidth and capacity compared with either the off-package memories or the on-package memories alone. The memory management logic services requests while maintaining quality of service (QoS) to satisfy bandwidth requirements for each allocation. An allocation may include any combination of the on and/or off package memories. The memory management logic also manages data migration between the on and off package memories.Type: GrantFiled: August 23, 2023Date of Patent: June 4, 2024Assignee: NVIDIA CorporationInventors: Niladrish Chatterjee, James Michael O'Connor, Donghyuk Lee, Gaurav Uttreja, Wishwesh Anil Gandhi
-
Publication number: 20240069736Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a multiprocessor computing system. During a remote memory operation in the multiprocessor computing system, a source processing unit transmits multiple segments of data to a destination processing. For each segment of data, the source processing unit transmits a remote memory operation to the destination processing unit that includes associated metadata that identifies the memory location of a corresponding synchronization object. The remote memory operation along with the metadata is transmitted as a single unit to the destination processing unit. The destination processing unit splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processing unit avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.Type: ApplicationFiled: August 31, 2022Publication date: February 29, 2024Inventors: Srinivas Santosh Kumar MADUGULA, Olivier GIROUX, Wishwesh Anil GANDHI, Michael Allen PARKER, Raghuram L, Ivan TANASIC, Manan PATEL, Mark HUMMEL, Alexander L. MINKIN
-
Patent number: 11893423Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: September 5, 2019Date of Patent: February 6, 2024Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Sonata Gale Wen, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Publication number: 20230393788Abstract: A combined on-package and off-package memory system uses a custom base-layer within which are fabricated one or more dedicated interfaces to off-package memories. An on-package processor and on-package memories are also directly coupled to the custom base-layer. The custom base-layer includes memory management logic between the processor and memories (both off and on package) to steer requests. The memories are exposed as a combined memory space having greater bandwidth and capacity compared with either the off-package memories or the on-package memories alone. The memory management logic services requests while maintaining quality of service (QoS) to satisfy bandwidth requirements for each allocation. An allocation may include any combination of the on and/or off package memories. The memory management logic also manages data migration between the on and off package memories.Type: ApplicationFiled: August 23, 2023Publication date: December 7, 2023Inventors: Nilandrish Chatterjee, James Michael O'Connor, Donghyuk Lee, Gaurav Uttreja, Wishwesh Anil Gandhi
-
Publication number: 20230333746Abstract: Various embodiments include techniques for performing speculative remote memory operation tracking in a multiprocessor computing system. Conventionally, transfers of data between processors and other components of a computing system require memory synchronization operations to determine that the data is valid and coherent before the data is transferred from a destination to a requesting source. Existing techniques for performing these memory synchronization operations are increasingly inefficient as the number of components in a computing system increases, particularly for remote memory operations. The disclosed techniques track remote memory operations and speculatively perform these memory synchronization operations. As a result, a given memory synchronization operation is often complete prior to the corresponding remote memory operation arrives at the destination, leading to improved efficiency and performance of remote memory operations in complex computing systems.Type: ApplicationFiled: November 17, 2022Publication date: October 19, 2023Inventors: Raymond Hoi Man WONG, Debajit BHATTACHARYA, Michael Allen PARKER, Wishwesh Anil GANDHI
-
Patent number: 11789649Abstract: A combined on-package and off-package memory system uses a custom base-layer within which are fabricated one or more dedicated interfaces to off-package memories. An on-package processor and on-package memories are also directly coupled to the custom base-layer. The custom base-layer includes memory management logic between the processor and memories (both off and on package) to steer requests. The memories are exposed as a combined memory space having greater bandwidth and capacity compared with either the off-package memories or the on-package memories alone. The memory management logic services requests while maintaining quality of service (QoS) to satisfy bandwidth requirements for each allocation. An allocation may include any combination of the on and/or off package memories. The memory management logic also manages data migration between the on and off package memories.Type: GrantFiled: April 22, 2021Date of Patent: October 17, 2023Assignee: NVIDIA CorporationInventors: Niladrish Chatterjee, James Michael O'Connor, Donghyuk Lee, Gaurav Uttreja, Wishwesh Anil Gandhi
-
Publication number: 20230315328Abstract: Various embodiments include techniques for accessing extended memory in a parallel processing system via a high-bandwidth path to extended memory residing on a central processing unit. The disclosed extended memory system extends the directly addressable high-bandwidth memory local to a parallel processing system and avoids the performance penalties associated with low-bandwidth system memory. As a result, execution threads that are highly parallelizable and access a large memory space execute with increased performance on a parallel processing system relative to prior approaches.Type: ApplicationFiled: March 18, 2022Publication date: October 5, 2023Inventors: Hemayet HOSSAIN, Steven E. MOLNAR, Jonathon Stuart Ramsay EVANS, Wishwesh Anil GANDHI, Lacky V. SHAH, Vyas VENKATARAMAN, Mark HAIRGROVE, Geoffrey GERFIN, Jeffrey M. SMITH, Terje BERGSTROM, Vikram SETHI, Piyush PATEL
-
Patent number: 11663036Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: September 5, 2019Date of Patent: May 30, 2023Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Patent number: 11635986Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: September 5, 2019Date of Patent: April 25, 2023Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Patent number: 11579925Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: September 5, 2019Date of Patent: February 14, 2023Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Publication number: 20230021678Abstract: Various embodiments include a parallel processing computer system that provides multiple memory synchronization domains in a single parallel processor to reduce unneeded synchronization operations. During execution, one execution kernel may synchronize with one or more other execution kernels by processing outstanding memory references. The parallel processor tracks memory references for each domain to each portion of local and remote memory. During synchronization, the processor synchronizes the memory references for a specific domain while refraining from synchronizing memory references for other domains. As a result, synchronization operations between kernels complete in a reduced amount of time relative to prior approaches.Type: ApplicationFiled: July 20, 2021Publication date: January 26, 2023Inventors: Michael Allen PARKER, Debajit BHATTACHARYA, David FONTAINE, Shirish GADRE, Wishwesh Anil GANDHI, Olivier GIROUX, Hemayet HOSSAIN, Ronny M. KRASHINSKY, Ze LONG, Raymond Hoi Man WONG
-
Patent number: 11513686Abstract: Accesses between a processor and its external memory is reduced when the processor internally maintains a compressed version of values stored in the external memory. The processor can then refer to the compressed version rather than access the external memory. One compression technique involves maintaining a dictionary on the processor mapping portions of a memory to values. When all of the values of a portion of memory are uniform (e.g., the same), the value is stored in the dictionary for that portion of memory. Thereafter, when the processor needs to access that portion of memory, the value is retrieved from the dictionary rather than from external memory. Techniques are disclosed herein to extend, for example, the capabilities of such dictionary-based compression so that the amount of accesses between the processor and its external memory are further reduced.Type: GrantFiled: May 5, 2020Date of Patent: November 29, 2022Assignee: NVIDIA CorporationInventors: Ram Rangan, Suryakant Patidar, Praveen Krishnamurthy, Wishwesh Anil Gandhi
-
Publication number: 20220342595Abstract: A combined on-package and off-package memory system uses a custom base-layer within which are fabricated one or more dedicated interfaces to off-package memories. An on-package processor and on-package memories are also directly coupled to the custom base-layer. The custom base-layer includes memory management logic between the processor and memories (both off and on package) to steer requests. The memories are exposed as a combined memory space having greater bandwidth and capacity compared with either the off-package memories or the on-package memories alone. The memory management logic services requests while maintaining quality of service (QoS) to satisfy bandwidth requirements for each allocation. An allocation may include any combination of the on and/or off package memories. The memory management logic also manages data migration between the on and off package memories.Type: ApplicationFiled: April 22, 2021Publication date: October 27, 2022Inventors: Niladrish Chatterjee, James Michael O'Connor, Donghyuk Lee, Gaurav Uttreja, Wishwesh Anil Gandhi
-
Patent number: 11372548Abstract: Some systems compress data utilized by a user mode software without the user mode software being aware of any compression taking place. To maintain that illusion, such systems prevent user mode software from being aware of and/or accessing the underlying compressed states of the data. While such an approach protects proprietary compression techniques used in such systems from being deciphered, such restrictions limit the ability of user mode software to use the underlying compressed forms of the data in new ways. Disclosed herein are various techniques for allowing user-mode software to access the underlying compressed states of data either directly or indirectly. Such techniques can be used, for example, to allow various user-mode software on a single system or on multiple systems to exchange data in the underlying compression format of the system(s) even when the user mode software is unable to decipher the compression format.Type: GrantFiled: May 29, 2020Date of Patent: June 28, 2022Assignee: NVIDIA CorporationInventors: Ram Rangan, Patrick Richard Brown, Wishwesh Anil Gandhi, Steven James Heinrich, Mathias Heyer, Emmett Michael Kilgariff, Praveen Krishnamurthy, Dong Han Ryu
-
Patent number: 11263051Abstract: Accesses between a processor and its external memory is reduced when the processor internally maintains a compressed version of values stored in the external memory. The processor can then refer to the compressed version rather than access the external memory. One compression technique involves maintaining a dictionary on the processor mapping portions of a memory to values. When all of the values of a portion of memory are uniform (e.g., the same), the value is stored in the dictionary for that portion of memory. Thereafter, when the processor needs to access that portion of memory, the value is retrieved from the dictionary rather than from external memory. Techniques are disclosed herein to extend, for example, the capabilities of such dictionary-based compression so that the amount of accesses between the processor and its external memory are further reduced.Type: GrantFiled: May 5, 2020Date of Patent: March 1, 2022Assignee: NVIDIA CorporationInventors: Ram Rangan, Suryakant Patidar, Praveen Krishnamurthy, Wishwesh Anil Gandhi
-
Patent number: 11249905Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.Type: GrantFiled: September 5, 2019Date of Patent: February 15, 2022Assignee: NVIDIA CORPORATIONInventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
-
Publication number: 20210373774Abstract: Some systems compress data utilized by a user mode software without the user mode software being aware of any compression taking place. To maintain that illusion, such systems prevent user mode software from being aware of and/or accessing the underlying compressed states of the data. While such an approach protects proprietary compression techniques used in such systems from being deciphered, such restrictions limit the ability of user mode software to use the underlying compressed forms of the data in new ways. Disclosed herein are various techniques for allowing user-mode software to access the underlying compressed states of data either directly or indirectly. Such techniques can be used, for example, to allow various user-mode software on a single system or on multiple systems to exchange data in the underlying compression format of the system(s) even when the user mode software is unable to decipher the compression format.Type: ApplicationFiled: May 29, 2020Publication date: December 2, 2021Inventors: Ram Rangan, Patrick Richard Brown, Wishwesh Anil Gandhi, Steven James Heinrich, Mathias Heyer, Emmett Michael Kilgariff, Praveen Krishnamurthy, Dong Han Ryu
-
Publication number: 20210349761Abstract: Accesses between a processor and its external memory is reduced when the processor internally maintains a compressed version of values stored in the external memory. The processor can then refer to the compressed version rather than access the external memory. One compression technique involves maintaining a dictionary on the processor mapping portions of a memory to values. When all of the values of a portion of memory are uniform (e.g., the same), the value is stored in the dictionary for that portion of memory. Thereafter, when the processor needs to access that portion of memory, the value is retrieved from the dictionary rather than from external memory. Techniques are disclosed herein to extend, for example, the capabilities of such dictionary-based compression so that the amount of accesses between the processor and its external memory are further reduced.Type: ApplicationFiled: May 5, 2020Publication date: November 11, 2021Inventors: Ram Rangan, Suryakant Patidar, Praveen Krishnamurthy, Wishwesh Anil Gandhi