Patents by Inventor Wishwesh Anil GANDHI

Wishwesh Anil GANDHI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SELF-SYNCHRONIZING REMOTE MEMORY OPERATIONS IN A MULTIPROCESSOR SYSTEM

Publication number: 20240393951

Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a multiprocessor computing system. During a remote memory operation in the multiprocessor computing system, a source processing unit transmits multiple segments of data to a destination processing. For each segment of data, the source processing unit transmits a remote memory operation to the destination processing unit that includes associated metadata that identifies the memory location of a corresponding synchronization object. The remote memory operation along with the metadata is transmitted as a single unit to the destination processing unit. The destination processing unit splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processing unit avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.

Type: Application

Filed: July 10, 2024

Publication date: November 28, 2024

Inventors: Srinivas Santosh Kumar MADUGULA, Olivier GIROUX, Wishwesh Anil GANDHI, Michael Allen PARKER, Raghuram L, Ivan TANASIC, Manan PATEL, Mark HUMMEL, Alexander L. MINKIN
SELF-SYNCHRONIZING REMOTE MEMORY OPERATIONS IN A DATA CENTER OR MULTIPROCESSOR SYSTEM

Publication number: 20240354106

Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a data center or multiprocessor computing system. During a remote memory operation, a source processor transmits multiple data segments to a destination processor. For each data segment, the source processor transmits a remote memory operation to the destination processor that includes associated metadata that identifies the memory location of a corresponding synchronization object representing a count of data segments to be stored or a flag for each data segment to be stored. The remote memory operation along with the metadata is transmitted as a single unit to the destination processor. The destination processor splits the operation into the remote memory operation and the memory synchronization operation.

Type: Application

Filed: June 26, 2024

Publication date: October 24, 2024

Inventors: Srinivas Santosh Kumar MADUGULA, Olivier GIROUX, Wishwesh Anil GANDHI, Michael Allen PARKER, Raghuram L, Ivan TANASIC, Manan PATEL, Mark HUMMEL, Alexander L. MINKIN, Gregory Michael THORSON
Self-synchronizing remote memory operations in a multiprocessor system

Patent number: 12105960

Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a multiprocessor computing system. During a remote memory operation in the multiprocessor computing system, a source processing unit transmits multiple segments of data to a destination processing. For each segment of data, the source processing unit transmits a remote memory operation to the destination processing unit that includes associated metadata that identifies the memory location of a corresponding synchronization object. The remote memory operation along with the metadata is transmitted as a single unit to the destination processing unit. The destination processing unit splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processing unit avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.

Type: Grant

Filed: August 31, 2022

Date of Patent: October 1, 2024

Assignee: NVIDIA CORPORATION

Inventors: Srinivas Santosh Kumar Madugula, Olivier Giroux, Wishwesh Anil Gandhi, Michael Allen Parker, Raghuram L, Ivan Tanasic, Manan Patel, Mark Hummel, Alexander L. Minkin
Combined on-package and off-package memory system

Patent number: 12001725

Abstract: A combined on-package and off-package memory system uses a custom base-layer within which are fabricated one or more dedicated interfaces to off-package memories. An on-package processor and on-package memories are also directly coupled to the custom base-layer. The custom base-layer includes memory management logic between the processor and memories (both off and on package) to steer requests. The memories are exposed as a combined memory space having greater bandwidth and capacity compared with either the off-package memories or the on-package memories alone. The memory management logic services requests while maintaining quality of service (QoS) to satisfy bandwidth requirements for each allocation. An allocation may include any combination of the on and/or off package memories. The memory management logic also manages data migration between the on and off package memories.

Type: Grant

Filed: August 23, 2023

Date of Patent: June 4, 2024

Assignee: NVIDIA Corporation

Inventors: Niladrish Chatterjee, James Michael O'Connor, Donghyuk Lee, Gaurav Uttreja, Wishwesh Anil Gandhi
SELF-SYNCHRONIZING REMOTE MEMORY OPERATIONS IN A MULTIPROCESSOR SYSTEM

Publication number: 20240069736

Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a multiprocessor computing system. During a remote memory operation in the multiprocessor computing system, a source processing unit transmits multiple segments of data to a destination processing. For each segment of data, the source processing unit transmits a remote memory operation to the destination processing unit that includes associated metadata that identifies the memory location of a corresponding synchronization object. The remote memory operation along with the metadata is transmitted as a single unit to the destination processing unit. The destination processing unit splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processing unit avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.

Type: Application

Filed: August 31, 2022

Publication date: February 29, 2024

Inventors: Srinivas Santosh Kumar MADUGULA, Olivier GIROUX, Wishwesh Anil GANDHI, Michael Allen PARKER, Raghuram L, Ivan TANASIC, Manan PATEL, Mark HUMMEL, Alexander L. MINKIN
Techniques for configuring a processor to function as multiple, separate processors

Patent number: 11893423

Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.

Type: Grant

Filed: September 5, 2019

Date of Patent: February 6, 2024

Assignee: NVIDIA CORPORATION

Inventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Sonata Gale Wen, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
COMBINED ON-PACKAGE AND OFF-PACKAGE MEMORY SYSTEM

Publication number: 20230393788

Abstract: A combined on-package and off-package memory system uses a custom base-layer within which are fabricated one or more dedicated interfaces to off-package memories. An on-package processor and on-package memories are also directly coupled to the custom base-layer. The custom base-layer includes memory management logic between the processor and memories (both off and on package) to steer requests. The memories are exposed as a combined memory space having greater bandwidth and capacity compared with either the off-package memories or the on-package memories alone. The memory management logic services requests while maintaining quality of service (QoS) to satisfy bandwidth requirements for each allocation. An allocation may include any combination of the on and/or off package memories. The memory management logic also manages data migration between the on and off package memories.

Type: Application

Filed: August 23, 2023

Publication date: December 7, 2023

Inventors: Nilandrish Chatterjee, James Michael O'Connor, Donghyuk Lee, Gaurav Uttreja, Wishwesh Anil Gandhi
SPECULATIVE REMOTE MEMORY OPERATION TRACKING FOR EFFICIENT MEMORY BARRIER

Publication number: 20230333746

Abstract: Various embodiments include techniques for performing speculative remote memory operation tracking in a multiprocessor computing system. Conventionally, transfers of data between processors and other components of a computing system require memory synchronization operations to determine that the data is valid and coherent before the data is transferred from a destination to a requesting source. Existing techniques for performing these memory synchronization operations are increasingly inefficient as the number of components in a computing system increases, particularly for remote memory operations. The disclosed techniques track remote memory operations and speculatively perform these memory synchronization operations. As a result, a given memory synchronization operation is often complete prior to the corresponding remote memory operation arrives at the destination, leading to improved efficiency and performance of remote memory operations in complex computing systems.

Type: Application

Filed: November 17, 2022

Publication date: October 19, 2023

Inventors: Raymond Hoi Man WONG, Debajit BHATTACHARYA, Michael Allen PARKER, Wishwesh Anil GANDHI
Combined on-package and off-package memory system

Patent number: 11789649

Abstract: A combined on-package and off-package memory system uses a custom base-layer within which are fabricated one or more dedicated interfaces to off-package memories. An on-package processor and on-package memories are also directly coupled to the custom base-layer. The custom base-layer includes memory management logic between the processor and memories (both off and on package) to steer requests. The memories are exposed as a combined memory space having greater bandwidth and capacity compared with either the off-package memories or the on-package memories alone. The memory management logic services requests while maintaining quality of service (QoS) to satisfy bandwidth requirements for each allocation. An allocation may include any combination of the on and/or off package memories. The memory management logic also manages data migration between the on and off package memories.

Type: Grant

Filed: April 22, 2021

Date of Patent: October 17, 2023

Assignee: NVIDIA Corporation

Inventors: Niladrish Chatterjee, James Michael O'Connor, Donghyuk Lee, Gaurav Uttreja, Wishwesh Anil Gandhi
HIGH BANDWIDTH EXTENDED MEMORY IN A PARALLEL PROCESSING SYSTEM

Publication number: 20230315328

Abstract: Various embodiments include techniques for accessing extended memory in a parallel processing system via a high-bandwidth path to extended memory residing on a central processing unit. The disclosed extended memory system extends the directly addressable high-bandwidth memory local to a parallel processing system and avoids the performance penalties associated with low-bandwidth system memory. As a result, execution threads that are highly parallelizable and access a large memory space execute with increased performance on a parallel processing system relative to prior approaches.

Type: Application

Filed: March 18, 2022

Publication date: October 5, 2023

Inventors: Hemayet HOSSAIN, Steven E. MOLNAR, Jonathon Stuart Ramsay EVANS, Wishwesh Anil GANDHI, Lacky V. SHAH, Vyas VENKATARAMAN, Mark HAIRGROVE, Geoffrey GERFIN, Jeffrey M. SMITH, Terje BERGSTROM, Vikram SETHI, Piyush PATEL
Techniques for configuring a processor to function as multiple, separate processors

Patent number: 11663036

Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.

Type: Grant

Filed: September 5, 2019

Date of Patent: May 30, 2023

Assignee: NVIDIA CORPORATION

Inventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
Techniques for configuring a processor to function as multiple, separate processors

Patent number: 11635986

Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.

Type: Grant

Filed: September 5, 2019

Date of Patent: April 25, 2023

Assignee: NVIDIA CORPORATION

Inventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
Techniques for reconfiguring partitions in a parallel processing system

Patent number: 11579925

Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.

Type: Grant

Filed: September 5, 2019

Date of Patent: February 14, 2023

Assignee: NVIDIA CORPORATION

Inventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
THREAD SYNCHRONIZATION ACROSS MEMORY SYNCHRONIZATION DOMAINS

Publication number: 20230021678

Abstract: Various embodiments include a parallel processing computer system that provides multiple memory synchronization domains in a single parallel processor to reduce unneeded synchronization operations. During execution, one execution kernel may synchronize with one or more other execution kernels by processing outstanding memory references. The parallel processor tracks memory references for each domain to each portion of local and remote memory. During synchronization, the processor synchronizes the memory references for a specific domain while refraining from synchronizing memory references for other domains. As a result, synchronization operations between kernels complete in a reduced amount of time relative to prior approaches.

Type: Application

Filed: July 20, 2021

Publication date: January 26, 2023

Inventors: Michael Allen PARKER, Debajit BHATTACHARYA, David FONTAINE, Shirish GADRE, Wishwesh Anil GANDHI, Olivier GIROUX, Hemayet HOSSAIN, Ronny M. KRASHINSKY, Ze LONG, Raymond Hoi Man WONG
Techniques for dynamically compressing memory regions having a uniform value

Patent number: 11513686

Abstract: Accesses between a processor and its external memory is reduced when the processor internally maintains a compressed version of values stored in the external memory. The processor can then refer to the compressed version rather than access the external memory. One compression technique involves maintaining a dictionary on the processor mapping portions of a memory to values. When all of the values of a portion of memory are uniform (e.g., the same), the value is stored in the dictionary for that portion of memory. Thereafter, when the processor needs to access that portion of memory, the value is retrieved from the dictionary rather than from external memory. Techniques are disclosed herein to extend, for example, the capabilities of such dictionary-based compression so that the amount of accesses between the processor and its external memory are further reduced.

Type: Grant

Filed: May 5, 2020

Date of Patent: November 29, 2022

Assignee: NVIDIA Corporation

Inventors: Ram Rangan, Suryakant Patidar, Praveen Krishnamurthy, Wishwesh Anil Gandhi
COMBINED ON-PACKAGE AND OFF-PACKAGE MEMORY SYSTEM

Publication number: 20220342595

Abstract: A combined on-package and off-package memory system uses a custom base-layer within which are fabricated one or more dedicated interfaces to off-package memories. An on-package processor and on-package memories are also directly coupled to the custom base-layer. The custom base-layer includes memory management logic between the processor and memories (both off and on package) to steer requests. The memories are exposed as a combined memory space having greater bandwidth and capacity compared with either the off-package memories or the on-package memories alone. The memory management logic services requests while maintaining quality of service (QoS) to satisfy bandwidth requirements for each allocation. An allocation may include any combination of the on and/or off package memories. The memory management logic also manages data migration between the on and off package memories.

Type: Application

Filed: April 22, 2021

Publication date: October 27, 2022

Inventors: Niladrish Chatterjee, James Michael O'Connor, Donghyuk Lee, Gaurav Uttreja, Wishwesh Anil Gandhi
Techniques for accessing and utilizing compressed data and its state information

Patent number: 11372548

Abstract: Some systems compress data utilized by a user mode software without the user mode software being aware of any compression taking place. To maintain that illusion, such systems prevent user mode software from being aware of and/or accessing the underlying compressed states of the data. While such an approach protects proprietary compression techniques used in such systems from being deciphered, such restrictions limit the ability of user mode software to use the underlying compressed forms of the data in new ways. Disclosed herein are various techniques for allowing user-mode software to access the underlying compressed states of data either directly or indirectly. Such techniques can be used, for example, to allow various user-mode software on a single system or on multiple systems to exchange data in the underlying compression format of the system(s) even when the user mode software is unable to decipher the compression format.

Type: Grant

Filed: May 29, 2020

Date of Patent: June 28, 2022

Assignee: NVIDIA Corporation

Inventors: Ram Rangan, Patrick Richard Brown, Wishwesh Anil Gandhi, Steven James Heinrich, Mathias Heyer, Emmett Michael Kilgariff, Praveen Krishnamurthy, Dong Han Ryu
Techniques for scaling dictionary-based compression

Patent number: 11263051

Abstract: Accesses between a processor and its external memory is reduced when the processor internally maintains a compressed version of values stored in the external memory. The processor can then refer to the compressed version rather than access the external memory. One compression technique involves maintaining a dictionary on the processor mapping portions of a memory to values. When all of the values of a portion of memory are uniform (e.g., the same), the value is stored in the dictionary for that portion of memory. Thereafter, when the processor needs to access that portion of memory, the value is retrieved from the dictionary rather than from external memory. Techniques are disclosed herein to extend, for example, the capabilities of such dictionary-based compression so that the amount of accesses between the processor and its external memory are further reduced.

Type: Grant

Filed: May 5, 2020

Date of Patent: March 1, 2022

Assignee: NVIDIA Corporation

Inventors: Ram Rangan, Suryakant Patidar, Praveen Krishnamurthy, Wishwesh Anil Gandhi
Techniques for configuring a processor to function as multiple, separate processors

Patent number: 11249905

Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.

Type: Grant

Filed: September 5, 2019

Date of Patent: February 15, 2022

Assignee: NVIDIA CORPORATION

Inventors: Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
TECHNIQUES FOR ACCESSING AND UTILIZING COMPRESSED DATA AND ITS STATE INFORMATION

Publication number: 20210373774

Abstract: Some systems compress data utilized by a user mode software without the user mode software being aware of any compression taking place. To maintain that illusion, such systems prevent user mode software from being aware of and/or accessing the underlying compressed states of the data. While such an approach protects proprietary compression techniques used in such systems from being deciphered, such restrictions limit the ability of user mode software to use the underlying compressed forms of the data in new ways. Disclosed herein are various techniques for allowing user-mode software to access the underlying compressed states of data either directly or indirectly. Such techniques can be used, for example, to allow various user-mode software on a single system or on multiple systems to exchange data in the underlying compression format of the system(s) even when the user mode software is unable to decipher the compression format.

Type: Application

Filed: May 29, 2020

Publication date: December 2, 2021

Inventors: Ram Rangan, Patrick Richard Brown, Wishwesh Anil Gandhi, Steven James Heinrich, Mathias Heyer, Emmett Michael Kilgariff, Praveen Krishnamurthy, Dong Han Ryu

1 2 next