Patents by Inventor Shirish Gadre

Shirish Gadre has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Hybrid allocation of data lines in a streaming cache memory

Patent number: 11934311

Abstract: Various embodiments include a system for managing cache memory in a computing system. The system includes a sectored cache memory that provides a mechanism for sharing sectors in a cache line among multiple cache line allocations. Traditionally, different cache line allocations are assigned to different cache lines in the cache memory. Further, cache line allocations may not use all of the sectors of the cache line, leading to low utilization of the cache memory. With the present techniques, multiple cache lines share the same cache line, leading to improved cache memory utilization relative to prior techniques. Further, sectors of cache allocations can be assigned to reduce data bank conflicts when accessing cache memory. Reducing such data bank conflicts can result in improved memory access performance, even when cache lines are shared with multiple allocations.

Type: Grant

Filed: May 4, 2022

Date of Patent: March 19, 2024

Assignee: NVIDIA CORPORATION

Inventors: Michael Fetterman, Steven James Heinrich, Shirish Gadre
Systems and methods for processing contributions made to purchaser selected organizations

Patent number: 11928746

Abstract: Systems and methods configured for receiving, from a user device associated with a user, a first request to associate the user with a target and account information associated with the user; retrieving the target from a database to associate the user with the retrieved target; receiving, from an entity, event data associated with the account information; computing an amount of contribution based on the received event data; transmitting the computed amount of contribution to a distribution system configured to distribute the contribution to the associated target; updating the database by aggregating the computed amount of contribution to a stored contribution data associated with the user; retrieving pre-authenticated data of the user allowing transmission of a message to a public forum; generating the message based on the event data and the computed amount of contribution; and transmitting the message to one or more devices via the public forum.

Type: Grant

Filed: July 20, 2020

Date of Patent: March 12, 2024

Assignee: Populus Financial Group, Inc.

Inventors: Paul De Vos, Joseph Taylor, Shirish Gadre
Techniques for efficiently transferring data to a processor

Patent number: 11907717

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Type: Grant

Filed: February 8, 2023

Date of Patent: February 20, 2024

Assignee: NVIDIA Corporation

Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
Techniques for storing sub-alignment data when accelerating Smith-Waterman sequence alignments

Patent number: 11822541

Abstract: Various techniques for accelerating Smith-Waterman sequence alignments are provided. For example, threads in a group of threads are employed to use an interleaved cell layout to store relevant data in registers while computing sub-alignment data for one or more local alignment problems. In another example, specialized instructions that reduce the number of cycles required to compute each sub-alignment score are utilized. In another example, threads are employed to compute sub-alignment data for a subset of columns of one or more local alignment problems while other threads begin computing sub-alignment data based on partial result data received from the preceding threads. After computing a maximum sub-alignment score, a thread stores the maximum sub-alignment score and the corresponding position in global memory.

Type: Grant

Filed: September 30, 2021

Date of Patent: November 21, 2023

Assignee: NVIDIA CORPORATION

Inventors: Maciej Piotr Tyrlik, Ajay Sudarshan Tirumala, Shirish Gadre
HYBRID ALLOCATION OF DATA LINES IN A STREAMING CACHE MEMORY

Publication number: 20230359560

Abstract: Various embodiments include a system for managing cache memory in a computing system. The system includes a sectored cache memory that provides a mechanism for sharing sectors in a cache line among multiple cache line allocations. Traditionally, different cache line allocations are assigned to different cache lines in the cache memory. Further, cache line allocations may not use all of the sectors of the cache line, leading to low utilization of the cache memory. With the present techniques, multiple cache lines share the same cache line, leading to improved cache memory utilization relative to prior techniques. Further, sectors of cache allocations can be assigned to reduce data bank conflicts when accessing cache memory. Reducing such data bank conflicts can result in improved memory access performance, even when cache lines are shared with multiple allocations.

Type: Application

Filed: May 4, 2022

Publication date: November 9, 2023

Inventors: Michael FETTERMAN, Steven James HEINRICH, Shirish GADRE
High performance synchronization mechanisms for coordinating operations on a computer system

Patent number: 11803380

Abstract: To synchronize operations of a computing system, a new type of synchronization barrier is disclosed. In one embodiment, the disclosed synchronization barrier provides for certain synchronization mechanisms such as, for example, “Arrive” and “Wait” to be split to allow for greater flexibility and efficiency in coordinating synchronization. In another embodiment, the disclosed synchronization barrier allows for hardware components such as, for example, dedicated copy or direct-memory-access (DMA) engines to be synchronized with software-based threads.

Type: Grant

Filed: December 12, 2019

Date of Patent: October 31, 2023

Assignee: NVIDIA Corporation

Inventors: Olivier Giroux, Jack Choquette, Ronny Krashinsky, Steve Heinrich, Xiaogang Qiu, Shirish Gadre
IMPLEMENTING SPECIALIZED INSTRUCTIONS FOR ACCELERATING DYNAMIC PROGRAMMING ALGORITHMS

Publication number: 20230305844

Abstract: Various techniques for accelerating dynamic programming algorithms are provided. For example, a fused addition and comparison instruction, a three-operand comparison instruction, and a two-operand comparison instruction are used to accelerate a Needleman-Wunsch algorithm that determines an optimized global alignment of subsequences over two entire sequences. In another example, the fused addition and comparison instruction is used in an innermost loop of a Floyd-Warshall algorithm to reduce the number of instructions required to determine shortest paths between pairs of vertices in a graph. In another example, a two-way single instruction multiple data (SIMD) floating point variant of the three-operand comparison instruction is used to reduce the number of instructions required to determine the median of an array of floating point values.

Type: Application

Filed: September 28, 2022

Publication date: September 28, 2023

Inventors: Maciej Piotr TYRLIK, Ajay Sudarshan TIRUMALA, Shirish GADRE, Frank Joseph EATON, Daniel Alan STIFFLER
CACHE MEMORY WITH PER-SECTOR CACHE RESIDENCY CONTROLS

Publication number: 20230305957

Abstract: Various embodiments include techniques for managing cache memory in a computing system. The computing system includes a sectored cache memory that provides a mechanism for software applications to directly invalidate data items stored in the cache memory on a sector-by-sector basis, where a sector is smaller than a cache line. When all sectors in a cache line have been invalidated, the cache line is implicitly invalidated, freeing the cache line to be reallocated for other purposes. In cases where the data items to be invalidated can be aligned to sector boundaries, the disclosed techniques effectively use status indicators in the cache tag memory to track which sectors, and corresponding data items, have been invalidated by the software application. Thus, the disclosed techniques thereby enable a low-overhead solution for invalidating individual data items that are smaller than a cache line without additional tracking data structures or consuming additional memory transfer bandwidth.

Type: Application

Filed: March 23, 2022

Publication date: September 28, 2023

Inventors: Michael FETTERMAN, Shirish GADRE, Steven James HEINRICH, Martin STICH, Liang YIN
Out of order memory request tracking structure and technique

Patent number: 11768686

Abstract: In a streaming cache, multiple, dynamically sized tracking queues are employed. Request tracking information is distributed among the plural tracking queues to selectively enable out-of-order memory request returns. A dynamically controlled policy assigns pending requests to tracking queues, providing for example in-order memory returns in some contexts and/or for some traffic and out of order memory returns in other contexts and/or for other traffic.

Type: Grant

Filed: July 27, 2020

Date of Patent: September 26, 2023

Assignee: NVIDIA Corporation

Inventors: Michael A Fetterman, Mark Gebhart, Shirish Gadre, Mitchell Hayenga, Steven Heinrich, Ramesh Jandhyala, Raghavan Madhavan, Omkar Paranjape, James Robertson, Jeff Schottmiller
RECONFIGURING REGISTER AND SHARED MEMORY USAGE IN THREAD ARRAYS

Publication number: 20230297426

Abstract: Various embodiments include techniques for utilizing resources on a processing unit. Thread groups executing on a processor begin execution with specified resources, such as a number of registers and an amount of shared memory. During execution, one or more thread groups may determine that the thread groups have excess resources needed to execute the current functions. Such thread groups can deallocate the excess resources to a free pool. Similarly, during execution, one or more thread groups may determine that the thread groups have fewer resources needed to execute the current functions. Such thread groups can allocate the needed resources from the free pool. Further, producer thread groups that generate data for consumer thread groups can deallocate excess resources prior to completion. The consumer thread groups can allocate the excess resources and initiate execution while the producer thread groups complete execution, thereby decreasing latency between producer and consumer thread groups.

Type: Application

Filed: March 18, 2022

Publication date: September 21, 2023

Inventors: Rajballav DASH, Stephen JONES, Jack Hilaire CHOQUETTE, Manan PATEL, Ronny M. KRASHINSKY, Shirish GADRE, Lixia QIN
Cooperative Group Arrays

Publication number: 20230289215

Abstract: A new level(s) of hierarchy—Cooperate Group Arrays (CGAs)—and an associated new hardware-based work distribution/execution model is described. A CGA is a grid of thread blocks (also referred to as cooperative thread arrays (CTAs)). CGAs provide co-scheduling, e.g., control over where CTAs are placed/executed in a processor (such as a GPU), relative to the memory required by an application and relative to each other. Hardware support for such CGAs guarantees concurrency and enables applications to see more data locality, reduced latency, and better synchronization between all the threads in tightly cooperating collections of CTAs programmably distributed across different (e.g., hierarchical) hardware domains or partitions.

Type: Application

Filed: March 10, 2022

Publication date: September 14, 2023

Inventors: Greg PALMER, Gentaro HIROTA, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS, Rajballav DASH, Jeff TUCKEY, Jerome F. DULUK, JR., Lacky SHAH, Luke DURANT, Jack CHOQUETTE, Eric WERNESS, Naman GOVIL, Manan PATEL, Shayani DEB, Sandeep NAVADA, John EDMONDSON, Prakash BANGALORE PRABHAKAR, Wish GANDHI, Ravi MANYAM, Apoorv PARLE, Olivier GIROUX, Shirish GADRE, Steve HEINRICH
PROGRAMMATICALLY CONTROLLED DATA MULTICASTING ACROSS MULTIPLE COMPUTE ENGINES

Publication number: 20230289190

Abstract: This specification describes a programmatic multicast technique enabling one thread (for example, in a cooperative group array (CGA) on a GPU) to request data on behalf of one or more other threads (for example, executing on respective processor cores of the GPU). The multicast is supported by tracking circuitry that interfaces between multicast requests received from processor cores and the available memory. The multicast is designed to reduce cache (for example, layer 2 cache) bandwidth utilization enabling strong scaling and smaller tile sizes.

Type: Application

Filed: March 10, 2022

Publication date: September 14, 2023

Inventors: Apoorv PARLE, Ronny KRASHINSKY, John EDMONDSON, Jack CHOQUETTE, Shirish GADRE, Steve HEINRICH, Manan PATEL, Prakash Bangalore PRABHAKAR, JR., Ravi MANYAM, Wish GANDHI, Lacky SHAH, Alexander L. Minkin
HARDWARE ACCELERATED SYNCHRONIZATION WITH ASYNCHRONOUS TRANSACTION SUPPORT

Publication number: 20230289242

Abstract: A new transaction barrier synchronization primitive enables executing threads and asynchronous transactions to synchronize across parallel processors. The asynchronous transactions may include transactions resulting from, for example, hardware data movement units such as direct memory units, etc. A hardware synchronization circuit may provide for the synchronization primitive to be stored in a cache memory so that barrier operations may be accelerated by the circuit. A new wait mechanism reduces software overhead associated with waiting on a barrier.

Type: Application

Filed: March 10, 2022

Publication date: September 14, 2023

Inventors: Timothy GUO, Jack CHOQUETTE, Shirish GADRE, Olivier GIROUX, Carter EDWARDS, John EDMONDSON, Manan PATEL, Raghavan MADHAVAN, JR., Jessie HUANG, Peter NELSON, Ronny KRASHINSKY
METHOD AND APPARATUS FOR EFFICIENT ACCESS TO MULTIDIMENSIONAL DATA STRUCTURES AND/OR OTHER LARGE DATA BLOCKS

Publication number: 20230289304

Abstract: A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.

Type: Application

Filed: March 10, 2022

Publication date: September 14, 2023

Inventors: Alexander L. Minkin, Alan Kaatz, Oliver Giroux, Jack Choquette, Shirish Gadre, Manan Patel, John Tran, Ronny Krashinsky, Jeff Schottmiller
METHOD AND APPARATUS FOR EFFICIENT ACCESS TO MULTIDIMENSIONAL DATA STRUCTURES AND/OR OTHER LARGE DATA BLOCKS

Publication number: 20230289292

Abstract: A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.

Type: Application

Filed: March 10, 2022

Publication date: September 14, 2023

Inventors: Alexander L. Minkin, Alan Kaatz, Olivier Giroux, Jack Choquette, Shirish Gadre, Manan Patel, John Tran, Ronny Krashinsky, Jeff Schottmiller
Distributed Shared Memory

Publication number: 20230289189

Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache.

Type: Application

Filed: March 10, 2022

Publication date: September 14, 2023

Inventors: Prakash BANGALORE PRABHAKAR, Gentaro HIROTA, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS, Rajballav DASH, Jeff TUCKEY, Jerome F. DULUK, JR., Lacky SHAH, Luke DURANT, Jack CHOQUETTE, Eric WERNESS, Naman GOVIL, Manan PATEL, Shayani DEB, Sandeep NAVADA, John EDMONDSON, Greg PALMER, Wish GANDHI, Ravi MANYAM, Apoorv PARLE, Olivier GIROUX, Shirish GADRE, Steve HEINRICH
TECHNIQUES FOR EFFICIENTLY TRANSFERRING DATA TO A PROCESSOR

Publication number: 20230185570

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Type: Application

Filed: February 8, 2023

Publication date: June 15, 2023

Inventors: Andrew KERR, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
TECHNIQUES FOR ACCELERATING SMITH-WATERMAN SEQUENCE ALIGNMENTS

Publication number: 20230101085

Abstract: Various techniques for accelerating Smith-Waterman sequence alignments are provided. For example, threads in a group of threads are employed to use an interleaved cell layout to store relevant data in registers while computing sub-alignment data for one or more local alignment problems. In another example, specialized instructions that reduce the number of cycles required to compute each sub-alignment score are utilized. In another example, threads are employed to compute sub-alignment data for a subset of columns of one or more local alignment problems while other threads begin computing sub-alignment data based on partial result data received from the preceding threads. After computing a maximum sub-alignment score, a thread stores the maximum sub-alignment score and the corresponding position in global memory.

Type: Application

Filed: September 30, 2021

Publication date: March 30, 2023

Inventors: Maciej Piotr TYRLIK, Ajay Sudarshan TIRUMALA, Shirish GADRE
TECHNIQUES FOR STORING SUB-ALIGNMENT DATA WHEN ACCELERATING SMITH-WATERMAN SEQUENCE ALIGNMENTS

Publication number: 20230095916

Abstract: Various techniques for accelerating Smith-Waterman sequence alignments are provided. For example, threads in a group of threads are employed to use an interleaved cell layout to store relevant data in registers while computing sub-alignment data for one or more local alignment problems. In another example, specialized instructions that reduce the number of cycles required to compute each sub-alignment score are utilized. In another example, threads are employed to compute sub-alignment data for a subset of columns of one or more local alignment problems while other threads begin computing sub-alignment data based on partial result data received from the preceding threads. After computing a maximum sub-alignment score, a thread stores the maximum sub-alignment score and the corresponding position in global memory.

Type: Application

Filed: September 30, 2021

Publication date: March 30, 2023

Inventors: Maciej Piotr TYRLIK, Ajay Sudarshan TIRUMALA, Shirish GADRE
Techniques for efficiently transferring data to a processor

Patent number: 11604649

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Type: Grant

Filed: June 30, 2021

Date of Patent: March 14, 2023

Assignee: NVIDIA Corporation

Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz

1 2 3 4 5 … next