Patents by Inventor Shirish Gadre

Shirish Gadre has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

THREAD SYNCHRONIZATION ACROSS MEMORY SYNCHRONIZATION DOMAINS

Publication number: 20230021678

Abstract: Various embodiments include a parallel processing computer system that provides multiple memory synchronization domains in a single parallel processor to reduce unneeded synchronization operations. During execution, one execution kernel may synchronize with one or more other execution kernels by processing outstanding memory references. The parallel processor tracks memory references for each domain to each portion of local and remote memory. During synchronization, the processor synchronizes the memory references for a specific domain while refraining from synchronizing memory references for other domains. As a result, synchronization operations between kernels complete in a reduced amount of time relative to prior approaches.

Type: Application

Filed: July 20, 2021

Publication date: January 26, 2023

Inventors: Michael Allen PARKER, Debajit BHATTACHARYA, David FONTAINE, Shirish GADRE, Wishwesh Anil GANDHI, Olivier GIROUX, Hemayet HOSSAIN, Ronny M. KRASHINSKY, Ze LONG, Raymond Hoi Man WONG
Implementing specialized instructions for accelerating Smith-Waterman sequence alignments

Patent number: 11550584

Abstract: Various techniques for accelerating Smith-Waterman sequence alignments are provided. For example, threads in a group of threads are employed to use an interleaved cell layout to store relevant data in registers while computing sub-alignment data for one or more local alignment problems. In another example, specialized instructions that reduce the number of cycles required to compute each sub-alignment score are utilized. In another example, threads are employed to compute sub-alignment data for a subset of columns of one or more local alignment problems while other threads begin computing sub-alignment data based on partial result data received from the preceding threads. After computing a maximum sub-alignment score, a thread stores the maximum sub-alignment score and the corresponding position in global memory.

Type: Grant

Filed: September 30, 2021

Date of Patent: January 10, 2023

Assignee: NVIDIA CORPORATION

Inventors: Maciej Piotr Tyrlik, Ajay Sudarshan Tirumala, Shirish Gadre
SYSTEM AND METHOD OF CONTROLLING CACHE MEMORY RESIDENCY

Publication number: 20220365882

Abstract: Apparatuses, systems, and techniques to control operation of a memory cache. In at least one embodiment, cache guidance is specified within application source code by associating guidance with declaration of a memory block, and then applying specified guidance to source code statements that access said memory block.

Type: Application

Filed: August 5, 2021

Publication date: November 17, 2022

Inventors: Harold Carter Edwards, Luke David Durant, Stephen Jones, Jack H. Choquette, Ronny Krashinsky, Dmitri Vainbrand, Olivier Giroux, Olivier Francois Joseph Harel, Shirish Gadre, Ze Long, Matthieu Tardy, David Dastous St Hilaire, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Jaewook Shin, Jayashree Venkatesh, Girish Bhaskar Bharambe
Techniques for performing accelerated point sampling in a texture processing pipeline

Patent number: 11379944

Abstract: A texture processing pipeline in a graphics processing unit generates the surface appearance for objects in a computer-generated scene. This texture processing pipeline determines, at multiple stages within the texture processing pipeline, whether texture operations and texture loads may be processed at an accelerated rate. At each stage that includes a decision point, the texture processing pipeline assumes that the current texture operation or texture load can be accelerated unless specific, known information indicates that the texture operation or texture load cannot be accelerated. As a result, the texture processing pipeline increases the number of texture operations and texture loads that are accelerated relative to the number of texture operations and texture loads that are not accelerated.

Type: Grant

Filed: June 23, 2020

Date of Patent: July 5, 2022

Assignee: NVIDIA CORPORATION

Inventors: Michael Fetterman, Shirish Gadre, Mark Gebhart, Steven J. Heinrich, Ramesh Jandhyala, William Newhall, Omkar Paranjape, Stefano Pescador, Poorna Rao
Simultaneous compute and graphics scheduling

Patent number: 11367160

Abstract: A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.

Type: Grant

Filed: August 2, 2018

Date of Patent: June 21, 2022

Assignee: NVIDIA CORPORATION

Inventors: Rajballav Dash, Gregory Palmer, Gentaro Hirota, Lacky Shah, Jack Choquette, Emmett Kilgariff, Sriharsha Niverty, Milton Lei, Shirish Gadre, Omkar Paranjape, Lei Yang, Rouslan Dimitrov
Unified cache for diverse memory traffic

Patent number: 11347668

Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

Type: Grant

Filed: July 6, 2020

Date of Patent: May 31, 2022

Assignee: NVIDIA Corporation

Inventors: Xiaogang Qiu, Ronny Krashinsky, Steven Heinrich, Shirish Gadre, John Edmondson, Jack Choquette, Mark Gebhart, Ramesh Jandhyala, Poornachandra Rao, Omkar Paranjape, Michael Siu
SYSTEMS AND METHODS FOR ENABLING SELECTIVE ACTIVATION OF RESOURCE-DRAINING PROCESSES

Publication number: 20220036352

Abstract: Systems and methods configured for receiving a first signal from a user device associated with a user account; modifying a data record linked to the user account to unsuspend a protection program for the user account in response to the first signal; receiving a first request from a transaction processing system charging a first amount against the user account, the first amount being greater than a current balance of the user account; and processing the first request by: generating an authorization signal for the transaction processing system approving the first amount; modifying the data record to reflect a first balance corresponding to a difference between the current balance and the first amount; and generating a first notification signal for the user device based on the first request.

Type: Application

Filed: July 28, 2020

Publication date: February 3, 2022

Applicant: POPULUS FINANCIAL GROUP, INC.

Inventors: Paul DE VOS, Joseph TAYLOR, Shirish GADRE
OUT OF ORDER MEMORY REQUEST TRACKING STRUCTURE AND TECHNIQUE

Publication number: 20220027160

Abstract: In a streaming cache, multiple, dynamically sized tracking queues are employed. Request tracking information is distributed among the plural tracking queues to selectively enable out-of-order memory request returns. A dynamically controlled policy assigns pending requests to tracking queues, providing for example in-order memory returns in some contexts and/or for some traffic and out of order memory returns in other contexts and/or for other traffic.

Type: Application

Filed: July 27, 2020

Publication date: January 27, 2022

Inventors: Michael A. FETTERMAN, Mark GEBHART, Shirish GADRE, Mitchell HAYENGA, Steven HEINRICH, Ramesh JANDHYALA, Raghavan MADHAVAN, Omkar PARANJAPE, James ROBERTSON, Jeff SCHOTTMILLER
SYSTEMS AND METHODS FOR PROCESSING CONTRIBUTIONS MADE TO PURCHASER SELECTED ORGANIZATIONS

Publication number: 20220020098

Abstract: Systems and methods configured for receiving, from a user device associated with a user, a first request to associate the user with a target and account information associated with the user; retrieving the target from a database to associate the user with the retrieved target; receiving, from an entity, event data associated with the account information; computing an amount of contribution based on the received event data; transmitting the computed amount of contribution to a distribution system configured to distribute the contribution to the associated target; updating the database by aggregating the computed amount of contribution to a stored contribution data associated with the user; retrieving pre-authenticated data of the user allowing transmission of a message to a public forum; generating the message based on the event data and the computed amount of contribution; and transmitting the message to one or more devices via the public forum.

Type: Application

Filed: July 20, 2020

Publication date: January 20, 2022

Applicant: POPULUS FINANCIAL GROUP, INC.

Inventors: Paul DE VOS, Joseph TAYLOR, Shirish GADRE
TECHNIQUES FOR PERFORMING ACCELERATED POINT SAMPLING IN A TEXTURE PROCESSING PIPELINE

Publication number: 20210398241

Abstract: A texture processing pipeline in a graphics processing unit generates the surface appearance for objects in a computer-generated scene. This texture processing pipeline determines, at multiple stages within the texture processing pipeline, whether texture operations and texture loads may be processed at an accelerated rate. At each stage that includes a decision point, the texture processing pipeline assumes that the current texture operation or texture load can be accelerated unless specific, known information indicates that the texture operation or texture load cannot be accelerated. As a result, the texture processing pipeline increases the number of texture operations and texture loads that are accelerated relative to the number of texture operations and texture loads that are not accelerated.

Type: Application

Filed: June 23, 2020

Publication date: December 23, 2021

Inventors: Michael FETTERMAN, Shirish GADRE, Mark GEBHART, Steven J. HEINRICH, Ramesh JANDHYALA, William NEWHALL, Omkar PARANJAPE, Stefano PESCADOR, Poorna RAO
TECHNIQUES FOR EFFICIENTLY TRANSFERRING DATA TO A PROCESSOR

Publication number: 20210326137

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Type: Application

Filed: June 30, 2021

Publication date: October 21, 2021

Inventors: Andrew KERR, Jack CHOQUETTE, Xiaogang QIU, Omkar PARANJAPE, Poornachandra RAO, Shirish GADRE, Steven J. HEINRICH, Manan PATEL, Olivier GIROUX, Alan KAATZ
Techniques for efficiently transferring data to a processor

Patent number: 11080051

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Type: Grant

Filed: December 12, 2019

Date of Patent: August 3, 2021

Assignee: NVIDIA Corporation

Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
TECHNIQUES FOR EFFICIENTLY TRANSFERRING DATA TO A PROCESSOR

Publication number: 20210124582

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Type: Application

Filed: December 12, 2019

Publication date: April 29, 2021

Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
HIGH PERFORMANCE SYNCHRONIZATION MECHANISMS FOR COORDINATING OPERATIONS ON A COMPUTER SYSTEM

Publication number: 20210124627

Abstract: To synchronize operations of a computing system, a new type of synchronization barrier is disclosed. In one embodiment, the disclosed synchronization barrier provides for certain synchronization mechanisms such as, for example, “Arrive” and “Wait” to be split to allow for greater flexibility and efficiency in coordinating synchronization. In another embodiment, the disclosed synchronization barrier allows for hardware components such as, for example, dedicated copy or direct-memory-access (DMA) engines to be synchronized with software-based threads.

Type: Application

Filed: December 12, 2019

Publication date: April 29, 2021

Inventors: Olivier GIROUX, Jack CHOQUETTE, Ronny KRASHINSKY, Steve HEINRICH, Xiaogang QIU, Shirish GADRE
Binding constants at runtime for improved resource utilization

Patent number: 10877757

Abstract: A just-in-time (JIT) compiler binds constants to specific memory locations at runtime. The JIT compiler parses program code derived from a multithreaded application and identifies an instruction that references a uniform constant. The JIT compiler then determines a chain of pointers that originates within a root table specified in the multithreaded application and terminates at the uniform constant. The JIT compiler generates additional instructions for traversing the chain of pointers and inserts these instructions into the program code. A parallel processor executes this compiled code and, in doing so, causes a thread to traverse the chain of pointers and bind the uniform constant to a uniform register at runtime. Each thread in a group of threads executing on the parallel processor may then access the uniform constant.

Type: Grant

Filed: February 14, 2018

Date of Patent: December 29, 2020

Assignee: NVIDIA Corporation

Inventors: Ajay Tirumala, Jack Choquette, Manan Patel, Shirish Gadre, Praveen Kaushik, Amanpreet Grewal, Shekhar Divekar, Andrei Khodakovsky
UNIFIED CACHE FOR DIVERSE MEMORY TRAFFIC

Publication number: 20200401541

Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

Type: Application

Filed: July 6, 2020

Publication date: December 24, 2020

Inventors: Xiaogang QIU, Ronny KRASHINSKY, Steven HEINRICH, Shirish GADRE, John EDMONDSON, Jack CHOQUETTE, Mark GEBHART, Ramesh JANDHYALA, Poornachandra RAO, Omkar PARANJAPE, Michael SIU
Uniform register file for improved resource utilization

Patent number: 10866806

Abstract: A compiler parses a multithreaded application into cohesive blocks of instructions. Cohesive blocks include instructions that do not diverge or converge. Each cohesive block is associated with one or more uniform registers. When a set of threads executes the instructions in a given cohesive block, each thread in the set may access the uniform register independently of the other threads in the set. Accordingly, the uniform register may store a single copy of data on behalf of all threads in the set of threads, thereby conserving resources.

Type: Grant

Filed: February 14, 2018

Date of Patent: December 15, 2020

Assignee: NVIDIA Corporation

Inventors: Ajay Tirumala, Jack Choquette, Manan Patel, Shirish Gadre, Praveen Kaushik
Unified cache for diverse memory traffic

Patent number: 10705994

Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

Type: Grant

Filed: May 4, 2017

Date of Patent: July 7, 2020

Assignee: NVIDIA Corporation

Inventors: Xiaogang Qiu, Ronny Krashinsky, Steven Heinrich, Shirish Gadre, John Edmondson, Jack Choquette, Mark Gebhart, Ramesh Jandhyala, Poornachandra Rao, Omkar Paranjape, Michael Siu
Method and apparatus for obtaining sampled positions of texturing operations

Patent number: 10699427

Abstract: Methods and apparatuses are disclosed for reporting texture footprint information. A texture footprint identifies the portion of a texture that will be utilized in rendering a pixel in a scene. The disclosed methods and apparatuses advantageously improve system efficiency in decoupled shading systems by first identifying which texels in a given texture map are needed for subsequently rendering a scene. Therefore, the number of texels that are generated and stored may be reduced to include the identified texels. Texels that are not identified need not be rendered and/or stored.

Type: Grant

Filed: August 12, 2019

Date of Patent: June 30, 2020

Assignee: NVIDIA Corporation

Inventors: Yury Uralsky, Henry Packard Moreton, Eric Brian Lum, Jonathan J. Dunaisky, Steven James Heinrich, Stefano Pescador, Shirish Gadre, Michael Alan Fetterman
SIMULTANEOUS COMPUTE AND GRAPHICS SCHEDULING

Publication number: 20200043123

Abstract: A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.

Type: Application

Filed: August 2, 2018

Publication date: February 6, 2020

Inventors: Rajballav DASH, Gregory PALMER, Gentaro HIROTA, Lacky SHAH, Jack CHOQUETTE, Emmett KILGARIFF, Sriharsha NIVERTY, Milton LEI, Shirish GADRE, Omkar PARANJAPE, Lei YANG, Rouslan DIMITROV

prev 1 2 3 4 5 6 … next