Patents by Inventor Shirish Gadre

Shirish Gadre has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHOD AND APPARATUS FOR OBTAINING SAMPLED POSITIONS OF TEXTURING OPERATIONS

Publication number: 20200013174

Abstract: Methods and apparatuses are disclosed for reporting texture footprint information. A texture footprint identifies the portion of a texture that will be utilized in rendering a pixel in a scene. The disclosed methods and apparatuses advantageously improve system efficiency in decoupled shading systems by first identifying which texels in a given texture map are needed for subsequently rendering a scene. Therefore, the number of texels that are generated and stored may be reduced to include the identified texels. Texels that are not identified need not be rendered and/or stored.

Type: Application

Filed: August 12, 2019

Publication date: January 9, 2020

Inventors: Yury Uralsky, Henry Packard Moreton, Eric Brian Lum, Jonathan J. Dunaisky, Steven James Heinrich, Stefano Pescador, Shirish Gadre, Michael Alan Fetterman
Dispatching a stored instruction in response to determining that a received instruction is of a same instruction type

Patent number: 10503513

Abstract: A subsystem is configured to support a distributed instruction set architecture with primary and secondary execution pipelines. The primary execution pipeline supports the execution of a subset of instructions in the distributed instruction set architecture that are issued frequently. The secondary execution pipeline supports the execution of another subset of instructions in the distributed instruction set architecture that are issued less frequently. Both execution pipelines also support the execution of FFMA instructions as well as a common subset of instructions in the distributed instruction set architecture. When dispatching a requested instruction, an instruction scheduling unit is configured to select between the two execution pipelines based on various criteria. Those criteria may include power efficiency with which the instruction can be executed and availability of execution units to support execution of the instruction.

Type: Grant

Filed: October 23, 2013

Date of Patent: December 10, 2019

Assignee: NVIDIA CORPORATION

Inventors: David Conrad Tannenbaum, Srinivasan (Vasu) Iyer, Stuart F. Oberman, Ming Y. Siu, Michael Alan Fetterman, John Matthew Burgess, Shirish Gadre
Unified cache for diverse memory traffic

Patent number: 10459861

Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

Type: Grant

Filed: September 26, 2017

Date of Patent: October 29, 2019

Assignee: NVIDIA CORPORATION

Inventors: Xiaogang Qiu, Ronny Krashinsky, Steven Heinrich, Shirish Gadre, John Edmondson, Jack Choquette, Mark Gebhart, Ramesh Jandhyala, Poornachandra Rao, Omkar Paranjape, Michael Siu
Method and apparatus for obtaining sampled positions of texturing operations

Patent number: 10424074

Abstract: Methods and apparatuses are disclosed for reporting texture footprint information. A texture footprint identifies the portion of a texture that will be utilized in rendering a pixel in a scene. The disclosed methods and apparatuses advantageously improve system efficiency in decoupled shading systems by first identifying which texels in a given texture map are needed for subsequently rendering a scene. Therefore, the number of texels that are generated and stored may be reduced to include the identified texels. Texels that are not identified need not be rendered and/or stored.

Type: Grant

Filed: July 3, 2018

Date of Patent: September 24, 2019

Assignee: NVIDIA Corporation

Inventors: Yury Uralsky, Henry Packard Moreton, Eric Brian Lum, Jonathan J. Dunaisky, Steven James Heinrich, Stefano Pescador, Shirish Gadre, Michael Alan Fetterman
BINDING CONSTANTS AT RUNTIME FOR IMPROVED RESOURCE UTILIZATION

Publication number: 20190146817

Abstract: A just-in-time (JIT) compiler binds constants to specific memory locations at runtime. The JIT compiler parses program code derived from a multithreaded application and identifies an instruction that references a uniform constant. The JIT compiler then determines a chain of pointers that originates within a root table specified in the multithreaded application and terminates at the uniform constant. The JIT compiler generates additional instructions for traversing the chain of pointers and inserts these instructions into the program code. A parallel processor executes this compiled code and, in doing so, causes a thread to traverse the chain of pointers and bind the uniform constant to a uniform register at runtime. Each thread in a group of threads executing on the parallel processor may then access the uniform constant.

Type: Application

Filed: February 14, 2018

Publication date: May 16, 2019

Inventors: Ajay TIRUMALA, Jack CHOQUETTE, Manan PATEL, Shirish GADRE, Praveen KAUSHIK, Amanpreet GREWAL, Shekhar DIVEKAR, Andrei KHODAKOVSKY
UNIFORM REGISTER FILE FOR IMPROVED RESOURCE UTILIZATION

Publication number: 20190146796

Abstract: A compiler parses a multithreaded application into cohesive blocks of instructions. Cohesive blocks include instructions that do not diverge or converge. Each cohesive block is associated with one or more uniform registers. When a set of threads executes the instructions in a given cohesive block, each thread in the set may access the uniform register independently of the other threads in the set. Accordingly, the uniform register may store a single copy of data on behalf of all threads in the set of threads, thereby conserving resources.

Type: Application

Filed: February 14, 2018

Publication date: May 16, 2019

Inventors: Ajay TIRUMALA, Jack CHOQUETTE, Manan PATEL, Shirish GADRE, Praveen KAUSHIK
Cooperative thread array granularity context switch during trap handling

Patent number: 10289418

Abstract: Techniques are provided for handling a trap encountered in a thread that is part of a thread array that is being executed in a plurality of execution units. In these techniques, a data structure with an identifier associated with the thread is updated to indicate that the trap occurred during the execution of the thread array. Also in these techniques, the execution units execute a trap handling routine that includes a context switch. The execution units perform this context switch for at least one of the execution units as part of the trap handling routine while allowing the remaining execution units to exit the trap handling routine before the context switch. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors.

Type: Grant

Filed: December 27, 2012

Date of Patent: May 14, 2019

Assignee: NVIDIA CORPORATION

Inventors: Gerald F. Luiz, Philip Alexander Cuadra, Luke Durant, Shirish Gadre, Robert Ohannessian, Lacky V. Shah, Nicholas Wang, Arthur Danskin
Technique for saving and restoring thread group operating state

Patent number: 10235208

Abstract: A streaming multiprocessor (SM) included within a parallel processing unit (PPU) is configured to suspend a thread group executing on the SM and to save the operating state of the suspended thread group. A load-store unit (LSU) within the SM re-maps local memory associated with the thread group to a location in global memory. Subsequently, the SM may re-launch the suspended thread group. The LSU may then perform local memory access operations on behalf of the re-launched thread group with the re-mapped local memory that resides in global memory.

Type: Grant

Filed: December 11, 2012

Date of Patent: March 19, 2019

Assignee: NVIDIA CORPORATION

Inventors: Nicholas Wang, Lacky V. Shah, Gerald F. Luiz, Philip Alexander Cuadra, Luke Durant, Shirish Gadre
Pre-scheduled replays of divergent operations

Patent number: 10152329

Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.

Type: Grant

Filed: February 9, 2012

Date of Patent: December 11, 2018

Assignee: NVIDIA CORPORATION

Inventors: Michael Fetterman, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
UNIFIED CACHE FOR DIVERSE MEMORY TRAFFIC

Publication number: 20180322078

Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

Type: Application

Filed: September 26, 2017

Publication date: November 8, 2018

Inventors: Xiaogang QIU, Ronny KRASHINSKY, Steven HEINRICH, Shirish GADRE, John EDMONDSON, Jack CHOQUETTE, Mark GEBHART, Ramesh JANDHYALA, Poornachandra RAO, Omkar PARANJAPE, Michael SIU
UNIFIED CACHE FOR DIVERSE MEMORY TRAFFIC

Publication number: 20180322077

Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

Type: Application

Filed: May 4, 2017

Publication date: November 8, 2018

Inventors: Xiaogang QIU, Ronny KRASHINSKY, Steven HEINRICH, Shirish GADRE, John EDMONDSON, Jack CHOQUETTE, Mark GEBHART, Ramesh JANDHYALA, Poornachandra RAO, Omkar PARANJAPE, Michael SIU
System, method, and computer program product for warming a cache for a task launch

Patent number: 10114755

Abstract: A system, method, and computer program product for warming a cache for a task launch is described. The method includes the steps of receiving a task data structure that defines a processing task, extracting information stored in a cache warming field of the task data structure, and, prior to executing the processing task, generating a cache warming instruction that is configured to load one or more entries of a cache storage with data fetched from a memory.

Type: Grant

Filed: June 14, 2013

Date of Patent: October 30, 2018

Assignee: NVIDIA CORPORATION

Inventors: Scott Ricketts, Nicholas Wang, Shirish Gadre, Gentaro Hirota, Robert Ohannessian, Jr.
Mechanism for waking common resource requests within a resource management subsystem

Patent number: 10095548

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Grant

Filed: May 21, 2012

Date of Patent: October 9, 2018

Assignee: NVIDIA CORPORATION

Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
Cooperative thread array granularity context switch during trap handling

Patent number: 10095542

Abstract: Techniques are provided for restoring threads within a processing core. The techniques include, for a first thread group included in a plurality of thread groups, executing a context restore routine to restore from a memory a first portion of a context associated with the first thread group, determining whether the first thread group completed an assigned function, and, if the first thread group completed the assigned function, then exiting the context restore routine, or if the first thread group did not complete the assigned function, then executing one or more operations associated with a trap handler routine.

Type: Grant

Filed: October 30, 2017

Date of Patent: October 9, 2018

Assignee: NVIDIA CORPORATION

Inventors: Gerald F. Luiz, Philip Alexander Cuadra, Luke Durant, Shirish Gadre, Robert Ohannessian, Lacky V. Shah, Nicholas Wang, Arthur Merlin Danskin
Uniform load processing for parallel thread sub-sets

Patent number: 10007527

Abstract: One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available.

Type: Grant

Filed: March 5, 2012

Date of Patent: June 26, 2018

Assignee: NVIDIA CORPORATION

Inventors: Michael Fetterman, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
COOPERATIVE THREAD ARRAY GRANULARITY CONTEXT SWITCH DURING TRAP HANDLING

Publication number: 20180052707

Abstract: Techniques are provided for restoring threads within a processing core. The techniques include, for a first thread group included in a plurality of thread groups, executing a context restore routine to restore from a memory a first portion of a context associated with the first thread group, determining whether the first thread group completed an assigned function, and, if the first thread group completed the assigned function, then exiting the context restore routine, or if the first thread group did not complete the assigned function, then executing one or more operations associated with a trap handler routine.

Type: Application

Filed: October 30, 2017

Publication date: February 22, 2018

Inventors: Gerald F. LUIZ, Philip Alexander CUADRA, Luke DURANT, Shirish GADRE, Robert OHANNESSIAN, Lacky V. SHAH, Nicholas Wang, Arthur Merlin DANSKIN
Resource management subsystem that maintains fairness and order

Patent number: 9836325

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Grant

Filed: May 21, 2012

Date of Patent: December 5, 2017

Assignee: NVIDIA Corporation

Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
Selective fault stalling for a GPU memory pipeline in a unified virtual memory system

Patent number: 9830224

Abstract: One embodiment of the present invention is a parallel processing unit (PPU) that includes one or more streaming multiprocessors (SMs) and implements a selective fault-stalling pipeline. Upon detecting a memory access fault associated with an operation executing on a particular SM, a replay unit in the selective fault-stalling pipeline considers the operation as a faulting operation. Subsequently, instead of notifying the SM of the memory access fault, the replay unit recirculates the operation—reinserting the operation into the selective fault-stalling pipeline. Recirculating faulting operations in such a fashion enables the SM to execute other operation while the replay unit stalls the faulting request until the associated access fault is resolved. Advantageously, the overall performance of the PPU is improved compared to conventional PPUs that, upon detecting a memory access fault, cancel the associated operation and subsequent operations.

Type: Grant

Filed: December 17, 2013

Date of Patent: November 28, 2017

Assignee: NVIDIA Corporation

Inventors: Olivier Giroux, Shirish Gadre
Cooperative thread array granularity context switch during trap handling

Patent number: 9804885

Abstract: Techniques are provided for restoring threads within a processing core. The techniques include, for a first thread group included in a plurality of thread groups, executing a context restore routine to restore from a memory a first portion of a context associated with the first thread group, determining whether the first thread group completed an assigned function, and, if the first thread group completed the assigned function, then exiting the context restore routine, or if the first thread group did not complete the assigned function, then executing one or more operations associated with a trap handler routine.

Type: Grant

Filed: September 20, 2016

Date of Patent: October 31, 2017

Assignee: NVIDIA Corporation

Inventors: Gerald F. Luiz, Philip Alexander Cuadra, Luke Durant, Shirish Gadre, Robert Ohannessian, Lacky V. Shah, Nicholas Wang, Arthur Merlin Danskin
Mechanism for tracking age of common resource requests within a resource management subsystem

Patent number: 9755994

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Grant

Filed: May 21, 2012

Date of Patent: September 5, 2017

Assignee: NVIDIA Corporation

Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich

prev 1 2 3 4 5 6 7 next