Patents by Inventor Shirish Gadre

Shirish Gadre has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

APPROACH FOR CONTEXT SWITCHING OF LOCK-BIT PROTECTED MEMORY

Publication number: 20140189260

Abstract: A streaming multiprocessor in a parallel processing subsystem processes atomic operations for multiple threads in a multi-threaded architecture. The streaming multiprocessor receives a request from a thread in a thread group to acquire access to a memory location in a lock-protected shared memory, and determines whether a address lock in a plurality of address locks is asserted, where the address lock is associated the memory location. If the address lock is asserted, then the streaming multiprocessor refuses the request. Otherwise, the streaming multiprocessor asserts the address lock, asserts a thread group lock in a plurality of thread group locks, where the thread group lock is associated with the thread group, and grants the request. One advantage of the disclosed techniques is that acquired locks are released when a thread is preempted. As a result, a preempted thread that has previously acquired a lock does not retain the lock indefinitely.

Type: Application

Filed: December 27, 2012

Publication date: July 3, 2014

Applicant: NVIDIA CORPORATION

Inventors: Nicholas WANG, Shirish GADRE, Robert OHANNESSIAN, Lacky V. SHAH, Matthew BROCKMEYER, Stewart Glenn CARLTON
COOPERATIVE THREAD ARRAY GRANULARITY CONTEXT SWITCH DURING TRAP HANDLING

Publication number: 20140189711

Abstract: Techniques are provided for restoring thread groups in a cooperative thread array (CTA) within a processing core. Each thread group in the CTA is launched to execute a context restore routine. Each thread group, executes the context restore routine to restore from a memory a first portion of context associated with the thread group, and determines whether the thread group completed an assigned function prior to executing the context restore routine. If the thread group completed an assigned function prior to executing the context restore routine, then the thread group exits the context restore routine. If the thread group did not complete the assigned function prior to executing the context restore routine, then the thread group executes one or more operations associated with a trap handler routine. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors.

Type: Application

Filed: April 15, 2013

Publication date: July 3, 2014

Inventors: Gerald F. Luiz, Phillip Alexander Cuadra, Luke Durant, Shirish Gadre, Robert Ohannessian, Lacky V. Shah, Nicholas Wang, Arthur Merlin Danskin
COOPERATIVE THREAD ARRAY GRANULARITY CONTEXT SWITCH DURING TRAP HANDLING

Publication number: 20140189329

Abstract: Techniques are provided for handling a trap encountered in a thread that is part of a thread array that is being executed in a plurality of execution units. In these techniques, a data structure with an identifier associated with the thread is updated to indicate that the trap occurred during the execution of the thread array. Also in these techniques, the execution units execute a trap handling routine that includes a context switch. The execution units perform this context switch for at least one of the execution units as part of the trap handling routine while allowing the remaining execution units to exit the trap handling routine before the context switch. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors.

Type: Application

Filed: December 27, 2012

Publication date: July 3, 2014

Applicant: NVIDIA CORPORATION

Inventors: Gerald F. LUIZ, Philip Alexander CUADRA, Luke DURANT, Shirish GADRE, Robert OHANNESSIAN, Lacky V. SHAH, Nicholas WANG, Arthur DANSKIN
TECHNIQUE FOR PERFORMING MEMORY ACCESS OPERATIONS VIA TEXTURE HARDWARE

Publication number: 20140173258

Abstract: A texture processing pipeline can be configured to service memory access requests that represent texture data access operations or generic data access operations. When the texture processing pipeline receives a memory access request that represents a texture data access operation, the texture processing pipeline may retrieve texture data based on texture coordinates. When the memory access request represents a generic data access operation, the texture pipeline extracts a virtual address from the memory access request and then retrieves data based on the virtual address. The texture processing pipeline is also configured to cache generic data retrieved on behalf of a group of threads and to then invalidate that generic data when the group of threads exits.

Type: Application

Filed: December 19, 2012

Publication date: June 19, 2014

Applicant: NVIDIA CORPORATION

Inventors: Brian FAHS, Eric T. ANDERSON, Nick BARROW-WILLIAMS, Shirish GADRE, Joel James MCCORMACK, Bryon S. NORDQUIST, Nirmal Raj SAXENA, Lacky V. SHAH
TECHNIQUE FOR PERFORMING MEMORY ACCESS OPERATIONS VIA TEXTURE HARDWARE

Publication number: 20140168245

Abstract: A texture processing pipeline can be configured to service memory access requests that represent texture data access operations or generic data access operations. When the texture processing pipeline receives a memory access request that represents a texture data access operation, the texture processing pipeline may retrieve texture data based on texture coordinates. When the memory access request represents a generic data access operation, the texture pipeline extracts a virtual address from the memory access request and then retrieves data based on the virtual address. The texture processing pipeline is also configured to cache generic data retrieved on behalf of a group of threads and to then invalidate that generic data when the group of threads exits.

Type: Application

Filed: December 19, 2012

Publication date: June 19, 2014

Applicant: NVIDIA CORPORATION

Inventors: Brian Fahs, Eric T. Anderson, Nick Barrow-Williams, Shirish Gadre, Joel James McCormack, Bryon S. Nordquist, Nirmal Raj Saxena, Lacky V. Shah
TECHNIQUE FOR ACCESSING CONTENT-ADDRESSABLE MEMORY

Publication number: 20140173193

Abstract: A tag unit configured to manage a cache unit includes a coalescer that implements a set hashing function. The set hashing function maps a virtual address to a particular content-addressable memory unit (CAM). The coalescer implements the set hashing function by splitting the virtual address into upper, middle, and lower portions. The upper portion is further divided into even-indexed bits and odd-indexed bits. The even-indexed bits are reduced to a single bit using a XOR tree, and the odd-indexed are reduced in like fashion. Those single bits are combined with the middle portion of the virtual address to provide a CAM number that identifies a particular CAM. The identified CAM is queried to determine the presence of a tag portion of the virtual address, indicating a cache hit or cache miss.

Type: Application

Filed: December 19, 2012

Publication date: June 19, 2014

Applicant: NVIDIA CORPORATION

Inventors: Brian Fahs, Eric T. ANDERSON, Nick Barrow-Williams, Shirish GADRE, Joel James MCCORMACK, Bryon S. NORDQUIST, Nirmal Raj Saxena, Lacky V. Shah
TECHNIQUE FOR SAVING AND RESTORING THREAD GROUP OPERATING STATE

Publication number: 20140165072

Abstract: A streaming multiprocessor (SM) included within a parallel processing unit (PPU) is configured to suspend a thread group executing on the SM and to save the operating state of the suspended thread group. A load-store unit (LSU) within the SM re-maps local memory associated with the thread group to a location in global memory. Subsequently, the SM may re-launch the suspended thread group. The LSU may then perform local memory access operations on behalf of the re-launched thread group with the re-mapped local memory that resides in global memory.

Type: Application

Filed: December 11, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Nicholas WANG, Lacky V. SHAH, Gerald F. LUIZ, Philip Alexander CUADRA, Luke DURANT, Shirish GADRE
Methods and systems for command acceleration in a video processor via translation of scalar instructions into vector instructions

Patent number: 8738891

Abstract: A method for implementing command acceleration. The method includes receiving a first set of instructions from a first processor, wherein the first set of instructions are formatted in accordance with a microarchitecture of the first processor. The first set of instructions are translated into a second set of instructions, wherein the second set of instructions are formatted in accordance with a microarchitecture of a second processor. The second set instructions are then transmitted to the second processor for execution by the second processor.

Type: Grant

Filed: November 4, 2005

Date of Patent: May 27, 2014

Assignee: NVIDIA Corporation

Inventors: Ashish Karandikar, Shirish Gadre, Amir H. Salek
Programmable DMA engine for implementing memory transfers and video processing for a video processor

Patent number: 8736623

Abstract: A method for using a programmable DMA engine to implement memory transfers and video processing for a video processor. A DMA control program is configured for controlling DMA memory transfers between a frame buffer memory and a video processor. The DMA control program is stored in the DMA engine. A DMA request can be received from the video processor. The DMA control program is executable to implement the DMA request for the video processor. The DMA engine is operable to execute low-level command for accessing the frame buffer memory to implement a high-level command.

Type: Grant

Filed: November 4, 2005

Date of Patent: May 27, 2014

Assignee: NVIDIA Corporation

Inventors: Stephen D. Lew, Shirish Gadre, Ashish Karandikar, Franciscus W. Sijstermans
Video processor having scalar and vector components

Patent number: 8698817

Abstract: A video processor for executing video processing operations. The video processor includes a host interface for implementing communication between the video processor and a host CPU. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A scalar execution unit is coupled to the host interface and the memory interface and is configured to execute scalar video processing operations. A vector execution unit is coupled to the host interface and the memory interface and is configured to execute vector video processing operations.

Type: Grant

Filed: November 4, 2005

Date of Patent: April 15, 2014

Assignee: Nvidia Corporation

Inventors: Shirish Gadre, Ashish Karandikar, Stephen D. Lew, Christopher T. Cheng
Latency tolerant system for executing video processing operations

Patent number: 8687008

Abstract: A latency tolerant system for executing video processing operations. The system includes a host interface for implementing communication between the video processor and a host CPU, a scalar execution unit coupled to the host interface and configured to execute scalar video processing operations, and a vector execution unit coupled to the host interface and configured to execute vector video processing operations. A command FIFO is included for enabling the vector execution unit to operate on a demand driven basis by accessing the memory command FIFO. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A DMA engine is built into the memory interface for implementing DMA transfers between a plurality of different memory locations and for loading the command FIFO with data and instructions for the vector execution unit.

Type: Grant

Filed: November 4, 2005

Date of Patent: April 1, 2014

Assignee: NVIDIA Corporation

Inventors: Ashish Karandikar, Shirish Gadre, Stephen D. Lew
Multi context execution on a video processor

Patent number: 8683184

Abstract: A method for implementing multi context execution on a video processor having a scalar execution unit and a vector execution unit. The method includes allocating a first task to a vector execution unit and allocating a second task to the vector execution unit. The first task is from a first context in the second task is from a second context. The method further includes interleaving a plurality of work packages comprising the first task and the second task to generate a combined work package stream. The combined work package stream is subsequently executed on the vector execution unit.

Type: Grant

Filed: November 4, 2005

Date of Patent: March 25, 2014

Assignee: Nvidia Corporation

Inventors: Stephen D. Lew, Ashish Karandikar, Shirish Gadre, Franciscus W. Sijstermans
MECHANISM FOR WAKING COMMON RESOURCE REQUESTS WITHIN A RESOURCE MANAGEMENT SUBSYSTEM

Publication number: 20130311996

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Application

Filed: May 21, 2012

Publication date: November 21, 2013

Inventors: Michael FETTERMAN, Shirish GADRE, John H. EDMONDSON, Omkar PARANJAPE, Anjana RAJENDRAN, Eric Lyell HILL, Rajeshwaran SELVANESAN, Charles McCARVER, Kevin MITCHELL, Steven James HEINRICH
MECHANISM FOR TRACKING AGE OF COMMON RESOURCE REQUESTS WITHIN A RESOURCE MANAGEMENT SUBSYSTEM

Publication number: 20130311686

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Application

Filed: May 21, 2012

Publication date: November 21, 2013

Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
RESOURCE MANAGEMENT SUBSYSTEM THAT MAINTAINS FAIRNESS AND ORDER

Publication number: 20130311999

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Application

Filed: May 21, 2012

Publication date: November 21, 2013

Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
DYNAMIC BANK MODE ADDRESSING FOR MEMORY ACCESS

Publication number: 20130268715

Abstract: One embodiment sets forth a technique for dynamically mapping addresses to banks of a multi-bank memory based on a bank mode. Application programs may be configured to perform read and write a memory accessing different numbers of bits per bank, e.g., 32-bits per bank, 64-bits per bank, or 128-bits per bank. On each clock cycle an access request may be received from one of the application programs and per processing thread addresses of the access request are dynamically mapped based on the bank mode to produce a set of bank addresses. The bank addresses are then used to access the multi-bank memory. Allowing different bank mappings enables each application program to avoid bank conflicts when the memory is accesses compared with using a single bank mapping for all accesses.

Type: Application

Filed: April 5, 2012

Publication date: October 10, 2013

Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
UNIFORM LOAD PROCESSING FOR PARALLEL THREAD SUB-SETS

Publication number: 20130232322

Abstract: One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available.

Type: Application

Filed: March 5, 2012

Publication date: September 5, 2013

Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
PRE-SCHEDULED REPLAYS OF DIVERGENT OPERATIONS

Publication number: 20130212364

Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.

Type: Application

Filed: February 9, 2012

Publication date: August 15, 2013

Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
Multidimensional datapath processing in a video processor

Patent number: 8493396

Abstract: A multidimensional datapath processing system for a video processor for executing video processing operations. The video processor includes a scalar execution unit configured to execute scalar video processing operations and a vector execution unit configured to execute vector video processing operations. A data store memory is included for storing data for the vector execution unit. The data store memory includes a plurality of tiles having symmetrical bank data structures arranged in an array. The bank data structures are configured to support accesses to different tiles of each bank.

Type: Grant

Filed: November 4, 2005

Date of Patent: July 23, 2013

Assignee: Nvidia Corporation

Inventors: Ashish Karandikar, Shirish Gadre, Stephen D. Lew, Christopher T. Cheng
SHAPED REGISTER FILE READS

Publication number: 20130166877

Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.

Type: Application

Filed: December 22, 2011

Publication date: June 27, 2013

Inventors: Jack Hilaire CHOQUETTE, Michael FETTERMAN, Shirish GADRE, Xiaogang QIU, Omkar PARANJAPE, Anjana RAJENDRAN, Stewart Glenn CARLTON, Eric Lyell HILL, Rajeshwaran SELVANESAN, Douglas J. HAHN

prev 1 2 3 4 5 6 7 next