Patents by Inventor Charles McCarver
Charles McCarver has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10152329Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.Type: GrantFiled: February 9, 2012Date of Patent: December 11, 2018Assignee: NVIDIA CORPORATIONInventors: Michael Fetterman, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
-
Patent number: 10095548Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.Type: GrantFiled: May 21, 2012Date of Patent: October 9, 2018Assignee: NVIDIA CORPORATIONInventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
-
Patent number: 9952977Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.Type: GrantFiled: September 24, 2010Date of Patent: April 24, 2018Assignee: NVIDIA CORPORATIONInventors: Steven James Heinrich, Alexander L. Minkin, Brett W. Coon, Rajeshwaran Selvanesan, Robert Steven Glanville, Charles McCarver, Anjana Rajendran, Stewart Glenn Carlton, John R. Nickolls, Brian Fahs
-
Patent number: 9836325Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.Type: GrantFiled: May 21, 2012Date of Patent: December 5, 2017Assignee: NVIDIA CorporationInventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
-
Patent number: 9755994Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.Type: GrantFiled: May 21, 2012Date of Patent: September 5, 2017Assignee: NVIDIA CorporationInventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
-
Patent number: 8997103Abstract: One embodiment sets forth a technique for N-way memory barrier operation coalescing. When a first memory barrier is received for a first thread group execution of subsequent memory operations for the first thread group are suspended until the first memory barrier is executed. Subsequent memory barriers for different thread groups may be coalesced with the first memory barrier to produce a coalesced memory barrier that represents memory barrier operations for multiple thread groups. When the coalesced memory barrier is being processed, execution of subsequent memory operations for the different thread groups is also suspended. However, memory operations for other thread groups that are not affected by the coalesced memory barrier may be executed.Type: GrantFiled: April 6, 2012Date of Patent: March 31, 2015Assignee: NVIDIA CorporationInventors: Shirish Gadre, Charles McCarver, Anjana Rajendran, Omkar Paranjape, Steven James Heinrich
-
Patent number: 8595425Abstract: One embodiment of the present invention sets forth a technique for providing a L1 cache that is a central storage resource. The L1 cache services multiple clients with diverse latency and bandwidth requirements. The L1 cache may be reconfigured to create multiple storage spaces enabling the L1 cache may replace dedicated buffers, caches, and FIFOs in previous architectures. A “direct mapped” storage region that is configured within the L1 cache may replace dedicated buffers, FIFOs, and interface paths, allowing clients of the L1 cache to exchange attribute and primitive data. The direct mapped storage region may used as a global register file. A “local and global cache” storage region configured within the L1 cache may be used to support load/store memory requests to multiple spaces. These spaces include global, local, and call-return stack (CRS) memory.Type: GrantFiled: September 25, 2009Date of Patent: November 26, 2013Assignee: NVIDIA CorporationInventors: Alexander L. Minkin, Steven James Heinrich, RaJeshwaran Selvanesan, Brett W. Coon, Charles McCarver, Anjana Rajendran, Stewart G. Carlton
-
Publication number: 20130311686Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.Type: ApplicationFiled: May 21, 2012Publication date: November 21, 2013Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
-
Publication number: 20130311999Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.Type: ApplicationFiled: May 21, 2012Publication date: November 21, 2013Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
-
Publication number: 20130311996Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.Type: ApplicationFiled: May 21, 2012Publication date: November 21, 2013Inventors: Michael FETTERMAN, Shirish GADRE, John H. EDMONDSON, Omkar PARANJAPE, Anjana RAJENDRAN, Eric Lyell HILL, Rajeshwaran SELVANESAN, Charles McCARVER, Kevin MITCHELL, Steven James HEINRICH
-
Publication number: 20130212364Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.Type: ApplicationFiled: February 9, 2012Publication date: August 15, 2013Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
-
Patent number: 8335892Abstract: One embodiment of the present invention sets forth a technique for arbitrating requests received by an L1 cache from multiple clients. The L1 cache outputs bubble requests to a first one of the multiple clients that cause the first one of the multiple clients to insert bubbles into the request stream, where a bubble is the absence of a request. The bubbles allow the L1 cache to grant access to another one of the multiple clients without stalling the first one of the multiple clients. The L1 cache services multiple clients with diverse latency and bandwidth requirements and may be reconfigured to provide memory spaces for clients executing multiple parallel threads, where the memory spaces each have a different scope.Type: GrantFiled: December 30, 2009Date of Patent: December 18, 2012Assignee: NVIDIA CorporationInventors: Alexander L. Minkin, Steven J. Heinrich, Rajeshwaran Selvanesan, Charles McCarver, Stewart Glenn Carlton, Anjana Rajendran
-
Patent number: 8266382Abstract: One embodiment of the present invention sets forth a technique for arbitrating requests received from one of the multiple clients of an L1 cache and for providing hints to the client to assist in arbitration. The L1 cache services multiple clients with diverse latency and bandwidth requirements and may be reconfigured to provide memory spaces for clients executing multiple parallel threads, where the memory spaces each have a different scope.Type: GrantFiled: December 30, 2009Date of Patent: September 11, 2012Assignee: NVIDIA CorporationInventors: Alexander L. Minkin, Steven J. Heinrich, Rajeshwaran Selvanesan, Charles McCarver, Stewart Glenn Carlton, Anjana Rajendran, Yan Yan Tang
-
Patent number: 8266383Abstract: One embodiment of the present invention sets forth a technique for processing cache misses resulting from a request received from one of the multiple clients of an L1 cache. The L1 cache services multiple clients with diverse latency and bandwidth requirements, including at least one client whose requests cannot be stalled. The L1 cache includes storage to buffer pending requests for caches misses. When an entry is available to store a pending request, a request causing a cache miss is accepted. When the data for a read request becomes available, the cache instructs the client to resubmit the read request to receive the data. When an entry is not available to store a pending request, a request causing a cache miss is deferred and the cache provides the client with status information that is used to determine when the request should be resubmitted.Type: GrantFiled: December 30, 2009Date of Patent: September 11, 2012Assignee: NVIDIA CorporationInventors: Alexander L. Minkin, Steven J. Heinrich, Rajeshwaran Selvanesan, Charles McCarver, Stewart Glenn Carlton, Ming Y. Siu, Yan Yan Tang, Robert J. Stoll
-
Publication number: 20120198214Abstract: One embodiment sets forth a technique for N-way memory barrier operation coalescing. When a first memory barrier is received for a first thread group execution of subsequent memory operations for the first thread group are suspended until the first memory barrier is executed. Subsequent memory barriers for different thread groups may be coalesced with the first memory barrier to produce a coalesced memory barrier that represents memory barrier operations for multiple thread groups. When the coalesced memory barrier is being processed, execution of subsequent memory operations for the different thread groups is also suspended. However, memory operations for other thread groups that are not affected by the coalesced memory barrier may be executed.Type: ApplicationFiled: April 6, 2012Publication date: August 2, 2012Inventors: Shirish GADRE, Charles McCARVER, Anjana RAJENDRAN, Omkar PARANJAPE, Steven James HEINRICH
-
Publication number: 20110078367Abstract: One embodiment of the present invention sets forth a technique for providing a L1 cache that is a central storage resource. The L1 cache services multiple clients with diverse latency and bandwidth requirements. The L1 cache may be reconfigured to create multiple storage spaces enabling the L1 cache may replace dedicated buffers, caches, and FIFOs in previous architectures. A “direct mapped” storage region that is configured within the L1 cache may replace dedicated buffers, FIFOs, and interface paths, allowing clients of the L1 cache to exchange attribute and primitive data. The direct mapped storage region may used as a global register file. A “local and global cache” storage region configured within the L1 cache may be used to support load/store memory requests to multiple spaces. These spaces include global, local, and call-return stack (CRS) memory.Type: ApplicationFiled: September 25, 2009Publication date: March 31, 2011Inventors: Alexander L. Minkin, Steven James Heinrich, RaJeshwaren Selvanesan, Brett W. Coon, Charles McCarver, Anjana Rajendran, Stewart G. Carlton
-
Publication number: 20110078381Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.Type: ApplicationFiled: September 24, 2010Publication date: March 31, 2011Inventors: Steven James HEINRICH, Alexander L. Minkin, Brett W. Coon, Rajeshwaran Selvanesan, Robert Steven Glanville, Charles McCarver, Anjana Rajendran, Stewart Glenn Carlton, John R. Nickolls, Brian Fahs
-
Publication number: 20050188009Abstract: A method for maintaining coherent data in a multiprocessor system having a plurality of processors coupled to main memory, where each processor has an internal cache which is externally unreadable outside the processor. The method includes requesting data associated with a memory location in main memory and determining if an external cache coupled to an application specific integrated circuit associated with a second processor contains a reference to the requested data. A snoop cycle is performed on the second processor if the external cache has a reference to the requested data, whereupon a determination is made as to whether the requested data has been modified.Type: ApplicationFiled: July 7, 2004Publication date: August 25, 2005Inventors: Arthur McKinney, Charles McCarver, Vahid Samiee