Patents by Inventor Michael A. Fetterman

Michael A. Fetterman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Hybrid allocation of data lines in a streaming cache memory

Patent number: 11934311

Abstract: Various embodiments include a system for managing cache memory in a computing system. The system includes a sectored cache memory that provides a mechanism for sharing sectors in a cache line among multiple cache line allocations. Traditionally, different cache line allocations are assigned to different cache lines in the cache memory. Further, cache line allocations may not use all of the sectors of the cache line, leading to low utilization of the cache memory. With the present techniques, multiple cache lines share the same cache line, leading to improved cache memory utilization relative to prior techniques. Further, sectors of cache allocations can be assigned to reduce data bank conflicts when accessing cache memory. Reducing such data bank conflicts can result in improved memory access performance, even when cache lines are shared with multiple allocations.

Type: Grant

Filed: May 4, 2022

Date of Patent: March 19, 2024

Assignee: NVIDIA CORPORATION

Inventors: Michael Fetterman, Steven James Heinrich, Shirish Gadre
Techniques for interleaving textures

Patent number: 11823318

Abstract: Techniques are disclosed herein for interleaving textures. In the disclosed techniques, multiple textures that would otherwise be accessed separately are interleaved into a single, interleaved texture that can be used to access the multiple textures together. The interleaved texture can include alternating blocks from the multiple textures. The interleaved texture can be generated when the multiple textures are being loaded into memory. Further, the interleaved texture can be accessed using multiple texture headers that are associated with different textures in the interleaved texture. Each of texture headers includes a stride indicating the distance between two blocks from a same texture in the interleaved texture.

Type: Grant

Filed: June 4, 2021

Date of Patent: November 21, 2023

Assignee: NVIDIA CORPORATION

Inventors: Tomas Akenine-Moller, Michael Fetterman, Steven James Heinrich
HYBRID ALLOCATION OF DATA LINES IN A STREAMING CACHE MEMORY

Publication number: 20230359560

Abstract: Various embodiments include a system for managing cache memory in a computing system. The system includes a sectored cache memory that provides a mechanism for sharing sectors in a cache line among multiple cache line allocations. Traditionally, different cache line allocations are assigned to different cache lines in the cache memory. Further, cache line allocations may not use all of the sectors of the cache line, leading to low utilization of the cache memory. With the present techniques, multiple cache lines share the same cache line, leading to improved cache memory utilization relative to prior techniques. Further, sectors of cache allocations can be assigned to reduce data bank conflicts when accessing cache memory. Reducing such data bank conflicts can result in improved memory access performance, even when cache lines are shared with multiple allocations.

Type: Application

Filed: May 4, 2022

Publication date: November 9, 2023

Inventors: Michael FETTERMAN, Steven James HEINRICH, Shirish GADRE
CACHE MEMORY WITH PER-SECTOR CACHE RESIDENCY CONTROLS

Publication number: 20230305957

Abstract: Various embodiments include techniques for managing cache memory in a computing system. The computing system includes a sectored cache memory that provides a mechanism for software applications to directly invalidate data items stored in the cache memory on a sector-by-sector basis, where a sector is smaller than a cache line. When all sectors in a cache line have been invalidated, the cache line is implicitly invalidated, freeing the cache line to be reallocated for other purposes. In cases where the data items to be invalidated can be aligned to sector boundaries, the disclosed techniques effectively use status indicators in the cache tag memory to track which sectors, and corresponding data items, have been invalidated by the software application. Thus, the disclosed techniques thereby enable a low-overhead solution for invalidating individual data items that are smaller than a cache line without additional tracking data structures or consuming additional memory transfer bandwidth.

Type: Application

Filed: March 23, 2022

Publication date: September 28, 2023

Inventors: Michael FETTERMAN, Shirish GADRE, Steven James HEINRICH, Martin STICH, Liang YIN
Out of order memory request tracking structure and technique

Patent number: 11768686

Abstract: In a streaming cache, multiple, dynamically sized tracking queues are employed. Request tracking information is distributed among the plural tracking queues to selectively enable out-of-order memory request returns. A dynamically controlled policy assigns pending requests to tracking queues, providing for example in-order memory returns in some contexts and/or for some traffic and out of order memory returns in other contexts and/or for other traffic.

Type: Grant

Filed: July 27, 2020

Date of Patent: September 26, 2023

Assignee: NVIDIA Corporation

Inventors: Michael A Fetterman, Mark Gebhart, Shirish Gadre, Mitchell Hayenga, Steven Heinrich, Ramesh Jandhyala, Raghavan Madhavan, Omkar Paranjape, James Robertson, Jeff Schottmiller
TECHNIQUES FOR INTERLEAVING TEXTURES

Publication number: 20220392140

Abstract: Techniques are disclosed herein for interleaving textures. In the disclosed techniques, multiple textures that would otherwise be accessed separately are interleaved into a single, interleaved texture that can be used to access the multiple textures together. The interleaved texture can include alternating blocks from the multiple textures. The interleaved texture can be generated when the multiple textures are being loaded into memory. Further, the interleaved texture can be accessed using multiple texture headers that are associated with different textures in the interleaved texture. Each of texture headers includes a stride indicating the distance between two blocks from a same texture in the interleaved texture.

Type: Application

Filed: June 4, 2021

Publication date: December 8, 2022

Inventors: Tomas AKENINE-MOLLER, Michael FETTERMAN, Steven James HEINRICH
Techniques for performing accelerated point sampling in a texture processing pipeline

Patent number: 11379944

Abstract: A texture processing pipeline in a graphics processing unit generates the surface appearance for objects in a computer-generated scene. This texture processing pipeline determines, at multiple stages within the texture processing pipeline, whether texture operations and texture loads may be processed at an accelerated rate. At each stage that includes a decision point, the texture processing pipeline assumes that the current texture operation or texture load can be accelerated unless specific, known information indicates that the texture operation or texture load cannot be accelerated. As a result, the texture processing pipeline increases the number of texture operations and texture loads that are accelerated relative to the number of texture operations and texture loads that are not accelerated.

Type: Grant

Filed: June 23, 2020

Date of Patent: July 5, 2022

Assignee: NVIDIA CORPORATION

Inventors: Michael Fetterman, Shirish Gadre, Mark Gebhart, Steven J. Heinrich, Ramesh Jandhyala, William Newhall, Omkar Paranjape, Stefano Pescador, Poorna Rao
OUT OF ORDER MEMORY REQUEST TRACKING STRUCTURE AND TECHNIQUE

Publication number: 20220027160

Abstract: In a streaming cache, multiple, dynamically sized tracking queues are employed. Request tracking information is distributed among the plural tracking queues to selectively enable out-of-order memory request returns. A dynamically controlled policy assigns pending requests to tracking queues, providing for example in-order memory returns in some contexts and/or for some traffic and out of order memory returns in other contexts and/or for other traffic.

Type: Application

Filed: July 27, 2020

Publication date: January 27, 2022

Inventors: Michael A. FETTERMAN, Mark GEBHART, Shirish GADRE, Mitchell HAYENGA, Steven HEINRICH, Ramesh JANDHYALA, Raghavan MADHAVAN, Omkar PARANJAPE, James ROBERTSON, Jeff SCHOTTMILLER
TECHNIQUES FOR PERFORMING ACCELERATED POINT SAMPLING IN A TEXTURE PROCESSING PIPELINE

Publication number: 20210398241

Abstract: A texture processing pipeline in a graphics processing unit generates the surface appearance for objects in a computer-generated scene. This texture processing pipeline determines, at multiple stages within the texture processing pipeline, whether texture operations and texture loads may be processed at an accelerated rate. At each stage that includes a decision point, the texture processing pipeline assumes that the current texture operation or texture load can be accelerated unless specific, known information indicates that the texture operation or texture load cannot be accelerated. As a result, the texture processing pipeline increases the number of texture operations and texture loads that are accelerated relative to the number of texture operations and texture loads that are not accelerated.

Type: Application

Filed: June 23, 2020

Publication date: December 23, 2021

Inventors: Michael FETTERMAN, Shirish GADRE, Mark GEBHART, Steven J. HEINRICH, Ramesh JANDHYALA, William NEWHALL, Omkar PARANJAPE, Stefano PESCADOR, Poorna RAO
Pre-scheduled replays of divergent operations

Patent number: 10152329

Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.

Type: Grant

Filed: February 9, 2012

Date of Patent: December 11, 2018

Assignee: NVIDIA CORPORATION

Inventors: Michael Fetterman, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
Mechanism for waking common resource requests within a resource management subsystem

Patent number: 10095548

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Grant

Filed: May 21, 2012

Date of Patent: October 9, 2018

Assignee: NVIDIA CORPORATION

Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
Uniform load processing for parallel thread sub-sets

Patent number: 10007527

Abstract: One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available.

Type: Grant

Filed: March 5, 2012

Date of Patent: June 26, 2018

Assignee: NVIDIA CORPORATION

Inventors: Michael Fetterman, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
Resource management subsystem that maintains fairness and order

Patent number: 9836325

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Grant

Filed: May 21, 2012

Date of Patent: December 5, 2017

Assignee: NVIDIA Corporation

Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
Batched replays of divergent operations

Patent number: 9817668

Abstract: One embodiment of the present invention sets forth an approach for executing replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back.

Type: Grant

Filed: December 16, 2011

Date of Patent: November 14, 2017

Assignee: NVIDIA Corporation

Inventors: Michael Fetterman, Jack Hilaire Choquette, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Stewart Glenn Carlton, Rajeshwaran Selvanesan, Douglas J. Hahn, Steven James Heinrich
Mechanism for tracking age of common resource requests within a resource management subsystem

Patent number: 9755994

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Grant

Filed: May 21, 2012

Date of Patent: September 5, 2017

Assignee: NVIDIA Corporation

Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
Shaped register file reads

Patent number: 9626191

Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.

Type: Grant

Filed: December 22, 2011

Date of Patent: April 18, 2017

Assignee: NVIDIA Corporation

Inventors: Jack Hilaire Choquette, Michael Fetterman, Shirish Gadre, Xiaogang Qiu, Omkar Paranjape, Anjana Rajendran, Stewart Glenn Carlton, Eric Lyell Hill, Rajeshwaran Selvanesan, Douglas J. Hahn
Dynamic bank mode addressing for memory access

Patent number: 9262174

Abstract: One embodiment sets forth a technique for dynamically mapping addresses to banks of a multi-bank memory based on a bank mode. Application programs may be configured to perform read and write a memory accessing different numbers of bits per bank, e.g., 32-bits per bank, 64-bits per bank, or 128-bits per bank. On each clock cycle an access request may be received from one of the application programs and per processing thread addresses of the access request are dynamically mapped based on the bank mode to produce a set of bank addresses. The bank addresses are then used to access the multi-bank memory. Allowing different bank mappings enables each application program to avoid bank conflicts when the memory is accesses compared with using a single bank mapping for all accesses.

Type: Grant

Filed: April 5, 2012

Date of Patent: February 16, 2016

Assignee: NVIDIA Corporation

Inventors: Michael Fetterman, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
MECHANISM FOR WAKING COMMON RESOURCE REQUESTS WITHIN A RESOURCE MANAGEMENT SUBSYSTEM

Publication number: 20130311996

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Application

Filed: May 21, 2012

Publication date: November 21, 2013

Inventors: Michael FETTERMAN, Shirish GADRE, John H. EDMONDSON, Omkar PARANJAPE, Anjana RAJENDRAN, Eric Lyell HILL, Rajeshwaran SELVANESAN, Charles McCARVER, Kevin MITCHELL, Steven James HEINRICH
MECHANISM FOR TRACKING AGE OF COMMON RESOURCE REQUESTS WITHIN A RESOURCE MANAGEMENT SUBSYSTEM

Publication number: 20130311686

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Application

Filed: May 21, 2012

Publication date: November 21, 2013

Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
RESOURCE MANAGEMENT SUBSYSTEM THAT MAINTAINS FAIRNESS AND ORDER

Publication number: 20130311999

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Application

Filed: May 21, 2012

Publication date: November 21, 2013

Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich

1 2 3 4 next