Patents by Inventor Steven James Heinrich

Steven James Heinrich has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and system for reducing a polygon bounding box

Patent number: 9633458

Abstract: In a graphics processing pipeline, a processing unit establishes a bounding box around a polygon in order to identify sample points that are covered by the polygon. For a given sample point included within the bounding box, the processing unit constructs a set of lines that intersect at the sample point, where each line in the set of lines is parallel to at least one side of the polygon. When all vertices of the polygon reside on one side of at least one line in the set of lines, the processing unit may reduce the size of the bounding box to exclude the sample point.

Type: Grant

Filed: January 23, 2012

Date of Patent: April 25, 2017

Assignee: NVIDIA Corporation

Inventors: Walter R. Steiner, Eric Lum, Dale L. Kirkland, Steven James Heinrich, David Charles Patrick
Multi-sample surface processing using sample subsets

Patent number: 9286659

Abstract: A system, method, and computer program product are provided for multi-sample processing. The multi-sample pixel data is received and is analyzed to identify subsets of samples of a multi-sample pixel that have equal data, such that data for one sample in a subset represents multi-sample pixel data for all samples in the subset. An encoding state is generated that indicates which samples of the multi-sample pixel are included in each one of the subsets.

Type: Grant

Filed: March 15, 2013

Date of Patent: March 15, 2016

Assignee: NVIDIA Corporation

Inventors: Alexander Lev Minkin, Henry Packard Moreton, Yury Uralsky, Eric Brian Lum, Dale L. Kirkland, Steven James Heinrich, Rui Manuel Bastos, Emmett M. Kilgariff, Jeffrey Alan Bolz, Tyson Bergland, Patrick R. Brown
Dynamic bank mode addressing for memory access

Patent number: 9262174

Abstract: One embodiment sets forth a technique for dynamically mapping addresses to banks of a multi-bank memory based on a bank mode. Application programs may be configured to perform read and write a memory accessing different numbers of bits per bank, e.g., 32-bits per bank, 64-bits per bank, or 128-bits per bank. On each clock cycle an access request may be received from one of the application programs and per processing thread addresses of the access request are dynamically mapped based on the bank mode to produce a set of bank addresses. The bank addresses are then used to access the multi-bank memory. Allowing different bank mappings enables each application program to avoid bank conflicts when the memory is accesses compared with using a single bank mapping for all accesses.

Type: Grant

Filed: April 5, 2012

Date of Patent: February 16, 2016

Assignee: NVIDIA Corporation

Inventors: Michael Fetterman, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
Multi-sample surface processing using one sample

Patent number: 9262797

Abstract: A system, method, and computer program product are provided for multi-sample processing. The multi-sample pixel data is received and an encoding state associated with the multi-sample pixel data is determined. Data for one sample of a multi-sample pixel and the encoding state are provided to a processing unit. The one sample of the multi-sample pixel is processed by the processing unit to generate processed data for the one sample that represents processed multi-sample pixel data for all samples of the multi-sample pixel or two or more samples of the multi-sample pixel.

Type: Grant

Filed: March 15, 2013

Date of Patent: February 16, 2016

Assignee: NVIDIA Corporation

Inventors: Alexander Lev Minkin, Henry Packard Moreton, Yury Uralsky, Eric Brian Lum, Dale L. Kirkland, Steven James Heinrich, Rui Manuel Bastos, Emmett M. Kilgariff, Jeffrey Alan Bolz, Tyson Bergland, Patrick R. Brown
Coalescing memory barrier operations across multiple parallel threads

Patent number: 9223578

Abstract: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.

Type: Grant

Filed: September 21, 2010

Date of Patent: December 29, 2015

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Steven James Heinrich, Brett W. Coon, Michael C. Shebanow
N-way memory barrier operation coalescing

Patent number: 8997103

Abstract: One embodiment sets forth a technique for N-way memory barrier operation coalescing. When a first memory barrier is received for a first thread group execution of subsequent memory operations for the first thread group are suspended until the first memory barrier is executed. Subsequent memory barriers for different thread groups may be coalesced with the first memory barrier to produce a coalesced memory barrier that represents memory barrier operations for multiple thread groups. When the coalesced memory barrier is being processed, execution of subsequent memory operations for the different thread groups is also suspended. However, memory operations for other thread groups that are not affected by the coalesced memory barrier may be executed.

Type: Grant

Filed: April 6, 2012

Date of Patent: March 31, 2015

Assignee: NVIDIA Corporation

Inventors: Shirish Gadre, Charles McCarver, Anjana Rajendran, Omkar Paranjape, Steven James Heinrich
COALESCING TEXTURE ACCESS AND LOAD/STORE OPERATIONS

Publication number: 20150046662

Abstract: A system, method, and computer program product are provided for coalescing memory access requests. A plurality of memory access requests is received in a thread execution order and a portion of the memory access requests are coalesced into memory order, where memory access requests included in the portion are generated by threads in a thread block. A memory operation is generated that is transmitted to a memory system, where the memory operation represents the coalesced portion of memory access requests.

Type: Application

Filed: August 6, 2013

Publication date: February 12, 2015

Applicant: NVIDIA Corporation

Inventors: Steven James Heinrich, Ramesh Jandhyala, Bengt-Olaf Schneider
MULTI-SAMPLE SURFACE PROCESSING USING ONE SAMPLE

Publication number: 20140267315

Abstract: A system, method, and computer program product are provided for multi-sample processing. The multi-sample pixel data is received and an encoding state associated with the multi-sample pixel data is determined. Data for one sample of a multi-sample pixel and the encoding state are provided to a processing unit. The one sample of the multi-sample pixel is processed by the processing unit to generate processed data for the one sample that represents processed multi-sample pixel data for all samples of the multi-sample pixel or two or more samples of the multi-sample pixel.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Applicant: NVIDIA CORPORATION

Inventors: Alexander Lev Minkin, Henry Packard Moreton, Yury Uralsky, Eric Brian Lum, Dale L. Kirkland, Steven James Heinrich, Rui Manuel Bastos, Emmett M. Kilgariff, Jeffrey Alan Bolz, Tyson Bergland, Patrick R. Brown
MULTI-SAMPLE SURFACE PROCESSING USING SAMPLE SUBSETS

Publication number: 20140267356

Abstract: A system, method, and computer program product are provided for multi-sample processing. The multi-sample pixel data is received and is analyzed to identify subsets of samples of a multi-sample pixel that have equal data, such that data for one sample in a subset represents multi-sample pixel data for all samples in the subset. An encoding state is generated that indicates which samples of the multi-sample pixel are included in each one of the subsets.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Applicant: NVIDIA CORPORATION

Inventors: Alexander Lev Minkin, Henry Packard Moreton, Yury Uralsky, Eric Brian Lum, Dale L. Kirkland, Steven James Heinrich, Rui Manuel Bastos, Emmett M. Kilgariff, Jeffrey Alan Bolz, Tyson Bergland, Patrick R. Brown
Configurable cache for multiple clients

Patent number: 8595425

Abstract: One embodiment of the present invention sets forth a technique for providing a L1 cache that is a central storage resource. The L1 cache services multiple clients with diverse latency and bandwidth requirements. The L1 cache may be reconfigured to create multiple storage spaces enabling the L1 cache may replace dedicated buffers, caches, and FIFOs in previous architectures. A “direct mapped” storage region that is configured within the L1 cache may replace dedicated buffers, FIFOs, and interface paths, allowing clients of the L1 cache to exchange attribute and primitive data. The direct mapped storage region may used as a global register file. A “local and global cache” storage region configured within the L1 cache may be used to support load/store memory requests to multiple spaces. These spaces include global, local, and call-return stack (CRS) memory.

Type: Grant

Filed: September 25, 2009

Date of Patent: November 26, 2013

Assignee: NVIDIA Corporation

Inventors: Alexander L. Minkin, Steven James Heinrich, RaJeshwaran Selvanesan, Brett W. Coon, Charles McCarver, Anjana Rajendran, Stewart G. Carlton
MECHANISM FOR WAKING COMMON RESOURCE REQUESTS WITHIN A RESOURCE MANAGEMENT SUBSYSTEM

Publication number: 20130311996

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Application

Filed: May 21, 2012

Publication date: November 21, 2013

Inventors: Michael FETTERMAN, Shirish GADRE, John H. EDMONDSON, Omkar PARANJAPE, Anjana RAJENDRAN, Eric Lyell HILL, Rajeshwaran SELVANESAN, Charles McCARVER, Kevin MITCHELL, Steven James HEINRICH
RESOURCE MANAGEMENT SUBSYSTEM THAT MAINTAINS FAIRNESS AND ORDER

Publication number: 20130311999

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Application

Filed: May 21, 2012

Publication date: November 21, 2013

Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
MECHANISM FOR TRACKING AGE OF COMMON RESOURCE REQUESTS WITHIN A RESOURCE MANAGEMENT SUBSYSTEM

Publication number: 20130311686

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Application

Filed: May 21, 2012

Publication date: November 21, 2013

Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
DYNAMIC BANK MODE ADDRESSING FOR MEMORY ACCESS

Publication number: 20130268715

Abstract: One embodiment sets forth a technique for dynamically mapping addresses to banks of a multi-bank memory based on a bank mode. Application programs may be configured to perform read and write a memory accessing different numbers of bits per bank, e.g., 32-bits per bank, 64-bits per bank, or 128-bits per bank. On each clock cycle an access request may be received from one of the application programs and per processing thread addresses of the access request are dynamically mapped based on the bank mode to produce a set of bank addresses. The bank addresses are then used to access the multi-bank memory. Allowing different bank mappings enables each application program to avoid bank conflicts when the memory is accesses compared with using a single bank mapping for all accesses.

Type: Application

Filed: April 5, 2012

Publication date: October 10, 2013

Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
UNIFORM LOAD PROCESSING FOR PARALLEL THREAD SUB-SETS

Publication number: 20130232322

Abstract: One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available.

Type: Application

Filed: March 5, 2012

Publication date: September 5, 2013

Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
PRE-SCHEDULED REPLAYS OF DIVERGENT OPERATIONS

Publication number: 20130212364

Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.

Type: Application

Filed: February 9, 2012

Publication date: August 15, 2013

Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
METHOD AND SYSTEM FOR REDUCING A POLYGON BOUNDING BOX

Publication number: 20130187956

Abstract: In a graphics processing pipeline, a processing unit establishes a bounding box around a polygon in order to identify sample points that are covered by the polygon. For a given sample point included within the bounding box, the processing unit constructs a set of lines that intersect at the sample point, where each line in the set of lines is parallel to at least one side of the polygon. When all vertices of the polygon reside on one side of at least one line in the set of lines, the processing unit may reduce the size of the bounding box to exclude the sample point.

Type: Application

Filed: January 23, 2012

Publication date: July 25, 2013

Inventors: Walter R. STEINER, Eric LUM, Dale L. KIRKLAND, Steven James HEINRICH, David Charles PATRICK
BATCHED REPLAYS OF DIVERGENT OPERATIONS

Publication number: 20130159684

Abstract: One embodiment of the present invention sets forth an optimized way to execute replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back. Advantageously, divergent operations requiring two or more replay operations operate with reduced latency. Where memory access operations require transfer of more than two cache lines to service all threads, the number of clock cycles required to complete all replay operations is reduced.

Type: Application

Filed: December 16, 2011

Publication date: June 20, 2013

Inventors: Michael Fetterman, Jack Hilaire Choquette, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Stewart glenn Carlton, Rajeshwaran Selvanesan, Douglas J. Hahn, Steven James Heinrich
N-WAY MEMORY BARRIER OPERATION COALESCING

Publication number: 20120198214

Abstract: One embodiment sets forth a technique for N-way memory barrier operation coalescing. When a first memory barrier is received for a first thread group execution of subsequent memory operations for the first thread group are suspended until the first memory barrier is executed. Subsequent memory barriers for different thread groups may be coalesced with the first memory barrier to produce a coalesced memory barrier that represents memory barrier operations for multiple thread groups. When the coalesced memory barrier is being processed, execution of subsequent memory operations for the different thread groups is also suspended. However, memory operations for other thread groups that are not affected by the coalesced memory barrier may be executed.

Type: Application

Filed: April 6, 2012

Publication date: August 2, 2012

Inventors: Shirish GADRE, Charles McCARVER, Anjana RAJENDRAN, Omkar PARANJAPE, Steven James HEINRICH
COALESCING MEMORY BARRIER OPERATIONS ACROSS MULTIPLE PARALLEL THREADS

Publication number: 20110078692

Abstract: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.

Type: Application

Filed: September 21, 2010

Publication date: March 31, 2011

Inventors: John R. NICKOLLS, Steven James Heinrich, Brett W. Coon, Michael C. Shebanow

prev 1 2 3 next