Patents by Inventor Steven James Heinrich

Steven James Heinrich has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9633458
    Abstract: In a graphics processing pipeline, a processing unit establishes a bounding box around a polygon in order to identify sample points that are covered by the polygon. For a given sample point included within the bounding box, the processing unit constructs a set of lines that intersect at the sample point, where each line in the set of lines is parallel to at least one side of the polygon. When all vertices of the polygon reside on one side of at least one line in the set of lines, the processing unit may reduce the size of the bounding box to exclude the sample point.
    Type: Grant
    Filed: January 23, 2012
    Date of Patent: April 25, 2017
    Assignee: NVIDIA Corporation
    Inventors: Walter R. Steiner, Eric Lum, Dale L. Kirkland, Steven James Heinrich, David Charles Patrick
  • Patent number: 9286659
    Abstract: A system, method, and computer program product are provided for multi-sample processing. The multi-sample pixel data is received and is analyzed to identify subsets of samples of a multi-sample pixel that have equal data, such that data for one sample in a subset represents multi-sample pixel data for all samples in the subset. An encoding state is generated that indicates which samples of the multi-sample pixel are included in each one of the subsets.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: March 15, 2016
    Assignee: NVIDIA Corporation
    Inventors: Alexander Lev Minkin, Henry Packard Moreton, Yury Uralsky, Eric Brian Lum, Dale L. Kirkland, Steven James Heinrich, Rui Manuel Bastos, Emmett M. Kilgariff, Jeffrey Alan Bolz, Tyson Bergland, Patrick R. Brown
  • Patent number: 9262174
    Abstract: One embodiment sets forth a technique for dynamically mapping addresses to banks of a multi-bank memory based on a bank mode. Application programs may be configured to perform read and write a memory accessing different numbers of bits per bank, e.g., 32-bits per bank, 64-bits per bank, or 128-bits per bank. On each clock cycle an access request may be received from one of the application programs and per processing thread addresses of the access request are dynamically mapped based on the bank mode to produce a set of bank addresses. The bank addresses are then used to access the multi-bank memory. Allowing different bank mappings enables each application program to avoid bank conflicts when the memory is accesses compared with using a single bank mapping for all accesses.
    Type: Grant
    Filed: April 5, 2012
    Date of Patent: February 16, 2016
    Assignee: NVIDIA Corporation
    Inventors: Michael Fetterman, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
  • Patent number: 9262797
    Abstract: A system, method, and computer program product are provided for multi-sample processing. The multi-sample pixel data is received and an encoding state associated with the multi-sample pixel data is determined. Data for one sample of a multi-sample pixel and the encoding state are provided to a processing unit. The one sample of the multi-sample pixel is processed by the processing unit to generate processed data for the one sample that represents processed multi-sample pixel data for all samples of the multi-sample pixel or two or more samples of the multi-sample pixel.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: February 16, 2016
    Assignee: NVIDIA Corporation
    Inventors: Alexander Lev Minkin, Henry Packard Moreton, Yury Uralsky, Eric Brian Lum, Dale L. Kirkland, Steven James Heinrich, Rui Manuel Bastos, Emmett M. Kilgariff, Jeffrey Alan Bolz, Tyson Bergland, Patrick R. Brown
  • Patent number: 9223578
    Abstract: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.
    Type: Grant
    Filed: September 21, 2010
    Date of Patent: December 29, 2015
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Steven James Heinrich, Brett W. Coon, Michael C. Shebanow
  • Patent number: 8997103
    Abstract: One embodiment sets forth a technique for N-way memory barrier operation coalescing. When a first memory barrier is received for a first thread group execution of subsequent memory operations for the first thread group are suspended until the first memory barrier is executed. Subsequent memory barriers for different thread groups may be coalesced with the first memory barrier to produce a coalesced memory barrier that represents memory barrier operations for multiple thread groups. When the coalesced memory barrier is being processed, execution of subsequent memory operations for the different thread groups is also suspended. However, memory operations for other thread groups that are not affected by the coalesced memory barrier may be executed.
    Type: Grant
    Filed: April 6, 2012
    Date of Patent: March 31, 2015
    Assignee: NVIDIA Corporation
    Inventors: Shirish Gadre, Charles McCarver, Anjana Rajendran, Omkar Paranjape, Steven James Heinrich
  • Publication number: 20150046662
    Abstract: A system, method, and computer program product are provided for coalescing memory access requests. A plurality of memory access requests is received in a thread execution order and a portion of the memory access requests are coalesced into memory order, where memory access requests included in the portion are generated by threads in a thread block. A memory operation is generated that is transmitted to a memory system, where the memory operation represents the coalesced portion of memory access requests.
    Type: Application
    Filed: August 6, 2013
    Publication date: February 12, 2015
    Applicant: NVIDIA Corporation
    Inventors: Steven James Heinrich, Ramesh Jandhyala, Bengt-Olaf Schneider
  • Publication number: 20140267315
    Abstract: A system, method, and computer program product are provided for multi-sample processing. The multi-sample pixel data is received and an encoding state associated with the multi-sample pixel data is determined. Data for one sample of a multi-sample pixel and the encoding state are provided to a processing unit. The one sample of the multi-sample pixel is processed by the processing unit to generate processed data for the one sample that represents processed multi-sample pixel data for all samples of the multi-sample pixel or two or more samples of the multi-sample pixel.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Alexander Lev Minkin, Henry Packard Moreton, Yury Uralsky, Eric Brian Lum, Dale L. Kirkland, Steven James Heinrich, Rui Manuel Bastos, Emmett M. Kilgariff, Jeffrey Alan Bolz, Tyson Bergland, Patrick R. Brown
  • Publication number: 20140267356
    Abstract: A system, method, and computer program product are provided for multi-sample processing. The multi-sample pixel data is received and is analyzed to identify subsets of samples of a multi-sample pixel that have equal data, such that data for one sample in a subset represents multi-sample pixel data for all samples in the subset. An encoding state is generated that indicates which samples of the multi-sample pixel are included in each one of the subsets.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Alexander Lev Minkin, Henry Packard Moreton, Yury Uralsky, Eric Brian Lum, Dale L. Kirkland, Steven James Heinrich, Rui Manuel Bastos, Emmett M. Kilgariff, Jeffrey Alan Bolz, Tyson Bergland, Patrick R. Brown
  • Patent number: 8595425
    Abstract: One embodiment of the present invention sets forth a technique for providing a L1 cache that is a central storage resource. The L1 cache services multiple clients with diverse latency and bandwidth requirements. The L1 cache may be reconfigured to create multiple storage spaces enabling the L1 cache may replace dedicated buffers, caches, and FIFOs in previous architectures. A “direct mapped” storage region that is configured within the L1 cache may replace dedicated buffers, FIFOs, and interface paths, allowing clients of the L1 cache to exchange attribute and primitive data. The direct mapped storage region may used as a global register file. A “local and global cache” storage region configured within the L1 cache may be used to support load/store memory requests to multiple spaces. These spaces include global, local, and call-return stack (CRS) memory.
    Type: Grant
    Filed: September 25, 2009
    Date of Patent: November 26, 2013
    Assignee: NVIDIA Corporation
    Inventors: Alexander L. Minkin, Steven James Heinrich, RaJeshwaran Selvanesan, Brett W. Coon, Charles McCarver, Anjana Rajendran, Stewart G. Carlton
  • Publication number: 20130311996
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Application
    Filed: May 21, 2012
    Publication date: November 21, 2013
    Inventors: Michael FETTERMAN, Shirish GADRE, John H. EDMONDSON, Omkar PARANJAPE, Anjana RAJENDRAN, Eric Lyell HILL, Rajeshwaran SELVANESAN, Charles McCARVER, Kevin MITCHELL, Steven James HEINRICH
  • Publication number: 20130311999
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Application
    Filed: May 21, 2012
    Publication date: November 21, 2013
    Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
  • Publication number: 20130311686
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Application
    Filed: May 21, 2012
    Publication date: November 21, 2013
    Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
  • Publication number: 20130268715
    Abstract: One embodiment sets forth a technique for dynamically mapping addresses to banks of a multi-bank memory based on a bank mode. Application programs may be configured to perform read and write a memory accessing different numbers of bits per bank, e.g., 32-bits per bank, 64-bits per bank, or 128-bits per bank. On each clock cycle an access request may be received from one of the application programs and per processing thread addresses of the access request are dynamically mapped based on the bank mode to produce a set of bank addresses. The bank addresses are then used to access the multi-bank memory. Allowing different bank mappings enables each application program to avoid bank conflicts when the memory is accesses compared with using a single bank mapping for all accesses.
    Type: Application
    Filed: April 5, 2012
    Publication date: October 10, 2013
    Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
  • Publication number: 20130232322
    Abstract: One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available.
    Type: Application
    Filed: March 5, 2012
    Publication date: September 5, 2013
    Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
  • Publication number: 20130212364
    Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.
    Type: Application
    Filed: February 9, 2012
    Publication date: August 15, 2013
    Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
  • Publication number: 20130187956
    Abstract: In a graphics processing pipeline, a processing unit establishes a bounding box around a polygon in order to identify sample points that are covered by the polygon. For a given sample point included within the bounding box, the processing unit constructs a set of lines that intersect at the sample point, where each line in the set of lines is parallel to at least one side of the polygon. When all vertices of the polygon reside on one side of at least one line in the set of lines, the processing unit may reduce the size of the bounding box to exclude the sample point.
    Type: Application
    Filed: January 23, 2012
    Publication date: July 25, 2013
    Inventors: Walter R. STEINER, Eric LUM, Dale L. KIRKLAND, Steven James HEINRICH, David Charles PATRICK
  • Publication number: 20130159684
    Abstract: One embodiment of the present invention sets forth an optimized way to execute replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back. Advantageously, divergent operations requiring two or more replay operations operate with reduced latency. Where memory access operations require transfer of more than two cache lines to service all threads, the number of clock cycles required to complete all replay operations is reduced.
    Type: Application
    Filed: December 16, 2011
    Publication date: June 20, 2013
    Inventors: Michael Fetterman, Jack Hilaire Choquette, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Stewart glenn Carlton, Rajeshwaran Selvanesan, Douglas J. Hahn, Steven James Heinrich
  • Publication number: 20120198214
    Abstract: One embodiment sets forth a technique for N-way memory barrier operation coalescing. When a first memory barrier is received for a first thread group execution of subsequent memory operations for the first thread group are suspended until the first memory barrier is executed. Subsequent memory barriers for different thread groups may be coalesced with the first memory barrier to produce a coalesced memory barrier that represents memory barrier operations for multiple thread groups. When the coalesced memory barrier is being processed, execution of subsequent memory operations for the different thread groups is also suspended. However, memory operations for other thread groups that are not affected by the coalesced memory barrier may be executed.
    Type: Application
    Filed: April 6, 2012
    Publication date: August 2, 2012
    Inventors: Shirish GADRE, Charles McCARVER, Anjana RAJENDRAN, Omkar PARANJAPE, Steven James HEINRICH
  • Publication number: 20110078692
    Abstract: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.
    Type: Application
    Filed: September 21, 2010
    Publication date: March 31, 2011
    Inventors: John R. NICKOLLS, Steven James Heinrich, Brett W. Coon, Michael C. Shebanow