Patents by Inventor Steven James Heinrich
Steven James Heinrich has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11934311Abstract: Various embodiments include a system for managing cache memory in a computing system. The system includes a sectored cache memory that provides a mechanism for sharing sectors in a cache line among multiple cache line allocations. Traditionally, different cache line allocations are assigned to different cache lines in the cache memory. Further, cache line allocations may not use all of the sectors of the cache line, leading to low utilization of the cache memory. With the present techniques, multiple cache lines share the same cache line, leading to improved cache memory utilization relative to prior techniques. Further, sectors of cache allocations can be assigned to reduce data bank conflicts when accessing cache memory. Reducing such data bank conflicts can result in improved memory access performance, even when cache lines are shared with multiple allocations.Type: GrantFiled: May 4, 2022Date of Patent: March 19, 2024Assignee: NVIDIA CORPORATIONInventors: Michael Fetterman, Steven James Heinrich, Shirish Gadre
-
Patent number: 11823318Abstract: Techniques are disclosed herein for interleaving textures. In the disclosed techniques, multiple textures that would otherwise be accessed separately are interleaved into a single, interleaved texture that can be used to access the multiple textures together. The interleaved texture can include alternating blocks from the multiple textures. The interleaved texture can be generated when the multiple textures are being loaded into memory. Further, the interleaved texture can be accessed using multiple texture headers that are associated with different textures in the interleaved texture. Each of texture headers includes a stride indicating the distance between two blocks from a same texture in the interleaved texture.Type: GrantFiled: June 4, 2021Date of Patent: November 21, 2023Assignee: NVIDIA CORPORATIONInventors: Tomas Akenine-Moller, Michael Fetterman, Steven James Heinrich
-
Publication number: 20230359560Abstract: Various embodiments include a system for managing cache memory in a computing system. The system includes a sectored cache memory that provides a mechanism for sharing sectors in a cache line among multiple cache line allocations. Traditionally, different cache line allocations are assigned to different cache lines in the cache memory. Further, cache line allocations may not use all of the sectors of the cache line, leading to low utilization of the cache memory. With the present techniques, multiple cache lines share the same cache line, leading to improved cache memory utilization relative to prior techniques. Further, sectors of cache allocations can be assigned to reduce data bank conflicts when accessing cache memory. Reducing such data bank conflicts can result in improved memory access performance, even when cache lines are shared with multiple allocations.Type: ApplicationFiled: May 4, 2022Publication date: November 9, 2023Inventors: Michael FETTERMAN, Steven James HEINRICH, Shirish GADRE
-
Publication number: 20230305957Abstract: Various embodiments include techniques for managing cache memory in a computing system. The computing system includes a sectored cache memory that provides a mechanism for software applications to directly invalidate data items stored in the cache memory on a sector-by-sector basis, where a sector is smaller than a cache line. When all sectors in a cache line have been invalidated, the cache line is implicitly invalidated, freeing the cache line to be reallocated for other purposes. In cases where the data items to be invalidated can be aligned to sector boundaries, the disclosed techniques effectively use status indicators in the cache tag memory to track which sectors, and corresponding data items, have been invalidated by the software application. Thus, the disclosed techniques thereby enable a low-overhead solution for invalidating individual data items that are smaller than a cache line without additional tracking data structures or consuming additional memory transfer bandwidth.Type: ApplicationFiled: March 23, 2022Publication date: September 28, 2023Inventors: Michael FETTERMAN, Shirish GADRE, Steven James HEINRICH, Martin STICH, Liang YIN
-
Patent number: 11720440Abstract: Various embodiments include a parallel processing computer system that detects memory errors as a memory client loads data from memory and disables the memory client from storing data to memory, thereby reducing the likelihood that the memory error propagates to other memory clients. The memory client initiates a stall sequence, while other memory clients continue to execute instructions and the memory continues to service memory load and store operations. When a memory error is detected, a specific bit pattern is stored in conjunction with the data associated with the memory error. When the data is copied from one memory to another memory, the specific bit pattern is also copied, in order to identify the data as having a memory error.Type: GrantFiled: July 12, 2021Date of Patent: August 8, 2023Assignee: NVIDIA CORPORATIONInventors: Naveen Cherukuri, Saurabh Hukerikar, Paul Racunas, Nirmal Raj Saxena, David Charles Patrick, Yiyang Feng, Abhijeet Ghadge, Steven James Heinrich, Adam Hendrickson, Gentaro Hirota, Praveen Joginipally, Vaishali Kulkarni, Peter C. Mills, Sandeep Navada, Manan Patel, Liang Yin
-
Publication number: 20230011863Abstract: Various embodiments include a parallel processing computer system that detects memory errors as a memory client loads data from memory and disables the memory client from storing data to memory, thereby reducing the likelihood that the memory error propagates to other memory clients. The memory client initiates a stall sequence, while other memory clients continue to execute instructions and the memory continues to service memory load and store operations. When a memory error is detected, a specific bit pattern is stored in conjunction with the data associated with the memory error. When the data is copied from one memory to another memory, the specific bit pattern is also copied, in order to identify the data as having a memory error.Type: ApplicationFiled: July 12, 2021Publication date: January 12, 2023Inventors: NAVEEN CHERUKURI, SAURABH HUKERIKAR, PAUL RACUNAS, NIRMAL RAJ SAXENA, DAVID CHARLES PATRICK, YIYANG FENG, ABHIJEET GHADGE, STEVEN JAMES HEINRICH, ADAM HENDRICKSON, GENTARO HIROTA, PRAVEEN JOGINIPALLY, VAISHALI KULKARNI, PETER C. MILLS, SANDEEP NAVADA, MANAN PATEL, LIANG YIN
-
Publication number: 20220392140Abstract: Techniques are disclosed herein for interleaving textures. In the disclosed techniques, multiple textures that would otherwise be accessed separately are interleaved into a single, interleaved texture that can be used to access the multiple textures together. The interleaved texture can include alternating blocks from the multiple textures. The interleaved texture can be generated when the multiple textures are being loaded into memory. Further, the interleaved texture can be accessed using multiple texture headers that are associated with different textures in the interleaved texture. Each of texture headers includes a stride indicating the distance between two blocks from a same texture in the interleaved texture.Type: ApplicationFiled: June 4, 2021Publication date: December 8, 2022Inventors: Tomas AKENINE-MOLLER, Michael FETTERMAN, Steven James HEINRICH
-
Patent number: 11372548Abstract: Some systems compress data utilized by a user mode software without the user mode software being aware of any compression taking place. To maintain that illusion, such systems prevent user mode software from being aware of and/or accessing the underlying compressed states of the data. While such an approach protects proprietary compression techniques used in such systems from being deciphered, such restrictions limit the ability of user mode software to use the underlying compressed forms of the data in new ways. Disclosed herein are various techniques for allowing user-mode software to access the underlying compressed states of data either directly or indirectly. Such techniques can be used, for example, to allow various user-mode software on a single system or on multiple systems to exchange data in the underlying compression format of the system(s) even when the user mode software is unable to decipher the compression format.Type: GrantFiled: May 29, 2020Date of Patent: June 28, 2022Assignee: NVIDIA CorporationInventors: Ram Rangan, Patrick Richard Brown, Wishwesh Anil Gandhi, Steven James Heinrich, Mathias Heyer, Emmett Michael Kilgariff, Praveen Krishnamurthy, Dong Han Ryu
-
Publication number: 20210373774Abstract: Some systems compress data utilized by a user mode software without the user mode software being aware of any compression taking place. To maintain that illusion, such systems prevent user mode software from being aware of and/or accessing the underlying compressed states of the data. While such an approach protects proprietary compression techniques used in such systems from being deciphered, such restrictions limit the ability of user mode software to use the underlying compressed forms of the data in new ways. Disclosed herein are various techniques for allowing user-mode software to access the underlying compressed states of data either directly or indirectly. Such techniques can be used, for example, to allow various user-mode software on a single system or on multiple systems to exchange data in the underlying compression format of the system(s) even when the user mode software is unable to decipher the compression format.Type: ApplicationFiled: May 29, 2020Publication date: December 2, 2021Inventors: Ram Rangan, Patrick Richard Brown, Wishwesh Anil Gandhi, Steven James Heinrich, Mathias Heyer, Emmett Michael Kilgariff, Praveen Krishnamurthy, Dong Han Ryu
-
Patent number: 10699427Abstract: Methods and apparatuses are disclosed for reporting texture footprint information. A texture footprint identifies the portion of a texture that will be utilized in rendering a pixel in a scene. The disclosed methods and apparatuses advantageously improve system efficiency in decoupled shading systems by first identifying which texels in a given texture map are needed for subsequently rendering a scene. Therefore, the number of texels that are generated and stored may be reduced to include the identified texels. Texels that are not identified need not be rendered and/or stored.Type: GrantFiled: August 12, 2019Date of Patent: June 30, 2020Assignee: NVIDIA CorporationInventors: Yury Uralsky, Henry Packard Moreton, Eric Brian Lum, Jonathan J. Dunaisky, Steven James Heinrich, Stefano Pescador, Shirish Gadre, Michael Alan Fetterman
-
Publication number: 20200013174Abstract: Methods and apparatuses are disclosed for reporting texture footprint information. A texture footprint identifies the portion of a texture that will be utilized in rendering a pixel in a scene. The disclosed methods and apparatuses advantageously improve system efficiency in decoupled shading systems by first identifying which texels in a given texture map are needed for subsequently rendering a scene. Therefore, the number of texels that are generated and stored may be reduced to include the identified texels. Texels that are not identified need not be rendered and/or stored.Type: ApplicationFiled: August 12, 2019Publication date: January 9, 2020Inventors: Yury Uralsky, Henry Packard Moreton, Eric Brian Lum, Jonathan J. Dunaisky, Steven James Heinrich, Stefano Pescador, Shirish Gadre, Michael Alan Fetterman
-
Patent number: 10424074Abstract: Methods and apparatuses are disclosed for reporting texture footprint information. A texture footprint identifies the portion of a texture that will be utilized in rendering a pixel in a scene. The disclosed methods and apparatuses advantageously improve system efficiency in decoupled shading systems by first identifying which texels in a given texture map are needed for subsequently rendering a scene. Therefore, the number of texels that are generated and stored may be reduced to include the identified texels. Texels that are not identified need not be rendered and/or stored.Type: GrantFiled: July 3, 2018Date of Patent: September 24, 2019Assignee: NVIDIA CorporationInventors: Yury Uralsky, Henry Packard Moreton, Eric Brian Lum, Jonathan J. Dunaisky, Steven James Heinrich, Stefano Pescador, Shirish Gadre, Michael Alan Fetterman
-
Patent number: 10152329Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.Type: GrantFiled: February 9, 2012Date of Patent: December 11, 2018Assignee: NVIDIA CORPORATIONInventors: Michael Fetterman, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
-
Patent number: 10095548Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.Type: GrantFiled: May 21, 2012Date of Patent: October 9, 2018Assignee: NVIDIA CORPORATIONInventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
-
Patent number: 10007527Abstract: One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available.Type: GrantFiled: March 5, 2012Date of Patent: June 26, 2018Assignee: NVIDIA CORPORATIONInventors: Michael Fetterman, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
-
Patent number: 9952977Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.Type: GrantFiled: September 24, 2010Date of Patent: April 24, 2018Assignee: NVIDIA CORPORATIONInventors: Steven James Heinrich, Alexander L. Minkin, Brett W. Coon, Rajeshwaran Selvanesan, Robert Steven Glanville, Charles McCarver, Anjana Rajendran, Stewart Glenn Carlton, John R. Nickolls, Brian Fahs
-
Patent number: 9946666Abstract: A system, method, and computer program product are provided for coalescing memory access requests. A plurality of memory access requests is received in a thread execution order and a portion of the memory access requests are coalesced into memory order, where memory access requests included in the portion are generated by threads in a thread block. A memory operation is generated that is transmitted to a memory system, where the memory operation represents the coalesced portion of memory access requests.Type: GrantFiled: August 6, 2013Date of Patent: April 17, 2018Assignee: NVIDIA CorporationInventors: Steven James Heinrich, Ramesh Jandhyala, Bengt-Olaf Schneider
-
Patent number: 9836325Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.Type: GrantFiled: May 21, 2012Date of Patent: December 5, 2017Assignee: NVIDIA CorporationInventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
-
Patent number: 9817668Abstract: One embodiment of the present invention sets forth an approach for executing replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back.Type: GrantFiled: December 16, 2011Date of Patent: November 14, 2017Assignee: NVIDIA CorporationInventors: Michael Fetterman, Jack Hilaire Choquette, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Stewart Glenn Carlton, Rajeshwaran Selvanesan, Douglas J. Hahn, Steven James Heinrich
-
Patent number: 9755994Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.Type: GrantFiled: May 21, 2012Date of Patent: September 5, 2017Assignee: NVIDIA CorporationInventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich