Patents by Inventor Gary M. Tarolli

Gary M. Tarolli has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

TECHNIQUES FOR EFFICIENTLY SYNCHRONIZING MULTIPLE PROGRAM THREADS

Publication number: 20220391264

Abstract: Various embodiments include a parallel processing computer system that enables parallel instances of a program to synchronize at disparate addresses in memory. When the parallel program instances need to exchange data, the program instances synchronize based on a mask that identifies the program instances that are synchronizing. As each program instance reaches the point of synchronization, the program instance blocks and waits for all other program instances to reach the point of synchronization. When all program instances have reached the point of synchronization, at least one program instance executes a synchronous operation to exchange data. The program instances then continue execution at respective and disparate return addresses.

Type: Application

Filed: June 3, 2021

Publication date: December 8, 2022

Inventors: Ajay Sudarshan TIRUMALA, Olivier GIROUX, Peter NELSON, Gary M. TAROLLI, Ankita UPRETI
Dynamically detecting uniformity and eliminating redundant computations to reduce power consumption

Patent number: 11055097

Abstract: One embodiment of the present invention includes techniques to decrease power consumption by reducing the number of redundant operations performed. In operation, a streamlining multiprocessor (SM) identifies uniform groups of threads that, when executed, apply the same deterministic operation to uniform sets of input operands. Within each uniform group of threads, the SM designates one thread as the anchor thread. The SM disables execution units assigned to all of the threads except the anchor thread. The anchor execution unit, assigned to the anchor thread, executes the operation on the uniform set of input operands. Subsequently, the SM sets the outputs of the non-anchor threads included in the uniform group of threads to equal the value of the anchor execution unit output.

Type: Grant

Filed: October 8, 2013

Date of Patent: July 6, 2021

Assignee: NVIDIA Corporation

Inventors: Gary M. Tarolli, John H. Edmondson, John Matthew Burgess, Robert Ohannessian
Approach for a configurable phase-based priority scheduler

Patent number: 10346212

Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.

Type: Grant

Filed: February 3, 2015

Date of Patent: July 9, 2019

Assignee: NVIDIA CORPORATION

Inventors: Jack Hilaire Choquette, Olivier Giroux, Robert J. Stoll, Gary M. Tarolli, John Erik Lindholm
APPROACH FOR A CONFIGURABLE PHASE-BASED PRIORITY SCHEDULER

Publication number: 20170192822

Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.

Type: Application

Filed: February 3, 2015

Publication date: July 6, 2017

Inventors: Jack Hilaire CHOQUETTE, Olivier GIROUX, Robert J. STOLL, Gary M. TAROLLI, John Erik LINDHOLM
APPROACH FOR A CONFIGURABLE PHASE-BASED PRIORITY SCHEDULER

Publication number: 20160224386

Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.

Type: Application

Filed: February 3, 2015

Publication date: August 4, 2016

Inventors: Jack Hilaire CHOQUETTE, Olivier GIROUX, Robert J. STOLL, Gary M. TAROLLI, John Erik LINDHOLM
DYNAMICALLY DETECTING UNIFORMITY AND ELIMINATING REDUNDANT COMPUTATIONS TO REDUCE POWER CONSUMPTION

Publication number: 20150100764

Abstract: One embodiment of the present invention includes techniques to decrease power consumption by reducing the number of redundant operations performed. In operation, a streamlining multiprocessor (SM) identifies uniform groups of threads that, when executed, apply the same deterministic operation to uniform sets of input operands. Within each uniform group of threads, the SM designates one thread as the anchor thread. The SM disables execution units assigned to all of the threads except the anchor thread. The anchor execution unit, assigned to the anchor thread, executes the operation on the uniform set of input operands. Subsequently, the SM sets the outputs of the non-anchor threads included in the uniform group of threads to equal the value of the anchor execution unit output. Advantageously, by exploiting the uniformity of data to reduce the number of execution units that execute, the SM dramatically reduces the power consumption compared to conventional SMs.

Type: Application

Filed: October 8, 2013

Publication date: April 9, 2015

Applicant: NVIDIA CORPORATION

Inventors: Gary M. TAROLLI, John H. EDMONDSON, John Matthew BURGESS, Robert OHANNESSIAN
Approach for a configurable phase-based priority scheduler

Patent number: 8949841

Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.

Type: Grant

Filed: December 27, 2012

Date of Patent: February 3, 2015

Assignee: NVIDIA Corporation

Inventors: Jack Hilaire Choquette, Olivier Giroux, Robert J. Stoll, Gary M. Tarolli, John Erik Lindholm
APPROACH FOR A CONFIGURABLE PHASE-BASED PRIORITY SCHEDULER

Publication number: 20140189698

Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.

Type: Application

Filed: December 27, 2012

Publication date: July 3, 2014

Applicant: NVIDIA Corporation

Inventors: Jack Hilaire CHOQUETTE, Olivier GIROUX, Robert J. STOLL, Gary M. TAROLLI, John Erik LINDHOLM
Method and system for improving data coherency in a parallel rendering system

Patent number: 8379033

Abstract: A method and system for improving data coherency in a parallel rendering system is disclosed. Specifically, one embodiment of the present invention sets forth a method for managing a plurality of independently processed texture streams in a parallel rendering system that includes the steps of maintaining a time stamp for a group of tiles of work that are associated with each of the plurality of the texture streams and are associated with a specified area in screen space, and utilizing the time stamps to counter divergences in the independent processing of the plurality of texture streams.

Type: Grant

Filed: February 17, 2012

Date of Patent: February 19, 2013

Assignee: NVIDIA Corporation

Inventors: Steven E. Molnar, Cass W. Everitt, Roger L. Allen, Gary M. Tarolli, John M. Danskin
METHOD AND SYSTEM FOR IMPROVING DATA COHERENCY IN A PARALLEL RENDERING SYSTEM

Publication number: 20120147027

Abstract: A method and system for improving data coherency in a parallel rendering system is disclosed. Specifically, one embodiment of the present invention sets forth a method for managing a plurality of independently processed texture streams in a parallel rendering system that includes the steps of maintaining a time stamp for a group of tiles of work that are associated with each of the plurality of the texture streams and are associated with a specified area in screen space, and utilizing the time stamps to counter divergences in the independent processing of the plurality of texture streams.

Type: Application

Filed: February 17, 2012

Publication date: June 14, 2012

Inventors: Steven E. MOLNAR, Cass W. Everitt, Roger L. Allen, Gary M. Tarolli, John M. Danskin
Subdividing a shader program

Patent number: 8159496

Abstract: Methods and apparatus for subdividing a shader program into regions or “phases” of instructions identifiable by phase identifiers (IDs) inserted into the shader program are provided. The phase IDs may be used to constrain execution of the shader program to prohibit texture fetches in later phases from being executed before a texture fetch in a current phase has completed. Other operations (e.g., math operations) within the current phase, however, may be allowed to execute while waiting for the current phase texture fetch to complete.

Type: Grant

Filed: June 1, 2009

Date of Patent: April 17, 2012

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Brett W. Coon, Gary M Tarolli
Method and system for improving data coherency in a parallel rendering system

Patent number: 8139069

Abstract: A method and system for improving data coherency in a parallel rendering system is disclosed. Specifically, one embodiment of the present invention sets forth a method for managing a plurality of independently processed texture streams in a parallel rendering system that includes the steps of maintaining a time stamp for a group of tiles of work that are associated with each of the plurality of the texture streams and are associated with a specified area in screen space, and utilizing the time stamps to counter divergences in the independent processing of the plurality of texture streams.

Type: Grant

Filed: November 3, 2006

Date of Patent: March 20, 2012

Assignee: NVIDIA Corporation

Inventors: Steven E. Molnar, Cass W. Everitt, Roger L. Allen, Gary M. Tarolli, John M. Danskin
Method and system for improving data coherency in a parallel rendering system

Patent number: 8085272

Abstract: A method and system for improving data coherency in a parallel rendering system is disclosed. Specifically, one embodiment of the present invention sets forth a method, which includes the steps of receiving a common input stream, tracking a periodic event associated with the common input stream, generating a plurality of fragment streams from the common input stream, inserting a marker based on an occurrence of the periodic event in a first fragment stream in the multiple fragment streams, and utilizing the marker to influence the processing of the first fragment stream so that a plurality of raster operation (ROP) request streams maintains substantially the same coherence as the common input stream. Each fragment stream is independently processed and corresponds to one of the ROP request streams.

Type: Grant

Filed: November 3, 2006

Date of Patent: December 27, 2011

Assignee: NVIDIA Corporation

Inventors: Steven E. Molnar, Cass W. Everitt, Roger L. Allen, Gary M. Tarolli, John M. Danskin, Adam Clark Weitkemper, Mark J. French
Scheduler in multi-threaded processor prioritizing instructions passing qualification rule

Patent number: 7949855

Abstract: A processor buffers asynchronous threads. Instructions requiring operations provided by a plurality of execution units are divided into phases, each phase having at least one computation operation and at least one memory access operation. Instructions within each phase are qualified and prioritized. The instructions may be qualified based on the status of the execution unit needed to execute one or more of the current instructions. The instructions may also be qualified based on an age of each instruction, status of the execution units, a divergence potential, locality, thread diversity, and resource requirements. Qualified instructions may be prioritized based on execution units needed to execute instructions and the execution units in use. One or more of the prioritized instructions is issued per cycle to the plurality of execution units.

Type: Grant

Filed: April 28, 2008

Date of Patent: May 24, 2011

Assignee: NVIDIA Corporation

Inventors: Peter C. Mills, John Erik Lindholm, Brett W. Coon, Gary M. Tarolli, John Matthew Burgess
Scalable shader architecture

Patent number: 7852340

Abstract: A scalable shader architecture is disclosed. In accord with that architecture, a shader includes multiple shader pipelines, each of which can perform processing operations on rasterized pixel data. Shader pipelines can be functionally removed as required, thus preventing a defective shader pipeline from causing a chip rejection. The shader includes a shader distributor that processes rasterized pixel data and then selectively distributes the processed rasterized pixel data to the various shader pipelines, beneficially in a manner that balances workloads. A shader collector formats the outputs of the various shader pipelines into proper order to form shaded pixel data. A shader instruction processor (scheduler) programs the individual shader pipelines to perform their intended tasks.

Type: Grant

Filed: December 14, 2007

Date of Patent: December 14, 2010

Assignee: NVIDIA Corporation

Inventors: Rui M. Bastos, Karim M. Abdalla, Christian Rouet, Michael J.M. Toksvig, Johnny S Rhoades, Roger L. Allen, John Douglas Tynefield, Jr., Emmett M. Kilgariff, Gary M. Tarolli, Brian Cabral, Craig Michael Wittenbrink, Sean J. Treichler
Subdividing a shader program

Patent number: 7542043

Abstract: Methods and apparatus for subdividing a shader program into regions or “phases” of instructions identifiable by phase identifiers (IDs) inserted into the shader program are provided. The phase IDs may be used to constrain execution of the shader program to prohibit texture fetches in later phases from being executed before a texture fetch in a current phase has completed. Other operations (e.g., math operations) within the current phase, however, may be allowed to execute while waiting for the current phase texture fetch to complete.

Type: Grant

Filed: May 23, 2005

Date of Patent: June 2, 2009

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Brett W. Coon, Gary M. Tarolli
Transparent antialiased memory access

Patent number: 7508398

Abstract: A system and method for providing antialiased memory access includes receiving a request to access a memory address. The memory address is examined to determine if the memory address is within a virtual frame buffer. If the memory address is within a virtual frame buffer then the memory address is transformed into one or more physical addresses within a frame buffer that is utilized for antialiasing. The frame buffer may be a single memory space containing subpixel information corresponding to pixels of the virtual frame buffer. Subpixels located at the physical addresses within the frame buffer are then accessed. The disclosed invention provides for direct access by a software application.

Type: Grant

Filed: August 22, 2003

Date of Patent: March 24, 2009

Assignee: NVIDIA Corporation

Inventors: John S. Montrym, Brian D. Hutsell, Steven E. Molnar, Gary M. Tarolli, Christopher T. Cheng, Emmett M. Kilgariff, Abraham B. de Waal
Scalable shader architecture

Patent number: 7385607

Abstract: A scalable shader architecture is disclosed. In accord with that architecture, a shader includes multiple shader pipelines, each of which can perform processing operations on rasterized pixel data. Shader pipelines can be functionally removed as required, thus preventing a defective shader pipeline from causing a chip rejection. The shader includes a shader distributor that processes rasterized pixel data and then selectively distributes the processed rasterized pixel data to the various shader pipelines, beneficially in a manner that balances workloads. A shader collector formats the outputs of the various shader pipelines into proper order to form shaded pixel data. A shader instruction processor (scheduler) programs the individual shader pipelines to perform their intended tasks.

Type: Grant

Filed: September 10, 2004

Date of Patent: June 10, 2008

Assignee: NVIDIA Corporation

Inventors: Rui M. Bastos, Karim M. Abdalla, Christian Rouet, Michael J. M. Toksvig, Johnny S. Rhoades, Roger L. Allen, John Douglas Tynefield, Jr., Emmett M. Kilgariff, Gary M. Tarolli, Brian Cabral, Craig Michael Wittenbrink, Sean J. Treichler
Scheduling instructions from multi-thread instruction buffer based on phase boundary qualifying rule for phases of math and data access operations with better caching

Patent number: 7366878

Abstract: A processor buffers asynchronous threads. Current instructions requiring operations provided by a plurality of execution units are divided into phases, each phase having at least one math operation and at least one texture cache access operation. Instructions within each phase are qualified and prioritized, with texture cache access operations in a subsequent phase not qualified until all of the texture cache access operations in a current phase have completed. The instructions may be qualified based on the status of the execution unit needed to execute one or more of the instructions. The instructions may also be qualified based on an age of each instruction, a divergence potential, locality, thread diversity, and resource requirements. Qualified instructions may be prioritized based on execution units needed to execute current instructions and the execution units in use. One or more of the prioritized instructions is issued per cycle to the plurality of execution units.

Type: Grant

Filed: April 13, 2006

Date of Patent: April 29, 2008

Assignee: NVIDIA Corporation

Inventors: Peter C. Mills, John Erik Lindholm, Brett W. Coon, Gary M. Tarolli, John Matthew Burgess
Multi-mode texture compression algorithm

Patent number: 6959110

Abstract: A multi-mode texture compression algorithm is provided for effective compression and decompression texture data during graphics processing. Initially, a request is sent to memory for compressed texture data. Such compressed texture data is then received from the memory in response to the request. At least one of a plurality of compression algorithms associated with the compressed texture data is subsequently identified. Thereafter, the compressed texture data is decompressed in accordance with the identified compression algorithm.

Type: Grant

Filed: August 13, 2001

Date of Patent: October 25, 2005

Assignee: NVIDIA Corporation

Inventors: John M. Danskin, Gary M. Tarolli, Murali Sundaresan

1 2 next