Patents by Inventor Jack Choquette

Jack Choquette has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

BINDING CONSTANTS AT RUNTIME FOR IMPROVED RESOURCE UTILIZATION

Publication number: 20190146817

Abstract: A just-in-time (JIT) compiler binds constants to specific memory locations at runtime. The JIT compiler parses program code derived from a multithreaded application and identifies an instruction that references a uniform constant. The JIT compiler then determines a chain of pointers that originates within a root table specified in the multithreaded application and terminates at the uniform constant. The JIT compiler generates additional instructions for traversing the chain of pointers and inserts these instructions into the program code. A parallel processor executes this compiled code and, in doing so, causes a thread to traverse the chain of pointers and bind the uniform constant to a uniform register at runtime. Each thread in a group of threads executing on the parallel processor may then access the uniform constant.

Type: Application

Filed: February 14, 2018

Publication date: May 16, 2019

Inventors: Ajay TIRUMALA, Jack CHOQUETTE, Manan PATEL, Shirish GADRE, Praveen KAUSHIK, Amanpreet GREWAL, Shekhar DIVEKAR, Andrei KHODAKOVSKY
Techniques for maintaining atomicity and ordering for pixel shader operations

Publication number: 20180374185

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

Type: Application

Filed: August 17, 2018

Publication date: December 27, 2018

Inventors: Ziyad Hakura, Eric Lum, Dale Kirkland, Jack Choquette, Patrick R. Brown, Yury Y. Uralsky, Jeffrey Bolz
UNIFIED CACHE FOR DIVERSE MEMORY TRAFFIC

Publication number: 20180322078

Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

Type: Application

Filed: September 26, 2017

Publication date: November 8, 2018

Inventors: Xiaogang QIU, Ronny KRASHINSKY, Steven HEINRICH, Shirish GADRE, John EDMONDSON, Jack CHOQUETTE, Mark GEBHART, Ramesh JANDHYALA, Poornachandra RAO, Omkar PARANJAPE, Michael SIU
UNIFIED CACHE FOR DIVERSE MEMORY TRAFFIC

Publication number: 20180322077

Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

Type: Application

Filed: May 4, 2017

Publication date: November 8, 2018

Inventors: Xiaogang QIU, Ronny KRASHINSKY, Steven HEINRICH, Shirish GADRE, John EDMONDSON, Jack CHOQUETTE, Mark GEBHART, Ramesh JANDHYALA, Poornachandra RAO, Omkar PARANJAPE, Michael SIU
TECHNIQUES FOR COMPREHENSIVELY SYNCHRONIZING EXECUTION THREADS

Publication number: 20180314520

Abstract: In one embodiment, a synchronization instruction causes a processor to ensure that specified threads included within a warp concurrently execute a single subsequent instruction. The specified threads include at least a first thread and a second thread. In operation, the first thread arrives at the synchronization instruction. The processor determines that the second thread has not yet arrived at the synchronization instruction and configures the first thread to stop executing instructions. After issuing at least one instruction for the second thread, the processor determines that all the specified threads have arrived at the synchronization instruction. The processor then causes all the specified threads to execute the subsequent instruction. Advantageously, unlike conventional approaches to synchronizing threads, the synchronization instruction enables the processor to reliably and properly execute code that includes complex control flows and/or instructions that presuppose that threads are converged.

Type: Application

Filed: April 27, 2017

Publication date: November 1, 2018

Inventors: Ajay Sudarshan Tirumala, Olivier Giroux, Peter Nelson, Jack Choquette
THREAD-LEVEL SLEEP IN A MASSIVELY MULTITHREADED ARCHITECTURE

Publication number: 20180314522

Abstract: A streaming multiprocessor (SM) includes a nanosleep (NS) unit configured to cause individual threads executing on the SM to sleep for a programmer-specified interval of time. For a given thread, the NS unit parses a NANOSLEEP instruction and extracts a sleep time. The NS unit then maps the sleep time to a single bit of a timer and causes the thread to sleep. When the timer bit changes, the sleep time expires, and the NS unit awakens the thread. The thread may then continue executing. The SM also includes a nanotrap (NT) unit configured to issue traps using a similar timing mechanism to that described above. For a given thread, the NT unit parses a NANOTRAP instruction and extracts a trap time. The NT unit then maps the trap time to a single bit of a timer. When the timer bit changes, the NT unit issues a trap.

Type: Application

Filed: April 28, 2017

Publication date: November 1, 2018

Inventors: Olivier GIROUX, Peter NELSON, Jack CHOQUETTE, Ajay Sudarshan TIRUMALA
Techniques for maintaining atomicity and ordering for pixel shader operations

Patent number: 10055806

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

Type: Grant

Filed: October 27, 2015

Date of Patent: August 21, 2018

Assignee: NVIDIA CORPORATION

Inventors: Ziyad Hakura, Eric Lum, Dale Kirkland, Jack Choquette, Patrick R. Brown, Yury Y. Uralsky, Jeffrey Bolz
Techniques for maintaining atomicity and ordering for pixel shader operations

Patent number: 10032245

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

Type: Grant

Filed: October 27, 2015

Date of Patent: July 24, 2018

Assignee: NVIDIA CORPORATION

Inventors: Ziyad Hakura, Eric Lum, Dale Kirkland, Jack Choquette, Patrick R. Brown, Yury Y. Uralsky, Jeffrey Bolz
Techniques for maintaining atomicity and ordering for pixel shader operations

Patent number: 10019776

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

Type: Grant

Filed: October 27, 2015

Date of Patent: July 10, 2018

Assignee: NVIDIA CORPORATION

Inventors: Ziyad Hakura, Eric Lum, Dale Kirkland, Jack Choquette, Patrick R. Brown, Yury Y. Uralsky, Jeffrey Bolz
TECHNIQUES FOR MAINTAINING ATOMICITY AND ORDERING FOR PIXEL SHADER OPERATIONS

Publication number: 20170116698

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

Type: Application

Filed: October 27, 2015

Publication date: April 27, 2017

Inventors: ZIYAD HAKURA, ERIC LUM, DALE KIRKLAND, JACK CHOQUETTE, PATRICK R. BROWN, YURY Y. URALSKY, JEFFREY BOLZ
TECHNIQUES FOR MAINTAINING ATOMICITY AND ORDERING FOR PIXEL SHADER OPERATIONS

Publication number: 20170116699

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

Type: Application

Filed: October 27, 2015

Publication date: April 27, 2017

Inventors: ZIYAD HAKURA, ERIC LUM, DALE KIRKLAND, JACK CHOQUETTE, PATRICK R. BROWN, YURY Y. URALSKY, JEFFREY BOLZ
TECHNIQUES FOR MAINTAINING ATOMICITY AND ORDERING FOR PIXEL SHADER OPERATIONS

Publication number: 20170116700

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

Type: Application

Filed: October 27, 2015

Publication date: April 27, 2017

Inventors: ZIYAD HAKURA, ERIC LUM, DALE KIRKLAND, JACK CHOQUETTE, PATRICK R. BROWN, YURY Y. URALSKY, JEFFREY BOLZ
Method and system for resolving thread divergences

Patent number: 9606808

Abstract: A computing device detects divergences between threads in a thread group executing on a parallel processing unit. The computing device includes an address divergence unit that identifies a subset of non-divergent threads included in the thread group. The address divergence unit stores instructions related to the subset of non-divergent threads in a multi-issue queue. The address divergence unit causes the instructions related to the subset of non-divergent threads to be retrieved from the multi-issue queue when the parallel processing unit is available. The address divergence unit causes the subset of non-divergent threads to be issued for execution on the parallel processing unit. The address divergence unit repeats the identifying, storing, and causing steps for the remaining threads in the thread group that are not included in the subset of non-divergent threads.

Type: Grant

Filed: January 11, 2012

Date of Patent: March 28, 2017

Assignee: NVIDIA Corporation

Inventors: Jack Choquette, Xiaogang Qiu, Jeff Tuckey, Michael (Ming Yiu) Siu, Robert J. Stoll, Olivier Giroux
Instruction based interrupt masking for managing interrupts in a computer environment

Patent number: 9361114

Abstract: Managing interrupts in a computing environment includes executing an instruction, deriving an interrupt mask value based at least in part on the instruction being executed, performing a masking operation involving the interrupt mask value and at least one pending interrupt to determine whether a pending interrupt is allowable, and in the event that the pending interrupt is allowable, performing the interrupt.

Type: Grant

Filed: December 6, 2005

Date of Patent: June 7, 2016

Assignee: Azul Systems, Inc.

Inventors: Gil Tene, Scott Sellers, Jack Choquette, Michael A. Wolf
Trap handler architecture for a parallel processing unit

Patent number: 8522000

Abstract: A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.

Type: Grant

Filed: September 29, 2009

Date of Patent: August 27, 2013

Assignee: Nvidia Corporation

Inventors: Michael C. Shebanow, Jack Choquette, Brett W. Coon, Steven J. Heinrich, Aravind Kalaiah, John R. Nickolls, Daniel Salinas, Ming Y. Siu, Tommy Thorn, Nicholas Wang
Method and System for Resolving Thread Divergences

Publication number: 20130179662

Abstract: An address divergence unit detects divergence between threads in a thread group and then separates those threads into a subset of non-divergent threads and a subset of divergent threads. In one embodiment, the address divergence unit causes instructions associated with the subset of non-divergent threads to be issued for execution on a parallel processing unit, while causing the instructions associated with the subset of divergent threads to be re-fetched and re-issued for execution.

Type: Application

Filed: January 11, 2012

Publication date: July 11, 2013

Inventors: Jack CHOQUETTE, Xiaogang Qiu, Jeff Tuckey, Michael (Ming Yiu) Siu, Robert J. Stoll, Olivier Giroux
TRAP HANDLER ARCHITECTURE FOR A PARALLEL PROCESSING UNIT

Publication number: 20110078427

Abstract: A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.

Type: Application

Filed: September 29, 2009

Publication date: March 31, 2011

Inventors: Michael C. Shebanow, Jack Choquette, Brett W. Coon, Steven J. Heinrich, Aravind Kalaiah, John R. Nickolls, Daniel Salinas, Ming Y. Siu, Tommy Thorn, Nicholas Wang
Processor instruction used to determine whether to perform a memory-related trap

Publication number: 20100153689

Abstract: Instruction execution includes fetching an instruction that comprises a first set of one or more bits identifying the instruction, and a second set of one or more bits associated with a first address value. It further includes executing the instruction to determine whether to perform a trap, wherein executing the instruction includes selecting from a plurality of tests at least one test for determining whether to perform a trap and carrying out the at least one test.

Type: Application

Filed: February 12, 2010

Publication date: June 17, 2010

Inventors: Jack Choquette, Gil Tene, Michael A. Wolf
Processor instruction used to determine whether to perform a memory-related trap

Patent number: 7689782

Abstract: An instruction used by a processor in a determination of whether to perform a trap is disclosed. The instruction includes a first set of one or more bits identifying the instruction, and a second set of one or more bits associated with a first address value used in the determination. The determination does not include performing a memory access that uses the first address value to determine a memory location of the memory access. The determination is based at least in part on more than one of the following: a group of one or more marker bits included in the first address value, a matrix entry located at least in part using one or more bits of the first address value, a Translation Look-aside Buffer entry associated with the first address value, whether the first address value is associated with stack allocated memory, and whether the first address value includes a null value.

Type: Grant

Filed: December 6, 2005

Date of Patent: March 30, 2010

Assignee: Azul Systems, Inc.

Inventors: Jack Choquette, Gil Tene, Michael A. Wolf
Ordering operation

Patent number: 7552302

Abstract: Executing an ordering operation is disclosed. A store operation associated with storing a value into a portion of a memory is initiated. An ordering operation to ensure that the store operation, but not necessarily all store operations, are completed is executed.

Type: Grant

Filed: September 14, 2005

Date of Patent: June 23, 2009

Assignee: Azul Systems, Inc.

Inventors: Gil Tene, Kevin Normoyle, Jack Choquette, David Kruckernyer, Cliff N. Click, Jr.

prev 1 2 3 4 next