Patents by Inventor Jack Choquette
Jack Choquette has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20190146817Abstract: A just-in-time (JIT) compiler binds constants to specific memory locations at runtime. The JIT compiler parses program code derived from a multithreaded application and identifies an instruction that references a uniform constant. The JIT compiler then determines a chain of pointers that originates within a root table specified in the multithreaded application and terminates at the uniform constant. The JIT compiler generates additional instructions for traversing the chain of pointers and inserts these instructions into the program code. A parallel processor executes this compiled code and, in doing so, causes a thread to traverse the chain of pointers and bind the uniform constant to a uniform register at runtime. Each thread in a group of threads executing on the parallel processor may then access the uniform constant.Type: ApplicationFiled: February 14, 2018Publication date: May 16, 2019Inventors: Ajay TIRUMALA, Jack CHOQUETTE, Manan PATEL, Shirish GADRE, Praveen KAUSHIK, Amanpreet GREWAL, Shekhar DIVEKAR, Andrei KHODAKOVSKY
-
Publication number: 20180374185Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.Type: ApplicationFiled: August 17, 2018Publication date: December 27, 2018Inventors: Ziyad Hakura, Eric Lum, Dale Kirkland, Jack Choquette, Patrick R. Brown, Yury Y. Uralsky, Jeffrey Bolz
-
Publication number: 20180322078Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.Type: ApplicationFiled: September 26, 2017Publication date: November 8, 2018Inventors: Xiaogang QIU, Ronny KRASHINSKY, Steven HEINRICH, Shirish GADRE, John EDMONDSON, Jack CHOQUETTE, Mark GEBHART, Ramesh JANDHYALA, Poornachandra RAO, Omkar PARANJAPE, Michael SIU
-
Publication number: 20180322077Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.Type: ApplicationFiled: May 4, 2017Publication date: November 8, 2018Inventors: Xiaogang QIU, Ronny KRASHINSKY, Steven HEINRICH, Shirish GADRE, John EDMONDSON, Jack CHOQUETTE, Mark GEBHART, Ramesh JANDHYALA, Poornachandra RAO, Omkar PARANJAPE, Michael SIU
-
Publication number: 20180314520Abstract: In one embodiment, a synchronization instruction causes a processor to ensure that specified threads included within a warp concurrently execute a single subsequent instruction. The specified threads include at least a first thread and a second thread. In operation, the first thread arrives at the synchronization instruction. The processor determines that the second thread has not yet arrived at the synchronization instruction and configures the first thread to stop executing instructions. After issuing at least one instruction for the second thread, the processor determines that all the specified threads have arrived at the synchronization instruction. The processor then causes all the specified threads to execute the subsequent instruction. Advantageously, unlike conventional approaches to synchronizing threads, the synchronization instruction enables the processor to reliably and properly execute code that includes complex control flows and/or instructions that presuppose that threads are converged.Type: ApplicationFiled: April 27, 2017Publication date: November 1, 2018Inventors: Ajay Sudarshan Tirumala, Olivier Giroux, Peter Nelson, Jack Choquette
-
Publication number: 20180314522Abstract: A streaming multiprocessor (SM) includes a nanosleep (NS) unit configured to cause individual threads executing on the SM to sleep for a programmer-specified interval of time. For a given thread, the NS unit parses a NANOSLEEP instruction and extracts a sleep time. The NS unit then maps the sleep time to a single bit of a timer and causes the thread to sleep. When the timer bit changes, the sleep time expires, and the NS unit awakens the thread. The thread may then continue executing. The SM also includes a nanotrap (NT) unit configured to issue traps using a similar timing mechanism to that described above. For a given thread, the NT unit parses a NANOTRAP instruction and extracts a trap time. The NT unit then maps the trap time to a single bit of a timer. When the timer bit changes, the NT unit issues a trap.Type: ApplicationFiled: April 28, 2017Publication date: November 1, 2018Inventors: Olivier GIROUX, Peter NELSON, Jack CHOQUETTE, Ajay Sudarshan TIRUMALA
-
Patent number: 10055806Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.Type: GrantFiled: October 27, 2015Date of Patent: August 21, 2018Assignee: NVIDIA CORPORATIONInventors: Ziyad Hakura, Eric Lum, Dale Kirkland, Jack Choquette, Patrick R. Brown, Yury Y. Uralsky, Jeffrey Bolz
-
Patent number: 10032245Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.Type: GrantFiled: October 27, 2015Date of Patent: July 24, 2018Assignee: NVIDIA CORPORATIONInventors: Ziyad Hakura, Eric Lum, Dale Kirkland, Jack Choquette, Patrick R. Brown, Yury Y. Uralsky, Jeffrey Bolz
-
Patent number: 10019776Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.Type: GrantFiled: October 27, 2015Date of Patent: July 10, 2018Assignee: NVIDIA CORPORATIONInventors: Ziyad Hakura, Eric Lum, Dale Kirkland, Jack Choquette, Patrick R. Brown, Yury Y. Uralsky, Jeffrey Bolz
-
Publication number: 20170116698Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.Type: ApplicationFiled: October 27, 2015Publication date: April 27, 2017Inventors: ZIYAD HAKURA, ERIC LUM, DALE KIRKLAND, JACK CHOQUETTE, PATRICK R. BROWN, YURY Y. URALSKY, JEFFREY BOLZ
-
Publication number: 20170116699Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.Type: ApplicationFiled: October 27, 2015Publication date: April 27, 2017Inventors: ZIYAD HAKURA, ERIC LUM, DALE KIRKLAND, JACK CHOQUETTE, PATRICK R. BROWN, YURY Y. URALSKY, JEFFREY BOLZ
-
Publication number: 20170116700Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.Type: ApplicationFiled: October 27, 2015Publication date: April 27, 2017Inventors: ZIYAD HAKURA, ERIC LUM, DALE KIRKLAND, JACK CHOQUETTE, PATRICK R. BROWN, YURY Y. URALSKY, JEFFREY BOLZ
-
Patent number: 9606808Abstract: A computing device detects divergences between threads in a thread group executing on a parallel processing unit. The computing device includes an address divergence unit that identifies a subset of non-divergent threads included in the thread group. The address divergence unit stores instructions related to the subset of non-divergent threads in a multi-issue queue. The address divergence unit causes the instructions related to the subset of non-divergent threads to be retrieved from the multi-issue queue when the parallel processing unit is available. The address divergence unit causes the subset of non-divergent threads to be issued for execution on the parallel processing unit. The address divergence unit repeats the identifying, storing, and causing steps for the remaining threads in the thread group that are not included in the subset of non-divergent threads.Type: GrantFiled: January 11, 2012Date of Patent: March 28, 2017Assignee: NVIDIA CorporationInventors: Jack Choquette, Xiaogang Qiu, Jeff Tuckey, Michael (Ming Yiu) Siu, Robert J. Stoll, Olivier Giroux
-
Patent number: 9361114Abstract: Managing interrupts in a computing environment includes executing an instruction, deriving an interrupt mask value based at least in part on the instruction being executed, performing a masking operation involving the interrupt mask value and at least one pending interrupt to determine whether a pending interrupt is allowable, and in the event that the pending interrupt is allowable, performing the interrupt.Type: GrantFiled: December 6, 2005Date of Patent: June 7, 2016Assignee: Azul Systems, Inc.Inventors: Gil Tene, Scott Sellers, Jack Choquette, Michael A. Wolf
-
Patent number: 8522000Abstract: A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.Type: GrantFiled: September 29, 2009Date of Patent: August 27, 2013Assignee: Nvidia CorporationInventors: Michael C. Shebanow, Jack Choquette, Brett W. Coon, Steven J. Heinrich, Aravind Kalaiah, John R. Nickolls, Daniel Salinas, Ming Y. Siu, Tommy Thorn, Nicholas Wang
-
Publication number: 20130179662Abstract: An address divergence unit detects divergence between threads in a thread group and then separates those threads into a subset of non-divergent threads and a subset of divergent threads. In one embodiment, the address divergence unit causes instructions associated with the subset of non-divergent threads to be issued for execution on a parallel processing unit, while causing the instructions associated with the subset of divergent threads to be re-fetched and re-issued for execution.Type: ApplicationFiled: January 11, 2012Publication date: July 11, 2013Inventors: Jack CHOQUETTE, Xiaogang Qiu, Jeff Tuckey, Michael (Ming Yiu) Siu, Robert J. Stoll, Olivier Giroux
-
Publication number: 20110078427Abstract: A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.Type: ApplicationFiled: September 29, 2009Publication date: March 31, 2011Inventors: Michael C. Shebanow, Jack Choquette, Brett W. Coon, Steven J. Heinrich, Aravind Kalaiah, John R. Nickolls, Daniel Salinas, Ming Y. Siu, Tommy Thorn, Nicholas Wang
-
Publication number: 20100153689Abstract: Instruction execution includes fetching an instruction that comprises a first set of one or more bits identifying the instruction, and a second set of one or more bits associated with a first address value. It further includes executing the instruction to determine whether to perform a trap, wherein executing the instruction includes selecting from a plurality of tests at least one test for determining whether to perform a trap and carrying out the at least one test.Type: ApplicationFiled: February 12, 2010Publication date: June 17, 2010Inventors: Jack Choquette, Gil Tene, Michael A. Wolf
-
Patent number: 7689782Abstract: An instruction used by a processor in a determination of whether to perform a trap is disclosed. The instruction includes a first set of one or more bits identifying the instruction, and a second set of one or more bits associated with a first address value used in the determination. The determination does not include performing a memory access that uses the first address value to determine a memory location of the memory access. The determination is based at least in part on more than one of the following: a group of one or more marker bits included in the first address value, a matrix entry located at least in part using one or more bits of the first address value, a Translation Look-aside Buffer entry associated with the first address value, whether the first address value is associated with stack allocated memory, and whether the first address value includes a null value.Type: GrantFiled: December 6, 2005Date of Patent: March 30, 2010Assignee: Azul Systems, Inc.Inventors: Jack Choquette, Gil Tene, Michael A. Wolf
-
Patent number: 7552302Abstract: Executing an ordering operation is disclosed. A store operation associated with storing a value into a portion of a memory is initiated. An ordering operation to ensure that the store operation, but not necessarily all store operations, are completed is executed.Type: GrantFiled: September 14, 2005Date of Patent: June 23, 2009Assignee: Azul Systems, Inc.Inventors: Gil Tene, Kevin Normoyle, Jack Choquette, David Kruckernyer, Cliff N. Click, Jr.