Patents by Inventor Nicholas Wang

Nicholas Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Cooperative thread array granularity context switch during trap handling

Patent number: 9448837

Abstract: Techniques are provided for restoring thread groups in a cooperative thread array (CTA) within a processing core. Each thread group in the CTA is launched to execute a context restore routine. Each thread group, executes the context restore routine to restore from a memory a first portion of context associated with the thread group, and determines whether the thread group completed an assigned function prior to executing the context restore routine. If the thread group completed an assigned function prior to executing the context restore routine, then the thread group exits the context restore routine. If the thread group did not complete the assigned function prior to executing the context restore routine, then the thread group executes one or more operations associated with a trap handler routine. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors.

Type: Grant

Filed: April 15, 2013

Date of Patent: September 20, 2016

Assignee: NVIDIA Corporation

Inventors: Gerald F. Luiz, Philip Alexander Cuadra, Luke Durant, Shirish Gadre, Robert Ohannessian, Lacky V. Shah, Nicholas Wang, Arthur Merlin Danskin
System, method, and computer program product for low latency scheduling and launch of memory defined tasks

Patent number: 9378139

Abstract: A system, method, and computer program product for low-latency scheduling and launch of memory defined tasks. The method includes the steps of receiving a task metadata data structure to be stored in a memory associated with a processor, transmitting the task metadata data structure to a scheduling unit of the processor, storing the task metadata data structure in a cache unit included in the scheduling unit, and copying the task metadata data structure from the cache unit to the memory.

Type: Grant

Filed: May 8, 2013

Date of Patent: June 28, 2016

Assignee: NVIDIA Corporation

Inventors: Scott Ricketts, Brian Scott Pharris, Nicholas Wang, Luke David Durant, Philip Alexander Cuadra, Jerome F. Duluk, Jr.
System, method, and computer program product for management of dependency between tasks

Patent number: 9286119

Abstract: A system, method, and computer program product for management of dynamic task-dependency graphs. The method includes the steps of generating a first task data structure in a memory for a first task, generating a second task data structure in the memory, storing a pointer to the second task data structure in a first output dependence field of the first task data structure, setting a reference counter field of the second task data structure to a threshold value that indicates a number of dependent events associated with the second task, and launching the second task when the reference counter field stores a particular value. The second task data structure is a placeholder for a second task that is dependent on the first task.

Type: Grant

Filed: February 13, 2013

Date of Patent: March 15, 2016

Assignee: NVIDIA Corporation

Inventors: Igor Sevastiyanov, Brian Matthew Fahs, Nicholas Wang, Scott Ricketts, Luke David Durant, Brian Scott Pharris
System, method, and computer program product for scheduling tasks associated with continuation thread blocks

Patent number: 9256623

Abstract: A system, method, and computer program product for scheduling tasks associated with continuation thread blocks. The method includes the steps of generating a first task metadata data structure in a memory, generating a second task metadata data structure in the memory, executing a first task corresponding to the first task metadata data structure in a processor, generating state information representing a continuation task related to the first task and storing the state information in the second task metadata data structure, executing the continuation task in the processor after the one or more child tasks have finished execution, and indicating that the first task has logically finished execution once the continuation task has finished execution. The second task metadata data structure is related to the first task metadata data structure, and at least one instruction in the first task causes one or more child tasks to be executed by the processor.

Type: Grant

Filed: May 8, 2013

Date of Patent: February 9, 2016

Assignee: NVIDIA Corporation

Inventors: Scott Ricketts, Luke David Durant, Brian Scott Pharris, Igor Sevastiyanov, Nicholas Wang
Multi-level instruction cache prefetching

Patent number: 9110810

Abstract: One embodiment of the present invention sets forth an improved way to prefetch instructions in a multi-level cache. Fetch unit initiates a prefetch operation to transfer one of a set of multiple cache lines, based on a function of a pseudorandom number generator and the sector corresponding to the current instruction L1 cache line. The fetch unit selects a prefetch target from the set of multiple cache lines according to some probability function. If the current instruction L1 cache 370 is located within the first sector of the corresponding L1.5 cache line, then the selected prefetch target is located at a sector within the next L1.5 cache line. The result is that the instruction L1 cache hit rate is improved and instruction fetch latency is reduced, even where the processor consumes instructions in the instruction L1 cache at a fast rate.

Type: Grant

Filed: December 6, 2011

Date of Patent: August 18, 2015

Assignee: NVIDIA CORPORATION

Inventors: Nicholas Wang, Jack Hilaire Choquette
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR WARMING A CACHE FOR A TASK LAUNCH

Publication number: 20140372703

Abstract: A system, method, and computer program product for warming a cache for a task launch is described. The method includes the steps of receiving a task data structure that defines a processing task, extracting information stored in a cache warming field of the task data structure, and, prior to executing the processing task, generating a cache warming instruction that is configured to load one or more entries of a cache storage with data fetched from a memory.

Type: Application

Filed: June 14, 2013

Publication date: December 18, 2014

Inventors: Scott Ricketts, Nicholas Wang, Shirish Gadre, Gentaro Hirota, Robert Ohannessian, JR.
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR SCHEDULING TASKS ASSOCIATED WITH CONTINUATION THREAD BLOCKS

Publication number: 20140337389

Abstract: A system, method, and computer program product for scheduling tasks associated with continuation thread blocks. The method includes the steps of generating a first task metadata data structure in a memory, generating a second task metadata data structure in the memory, executing a first task corresponding to the first task metadata data structure in a processor, generating state information representing a continuation task related to the first task and storing the state information in the second task metadata data structure, executing the continuation task in the processor after the one or more child tasks have finished execution, and indicating that the first task has logically finished execution once the continuation task has finished execution. The second task metadata data structure is related to the first task metadata data structure, and at least one instruction in the first task causes one or more child tasks to be executed by the processor.

Type: Application

Filed: May 8, 2013

Publication date: November 13, 2014

Applicant: NVIDIA Corporation

Inventors: Scott Ricketts, Luke David Durant, Brian Scott Pharris, Igor Sevastiyanov, Nicholas Wang
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR LOW LATENCY SCHEDULING AND LAUNCH OF MEMORY DEFINED TASKS

Publication number: 20140337569

Abstract: A system, method, and computer program product for low-latency scheduling and launch of memory defined tasks. The method includes the steps of receiving a task metadata data structure to be stored in a memory associated with a processor, transmitting the task metadata data structure to a scheduling unit of the processor, storing the task metadata data structure in a cache unit included in the scheduling unit, and copying the task metadata data structure from the cache unit to the memory.

Type: Application

Filed: May 8, 2013

Publication date: November 13, 2014

Applicant: Nvidia Corporation

Inventors: Scott Ricketts, Brian Scott Pharris, Nicholas Wang, Luke David Durant, Philip Alexander Cuadra, Jerome F. Duluk, Jr.
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR MANAGEMENT OF DEPENDENCY BETWEEN TASKS

Publication number: 20140229953

Abstract: A system, method, and computer program product for management of dynamic task-dependency graphs. The method includes the steps of generating a first task data structure in a memory for a first task, generating a second task data structure in the memory, storing a pointer to the second task data structure in a first output dependence field of the first task data structure, setting a reference counter field of the second task data structure to a threshold value that indicates a number of dependent events associated with the second task, and launching the second task when the reference counter field stores a particular value. The second task data structure is a placeholder for a second task that is dependent on the first task.

Type: Application

Filed: February 13, 2013

Publication date: August 14, 2014

Applicant: NVIDIA CORPORATION

Inventors: Igor Sevastiyanov, Brian Matthew Fahs, Nicholas Wang, Scott Ricketts, Luke David Durant, Brian Scott Pharris
COOPERATIVE THREAD ARRAY GRANULARITY CONTEXT SWITCH DURING TRAP HANDLING

Publication number: 20140189329

Abstract: Techniques are provided for handling a trap encountered in a thread that is part of a thread array that is being executed in a plurality of execution units. In these techniques, a data structure with an identifier associated with the thread is updated to indicate that the trap occurred during the execution of the thread array. Also in these techniques, the execution units execute a trap handling routine that includes a context switch. The execution units perform this context switch for at least one of the execution units as part of the trap handling routine while allowing the remaining execution units to exit the trap handling routine before the context switch. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors.

Type: Application

Filed: December 27, 2012

Publication date: July 3, 2014

Applicant: NVIDIA CORPORATION

Inventors: Gerald F. LUIZ, Philip Alexander CUADRA, Luke DURANT, Shirish GADRE, Robert OHANNESSIAN, Lacky V. SHAH, Nicholas WANG, Arthur DANSKIN
COOPERATIVE THREAD ARRAY GRANULARITY CONTEXT SWITCH DURING TRAP HANDLING

Publication number: 20140189711

Abstract: Techniques are provided for restoring thread groups in a cooperative thread array (CTA) within a processing core. Each thread group in the CTA is launched to execute a context restore routine. Each thread group, executes the context restore routine to restore from a memory a first portion of context associated with the thread group, and determines whether the thread group completed an assigned function prior to executing the context restore routine. If the thread group completed an assigned function prior to executing the context restore routine, then the thread group exits the context restore routine. If the thread group did not complete the assigned function prior to executing the context restore routine, then the thread group executes one or more operations associated with a trap handler routine. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors.

Type: Application

Filed: April 15, 2013

Publication date: July 3, 2014

Inventors: Gerald F. Luiz, Phillip Alexander Cuadra, Luke Durant, Shirish Gadre, Robert Ohannessian, Lacky V. Shah, Nicholas Wang, Arthur Merlin Danskin
APPROACH FOR CONTEXT SWITCHING OF LOCK-BIT PROTECTED MEMORY

Publication number: 20140189260

Abstract: A streaming multiprocessor in a parallel processing subsystem processes atomic operations for multiple threads in a multi-threaded architecture. The streaming multiprocessor receives a request from a thread in a thread group to acquire access to a memory location in a lock-protected shared memory, and determines whether a address lock in a plurality of address locks is asserted, where the address lock is associated the memory location. If the address lock is asserted, then the streaming multiprocessor refuses the request. Otherwise, the streaming multiprocessor asserts the address lock, asserts a thread group lock in a plurality of thread group locks, where the thread group lock is associated with the thread group, and grants the request. One advantage of the disclosed techniques is that acquired locks are released when a thread is preempted. As a result, a preempted thread that has previously acquired a lock does not retain the lock indefinitely.

Type: Application

Filed: December 27, 2012

Publication date: July 3, 2014

Applicant: NVIDIA CORPORATION

Inventors: Nicholas WANG, Shirish GADRE, Robert OHANNESSIAN, Lacky V. SHAH, Matthew BROCKMEYER, Stewart Glenn CARLTON
TECHNIQUE FOR SAVING AND RESTORING THREAD GROUP OPERATING STATE

Publication number: 20140165072

Abstract: A streaming multiprocessor (SM) included within a parallel processing unit (PPU) is configured to suspend a thread group executing on the SM and to save the operating state of the suspended thread group. A load-store unit (LSU) within the SM re-maps local memory associated with the thread group to a location in global memory. Subsequently, the SM may re-launch the suspended thread group. The LSU may then perform local memory access operations on behalf of the re-launched thread group with the re-mapped local memory that resides in global memory.

Type: Application

Filed: December 11, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Nicholas WANG, Lacky V. SHAH, Gerald F. LUIZ, Philip Alexander CUADRA, Luke DURANT, Shirish GADRE
Thread group scheduler for computing on a parallel thread processor

Patent number: 8732713

Abstract: A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

Type: Grant

Filed: September 28, 2011

Date of Patent: May 20, 2014

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John Erik Lindholm, Robert J. Stoll, Nicholas Wang, Jack Hilaire Choquette, Kathleen Elliott Nickolls
Trap handler architecture for a parallel processing unit

Patent number: 8522000

Abstract: A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.

Type: Grant

Filed: September 29, 2009

Date of Patent: August 27, 2013

Assignee: Nvidia Corporation

Inventors: Michael C. Shebanow, Jack Choquette, Brett W. Coon, Steven J. Heinrich, Aravind Kalaiah, John R. Nickolls, Daniel Salinas, Ming Y. Siu, Tommy Thorn, Nicholas Wang
MULTI-LEVEL INSTRUCTION CACHE PREFETCHING

Publication number: 20130145102

Abstract: One embodiment of the present invention sets forth an improved way to prefetch instructions in a multi-level cache. Fetch unit initiates a prefetch operation to transfer one of a set of multiple cache lines, based on a function of a pseudorandom number generator and the sector corresponding to the current instruction L1 cache line. The fetch unit selects a prefetch target from the set of multiple cache lines according to some probability function. If the current instruction L1 cache 370 is located within the first sector of the corresponding L1.5 cache line, then the selected prefetch target is located at a sector within the next L1.5 cache line. The result is that the instruction L1 cache hit rate is improved and instruction fetch latency is reduced, even where the processor consumes instructions in the instruction L1 cache at a fast rate.

Type: Application

Filed: December 6, 2011

Publication date: June 6, 2013

Inventors: Nicholas Wang, Jack Hilaire Choquette
INSTRUCTION LEVEL EXECUTION PREEMPTION

Publication number: 20130124838

Abstract: One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.

Type: Application

Filed: November 10, 2011

Publication date: May 16, 2013

Inventors: Lacky V. SHAH, Gregory Scott Palmer, Gernot Schaufler, Samuel H. Duncan, Philip Browning Johnson, Shirish Gadre, Robert Ohannessian, Nicholas Wang, Christopher Lamb, Philip Alexander Cuadra, Timothy John Purcell
THREAD GROUP SCHEDULER FOR COMPUTING ON A PARALLEL THREAD PROCESSOR

Publication number: 20120110586

Abstract: A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

Type: Application

Filed: September 28, 2011

Publication date: May 3, 2012

Inventors: Brett W. Coon, John R. Nickolls, John Erik Lindholm, Robert J. Stoll, Nicholas Wang, Jack Hilaire Choquette, Kathleen Elliott Nickolls
TRAP HANDLER ARCHITECTURE FOR A PARALLEL PROCESSING UNIT

Publication number: 20110078427

Abstract: A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.

Type: Application

Filed: September 29, 2009

Publication date: March 31, 2011

Inventors: Michael C. Shebanow, Jack Choquette, Brett W. Coon, Steven J. Heinrich, Aravind Kalaiah, John R. Nickolls, Daniel Salinas, Ming Y. Siu, Tommy Thorn, Nicholas Wang
DOUBLE DIPPED GLOVES

Publication number: 20100199407

Abstract: A method of making a glove is described. The method comprises dipping a glove form into two different formulations. Each formulation comprises a carboxylated nitrile butadiene rubber with different amounts of covalent and ionic cross linkers. The gloves preferably have a stress retention value of greater than 50%.

Type: Application

Filed: April 23, 2010

Publication date: August 12, 2010

Applicant: Encompass Medical Supplies, Inc.

Inventor: Nicholas Wang

prev 1 2 3 next