Patents by Inventor Nicholas Wang
Nicholas Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9448837Abstract: Techniques are provided for restoring thread groups in a cooperative thread array (CTA) within a processing core. Each thread group in the CTA is launched to execute a context restore routine. Each thread group, executes the context restore routine to restore from a memory a first portion of context associated with the thread group, and determines whether the thread group completed an assigned function prior to executing the context restore routine. If the thread group completed an assigned function prior to executing the context restore routine, then the thread group exits the context restore routine. If the thread group did not complete the assigned function prior to executing the context restore routine, then the thread group executes one or more operations associated with a trap handler routine. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors.Type: GrantFiled: April 15, 2013Date of Patent: September 20, 2016Assignee: NVIDIA CorporationInventors: Gerald F. Luiz, Philip Alexander Cuadra, Luke Durant, Shirish Gadre, Robert Ohannessian, Lacky V. Shah, Nicholas Wang, Arthur Merlin Danskin
-
Patent number: 9378139Abstract: A system, method, and computer program product for low-latency scheduling and launch of memory defined tasks. The method includes the steps of receiving a task metadata data structure to be stored in a memory associated with a processor, transmitting the task metadata data structure to a scheduling unit of the processor, storing the task metadata data structure in a cache unit included in the scheduling unit, and copying the task metadata data structure from the cache unit to the memory.Type: GrantFiled: May 8, 2013Date of Patent: June 28, 2016Assignee: NVIDIA CorporationInventors: Scott Ricketts, Brian Scott Pharris, Nicholas Wang, Luke David Durant, Philip Alexander Cuadra, Jerome F. Duluk, Jr.
-
Patent number: 9286119Abstract: A system, method, and computer program product for management of dynamic task-dependency graphs. The method includes the steps of generating a first task data structure in a memory for a first task, generating a second task data structure in the memory, storing a pointer to the second task data structure in a first output dependence field of the first task data structure, setting a reference counter field of the second task data structure to a threshold value that indicates a number of dependent events associated with the second task, and launching the second task when the reference counter field stores a particular value. The second task data structure is a placeholder for a second task that is dependent on the first task.Type: GrantFiled: February 13, 2013Date of Patent: March 15, 2016Assignee: NVIDIA CorporationInventors: Igor Sevastiyanov, Brian Matthew Fahs, Nicholas Wang, Scott Ricketts, Luke David Durant, Brian Scott Pharris
-
Patent number: 9256623Abstract: A system, method, and computer program product for scheduling tasks associated with continuation thread blocks. The method includes the steps of generating a first task metadata data structure in a memory, generating a second task metadata data structure in the memory, executing a first task corresponding to the first task metadata data structure in a processor, generating state information representing a continuation task related to the first task and storing the state information in the second task metadata data structure, executing the continuation task in the processor after the one or more child tasks have finished execution, and indicating that the first task has logically finished execution once the continuation task has finished execution. The second task metadata data structure is related to the first task metadata data structure, and at least one instruction in the first task causes one or more child tasks to be executed by the processor.Type: GrantFiled: May 8, 2013Date of Patent: February 9, 2016Assignee: NVIDIA CorporationInventors: Scott Ricketts, Luke David Durant, Brian Scott Pharris, Igor Sevastiyanov, Nicholas Wang
-
Patent number: 9110810Abstract: One embodiment of the present invention sets forth an improved way to prefetch instructions in a multi-level cache. Fetch unit initiates a prefetch operation to transfer one of a set of multiple cache lines, based on a function of a pseudorandom number generator and the sector corresponding to the current instruction L1 cache line. The fetch unit selects a prefetch target from the set of multiple cache lines according to some probability function. If the current instruction L1 cache 370 is located within the first sector of the corresponding L1.5 cache line, then the selected prefetch target is located at a sector within the next L1.5 cache line. The result is that the instruction L1 cache hit rate is improved and instruction fetch latency is reduced, even where the processor consumes instructions in the instruction L1 cache at a fast rate.Type: GrantFiled: December 6, 2011Date of Patent: August 18, 2015Assignee: NVIDIA CORPORATIONInventors: Nicholas Wang, Jack Hilaire Choquette
-
Publication number: 20140372703Abstract: A system, method, and computer program product for warming a cache for a task launch is described. The method includes the steps of receiving a task data structure that defines a processing task, extracting information stored in a cache warming field of the task data structure, and, prior to executing the processing task, generating a cache warming instruction that is configured to load one or more entries of a cache storage with data fetched from a memory.Type: ApplicationFiled: June 14, 2013Publication date: December 18, 2014Inventors: Scott Ricketts, Nicholas Wang, Shirish Gadre, Gentaro Hirota, Robert Ohannessian, JR.
-
Publication number: 20140337389Abstract: A system, method, and computer program product for scheduling tasks associated with continuation thread blocks. The method includes the steps of generating a first task metadata data structure in a memory, generating a second task metadata data structure in the memory, executing a first task corresponding to the first task metadata data structure in a processor, generating state information representing a continuation task related to the first task and storing the state information in the second task metadata data structure, executing the continuation task in the processor after the one or more child tasks have finished execution, and indicating that the first task has logically finished execution once the continuation task has finished execution. The second task metadata data structure is related to the first task metadata data structure, and at least one instruction in the first task causes one or more child tasks to be executed by the processor.Type: ApplicationFiled: May 8, 2013Publication date: November 13, 2014Applicant: NVIDIA CorporationInventors: Scott Ricketts, Luke David Durant, Brian Scott Pharris, Igor Sevastiyanov, Nicholas Wang
-
Publication number: 20140337569Abstract: A system, method, and computer program product for low-latency scheduling and launch of memory defined tasks. The method includes the steps of receiving a task metadata data structure to be stored in a memory associated with a processor, transmitting the task metadata data structure to a scheduling unit of the processor, storing the task metadata data structure in a cache unit included in the scheduling unit, and copying the task metadata data structure from the cache unit to the memory.Type: ApplicationFiled: May 8, 2013Publication date: November 13, 2014Applicant: Nvidia CorporationInventors: Scott Ricketts, Brian Scott Pharris, Nicholas Wang, Luke David Durant, Philip Alexander Cuadra, Jerome F. Duluk, Jr.
-
Publication number: 20140229953Abstract: A system, method, and computer program product for management of dynamic task-dependency graphs. The method includes the steps of generating a first task data structure in a memory for a first task, generating a second task data structure in the memory, storing a pointer to the second task data structure in a first output dependence field of the first task data structure, setting a reference counter field of the second task data structure to a threshold value that indicates a number of dependent events associated with the second task, and launching the second task when the reference counter field stores a particular value. The second task data structure is a placeholder for a second task that is dependent on the first task.Type: ApplicationFiled: February 13, 2013Publication date: August 14, 2014Applicant: NVIDIA CORPORATIONInventors: Igor Sevastiyanov, Brian Matthew Fahs, Nicholas Wang, Scott Ricketts, Luke David Durant, Brian Scott Pharris
-
Publication number: 20140189329Abstract: Techniques are provided for handling a trap encountered in a thread that is part of a thread array that is being executed in a plurality of execution units. In these techniques, a data structure with an identifier associated with the thread is updated to indicate that the trap occurred during the execution of the thread array. Also in these techniques, the execution units execute a trap handling routine that includes a context switch. The execution units perform this context switch for at least one of the execution units as part of the trap handling routine while allowing the remaining execution units to exit the trap handling routine before the context switch. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors.Type: ApplicationFiled: December 27, 2012Publication date: July 3, 2014Applicant: NVIDIA CORPORATIONInventors: Gerald F. LUIZ, Philip Alexander CUADRA, Luke DURANT, Shirish GADRE, Robert OHANNESSIAN, Lacky V. SHAH, Nicholas WANG, Arthur DANSKIN
-
Publication number: 20140189711Abstract: Techniques are provided for restoring thread groups in a cooperative thread array (CTA) within a processing core. Each thread group in the CTA is launched to execute a context restore routine. Each thread group, executes the context restore routine to restore from a memory a first portion of context associated with the thread group, and determines whether the thread group completed an assigned function prior to executing the context restore routine. If the thread group completed an assigned function prior to executing the context restore routine, then the thread group exits the context restore routine. If the thread group did not complete the assigned function prior to executing the context restore routine, then the thread group executes one or more operations associated with a trap handler routine. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors.Type: ApplicationFiled: April 15, 2013Publication date: July 3, 2014Inventors: Gerald F. Luiz, Phillip Alexander Cuadra, Luke Durant, Shirish Gadre, Robert Ohannessian, Lacky V. Shah, Nicholas Wang, Arthur Merlin Danskin
-
Publication number: 20140189260Abstract: A streaming multiprocessor in a parallel processing subsystem processes atomic operations for multiple threads in a multi-threaded architecture. The streaming multiprocessor receives a request from a thread in a thread group to acquire access to a memory location in a lock-protected shared memory, and determines whether a address lock in a plurality of address locks is asserted, where the address lock is associated the memory location. If the address lock is asserted, then the streaming multiprocessor refuses the request. Otherwise, the streaming multiprocessor asserts the address lock, asserts a thread group lock in a plurality of thread group locks, where the thread group lock is associated with the thread group, and grants the request. One advantage of the disclosed techniques is that acquired locks are released when a thread is preempted. As a result, a preempted thread that has previously acquired a lock does not retain the lock indefinitely.Type: ApplicationFiled: December 27, 2012Publication date: July 3, 2014Applicant: NVIDIA CORPORATIONInventors: Nicholas WANG, Shirish GADRE, Robert OHANNESSIAN, Lacky V. SHAH, Matthew BROCKMEYER, Stewart Glenn CARLTON
-
Publication number: 20140165072Abstract: A streaming multiprocessor (SM) included within a parallel processing unit (PPU) is configured to suspend a thread group executing on the SM and to save the operating state of the suspended thread group. A load-store unit (LSU) within the SM re-maps local memory associated with the thread group to a location in global memory. Subsequently, the SM may re-launch the suspended thread group. The LSU may then perform local memory access operations on behalf of the re-launched thread group with the re-mapped local memory that resides in global memory.Type: ApplicationFiled: December 11, 2012Publication date: June 12, 2014Applicant: NVIDIA CORPORATIONInventors: Nicholas WANG, Lacky V. SHAH, Gerald F. LUIZ, Philip Alexander CUADRA, Luke DURANT, Shirish GADRE
-
Patent number: 8732713Abstract: A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.Type: GrantFiled: September 28, 2011Date of Patent: May 20, 2014Assignee: NVIDIA CorporationInventors: Brett W. Coon, John Erik Lindholm, Robert J. Stoll, Nicholas Wang, Jack Hilaire Choquette, Kathleen Elliott Nickolls
-
Patent number: 8522000Abstract: A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.Type: GrantFiled: September 29, 2009Date of Patent: August 27, 2013Assignee: Nvidia CorporationInventors: Michael C. Shebanow, Jack Choquette, Brett W. Coon, Steven J. Heinrich, Aravind Kalaiah, John R. Nickolls, Daniel Salinas, Ming Y. Siu, Tommy Thorn, Nicholas Wang
-
Publication number: 20130145102Abstract: One embodiment of the present invention sets forth an improved way to prefetch instructions in a multi-level cache. Fetch unit initiates a prefetch operation to transfer one of a set of multiple cache lines, based on a function of a pseudorandom number generator and the sector corresponding to the current instruction L1 cache line. The fetch unit selects a prefetch target from the set of multiple cache lines according to some probability function. If the current instruction L1 cache 370 is located within the first sector of the corresponding L1.5 cache line, then the selected prefetch target is located at a sector within the next L1.5 cache line. The result is that the instruction L1 cache hit rate is improved and instruction fetch latency is reduced, even where the processor consumes instructions in the instruction L1 cache at a fast rate.Type: ApplicationFiled: December 6, 2011Publication date: June 6, 2013Inventors: Nicholas Wang, Jack Hilaire Choquette
-
Publication number: 20130124838Abstract: One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.Type: ApplicationFiled: November 10, 2011Publication date: May 16, 2013Inventors: Lacky V. SHAH, Gregory Scott Palmer, Gernot Schaufler, Samuel H. Duncan, Philip Browning Johnson, Shirish Gadre, Robert Ohannessian, Nicholas Wang, Christopher Lamb, Philip Alexander Cuadra, Timothy John Purcell
-
Publication number: 20120110586Abstract: A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.Type: ApplicationFiled: September 28, 2011Publication date: May 3, 2012Inventors: Brett W. Coon, John R. Nickolls, John Erik Lindholm, Robert J. Stoll, Nicholas Wang, Jack Hilaire Choquette, Kathleen Elliott Nickolls
-
Publication number: 20110078427Abstract: A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.Type: ApplicationFiled: September 29, 2009Publication date: March 31, 2011Inventors: Michael C. Shebanow, Jack Choquette, Brett W. Coon, Steven J. Heinrich, Aravind Kalaiah, John R. Nickolls, Daniel Salinas, Ming Y. Siu, Tommy Thorn, Nicholas Wang
-
Publication number: 20100199407Abstract: A method of making a glove is described. The method comprises dipping a glove form into two different formulations. Each formulation comprises a carboxylated nitrile butadiene rubber with different amounts of covalent and ionic cross linkers. The gloves preferably have a stress retention value of greater than 50%.Type: ApplicationFiled: April 23, 2010Publication date: August 12, 2010Applicant: Encompass Medical Supplies, Inc.Inventor: Nicholas Wang