Patents by Inventor Daniel Elliot Wexler
Daniel Elliot Wexler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20210349763Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.Type: ApplicationFiled: February 5, 2021Publication date: November 11, 2021Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, JR., Christopher Lamb
-
Patent number: 10915364Abstract: Apparatuses, systems, and techniques for performing nested kernel execution within a parallel processing subsystem. In at least one embodiment, a parent thread launches a nested child grid on the parallel processing subsystem, and enables the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid.Type: GrantFiled: December 2, 2016Date of Patent: February 9, 2021Assignee: NVIDIA CorporationInventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Christopher Lamb
-
Publication number: 20200151016Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.Type: ApplicationFiled: January 17, 2020Publication date: May 14, 2020Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Jr., Christopher Lamb
-
Publication number: 20170083373Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.Type: ApplicationFiled: December 2, 2016Publication date: March 23, 2017Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Christopher Lamb
-
Patent number: 9513975Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.Type: GrantFiled: May 2, 2012Date of Patent: December 6, 2016Assignee: NVIDIA CorporationInventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Jr., Christopher Lamb
-
Patent number: 9489245Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.Type: GrantFiled: October 26, 2012Date of Patent: November 8, 2016Assignee: NVIDIA CorporationInventors: Ignacio Llamas, Craig Ross Duttweiler, Jeffrey A. Bolz, Daniel Elliot Wexler
-
Patent number: 9442759Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.Type: GrantFiled: December 9, 2011Date of Patent: September 13, 2016Assignee: NVIDIA CorporationInventors: Samuel H. Duncan, Lacky V. Shah, Sean J. Treichler, Daniel Elliot Wexler, Jerome F. Duluk, Jr., Philip Browning Johnson, Jonathon Stuart Ramsay Evans
-
Patent number: 9268601Abstract: One embodiment of the present invention sets forth a technique for launching work on a processor. The method includes the steps of initializing a first state object within a memory region accessible to a program executing on the processor, populating the first state object with data associated with a first workload that is generated by the program, and triggering the processing of the first workload on the processor according to the data within the first state object.Type: GrantFiled: March 31, 2011Date of Patent: February 23, 2016Assignee: NVIDIA CorporationInventors: Timothy Paul Lottes Farrar, Ignacio Llamas, Daniel Elliot Wexler, Craig Ross Duttweiler
-
Patent number: 9147224Abstract: One embodiment of the present invention sets forth a technique for receiving versions of state objects at one or more stages in a processing pipeline. The method includes receiving a first version of a state object at a first stage in the processing pipeline, determining that the first version of the state object is relevant to the first stage, incrementing a first reference counter associated with the first version of the state object, assigning the first version of the state object to work requests that arrive at the first stage subsequent to the receipt of the first version of the state object, and transmitting the first version of the state object to a second stage in the processing pipeline.Type: GrantFiled: November 11, 2011Date of Patent: September 29, 2015Assignee: NVIDIA CorporationInventors: Sean J. Treichler, Lacky V. Shah, Daniel Elliot Wexler
-
Patent number: 9135081Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.Type: GrantFiled: October 26, 2012Date of Patent: September 15, 2015Assignee: NVIDIA CorporationInventors: Ignacio Llamas, Craig Ross Duttweiler, Jeffrey A. Bolz, Daniel Elliot Wexler
-
Patent number: 8976185Abstract: One embodiment of the present invention sets forth a technique for executing an operation once work associated with a version of a state object has been completed. The method includes receiving the version of the state object at a first stage in a processing pipeline, where the version of the state object is associated with a reference count object, determining that the version of the state object is relevant to the first stage, incrementing a counter included in the reference count object, transmitting the version of the state object to a second stage in the processing pipeline, processing work associated with the version of the state object, decrementing the counter, determining that the counter is equal to zero, and in response, executing an operation specified by the reference count object.Type: GrantFiled: November 11, 2011Date of Patent: March 10, 2015Assignee: NVIDIA CorporationInventors: Sean J. Treichler, Lacky V. Shah, Daniel Elliot Wexler
-
Patent number: 8928677Abstract: One embodiment of the present invention sets forth a technique for performing low latency computation on a parallel processing subsystem. A low latency functional node is exposed to an operating system. The low latency functional node and a generic functional node are configured to target the same underlying processor resource within the parallel processing subsystem. The operating system stores low latency tasks generated by a user application within a low latency command buffer associated with the low latency functional node. The parallel processing subsystem advantageously executes tasks from the low latency command buffer prior to completing execution of tasks in the generic command buffer, thereby reducing completion latency for the low latency tasks.Type: GrantFiled: January 24, 2012Date of Patent: January 6, 2015Assignee: NVIDIA CorporationInventors: Daniel Elliot Wexler, Jeffrey A. Bolz, Jesse David Hall, Philip Alexander Cuadra, Naveen Leekha, Ignacio Llamas
-
Publication number: 20140122838Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.Type: ApplicationFiled: October 26, 2012Publication date: May 1, 2014Applicant: NVIDIA CORPORATIONInventors: Ignacio LLAMAS, Craig Ross DUTTWEILER, Jeffrey A. BOLZ, Daniel Elliot WEXLER
-
Publication number: 20140123144Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.Type: ApplicationFiled: October 26, 2012Publication date: May 1, 2014Applicant: NVIDIA CORPORATIONInventors: Ignacio LLAMAS, Craig Ross DUTTWEILER, Jeffrey A. BOLZ, Daniel Elliot WEXLER
-
Patent number: 8633927Abstract: In an example embodiment, 3D graphics object information associated with a render of a frame may be stored in an object-indexed cache in a memory. The 3D graphics object information comprises results for one or more shading operations further comprises one or more input values for the one or more shading operations.Type: GrantFiled: July 25, 2006Date of Patent: January 21, 2014Assignee: nVidia CorporationInventors: Radomir Mech, Larry I. Gritz, Eric B. Enderton, John F. Schlag, Daniel Elliot Wexler, Philip A. Nemec
-
Publication number: 20130298133Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.Type: ApplicationFiled: May 2, 2012Publication date: November 7, 2013Inventors: Stephen JONES, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, JR., Christopher Lamb
-
Publication number: 20130187935Abstract: One embodiment of the present invention sets forth a technique for performing low latency computation on a parallel processing subsystem. A low latency functional node is exposed to an operating system. The low latency functional node and a generic functional node are configured to target the same underlying processor resource within the parallel processing subsystem. The operating system stores low latency tasks generated by a user application within a low latency command buffer associated with the low latency functional node. The parallel processing subsystem advantageously executes tasks from the low latency command buffer prior to completing execution of tasks in the generic command buffer, thereby reducing completion latency for the low latency tasks.Type: ApplicationFiled: January 24, 2012Publication date: July 25, 2013Inventors: Daniel Elliot Wexler, Jeffrey A. Bolz, Jesse David Hall, Philip Alexander Cuadra, Naveen Leekha, Ignacio Llamas
-
Publication number: 20130152093Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.Type: ApplicationFiled: December 9, 2011Publication date: June 13, 2013Inventors: Samuel H. DUNCAN, Lacky V. SHAH, Sean J. TREICHLER, Daniel Elliot WEXLER, Jerome F. DULUK, JR., Phillip Browning JOHNSON, Jonathon Stuart Ramsay EVANS
-
Publication number: 20130120413Abstract: One embodiment of the present invention sets forth a technique for receiving versions of state objects at one or more stages in a processing pipeline. The method includes receiving a first version of a state object at a first stage in the processing pipeline, determining that the first version of the state object is relevant to the first stage, incrementing a first reference counter associated with the first version of the state object, assigning the first version of the state object to work requests that arrive at the first stage subsequent to the receipt of the first version of the state object, and transmitting the first version of the state object to a second stage in the processing pipeline.Type: ApplicationFiled: November 11, 2011Publication date: May 16, 2013Inventors: Sean J. TREICHLER, Lacky V. Shah, Daniel Elliot Wexler
-
Publication number: 20130120412Abstract: One embodiment of the present invention sets forth a technique for executing an operation once work associated with a version of a state object has been completed. The method includes receiving the version of the state object at a first stage in a processing pipeline, where the version of the state object is associated with a reference count object, determining that the version of the state object is relevant to the first stage, incrementing a counter included in the reference count object, transmitting the version of the state object to a second stage in the processing pipeline, processing work associated with the version of the state object, decrementing the counter, determining that the counter is equal to zero, and in response, executing an operation specified by the reference count object.Type: ApplicationFiled: November 11, 2011Publication date: May 16, 2013Inventors: Sean J. TREICHLER, Lacky V. Shah, Daniel Elliot Wexler