Patents by Inventor Daniel Elliot Wexler

Daniel Elliot Wexler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

TECHNIQUE FOR COMPUTATIONAL NESTED PARALLELISM

Publication number: 20210349763

Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.

Type: Application

Filed: February 5, 2021

Publication date: November 11, 2021

Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, JR., Christopher Lamb
Technique for computational nested parallelism

Patent number: 10915364

Abstract: Apparatuses, systems, and techniques for performing nested kernel execution within a parallel processing subsystem. In at least one embodiment, a parent thread launches a nested child grid on the parallel processing subsystem, and enables the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid.

Type: Grant

Filed: December 2, 2016

Date of Patent: February 9, 2021

Assignee: NVIDIA Corporation

Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Christopher Lamb
TECHNIQUE FOR COMPUTATIONAL NESTED PARALLELISM

Publication number: 20200151016

Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.

Type: Application

Filed: January 17, 2020

Publication date: May 14, 2020

Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Jr., Christopher Lamb
Technique for Computational Nested Parallelism

Publication number: 20170083373

Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.

Type: Application

Filed: December 2, 2016

Publication date: March 23, 2017

Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Christopher Lamb
Technique for computational nested parallelism

Patent number: 9513975

Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.

Type: Grant

Filed: May 2, 2012

Date of Patent: December 6, 2016

Assignee: NVIDIA Corporation

Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Jr., Christopher Lamb
Work-queue-based graphics processing unit work creation

Patent number: 9489245

Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.

Type: Grant

Filed: October 26, 2012

Date of Patent: November 8, 2016

Assignee: NVIDIA Corporation

Inventors: Ignacio Llamas, Craig Ross Duttweiler, Jeffrey A. Bolz, Daniel Elliot Wexler
Concurrent execution of independent streams in multi-channel time slice groups

Patent number: 9442759

Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.

Type: Grant

Filed: December 9, 2011

Date of Patent: September 13, 2016

Assignee: NVIDIA Corporation

Inventors: Samuel H. Duncan, Lacky V. Shah, Sean J. Treichler, Daniel Elliot Wexler, Jerome F. Duluk, Jr., Philip Browning Johnson, Jonathon Stuart Ramsay Evans
API for launching work on a processor

Patent number: 9268601

Abstract: One embodiment of the present invention sets forth a technique for launching work on a processor. The method includes the steps of initializing a first state object within a memory region accessible to a program executing on the processor, populating the first state object with data associated with a first workload that is generated by the program, and triggering the processing of the first workload on the processor according to the data within the first state object.

Type: Grant

Filed: March 31, 2011

Date of Patent: February 23, 2016

Assignee: NVIDIA Corporation

Inventors: Timothy Paul Lottes Farrar, Ignacio Llamas, Daniel Elliot Wexler, Craig Ross Duttweiler
Method for handling state transitions in a network of virtual processing nodes

Patent number: 9147224

Abstract: One embodiment of the present invention sets forth a technique for receiving versions of state objects at one or more stages in a processing pipeline. The method includes receiving a first version of a state object at a first stage in the processing pipeline, determining that the first version of the state object is relevant to the first stage, incrementing a first reference counter associated with the first version of the state object, assigning the first version of the state object to work requests that arrive at the first stage subsequent to the receipt of the first version of the state object, and transmitting the first version of the state object to a second stage in the processing pipeline.

Type: Grant

Filed: November 11, 2011

Date of Patent: September 29, 2015

Assignee: NVIDIA Corporation

Inventors: Sean J. Treichler, Lacky V. Shah, Daniel Elliot Wexler
Work-queue-based graphics processing unit work creation

Patent number: 9135081

Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.

Type: Grant

Filed: October 26, 2012

Date of Patent: September 15, 2015

Assignee: NVIDIA Corporation

Inventors: Ignacio Llamas, Craig Ross Duttweiler, Jeffrey A. Bolz, Daniel Elliot Wexler
Method for handling state transitions in a network of virtual processing nodes

Patent number: 8976185

Abstract: One embodiment of the present invention sets forth a technique for executing an operation once work associated with a version of a state object has been completed. The method includes receiving the version of the state object at a first stage in a processing pipeline, where the version of the state object is associated with a reference count object, determining that the version of the state object is relevant to the first stage, incrementing a counter included in the reference count object, transmitting the version of the state object to a second stage in the processing pipeline, processing work associated with the version of the state object, decrementing the counter, determining that the counter is equal to zero, and in response, executing an operation specified by the reference count object.

Type: Grant

Filed: November 11, 2011

Date of Patent: March 10, 2015

Assignee: NVIDIA Corporation

Inventors: Sean J. Treichler, Lacky V. Shah, Daniel Elliot Wexler
Low latency concurrent computation

Patent number: 8928677

Abstract: One embodiment of the present invention sets forth a technique for performing low latency computation on a parallel processing subsystem. A low latency functional node is exposed to an operating system. The low latency functional node and a generic functional node are configured to target the same underlying processor resource within the parallel processing subsystem. The operating system stores low latency tasks generated by a user application within a low latency command buffer associated with the low latency functional node. The parallel processing subsystem advantageously executes tasks from the low latency command buffer prior to completing execution of tasks in the generic command buffer, thereby reducing completion latency for the low latency tasks.

Type: Grant

Filed: January 24, 2012

Date of Patent: January 6, 2015

Assignee: NVIDIA Corporation

Inventors: Daniel Elliot Wexler, Jeffrey A. Bolz, Jesse David Hall, Philip Alexander Cuadra, Naveen Leekha, Ignacio Llamas
WORK-QUEUE-BASED GRAPHICS PROCESSING UNIT WORK CREATION

Publication number: 20140122838

Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.

Type: Application

Filed: October 26, 2012

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ignacio LLAMAS, Craig Ross DUTTWEILER, Jeffrey A. BOLZ, Daniel Elliot WEXLER
WORK-QUEUE-BASED GRAPHICS PROCESSING UNIT WORK CREATION

Publication number: 20140123144

Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.

Type: Application

Filed: October 26, 2012

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ignacio LLAMAS, Craig Ross DUTTWEILER, Jeffrey A. BOLZ, Daniel Elliot WEXLER
Re-render acceleration of frame with lighting change

Patent number: 8633927

Abstract: In an example embodiment, 3D graphics object information associated with a render of a frame may be stored in an object-indexed cache in a memory. The 3D graphics object information comprises results for one or more shading operations further comprises one or more input values for the one or more shading operations.

Type: Grant

Filed: July 25, 2006

Date of Patent: January 21, 2014

Assignee: nVidia Corporation

Inventors: Radomir Mech, Larry I. Gritz, Eric B. Enderton, John F. Schlag, Daniel Elliot Wexler, Philip A. Nemec
TECHNIQUE FOR COMPUTATIONAL NESTED PARALLELISM

Publication number: 20130298133

Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.

Type: Application

Filed: May 2, 2012

Publication date: November 7, 2013

Inventors: Stephen JONES, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, JR., Christopher Lamb
LOW LATENCY CONCURRENT COMPUTATION

Publication number: 20130187935

Abstract: One embodiment of the present invention sets forth a technique for performing low latency computation on a parallel processing subsystem. A low latency functional node is exposed to an operating system. The low latency functional node and a generic functional node are configured to target the same underlying processor resource within the parallel processing subsystem. The operating system stores low latency tasks generated by a user application within a low latency command buffer associated with the low latency functional node. The parallel processing subsystem advantageously executes tasks from the low latency command buffer prior to completing execution of tasks in the generic command buffer, thereby reducing completion latency for the low latency tasks.

Type: Application

Filed: January 24, 2012

Publication date: July 25, 2013

Inventors: Daniel Elliot Wexler, Jeffrey A. Bolz, Jesse David Hall, Philip Alexander Cuadra, Naveen Leekha, Ignacio Llamas
Multi-Channel Time Slice Groups

Publication number: 20130152093

Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.

Type: Application

Filed: December 9, 2011

Publication date: June 13, 2013

Inventors: Samuel H. DUNCAN, Lacky V. SHAH, Sean J. TREICHLER, Daniel Elliot WEXLER, Jerome F. DULUK, JR., Phillip Browning JOHNSON, Jonathon Stuart Ramsay EVANS
METHOD FOR HANDLING STATE TRANSITIONS IN A NETWORK OF VIRTUAL PROCESSING NODES

Publication number: 20130120413

Abstract: One embodiment of the present invention sets forth a technique for receiving versions of state objects at one or more stages in a processing pipeline. The method includes receiving a first version of a state object at a first stage in the processing pipeline, determining that the first version of the state object is relevant to the first stage, incrementing a first reference counter associated with the first version of the state object, assigning the first version of the state object to work requests that arrive at the first stage subsequent to the receipt of the first version of the state object, and transmitting the first version of the state object to a second stage in the processing pipeline.

Type: Application

Filed: November 11, 2011

Publication date: May 16, 2013

Inventors: Sean J. TREICHLER, Lacky V. Shah, Daniel Elliot Wexler
METHOD FOR HANDLING STATE TRANSITIONS IN A NETWORK OF VIRTUAL PROCESSING NODES

Publication number: 20130120412

Abstract: One embodiment of the present invention sets forth a technique for executing an operation once work associated with a version of a state object has been completed. The method includes receiving the version of the state object at a first stage in a processing pipeline, where the version of the state object is associated with a reference count object, determining that the version of the state object is relevant to the first stage, incrementing a counter included in the reference count object, transmitting the version of the state object to a second stage in the processing pipeline, processing work associated with the version of the state object, decrementing the counter, determining that the counter is equal to zero, and in response, executing an operation specified by the reference count object.

Type: Application

Filed: November 11, 2011

Publication date: May 16, 2013

Inventors: Sean J. TREICHLER, Lacky V. Shah, Daniel Elliot Wexler

1 2 next