Patents by Inventor Daniel Elliot Wexler

Daniel Elliot Wexler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210349763
    Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.
    Type: Application
    Filed: February 5, 2021
    Publication date: November 11, 2021
    Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, JR., Christopher Lamb
  • Patent number: 10915364
    Abstract: Apparatuses, systems, and techniques for performing nested kernel execution within a parallel processing subsystem. In at least one embodiment, a parent thread launches a nested child grid on the parallel processing subsystem, and enables the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid.
    Type: Grant
    Filed: December 2, 2016
    Date of Patent: February 9, 2021
    Assignee: NVIDIA Corporation
    Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Christopher Lamb
  • Publication number: 20200151016
    Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.
    Type: Application
    Filed: January 17, 2020
    Publication date: May 14, 2020
    Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Jr., Christopher Lamb
  • Publication number: 20170083373
    Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.
    Type: Application
    Filed: December 2, 2016
    Publication date: March 23, 2017
    Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Christopher Lamb
  • Patent number: 9513975
    Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.
    Type: Grant
    Filed: May 2, 2012
    Date of Patent: December 6, 2016
    Assignee: NVIDIA Corporation
    Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Jr., Christopher Lamb
  • Patent number: 9489245
    Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.
    Type: Grant
    Filed: October 26, 2012
    Date of Patent: November 8, 2016
    Assignee: NVIDIA Corporation
    Inventors: Ignacio Llamas, Craig Ross Duttweiler, Jeffrey A. Bolz, Daniel Elliot Wexler
  • Patent number: 9442759
    Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.
    Type: Grant
    Filed: December 9, 2011
    Date of Patent: September 13, 2016
    Assignee: NVIDIA Corporation
    Inventors: Samuel H. Duncan, Lacky V. Shah, Sean J. Treichler, Daniel Elliot Wexler, Jerome F. Duluk, Jr., Philip Browning Johnson, Jonathon Stuart Ramsay Evans
  • Patent number: 9268601
    Abstract: One embodiment of the present invention sets forth a technique for launching work on a processor. The method includes the steps of initializing a first state object within a memory region accessible to a program executing on the processor, populating the first state object with data associated with a first workload that is generated by the program, and triggering the processing of the first workload on the processor according to the data within the first state object.
    Type: Grant
    Filed: March 31, 2011
    Date of Patent: February 23, 2016
    Assignee: NVIDIA Corporation
    Inventors: Timothy Paul Lottes Farrar, Ignacio Llamas, Daniel Elliot Wexler, Craig Ross Duttweiler
  • Patent number: 9147224
    Abstract: One embodiment of the present invention sets forth a technique for receiving versions of state objects at one or more stages in a processing pipeline. The method includes receiving a first version of a state object at a first stage in the processing pipeline, determining that the first version of the state object is relevant to the first stage, incrementing a first reference counter associated with the first version of the state object, assigning the first version of the state object to work requests that arrive at the first stage subsequent to the receipt of the first version of the state object, and transmitting the first version of the state object to a second stage in the processing pipeline.
    Type: Grant
    Filed: November 11, 2011
    Date of Patent: September 29, 2015
    Assignee: NVIDIA Corporation
    Inventors: Sean J. Treichler, Lacky V. Shah, Daniel Elliot Wexler
  • Patent number: 9135081
    Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.
    Type: Grant
    Filed: October 26, 2012
    Date of Patent: September 15, 2015
    Assignee: NVIDIA Corporation
    Inventors: Ignacio Llamas, Craig Ross Duttweiler, Jeffrey A. Bolz, Daniel Elliot Wexler
  • Patent number: 8976185
    Abstract: One embodiment of the present invention sets forth a technique for executing an operation once work associated with a version of a state object has been completed. The method includes receiving the version of the state object at a first stage in a processing pipeline, where the version of the state object is associated with a reference count object, determining that the version of the state object is relevant to the first stage, incrementing a counter included in the reference count object, transmitting the version of the state object to a second stage in the processing pipeline, processing work associated with the version of the state object, decrementing the counter, determining that the counter is equal to zero, and in response, executing an operation specified by the reference count object.
    Type: Grant
    Filed: November 11, 2011
    Date of Patent: March 10, 2015
    Assignee: NVIDIA Corporation
    Inventors: Sean J. Treichler, Lacky V. Shah, Daniel Elliot Wexler
  • Patent number: 8928677
    Abstract: One embodiment of the present invention sets forth a technique for performing low latency computation on a parallel processing subsystem. A low latency functional node is exposed to an operating system. The low latency functional node and a generic functional node are configured to target the same underlying processor resource within the parallel processing subsystem. The operating system stores low latency tasks generated by a user application within a low latency command buffer associated with the low latency functional node. The parallel processing subsystem advantageously executes tasks from the low latency command buffer prior to completing execution of tasks in the generic command buffer, thereby reducing completion latency for the low latency tasks.
    Type: Grant
    Filed: January 24, 2012
    Date of Patent: January 6, 2015
    Assignee: NVIDIA Corporation
    Inventors: Daniel Elliot Wexler, Jeffrey A. Bolz, Jesse David Hall, Philip Alexander Cuadra, Naveen Leekha, Ignacio Llamas
  • Publication number: 20140122838
    Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.
    Type: Application
    Filed: October 26, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ignacio LLAMAS, Craig Ross DUTTWEILER, Jeffrey A. BOLZ, Daniel Elliot WEXLER
  • Publication number: 20140123144
    Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.
    Type: Application
    Filed: October 26, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ignacio LLAMAS, Craig Ross DUTTWEILER, Jeffrey A. BOLZ, Daniel Elliot WEXLER
  • Patent number: 8633927
    Abstract: In an example embodiment, 3D graphics object information associated with a render of a frame may be stored in an object-indexed cache in a memory. The 3D graphics object information comprises results for one or more shading operations further comprises one or more input values for the one or more shading operations.
    Type: Grant
    Filed: July 25, 2006
    Date of Patent: January 21, 2014
    Assignee: nVidia Corporation
    Inventors: Radomir Mech, Larry I. Gritz, Eric B. Enderton, John F. Schlag, Daniel Elliot Wexler, Philip A. Nemec
  • Publication number: 20130298133
    Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.
    Type: Application
    Filed: May 2, 2012
    Publication date: November 7, 2013
    Inventors: Stephen JONES, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, JR., Christopher Lamb
  • Publication number: 20130187935
    Abstract: One embodiment of the present invention sets forth a technique for performing low latency computation on a parallel processing subsystem. A low latency functional node is exposed to an operating system. The low latency functional node and a generic functional node are configured to target the same underlying processor resource within the parallel processing subsystem. The operating system stores low latency tasks generated by a user application within a low latency command buffer associated with the low latency functional node. The parallel processing subsystem advantageously executes tasks from the low latency command buffer prior to completing execution of tasks in the generic command buffer, thereby reducing completion latency for the low latency tasks.
    Type: Application
    Filed: January 24, 2012
    Publication date: July 25, 2013
    Inventors: Daniel Elliot Wexler, Jeffrey A. Bolz, Jesse David Hall, Philip Alexander Cuadra, Naveen Leekha, Ignacio Llamas
  • Publication number: 20130152093
    Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.
    Type: Application
    Filed: December 9, 2011
    Publication date: June 13, 2013
    Inventors: Samuel H. DUNCAN, Lacky V. SHAH, Sean J. TREICHLER, Daniel Elliot WEXLER, Jerome F. DULUK, JR., Phillip Browning JOHNSON, Jonathon Stuart Ramsay EVANS
  • Publication number: 20130120413
    Abstract: One embodiment of the present invention sets forth a technique for receiving versions of state objects at one or more stages in a processing pipeline. The method includes receiving a first version of a state object at a first stage in the processing pipeline, determining that the first version of the state object is relevant to the first stage, incrementing a first reference counter associated with the first version of the state object, assigning the first version of the state object to work requests that arrive at the first stage subsequent to the receipt of the first version of the state object, and transmitting the first version of the state object to a second stage in the processing pipeline.
    Type: Application
    Filed: November 11, 2011
    Publication date: May 16, 2013
    Inventors: Sean J. TREICHLER, Lacky V. Shah, Daniel Elliot Wexler
  • Publication number: 20130120412
    Abstract: One embodiment of the present invention sets forth a technique for executing an operation once work associated with a version of a state object has been completed. The method includes receiving the version of the state object at a first stage in a processing pipeline, where the version of the state object is associated with a reference count object, determining that the version of the state object is relevant to the first stage, incrementing a counter included in the reference count object, transmitting the version of the state object to a second stage in the processing pipeline, processing work associated with the version of the state object, decrementing the counter, determining that the counter is equal to zero, and in response, executing an operation specified by the reference count object.
    Type: Application
    Filed: November 11, 2011
    Publication date: May 16, 2013
    Inventors: Sean J. TREICHLER, Lacky V. Shah, Daniel Elliot Wexler