Patents by Inventor Timothy John Purcell

Timothy John Purcell has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Order-preserving distributed rasterizer

Patent number: 8941653

Abstract: One embodiment of the present invention sets forth a technique for rendering graphics primitives in parallel while maintaining the API primitive ordering. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives concurrently to multiple rasterizers at rates of multiple primitives per clock while maintaining the primitive ordering for each pixel. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.

Type: Grant

Filed: November 18, 2013

Date of Patent: January 27, 2015

Assignee: NVIDIA Corporation

Inventors: Steven E. Molnar, Emmett M. Kilgariff, John S. Rhoades, Timothy John Purcell, Sean J. Treichler, Ziyad S. Hakura, Franklin C. Crow, James C. Bowman
Hardware-managed virtual buffers using a shared memory for load distribution

Patent number: 8760460

Abstract: One embodiment of the present invention sets forth a technique for using a shared memory to store hardware-managed virtual buffers. A circular buffer is allocated within a general-purpose multi-use cache for storage of primitive attribute data rather than having a dedicated buffer for the storage of the primitive attribute data. The general-purpose multi-use cache is also configured to store other graphics data sinces the space requirement for primitive attribute data storage is highly variable, depending on the number of attributes and the size of primitives. Entries in the circular buffer are allocated as needed and released and invalidated after the primitive attribute data has been consumed. An address to the circular buffer entry is transmitted along with primitive descriptors from object-space processing to the distributed processing in screen-space.

Type: Grant

Filed: May 4, 2010

Date of Patent: June 24, 2014

Assignee: NVIDIA Corporation

Inventors: Emmett M. Kilgariff, Steven E. Molnar, Sean J. Treichler, Johnny S. Rhoades, Gernot Schaufler, Dale L. Kirkland, Cynthia Ann Edgeworth Allison, Karl M. Wurstner, Timothy John Purcell
ORDER-PRESERVING DISTRIBUTED RASTERIZER

Publication number: 20140152652

Abstract: One embodiment of the present invention sets forth a technique for rendering graphics primitives in parallel while maintaining the API primitive ordering. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives concurrently to multiple rasterizers at rates of multiple primitives per clock while maintaining the primitive ordering for each pixel. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.

Type: Application

Filed: November 18, 2013

Publication date: June 5, 2014

Applicant: NVIDIA CORPORATION

Inventors: Steven E. MOLNAR, Emmett M. KILGARIFF, John S. RHOADES, Timothy John PURCELL, Sean J. TREICHLER, Ziyad S. HAKURA, Franklin C. CROW, James C. BOWMAN
EFFICIENT MEMORY VIRTUALIZATION IN MULTI-THREADED PROCESSING UNITS

Publication number: 20140123146

Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.

Type: Application

Filed: October 25, 2012

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, JR., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
CONTROL MECHANISM FOR FINE-TUNED CACHE TO BACKING-STORE SYNCHRONIZATION

Publication number: 20140122809

Abstract: One embodiment of the present invention sets forth a technique for processing commands received by an intermediary cache from one or more clients. The technique involves receiving a first write command from an arbiter unit, where the first write command specifies a first memory address, determining that a first cache line related to a set of cache lines included in the intermediary cache is associated with the first memory address, causing data associated with the first write command to be written into the first cache line, and marking the first cache line as dirty.

Type: Application

Filed: October 30, 2012

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: James Patrick ROBERTSON, Gregory Alan MUTHLER, Hemayet HOSSAIN, Timothy John PURCELL, Karan MEHRA, Peter B. HOLMQVIST, George R. LYNCH
EFFICIENT MEMORY VIRTUALIZATION IN MULTI-THREADED PROCESSING UNITS

Publication number: 20140123145

Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.

Type: Application

Filed: October 25, 2012

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, Jr., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
EFFICIENT MEMORY VIRTUALIZATION IN MULTI-THREADED PROCESSING UNITS

Publication number: 20140122829

Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.

Type: Application

Filed: October 25, 2012

Publication date: May 1, 2014

Applicant: NVIDIA Corporation

Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, JR., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
Rasterization tile coalescer and reorder buffer

Patent number: 8605102

Abstract: A raster unit generates graphic data for specific regions of a display device by processing each graphics primitive in a sequence of graphics primitives. A tile coalescer within the raster unit receives graphic data based on the sequence of graphics primitives processed by the raster unit. The tile coalescer collects graphic data for each region of the display device into a different bin before shading and then outputs each bin separately.

Type: Grant

Filed: December 31, 2009

Date of Patent: December 10, 2013

Assignee: NVIDIA Corporation

Inventors: Timothy John Purcell, Steven E. Molnar
Order-preserving distributed rasterizer

Patent number: 8587581

Abstract: One embodiment of the present invention sets forth a technique for rendering graphics primitives in parallel while maintaining the API primitive ordering. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives concurrently to multiple rasterizers at rates of multiple primitives per clock while maintaining the primitive ordering for each pixel. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.

Type: Grant

Filed: October 15, 2009

Date of Patent: November 19, 2013

Assignee: Nvidia Corporation

Inventors: Steven E. Molnar, Emmett M. Kilgariff, Johnny S. Rhoades, Timothy John Purcell, Sean J. Treichler, Ziyad S. Hakura, Franklin C. Crow, James C. Bowman
AUTOMATIC DEPENDENT TASK LAUNCH

Publication number: 20130198760

Abstract: One embodiment of the present invention sets forth a technique for automatic launching of a dependent task when execution of a first task completes. Automatically launching the dependent task reduces the latency incurred during the transition from the first task to the dependent task. Information associated with the dependent task is encoded as part of the metadata for the first task. When execution of the first task completes a task scheduling unit is notified and the dependent task is launched without requiring any release or acquisition of a semaphore. The information associated with the dependent task includes an enable flag and a pointer to the dependent task. Once the dependent task is launched, the first task is marked as complete so that memory storing the metadata for the first task may be reused to store metadata for a new task.

Type: Application

Filed: January 27, 2012

Publication date: August 1, 2013

Inventors: Philip Alexander CUADRA, Lacky V. Shah, Timothy John Purcell, Gerald F. Luiz, Jerome F. Duluk, JR.
SCHEDULING AND EXECUTION OF COMPUTE TASKS

Publication number: 20130185728

Abstract: One embodiment of the present invention sets forth a technique for assigning a compute task to a first processor included in a plurality of processors. The technique involves analyzing each compute task in a plurality of compute tasks to identify one or more compute tasks that are eligible for assignment to the first processor, where each compute task is listed in a first table and is associated with a priority value and an allocation order that indicates relative time at which the compute task was added to the first table. The technique further involves selecting a first task compute from the identified one or more compute tasks based on at least one of the priority value and the allocation order, and assigning the first compute task to the first processor for execution.

Type: Application

Filed: January 18, 2012

Publication date: July 18, 2013

Inventors: Karim M. Abdalla, Lacky V. Shah, Jerome F. Duluk, JR., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
SCHEDULING AND EXECUTION OF COMPUTE TASKS

Publication number: 20130185725

Abstract: One embodiment of the present invention sets forth a technique for selecting a first processor included in a plurality of processors to receive work related to a compute task. The technique involves analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, receiving, from each of the one or more processors identified as eligible, an availability value that indicates the capacity of the processor to receive new work, selecting a first processor to receive work related to the one compute task based on the availability values received from the one or more processors, and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task.

Type: Application

Filed: January 18, 2012

Publication date: July 18, 2013

Inventors: Karim M. ABDALLA, Lacky V. Shah, Jerome F. Duluk, JR., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
SIGNALING, ORDERING, AND EXECUTION OF DYNAMICALLY GENERATED TASKS IN A PROCESSING SYSTEM

Publication number: 20130160021

Abstract: One embodiment of the present invention sets forth a technique for enabling the insertion of generated tasks into a scheduling pipeline of a multiple processor system allows a compute task that is being executed to dynamically generate a dynamic task and notify a scheduling unit of the multiple processor system without intervention by a CPU. A reflected notification signal is generated in response to a write request when data for the dynamic task is written to a queue. Additional reflected notification signals are generated for other events that occur during execution of a compute task, e.g., to invalidate cache entries storing data for the compute task and to enable scheduling of another compute task.

Type: Application

Filed: December 16, 2011

Publication date: June 20, 2013

Inventors: Timothy John PURCELL, Lacky V. Shah, Jerome F. Duluk, JR., Sean J. Treichler, Karim M. Abdalla, Philip Alexander Cuadra, Brian Pharris
ERROR CHECKING IN OUT-OF-ORDER TASK SCHEDULING

Publication number: 20130152094

Abstract: One embodiment of the present invention sets forth a technique for error-checking a compute task. The technique involves receiving a pointer to a compute task, storing the pointer in a scheduling queue, determining that the compute task should be executed, retrieving the pointer from the scheduling queue, determining via an error-check procedure that the compute task is eligible for execution, and executing the compute task.

Type: Application

Filed: December 9, 2011

Publication date: June 13, 2013

Inventors: Jerome F. Duluk, JR., Timothy John Purcell, Jesse David Hall, Phlip Alexander Cuadra
COMPUTE THREAD ARRAY GRANULARITY EXECUTION PREEMPTION

Publication number: 20130132711

Abstract: One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.

Type: Application

Filed: November 22, 2011

Publication date: May 23, 2013

Inventors: Lacky V. SHAH, Gregory Scott Palmer, Gernot Schaufler, Samuel H. Duncan, Philip Browning Johnson, Shirish Gadre, Timothy John Purcell
INSTRUCTION LEVEL EXECUTION PREEMPTION

Publication number: 20130124838

Abstract: One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.

Type: Application

Filed: November 10, 2011

Publication date: May 16, 2013

Inventors: Lacky V. SHAH, Gregory Scott Palmer, Gernot Schaufler, Samuel H. Duncan, Philip Browning Johnson, Shirish Gadre, Robert Ohannessian, Nicholas Wang, Christopher Lamb, Philip Alexander Cuadra, Timothy John Purcell
COMPUTE WORK DISTRIBUTION REFERENCE COUNTERS

Publication number: 20130117758

Abstract: One embodiment of the present invention sets forth a technique for managing the allocation and release of resources during multi-threaded program execution. Programmable reference counters are initialized to values that limit the amount of resources for allocation to tasks that share the same reference counter. Resource parameters are specified for each task to define the amount of resources allocated for consumption by each array of execution threads that is launched to execute the task. The resource parameters also specify the behavior of the array for acquiring and releasing resources. Finally, during execution of each thread in the array, an exit instruction may be configured to override the release of the resources that were allocated to the array. The resources may then be retained for use by a child task that is generated during execution of a thread.

Type: Application

Filed: November 8, 2011

Publication date: May 9, 2013

Inventors: Philip Alexander Cuadra, Karim M. Abdalla, Jerome F. Duluk, JR., Luke Durant, Gerald F. Luiz, Timothy John Purcell, Lacky V. Shah
SCHEDULING AND MANAGEMENT OF COMPUTE TASKS WITH DIFFERENT EXECUTION PRIORITY LEVELS

Publication number: 20130074088

Abstract: One embodiment of the present invention sets forth a technique for dynamically scheduling and managing compute tasks with different execution priority levels. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes, such as round-robin, priority, and partitioned priority. Each group is maintained as a linked list of pointers to compute tasks that are encoded as queue metadata (QMD) stored in memory. A QMD encapsulates the state needed to execute a compute task. When a task is selected for execution by the scheduling circuitry, the QMD is removed for a group and transferred to a table of active compute tasks. Compute tasks are then selected from the active task table for execution by a streaming multiprocessor.

Type: Application

Filed: September 19, 2011

Publication date: March 21, 2013

Inventors: Timothy John PURCELL, Lacky V. Shah, Jerome F. Duluk, JR.
ORDER-PRESERVING DISTRIBUTED RASTERIZER

Publication number: 20110090220

Abstract: One embodiment of the present invention sets forth a technique for rendering graphics primitives in parallel while maintaining the API primitive ordering. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives concurrently to multiple rasterizers at rates of multiple primitives per clock while maintaining the primitive ordering for each pixel. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.

Type: Application

Filed: October 15, 2009

Publication date: April 21, 2011

Inventors: Steven E. Molnar, Emmett M. Kilgariff, Johnny S. Rhoades, Timothy John Purcell, Sean J. Treichler, Ziyad S. Hakura, Franklin C. Crow, James C. Bowman

prev 1 2