Patents by Inventor Robert OHANNESSIAN

Robert OHANNESSIAN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Cooperative thread array granularity context switch during trap handling

Patent number: 9448837

Abstract: Techniques are provided for restoring thread groups in a cooperative thread array (CTA) within a processing core. Each thread group in the CTA is launched to execute a context restore routine. Each thread group, executes the context restore routine to restore from a memory a first portion of context associated with the thread group, and determines whether the thread group completed an assigned function prior to executing the context restore routine. If the thread group completed an assigned function prior to executing the context restore routine, then the thread group exits the context restore routine. If the thread group did not complete the assigned function prior to executing the context restore routine, then the thread group executes one or more operations associated with a trap handler routine. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors.

Type: Grant

Filed: April 15, 2013

Date of Patent: September 20, 2016

Assignee: NVIDIA Corporation

Inventors: Gerald F. Luiz, Philip Alexander Cuadra, Luke Durant, Shirish Gadre, Robert Ohannessian, Lacky V. Shah, Nicholas Wang, Arthur Merlin Danskin
Technique for counting values in a register

Patent number: 9361105

Abstract: A parallel counter accesses data generated by an application and stored within a register. The register includes different segments that include different portions of the application data. The parallel counter is configured to count the number of values within each segment that have a particular characteristic in a parallel fashion. The parallel counter may then return the individual segment counts to the application, or combine those segment counts and return a register count to the application. Advantageously, applications that rely on population count operations may be accelerated. Further, increasing the number of segments in a given register may reduce the time needed to count the values in that register, thereby providing a scalable solution to population counting. Additionally, the architecture of the parallel counter is sufficiently flexible to allow both register counting and segment counting, thereby combining two separate functionalities into just one hardware unit.

Type: Grant

Filed: September 20, 2013

Date of Patent: June 7, 2016

Assignee: NVIDIA Corporation

Inventors: Robert Ohannessian, Brian Fahs
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR IMPLEMENTING SOFTWARE-BASED SCOREBOARDING

Publication number: 20150220341

Abstract: A system, method, and computer program product are provided for implementing a software-based scoreboarding mechanism. The method includes the steps of receiving a dependency barrier instruction that includes an immediate value and an identifier corresponding to a first register and, based on a comparison of the immediate value to the value stored in the first register, dispatching a subsequent instruction to at least a first processing unit of two or more processing units.

Type: Application

Filed: February 3, 2014

Publication date: August 6, 2015

Applicant: NVIDIA Corporation

Inventors: Robert Ohannessian, JR., Michael Alan Fetterman, Olivier Giroux, Jack H. Choquette, Xiaogang Qiu, Shirish Gadre, Meenaradchagan Vishnu
SYSTEM AND PROCESSOR FOR IMPLEMENTING INTERRUPTIBLE BATCHES OF INSTRUCTIONS

Publication number: 20150212819

Abstract: A system, method, and computer program product are provided for scheduling interruptible hatches of instructions for execution by one or more functional units of a processor. The method includes the steps of receiving a batch of instructions that includes a plurality of instructions and dispatching at least one instruction from the batch of instructions to one or more functional units for execution. The method further includes the step of receiving an interrupt request that causes an interrupt routine to be dispatched to the one or more functional units prior to all instructions in the batch of instructions being dispatched to the one or more functional units. When the interrupt request is received, the method further includes the step of storing batch-level resources in a memory to resume execution of the batch of instructions once the interrupt routine has finished execution.

Type: Application

Filed: January 30, 2014

Publication date: July 30, 2015

Applicant: NVIDIA Corporation

Inventors: Olivier Giroux, Robert Ohannessian, JR., Jack H. Choquette, Michael Alan Fetterman
SYSTEM AND PROCESSOR THAT INCLUDE AN IMPLEMENTATION OF DECOUPLED PIPELINES

Publication number: 20150193272

Abstract: A system and apparatus are provided that include an implementation for decoupled pipelines. The apparatus includes a scheduler configured to issue instructions to one or more functional units and a functional unit coupled to a queue having a number of slots for storing instructions. The instructions issued to the functional unit are stored in the queue until the functional unit is available to process the instructions.

Type: Application

Filed: January 3, 2014

Publication date: July 9, 2015

Applicant: NVIDIA Corporation

Inventors: Olivier Giroux, Michael Alan Fetterman, Robert Ohannessian, JR., Shirish Gadre, Jack H. Choquette, Xiaogang Qiu, Jeffrey Scott Tuckey, Robert James Stoll
Prioritized Memory Reads

Publication number: 20150193358

Abstract: A system includes a processing unit and a memory system coupled to the processing unit. The processing unit is configured to mark a memory access in the series of instructions as a priority memory access as a consequence of the memory access having a dependent instruction following less than a threshold distance after the memory access in the series of instructions. The processing unit is configured to send the marked memory access to the memory system.

Type: Application

Filed: January 6, 2014

Publication date: July 9, 2015

Applicant: NVIDIA Corporation

Inventors: James M. Van Dyke, Robert Ohannessian, JR.
DYNAMICALLY DETECTING UNIFORMITY AND ELIMINATING REDUNDANT COMPUTATIONS TO REDUCE POWER CONSUMPTION

Publication number: 20150100764

Abstract: One embodiment of the present invention includes techniques to decrease power consumption by reducing the number of redundant operations performed. In operation, a streamlining multiprocessor (SM) identifies uniform groups of threads that, when executed, apply the same deterministic operation to uniform sets of input operands. Within each uniform group of threads, the SM designates one thread as the anchor thread. The SM disables execution units assigned to all of the threads except the anchor thread. The anchor execution unit, assigned to the anchor thread, executes the operation on the uniform set of input operands. Subsequently, the SM sets the outputs of the non-anchor threads included in the uniform group of threads to equal the value of the anchor execution unit output. Advantageously, by exploiting the uniformity of data to reduce the number of execution units that execute, the SM dramatically reduces the power consumption compared to conventional SMs.

Type: Application

Filed: October 8, 2013

Publication date: April 9, 2015

Applicant: NVIDIA CORPORATION

Inventors: Gary M. TAROLLI, John H. EDMONDSON, John Matthew BURGESS, Robert OHANNESSIAN
TECHNIQUE FOR COUNTING VALUES IN A REGISTER

Publication number: 20150089207

Abstract: A parallel counter accesses data generated by an application and stored within a register. The register includes different segments that include different portions of the application data. The parallel counter is configured to count the number of values within each segment that have a particular characteristic in a parallel fashion. The parallel counter may then return the individual segment counts to the application, or combine those segment counts and return a register count to the application. Advantageously, applications that rely on population count operations may be accelerated. Further, increasing the number of segments in a given register may reduce the time needed to count the values in that register, thereby providing a scalable solution to population counting. Additionally, the architecture of the parallel counter is sufficiently flexible to allow both register counting and segment counting, thereby combining two separate functionalities into just one hardware unit.

Type: Application

Filed: September 20, 2013

Publication date: March 26, 2015

Applicant: NVIDIA CORPORATION

Inventors: Robert OHANNESSIAN, Brian FAHS
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR MANAGING OUT-OF-ORDER EXECUTION OF PROGRAM INSTRUCTIONS

Publication number: 20150026442

Abstract: A method, system and computer program product embodied on a computer-readable medium are provided for managing the execution of out-of-order instructions. The method includes the steps of receiving a plurality of instructions and identifying a subset of instructions in the plurality of instructions to be executed out-of-order.

Type: Application

Filed: July 18, 2013

Publication date: January 22, 2015

Applicant: NVIDIA Corporation

Inventors: Olivier Giroux, Robert Ohannessian, Jr., Jack H. Choquette, William Parsons Newhall, Jr.
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR WARMING A CACHE FOR A TASK LAUNCH

Publication number: 20140372703

Abstract: A system, method, and computer program product for warming a cache for a task launch is described. The method includes the steps of receiving a task data structure that defines a processing task, extracting information stored in a cache warming field of the task data structure, and, prior to executing the processing task, generating a cache warming instruction that is configured to load one or more entries of a cache storage with data fetched from a memory.

Type: Application

Filed: June 14, 2013

Publication date: December 18, 2014

Inventors: Scott Ricketts, Nicholas Wang, Shirish Gadre, Gentaro Hirota, Robert Ohannessian, JR.
APPROACH FOR CONTEXT SWITCHING OF LOCK-BIT PROTECTED MEMORY

Publication number: 20140189260

Abstract: A streaming multiprocessor in a parallel processing subsystem processes atomic operations for multiple threads in a multi-threaded architecture. The streaming multiprocessor receives a request from a thread in a thread group to acquire access to a memory location in a lock-protected shared memory, and determines whether a address lock in a plurality of address locks is asserted, where the address lock is associated the memory location. If the address lock is asserted, then the streaming multiprocessor refuses the request. Otherwise, the streaming multiprocessor asserts the address lock, asserts a thread group lock in a plurality of thread group locks, where the thread group lock is associated with the thread group, and grants the request. One advantage of the disclosed techniques is that acquired locks are released when a thread is preempted. As a result, a preempted thread that has previously acquired a lock does not retain the lock indefinitely.

Type: Application

Filed: December 27, 2012

Publication date: July 3, 2014

Applicant: NVIDIA CORPORATION

Inventors: Nicholas WANG, Shirish GADRE, Robert OHANNESSIAN, Lacky V. SHAH, Matthew BROCKMEYER, Stewart Glenn CARLTON
COOPERATIVE THREAD ARRAY GRANULARITY CONTEXT SWITCH DURING TRAP HANDLING

Publication number: 20140189329

Abstract: Techniques are provided for handling a trap encountered in a thread that is part of a thread array that is being executed in a plurality of execution units. In these techniques, a data structure with an identifier associated with the thread is updated to indicate that the trap occurred during the execution of the thread array. Also in these techniques, the execution units execute a trap handling routine that includes a context switch. The execution units perform this context switch for at least one of the execution units as part of the trap handling routine while allowing the remaining execution units to exit the trap handling routine before the context switch. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors.

Type: Application

Filed: December 27, 2012

Publication date: July 3, 2014

Applicant: NVIDIA CORPORATION

Inventors: Gerald F. LUIZ, Philip Alexander CUADRA, Luke DURANT, Shirish GADRE, Robert OHANNESSIAN, Lacky V. SHAH, Nicholas WANG, Arthur DANSKIN
COOPERATIVE THREAD ARRAY GRANULARITY CONTEXT SWITCH DURING TRAP HANDLING

Publication number: 20140189711

Abstract: Techniques are provided for restoring thread groups in a cooperative thread array (CTA) within a processing core. Each thread group in the CTA is launched to execute a context restore routine. Each thread group, executes the context restore routine to restore from a memory a first portion of context associated with the thread group, and determines whether the thread group completed an assigned function prior to executing the context restore routine. If the thread group completed an assigned function prior to executing the context restore routine, then the thread group exits the context restore routine. If the thread group did not complete the assigned function prior to executing the context restore routine, then the thread group executes one or more operations associated with a trap handler routine. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors.

Type: Application

Filed: April 15, 2013

Publication date: July 3, 2014

Inventors: Gerald F. Luiz, Phillip Alexander Cuadra, Luke Durant, Shirish Gadre, Robert Ohannessian, Lacky V. Shah, Nicholas Wang, Arthur Merlin Danskin
PRIMITIVE RE-ORDERING BETWEEN WORLD-SPACE AND SCREEN-SPACE PIPELINES WITH BUFFER LIMITED PROCESSING

Publication number: 20140118381

Abstract: One embodiment of the present invention includes approaches for processing graphics primitives associated with cache tiles when rendering an image. A set of graphics primitives associated with a first render target configuration is received from a first portion of a graphics processing pipeline, and the set of graphics primitives is stored in a memory. A condition is detected indicating that the set of graphics primitives is ready for processing, and a cache tile is selected that intersects at least one graphics primitive in the set of graphics primitives. At least one graphics primitive in the set of graphics primitives that intersects the cache tile is transmitted to a second portion of the graphics processing pipeline for processing. One advantage of the disclosed embodiments is that graphics primitives and associated data are more likely to remain stored on-chip during cache tile rendering, thereby reducing power consumption and improving rendering performance.

Type: Application

Filed: September 10, 2013

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ziyad S. HAKURA, Robert OHANNESSIAN, Cynthia ALLISON, Dale L. KIRKLAND
Primitive re-ordering between world-space and screen-space pipelines with buffer limited processing

Patent number: 8704826

Abstract: One embodiment of the present invention includes approaches for processing graphics primitives associated with cache tiles when rendering an image. A set of graphics primitives associated with a first render target configuration is received from a first portion of a graphics processing pipeline, and the set of graphics primitives is stored in a memory. A condition is detected indicating that the set of graphics primitives is ready for processing, and a cache tile is selected that intersects at least one graphics primitive in the set of graphics primitives. At least one graphics primitive in the set of graphics primitives that intersects the cache tile is transmitted to a second portion of the graphics processing pipeline for processing. One advantage of the disclosed embodiments is that graphics primitives and associated data are more likely to remain stored on-chip during cache tile rendering, thereby reducing power consumption and improving rendering performance.

Type: Grant

Filed: September 10, 2013

Date of Patent: April 22, 2014

Assignee: Nvidia Corporation

Inventors: Ziyad S. Hakura, Robert Ohannessian, Cynthia Allison, Dale L. Kirkland
Alpha-to-coverage value determination using virtual samples

Patent number: 8669999

Abstract: One embodiment of the present invention sets forth a technique for converting alpha values into pixel coverage masks. Geometric coverage is sampled at a number of “real” sample positions within each pixel. Color and depth values are computed for each of these real samples. Fragment alpha values are used to determine an alpha coverage mask for the real samples and additional “virtual” samples, in which the number of bits set in the mask bits is proportional to the alpha value. An alpha-to-coverage mode uses the virtual samples to increase the number of transparency levels for each pixel compared with using only real samples. The alpha-to-coverage mode may be used in conjunction with virtual coverage anti-aliasing to provide higher-quality transparency for rendering anti-aliased images.

Type: Grant

Filed: October 14, 2010

Date of Patent: March 11, 2014

Assignee: NVIDIA Corporation

Inventors: Walter E. Donovan, Emmett M. Kilgariff, Steven E. Molnar, Christian Amsinck, Robert Ohannessian
Cull before vertex attribute fetch and vertex lighting

Patent number: 8564616

Abstract: One embodiment of the invention sets forth a mechanism for compiling a vertex shader program into two portions, a culling portion and a shading portion. The culling portion of the compiled vertex shader program specifies vertex attributes and instructions of the vertex shader program needed to determine whether early vertex culling operations should be performed on a batch of vertices associated with one or more primitives of a graphics scene. The shading portion of the compiled vertex shader program specifies the remaining vertex attributes and instructions of the vertex shader program for performing vertex lighting and performing other operations on the vertices in the batch of vertices. When the compiled vertex shader program is executed by graphics processing hardware, the shading portion of the compiled vertex shader is executed only when early vertex culling operations are not performed on the batch of vertices.

Type: Grant

Filed: July 17, 2009

Date of Patent: October 22, 2013

Assignee: Nvidia Corporation

Inventors: Ziyad S. Hakura, John Erik Lindholm, Emmett M. Kilgariff, Robert Ohannessian, Scott R. Whitman, James C. Bowman, Patrick R. Brown, Ross A. Cunniff
Cull before vertex attribute fetch and vertex lighting

Patent number: 8542247

Abstract: One embodiment of the invention sets forth a mechanism for compiling a vertex shader program into two portions, a culling portion and a shading portion. The culling portion of the compiled vertex shader program specifies vertex attributes and instructions of the vertex shader program needed to determine whether early vertex culling operations should be performed on a batch of vertices associated with one or more primitives of a graphics scene. The shading portion of the compiled vertex shader program specifies the remaining vertex attributes and instructions of the vertex shader program for performing vertex lighting and performing other operations on the vertices in the batch of vertices. When the compiled vertex shader program is executed by graphics processing hardware, the shading portion of the compiled vertex shader is executed only when early vertex culling operations are not performed on the batch of vertices.

Type: Grant

Filed: July 17, 2009

Date of Patent: September 24, 2013

Assignee: Nvidia Corporation

Inventors: Ziyad S. Hakura, John Erik Lindholm, Emmett M. Kilgariff, Robert Ohannessian, Scott R. Whitman, James C. Bowman, Patrick R. Brown, Ross A. Cunniff
INSTRUCTION LEVEL EXECUTION PREEMPTION

Publication number: 20130124838

Abstract: One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.

Type: Application

Filed: November 10, 2011

Publication date: May 16, 2013

Inventors: Lacky V. SHAH, Gregory Scott Palmer, Gernot Schaufler, Samuel H. Duncan, Philip Browning Johnson, Shirish Gadre, Robert Ohannessian, Nicholas Wang, Christopher Lamb, Philip Alexander Cuadra, Timothy John Purcell
ALPHA-TO-COVERAGE VALUE DETERMINATION USING VIRTUAL SAMPLES

Publication number: 20110090251

Abstract: One embodiment of the present invention sets forth a technique for converting alpha values into pixel coverage masks. Geometric coverage is sampled at a number of “real” sample positions within each pixel. Color and depth values are computed for each of these real samples. Fragment alpha values are used to determine an alpha coverage mask for the real samples and additional “virtual” samples, in which the number of bits set in the mask bits is proportional to the alpha value. An alpha-to-coverage mode uses the virtual samples to increase the number of transparency levels for each pixel compared with using only real samples. The alpha-to-coverage mode may be used in conjunction with virtual coverage anti-aliasing to provide higher-quality transparency for rendering anti-aliased images.

Type: Application

Filed: October 14, 2010

Publication date: April 21, 2011

Inventors: Walter E. Donovan, Emmett M. Kilgariff, Steven E. Molnar, Christian Amsinck, Robert Ohannessian

prev 1 2 3 next