Patents by Inventor Brian Fahs

Brian Fahs has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS

Publication number: 20140019724

Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

Type: Application

Filed: September 12, 2013

Publication date: January 16, 2014

Applicant: NVIDIA Corporation

Inventors: Brian FAHS, Ming Y. SIU, Brett W. COON, John R. NICKOLLS, Lars NYLAND
Cooperative thread array reduction and scan operations

Patent number: 8539204

Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

Type: Grant

Filed: September 24, 2010

Date of Patent: September 17, 2013

Assignee: Nvidia Corporation

Inventors: Brian Fahs, Ming Y. Siu, Brett W. Coon, John R. Nickolls, Lars Nyland
SYSTEMS AND METHODS FOR VOTING AMONG PARALLEL THREADS

Publication number: 20120239909

Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.

Type: Application

Filed: May 31, 2012

Publication date: September 20, 2012

Inventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke
Systems and methods for voting among parallel threads

Patent number: 8214625

Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.

Type: Grant

Filed: November 26, 2008

Date of Patent: July 3, 2012

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke
Systems and methods for voting among parallel threads

Patent number: 8200947

Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.

Type: Grant

Filed: March 24, 2008

Date of Patent: June 12, 2012

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke
EFFICIENT IMPLEMENTATION OF ARRAYS OF STRUCTURES ON SIMT AND SIMD ARCHITECTURES

Publication number: 20120089792

Abstract: One embodiment of the present invention sets forth a technique providing an optimized way to allocate and access memory across a plurality of thread/data lanes. Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays. The device driver computes an address within the memory using information about the number of thread/data lanes and parameters from the instruction itself. The result is a memory allocation and access approach where the device driver properly computes the target address in the memory. Advantageously, processing efficiency is improved where memory in a parallel processing subsystem is internally stored and accessed as an array of structures of arrays, proportional to the SIMT/SIMD group width (the number of threads or lanes per execution group).

Type: Application

Filed: September 28, 2011

Publication date: April 12, 2012

Inventors: Brian FAHS, John R. Nickolls, Kathleen Elliott Nickolls, Henry Packard Moreton, Brett W. Coon
Architecture and Instructions for Accessing Multi-Dimensional Formatted Surface Memory

Publication number: 20110074802

Abstract: One embodiment of the present invention sets forth a technique for a program to access multi-dimensional formatted graphics surface memory. Multi-dimensional memory objects called “surfaces” stored in a user-specified data or pixel format and arranged in a graphics optimized layout are accessed by programs using surface instructions. A set of memory access instructions e.g., load, store, reduce, and atomic, referred to as surface instructions, may be used to access the surfaces. Coordinate bounds checking is performed with configurable clamping. Caching behavior may also be specified by the surface instructions. Data format conversion and packing to a specified storage format is supported for store, reduction, and atomic surface instructions. Data format conversion and unpacking from a specified storage format is supported for loads and atomic surface instructions.

Type: Application

Filed: September 24, 2010

Publication date: March 31, 2011

Inventors: John R. Nickolls, Brian Fahs, Lars Nyland, John Erik Lindholm, Richard Craig Johnson
Opcode-Specified Predicatable Warp Post-Synchronization

Publication number: 20110078690

Abstract: One embodiment of the present invention sets forth a technique for performing a method for synchronizing divergent executing threads. The method includes receiving a plurality of instructions that includes at least one set-synchronization instruction and at least one instruction that includes a synchronization command, and determining an active mask that indicates which threads in a plurality of threads are active and which threads in the plurality of threads are disabled. For each instruction included in the plurality of instructions, the instruction is transmitted to each of the active threads included in the plurality of threads. If the instruction is a set-synchronization instruction, then a synchronization token, the active mask and the synchronization point is each pushed onto a stack.

Type: Application

Filed: September 28, 2010

Publication date: March 31, 2011

Inventors: Brian Fahs, Ming Y. Siu, Robert Steven Glanville
COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS

Publication number: 20110078417

Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

Type: Application

Filed: September 24, 2010

Publication date: March 31, 2011

Inventors: Brian FAHS, Ming Y. Siu, Brett W. Coon, John R. Nickolls, Lars Nyland
Cache Operations and Policies For A Multi-Threaded Client

Publication number: 20110078381

Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.

Type: Application

Filed: September 24, 2010

Publication date: March 31, 2011

Inventors: Steven James HEINRICH, Alexander L. Minkin, Brett W. Coon, Rajeshwaran Selvanesan, Robert Steven Glanville, Charles McCarver, Anjana Rajendran, Stewart Glenn Carlton, John R. Nickolls, Brian Fahs
Method and system to analyze inlined functions

Patent number: 7360207

Abstract: A method and a system for examining an inlined function using a performance analysis tool are described. An inlined function is identified in computer code. Upon identification of the inlined function, and for example in response to executing a breakpoint associated with the inlined function, a performance analysis tool is used to perform desired task on the inlined function.

Type: Grant

Filed: December 13, 2001

Date of Patent: April 15, 2008

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Brian Fahs, Robert Hundt, Vinodha Ramasamy, Tara Krishnaswamy
Method and system to instrument virtual function calls

Patent number: 7103878

Abstract: A method and system for analyzing a virtual function. In one embodiment, the present invention determines whether a virtual table exists for a virtual function, and determines a call type for the virtual function. In the present embodiment, provided the virtual table is located, the present invention replaces an existing address for the virtual function with a new address such that the new address points to instrumentation code. In this embodiment, upon a call to the virtual function, the present invention loads the new address from the virtual table such that execution is directed to the instrumentation code. The present embodiment continues execution and executes the instrumentation code such that control is delivered to the instrumentor.

Type: Grant

Filed: December 13, 2001

Date of Patent: September 5, 2006

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Brian Fahs, Robert Hundt, Tara Krishnaswamy
Method and structure for monitoring pollution and prefetches due to speculative accesses

Publication number: 20050251629

Abstract: A method and structure for equipping a cache with information to enable the processor to track and report whether a given speculative access causes prefetches and/or pollutions of the cache. Two types of events are tracked in one of two different ways: first by counting/tracking prefetch operations, either globally or on a per instruction address basis and then by counting/tracking cache pollutions, either globally or on a per instruction address basis.

Type: Application

Filed: May 5, 2004

Publication date: November 10, 2005

Inventors: Brian Fahs, Sreekumar Nair, Santosh Abraham
Method and system to instrument virtual function calls

Publication number: 20030115584

Abstract: A method and system for analyzing a virtual function. In one embodiment, the present invention determines whether a virtual table exists for a virtual function, and determines a call type for the virtual function. In the present embodiment, provided the virtual table is located, the present invention replaces an existing address for the virtual function with a new address such that the new address points to instrumentation code. In this embodiment, upon a call to the virtual function, the present invention loads the new address from the virtual table such that execution is directed to the instrumentation code. The present embodiment continues execution and executes the instrumentation code such that control is delivered to the instrumentor.

Type: Application

Filed: December 13, 2001

Publication date: June 19, 2003

Inventors: Brian Fahs, Robert Hundt, Tara Krishnaswamy
Method and system to analyze inlined functions

Publication number: 20030115581

Abstract: A method and system for examining an inlined function using a performance analysis tool. In one method embodiment, the present invention identifies an inlined function. Upon identification of the inlined function, the present embodiment uses a performance analysis tool to perform a desired task on the inlined function.

Type: Application

Filed: December 13, 2001

Publication date: June 19, 2003

Inventors: Brian Fahs, Robert Hundt, Vinodha Ramasamy, Tara Krishnaswamy

prev 1 2 3 4