Patents by Inventor Brian Fahs

Brian Fahs has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20140019724
    Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.
    Type: Application
    Filed: September 12, 2013
    Publication date: January 16, 2014
    Applicant: NVIDIA Corporation
    Inventors: Brian FAHS, Ming Y. SIU, Brett W. COON, John R. NICKOLLS, Lars NYLAND
  • Patent number: 8539204
    Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: September 17, 2013
    Assignee: Nvidia Corporation
    Inventors: Brian Fahs, Ming Y. Siu, Brett W. Coon, John R. Nickolls, Lars Nyland
  • Publication number: 20120239909
    Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.
    Type: Application
    Filed: May 31, 2012
    Publication date: September 20, 2012
    Inventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke
  • Patent number: 8214625
    Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.
    Type: Grant
    Filed: November 26, 2008
    Date of Patent: July 3, 2012
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke
  • Patent number: 8200947
    Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.
    Type: Grant
    Filed: March 24, 2008
    Date of Patent: June 12, 2012
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke
  • Publication number: 20120089792
    Abstract: One embodiment of the present invention sets forth a technique providing an optimized way to allocate and access memory across a plurality of thread/data lanes. Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays. The device driver computes an address within the memory using information about the number of thread/data lanes and parameters from the instruction itself. The result is a memory allocation and access approach where the device driver properly computes the target address in the memory. Advantageously, processing efficiency is improved where memory in a parallel processing subsystem is internally stored and accessed as an array of structures of arrays, proportional to the SIMT/SIMD group width (the number of threads or lanes per execution group).
    Type: Application
    Filed: September 28, 2011
    Publication date: April 12, 2012
    Inventors: Brian FAHS, John R. Nickolls, Kathleen Elliott Nickolls, Henry Packard Moreton, Brett W. Coon
  • Publication number: 20110074802
    Abstract: One embodiment of the present invention sets forth a technique for a program to access multi-dimensional formatted graphics surface memory. Multi-dimensional memory objects called “surfaces” stored in a user-specified data or pixel format and arranged in a graphics optimized layout are accessed by programs using surface instructions. A set of memory access instructions e.g., load, store, reduce, and atomic, referred to as surface instructions, may be used to access the surfaces. Coordinate bounds checking is performed with configurable clamping. Caching behavior may also be specified by the surface instructions. Data format conversion and packing to a specified storage format is supported for store, reduction, and atomic surface instructions. Data format conversion and unpacking from a specified storage format is supported for loads and atomic surface instructions.
    Type: Application
    Filed: September 24, 2010
    Publication date: March 31, 2011
    Inventors: John R. Nickolls, Brian Fahs, Lars Nyland, John Erik Lindholm, Richard Craig Johnson
  • Publication number: 20110078690
    Abstract: One embodiment of the present invention sets forth a technique for performing a method for synchronizing divergent executing threads. The method includes receiving a plurality of instructions that includes at least one set-synchronization instruction and at least one instruction that includes a synchronization command, and determining an active mask that indicates which threads in a plurality of threads are active and which threads in the plurality of threads are disabled. For each instruction included in the plurality of instructions, the instruction is transmitted to each of the active threads included in the plurality of threads. If the instruction is a set-synchronization instruction, then a synchronization token, the active mask and the synchronization point is each pushed onto a stack.
    Type: Application
    Filed: September 28, 2010
    Publication date: March 31, 2011
    Inventors: Brian Fahs, Ming Y. Siu, Robert Steven Glanville
  • Publication number: 20110078417
    Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.
    Type: Application
    Filed: September 24, 2010
    Publication date: March 31, 2011
    Inventors: Brian FAHS, Ming Y. Siu, Brett W. Coon, John R. Nickolls, Lars Nyland
  • Publication number: 20110078381
    Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.
    Type: Application
    Filed: September 24, 2010
    Publication date: March 31, 2011
    Inventors: Steven James HEINRICH, Alexander L. Minkin, Brett W. Coon, Rajeshwaran Selvanesan, Robert Steven Glanville, Charles McCarver, Anjana Rajendran, Stewart Glenn Carlton, John R. Nickolls, Brian Fahs
  • Patent number: 7360207
    Abstract: A method and a system for examining an inlined function using a performance analysis tool are described. An inlined function is identified in computer code. Upon identification of the inlined function, and for example in response to executing a breakpoint associated with the inlined function, a performance analysis tool is used to perform desired task on the inlined function.
    Type: Grant
    Filed: December 13, 2001
    Date of Patent: April 15, 2008
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Brian Fahs, Robert Hundt, Vinodha Ramasamy, Tara Krishnaswamy
  • Patent number: 7103878
    Abstract: A method and system for analyzing a virtual function. In one embodiment, the present invention determines whether a virtual table exists for a virtual function, and determines a call type for the virtual function. In the present embodiment, provided the virtual table is located, the present invention replaces an existing address for the virtual function with a new address such that the new address points to instrumentation code. In this embodiment, upon a call to the virtual function, the present invention loads the new address from the virtual table such that execution is directed to the instrumentation code. The present embodiment continues execution and executes the instrumentation code such that control is delivered to the instrumentor.
    Type: Grant
    Filed: December 13, 2001
    Date of Patent: September 5, 2006
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Brian Fahs, Robert Hundt, Tara Krishnaswamy
  • Publication number: 20050251629
    Abstract: A method and structure for equipping a cache with information to enable the processor to track and report whether a given speculative access causes prefetches and/or pollutions of the cache. Two types of events are tracked in one of two different ways: first by counting/tracking prefetch operations, either globally or on a per instruction address basis and then by counting/tracking cache pollutions, either globally or on a per instruction address basis.
    Type: Application
    Filed: May 5, 2004
    Publication date: November 10, 2005
    Inventors: Brian Fahs, Sreekumar Nair, Santosh Abraham
  • Publication number: 20030115584
    Abstract: A method and system for analyzing a virtual function. In one embodiment, the present invention determines whether a virtual table exists for a virtual function, and determines a call type for the virtual function. In the present embodiment, provided the virtual table is located, the present invention replaces an existing address for the virtual function with a new address such that the new address points to instrumentation code. In this embodiment, upon a call to the virtual function, the present invention loads the new address from the virtual table such that execution is directed to the instrumentation code. The present embodiment continues execution and executes the instrumentation code such that control is delivered to the instrumentor.
    Type: Application
    Filed: December 13, 2001
    Publication date: June 19, 2003
    Inventors: Brian Fahs, Robert Hundt, Tara Krishnaswamy
  • Publication number: 20030115581
    Abstract: A method and system for examining an inlined function using a performance analysis tool. In one method embodiment, the present invention identifies an inlined function. Upon identification of the inlined function, the present embodiment uses a performance analysis tool to perform a desired task on the inlined function.
    Type: Application
    Filed: December 13, 2001
    Publication date: June 19, 2003
    Inventors: Brian Fahs, Robert Hundt, Vinodha Ramasamy, Tara Krishnaswamy