Patents by Inventor Brian Pharris

Brian Pharris has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11954518
    Abstract: Apparatuses, systems, and techniques to optimize processor resources at a user-defined level. In at least one embodiment, priority of one or more tasks are adjusted to prevent one or more other dependent tasks from entering an idle state due to lack of resources to consume.
    Type: Grant
    Filed: December 20, 2019
    Date of Patent: April 9, 2024
    Assignee: Nvidia Corporation
    Inventors: Jonathon Evans, Lacky Shah, Phil Johnson, Jonah Alben, Brian Pharris, Greg Palmer, Brian Fahs
  • Publication number: 20230289215
    Abstract: A new level(s) of hierarchy—Cooperate Group Arrays (CGAs)—and an associated new hardware-based work distribution/execution model is described. A CGA is a grid of thread blocks (also referred to as cooperative thread arrays (CTAs)). CGAs provide co-scheduling, e.g., control over where CTAs are placed/executed in a processor (such as a GPU), relative to the memory required by an application and relative to each other. Hardware support for such CGAs guarantees concurrency and enables applications to see more data locality, reduced latency, and better synchronization between all the threads in tightly cooperating collections of CTAs programmably distributed across different (e.g., hierarchical) hardware domains or partitions.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Greg PALMER, Gentaro HIROTA, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS, Rajballav DASH, Jeff TUCKEY, Jerome F. DULUK, JR., Lacky SHAH, Luke DURANT, Jack CHOQUETTE, Eric WERNESS, Naman GOVIL, Manan PATEL, Shayani DEB, Sandeep NAVADA, John EDMONDSON, Prakash BANGALORE PRABHAKAR, Wish GANDHI, Ravi MANYAM, Apoorv PARLE, Olivier GIROUX, Shirish GADRE, Steve HEINRICH
  • Publication number: 20230289189
    Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Prakash BANGALORE PRABHAKAR, Gentaro HIROTA, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS, Rajballav DASH, Jeff TUCKEY, Jerome F. DULUK, JR., Lacky SHAH, Luke DURANT, Jack CHOQUETTE, Eric WERNESS, Naman GOVIL, Manan PATEL, Shayani DEB, Sandeep NAVADA, John EDMONDSON, Greg PALMER, Wish GANDHI, Ravi MANYAM, Apoorv PARLE, Olivier GIROUX, Shirish GADRE, Steve HEINRICH
  • Publication number: 20230289211
    Abstract: A processor supports new thread group hierarchies by centralizing work distribution to provide hardware-guaranteed concurrent execution of thread groups in a thread group array through speculative launch and load balancing across processing cores. Efficiencies are realized by distributing grid rasterization among the processing cores.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Gentaro HIROTA, Tanmoy MANDAL, Jeff TUCKEY, Kevin STEPHANO, Chen MEI, Shayani DEB, Naman GOVIL, Rajballav DASH, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS
  • Patent number: 11307903
    Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.
    Type: Grant
    Filed: January 31, 2018
    Date of Patent: April 19, 2022
    Assignee: NVIDIA Corporation
    Inventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
  • Patent number: 11182207
    Abstract: Techniques are disclosed for reducing the latency between the completion of a producer task and the launch of a consumer task dependent on the producer task. Such latency exists when the information needed to launch the consumer task is unavailable when the producer task completes. Thus, various techniques are disclosed, where a task management unit initiates the retrieval of the information needed to launch the consumer task from memory in parallel with the producer task being launched. Because the retrieval of such information is initiated in parallel with the launch of the producer task, the information is often available when the producer task completes, thus allowing for the consumer task to be launched without delay. The disclosed techniques, therefore, enable the latency between completing the producer task and launching the consumer task to be reduced.
    Type: Grant
    Filed: June 24, 2019
    Date of Patent: November 23, 2021
    Assignee: NVIDIA CORPORATION
    Inventors: Gentaro Hirota, Brian Pharris, Jeff Tuckey, Robert Overman, Stephen Jones
  • Publication number: 20210191754
    Abstract: Apparatuses, systems, and techniques to optimize processor resources at a user-defined level. In at least one embodiment, priority of one or more tasks are adjusted to prevent one or more other dependent tasks from entering an idle state due to lack of resources to consume.
    Type: Application
    Filed: December 20, 2019
    Publication date: June 24, 2021
    Inventors: Jonathon Evans, Lacky Shah, Phil Johnson, Jonah Alben, Brian Pharris, Greg Palmer, Brian Fahs
  • Publication number: 20200401444
    Abstract: Techniques are disclosed for reducing the latency between the completion of a producer task and the launch of a consumer task dependent on the producer task. Such latency exists when the information needed to launch the consumer task is unavailable when the producer task completes. Thus, various techniques are disclosed, where a task management unit initiates the retrieval of the information needed to launch the consumer task from memory in parallel with the producer task being launched. Because the retrieval of such information is initiated in parallel with the launch of the producer task, the information is often available when the producer task completes, thus allowing for the consumer task to be launched without delay. The disclosed techniques, therefore, enable the latency between completing the producer task and launching the consumer task to be reduced.
    Type: Application
    Filed: June 24, 2019
    Publication date: December 24, 2020
    Inventors: Gentaro HIROTA, Brian PHARRIS, Jeff TUCKEY, Robert OVERMAN, Stephen JONES
  • Patent number: 10817338
    Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.
    Type: Grant
    Filed: January 31, 2018
    Date of Patent: October 27, 2020
    Assignee: NVIDIA Corporation
    Inventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
  • Publication number: 20190235928
    Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.
    Type: Application
    Filed: January 31, 2018
    Publication date: August 1, 2019
    Inventors: Jerome F. Duluk, JR., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
  • Publication number: 20190235924
    Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.
    Type: Application
    Filed: January 31, 2018
    Publication date: August 1, 2019
    Inventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
  • Patent number: 8984183
    Abstract: One embodiment of the present invention sets forth a technique for enabling the insertion of generated tasks into a scheduling pipeline of a multiple processor system allows a compute task that is being executed to dynamically generate a dynamic task and notify a scheduling unit of the multiple processor system without intervention by a CPU. A reflected notification signal is generated in response to a write request when data for the dynamic task is written to a queue. Additional reflected notification signals are generated for other events that occur during execution of a compute task, e.g., to invalidate cache entries storing data for the compute task and to enable scheduling of another compute task.
    Type: Grant
    Filed: December 16, 2011
    Date of Patent: March 17, 2015
    Assignee: Nvidia Corporation
    Inventors: Timothy John Purcell, Lacky V. Shah, Jerome F. Duluk, Jr., Sean J. Treichler, Karim M. Abdalla, Philip Alexander Cuadra, Brian Pharris
  • Publication number: 20130160021
    Abstract: One embodiment of the present invention sets forth a technique for enabling the insertion of generated tasks into a scheduling pipeline of a multiple processor system allows a compute task that is being executed to dynamically generate a dynamic task and notify a scheduling unit of the multiple processor system without intervention by a CPU. A reflected notification signal is generated in response to a write request when data for the dynamic task is written to a queue. Additional reflected notification signals are generated for other events that occur during execution of a compute task, e.g., to invalidate cache entries storing data for the compute task and to enable scheduling of another compute task.
    Type: Application
    Filed: December 16, 2011
    Publication date: June 20, 2013
    Inventors: Timothy John PURCELL, Lacky V. Shah, Jerome F. Duluk, JR., Sean J. Treichler, Karim M. Abdalla, Philip Alexander Cuadra, Brian Pharris
  • Publication number: 20100257329
    Abstract: An application programming interface is disclosed for loading and storing multidimensional arrays of data between a data parallel processing unit and an external memory. Physical addresses reference the external memory and define two-dimensional arrays of data storage locations corresponding to data records. The data parallel processing unit has multiple processing lanes to parallel process data records residing in respective register files. The interface comprises an X-dimension function call parameter to define an X-dimension in the memory array corresponding to a record for one lane and a Y-dimension function call parameter to define a Y-dimension in the memory array corresponding to the record for one lane. The X-dimension and Y-dimension function call parameters cooperate to generate memory accesses corresponding to the records.
    Type: Application
    Filed: August 6, 2009
    Publication date: October 7, 2010
    Inventors: Brucek Khailany, Nuwan Jayasena, Brian Pharris, Timothy Southgate