Patents by Inventor Brian Fahs

Brian Fahs has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9348762
    Abstract: A tag unit configured to manage a cache unit includes a coalescer that implements a set hashing function. The set hashing function maps a virtual address to a particular content-addressable memory unit (CAM). The coalescer implements the set hashing function by splitting the virtual address into upper, middle, and lower portions. The upper portion is further divided into even-indexed bits and odd-indexed bits. The even-indexed bits are reduced to a single bit using a XOR tree, and the odd-indexed are reduced in like fashion. Those single bits are combined with the middle portion of the virtual address to provide a CAM number that identifies a particular CAM. The identified CAM is queried to determine the presence of a tag portion of the virtual address, indicating a cache hit or cache miss.
    Type: Grant
    Filed: December 19, 2012
    Date of Patent: May 24, 2016
    Assignee: NVIDIA Corporation
    Inventors: Brian Fahs, Eric T. Anderson, Nick Barrow-Williams, Shirish Gadre, Joel James McCormack, Bryon S. Nordquist, Nirmal Raj Saxena, Lacky V. Shah
  • Publication number: 20150089207
    Abstract: A parallel counter accesses data generated by an application and stored within a register. The register includes different segments that include different portions of the application data. The parallel counter is configured to count the number of values within each segment that have a particular characteristic in a parallel fashion. The parallel counter may then return the individual segment counts to the application, or combine those segment counts and return a register count to the application. Advantageously, applications that rely on population count operations may be accelerated. Further, increasing the number of segments in a given register may reduce the time needed to count the values in that register, thereby providing a scalable solution to population counting. Additionally, the architecture of the parallel counter is sufficiently flexible to allow both register counting and segment counting, thereby combining two separate functionalities into just one hardware unit.
    Type: Application
    Filed: September 20, 2013
    Publication date: March 26, 2015
    Applicant: NVIDIA CORPORATION
    Inventors: Robert OHANNESSIAN, Brian FAHS
  • Patent number: 8850436
    Abstract: One embodiment of the present invention sets forth a technique for performing a method for synchronizing divergent executing threads. The method includes receiving a plurality of instructions that includes at least one set-synchronization instruction and at least one instruction that includes a synchronization command, and determining an active mask that indicates which threads in a plurality of threads are active and which threads in the plurality of threads are disabled. For each instruction included in the plurality of instructions, the instruction is transmitted to each of the active threads included in the plurality of threads. If the instruction is a set-synchronization instruction, then a synchronization token, the active mask and the synchronization point is each pushed onto a stack.
    Type: Grant
    Filed: September 28, 2010
    Date of Patent: September 30, 2014
    Assignee: NVIDIA Corporation
    Inventors: Brian Fahs, Ming Y. Siu, Robert Steven Glanville
  • Publication number: 20140281264
    Abstract: Embodiments of the approaches disclosed herein include a subsystem that includes an access tracking mechanism configured to monitor access operations directed to a first memory and a second memory. The access tracking mechanism detects an access operation generated by a processor for accessing a first memory page residing on the second memory. The access tracking mechanism further determines that the first memory page is included in a first subset of memory pages residing on the second memory. The access tracking mechanism further locates, within a reference vector, a reference bit that corresponds to the first memory page, and sets the reference bit. One advantage of the present invention is that memory pages in a hybrid system migrate as needed to increase overall memory performance.
    Type: Application
    Filed: December 18, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, James Leroy DEMING, Brian FAHS
  • Publication number: 20140281356
    Abstract: One embodiment of the present invention includes a microcontroller coupled to a memory management unit (MMU). The MMU is coupled to a page table included in a physical memory, and the microcontroller is configured to perform one or more virtual memory operations associated with the physical memory and the page table. In operation, the microcontroller receives a page fault generated by the MMU in response to an invalid memory access via a virtual memory address. To remedy such a page fault, the microcontroller performs actions to map the virtual memory address to an appropriate location in the physical memory. By contrast, in prior-art systems, a fault handler would typically remedy the page fault. Advantageously, because the microcontroller executes these tasks locally with respect to the MMU and the physical memory, latency associated with remedying page faults may be decreased. Consequently, overall system performance may be increased.
    Type: Application
    Filed: August 27, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Cameron BUSCHARDT, Jerome F. DULUK, JR., John MASHEY, Mark HAIRGROVE, James Leroy DEMING, Brian FAHS
  • Publication number: 20140267334
    Abstract: One embodiment of the present invention includes techniques for a first processing unit to perform an atomic operation on a memory page shared with a second processing unit. The memory page is associated with a page table entry corresponding to the first processing unit. Before executing the atomic operation, an MMU included in the first processing unit evaluates an atomic permission bit that is included in the page table entry. If the MMU determines that the atomic permission bit is inactive, then the two processing units coordinate to change the permission status of the memory page. As part of the status change, the atomic permission bit in the page table entry is activated. Subsequently, the first processing unit performs the atomic operation uninterrupted by the second processing unit. Advantageously, coordinating the processing unit via the atomic permission bit ensures the proper and efficient execution of the atomic operation.
    Type: Application
    Filed: August 27, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Jerome F. DULUK, JR., John MASHEY, Mark HAIRGROVE, James Leroy DEMING, Cameron BUSCHARDT, Brian FAHS
  • Publication number: 20140281365
    Abstract: One embodiment of the present invention is a memory subsystem that includes a sliding window tracker that tracks memory accesses associated with a sliding window of memory page groups. When the sliding window tracker detects an access operation associated with a memory page group within the sliding window, the sliding window tracker sets a reference bit that is associated with the memory page group and is included in a reference vector that represents accesses to the memory page groups within the sliding window. Based on the values of the reference bits, the sliding window tracker causes the selection a memory page in a memory page group that has fallen into disuse from a first memory to a second memory. Because the sliding window tracker tunes the memory pages that are resident in the first memory to reflect memory access patterns, the overall performance of the memory subsystem is improved.
    Type: Application
    Filed: December 12, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: John MASHEY, Cameron BUSCHARDT, James Leroy DEMING, Jerome F. DULUK, JR., Brian FAHS
  • Publication number: 20140281299
    Abstract: Techniques are disclosed for transitioning a memory page between memories in a virtual memory subsystem. A unified virtual memory (UVM) driver detects a page fault in response to a memory access request associated with a first memory page, where a local page table does not include an entry corresponding to a virtual memory address included in the memory access request. The UVM driver, in response to the page fault, executes a page fault sequence. The page fault sequence includes modifying the ownership state associated with the first memory page to be central-processing-unit-shared. The page fault sequence further includes scheduling the first memory page for migration from a system memory associated with a central processing unit (CPU) to a local memory associated with a parallel processing unit (PPU). One advantage of the disclosed approach is that the PPU accesses memory pages with greater efficiency.
    Type: Application
    Filed: December 18, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, James Leroy DEMING, Lucien DUNNING, Brian FAHS, Mark HAIRGROVE, John MASHEY
  • Publication number: 20140281324
    Abstract: One embodiment of the present invention sets forth a computer-implemented method for migrating a memory page from a first memory to a second memory. The method includes determining a first page size supported by the first memory. The method also includes determining a second page size supported by the second memory. The method further includes determining a use history of the memory page based on an entry in a page state directory associated with the memory page. The method also includes migrating the memory page between the first memory and the second memory based on the first page size, the second page size, and the use history.
    Type: Application
    Filed: December 19, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, James Leroy DEMING, Lucien DUNNING, Brian FAHS, Mark HAIRGROVE, Chenghuan JIA, John MASHEY, James M. VAN DYKE
  • Publication number: 20140281297
    Abstract: Techniques are provided by which memory pages may be migrated among PPU memories in a multi-PPU system. According to the techniques, a UVM driver determines that a particular memory page should change ownership state and/or be migrated between one PPU memory and another PPU memory. In response to this determination, the UVM driver initiates a peer transition sequence to cause the ownership state and/or location of the memory page to change. Various peer transition sequences involve modifying mappings for one or more PPU, and copying a memory page from one PPU memory to another PPU memory. Several steps in peer transition sequences may be performed in parallel for increased processing speed.
    Type: Application
    Filed: December 19, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Jerome F. DULUK, JR., John MASHEY, Mark HAIRGROVE, Chenghuan JIA, Cameron BUSCHARDT, Lucien DUNNING, Brian FAHS
  • Publication number: 20140281263
    Abstract: One embodiment of the present invention is a parallel processing unit (PPU) that includes one or more streaming multiprocessors (SMs) and implements a replay unit per SM. Upon detecting a page fault associated with a memory transaction issued by a particular SM, the corresponding replay unit causes the SM, but not any unaffected SMs, to cease issuing new memory transactions. The replay unit then stores the faulting memory transaction and any faulting in-flight memory transaction in a replay buffer. As page faults are resolved, the replay unit replays the memory transactions in the replay buffer—removing successful memory transactions from the replay buffer—until all of the stored memory transactions have successfully executed. Advantageously, the overall performance of the PPU is improved compared to conventional PPUs that, upon detecting a page fault, stop performing memory transactions across all SMs included in the PPU until the fault is resolved.
    Type: Application
    Filed: December 17, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: James Leroy DEMING, Jerome F. DULUK, Jr., John MASHEY, Mark HAIRGROVE, Lucien DUNNING, Jonathon Stuart Ramsay EVANS, Samuel H. DUNCAN, Cameron BUSCHARDT, Brian FAHS
  • Publication number: 20140281110
    Abstract: Techniques are disclosed for tracking memory page accesses in a unified virtual memory system. An access tracking unit detects a memory page access generated by a first processor for accessing a memory page in a memory system of a second processor. The access tracking unit determines whether a cache memory includes an entry for the memory page. If so, then the access tracking unit increments an associated access counter. Otherwise, the access tracking unit attempts to find an unused entry in the cache memory that is available for allocation. If so, then the access tracking unit associates the second entry with the memory page, and sets an access counter associated with the second entry to an initial value. Otherwise, the access tracking unit selects a valid entry in the cache memory; clears an associated valid bit; associates the entry with the memory page; and initializes an associated access counter.
    Type: Application
    Filed: December 9, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Jerome F. DULUK, Jr., Cameron BUSCHARDT, James Leroy DEMING, Brian FAHS, Mark HAIRGROVE, John MASHEY
  • Publication number: 20140281364
    Abstract: One embodiment of the present invention includes a microcontroller coupled to a memory management unit (MMU). The MMU is coupled to a page table included in a physical memory, and the microcontroller is configured to perform one or more virtual memory operations associated with the physical memory and the page table. In operation, the microcontroller receives a page fault generated by the MMU in response to an invalid memory access via a virtual memory address. To remedy such a page fault, the microcontroller performs actions to map the virtual memory address to an appropriate location in the physical memory. By contrast, in prior-art systems, a fault handler would typically remedy the page fault. Advantageously, because the microcontroller executes these tasks locally with respect to the MMU and the physical memory, latency associated with remedying page faults may be decreased. Consequently, overall system performance may be increased.
    Type: Application
    Filed: August 27, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Cameron BUSCHARDT, Jerome F. DULUK, JR., John MASHEY, Mark HAIRGROVE, James Leroy DEMING, Brian FAHS
  • Publication number: 20140173193
    Abstract: A tag unit configured to manage a cache unit includes a coalescer that implements a set hashing function. The set hashing function maps a virtual address to a particular content-addressable memory unit (CAM). The coalescer implements the set hashing function by splitting the virtual address into upper, middle, and lower portions. The upper portion is further divided into even-indexed bits and odd-indexed bits. The even-indexed bits are reduced to a single bit using a XOR tree, and the odd-indexed are reduced in like fashion. Those single bits are combined with the middle portion of the virtual address to provide a CAM number that identifies a particular CAM. The identified CAM is queried to determine the presence of a tag portion of the virtual address, indicating a cache hit or cache miss.
    Type: Application
    Filed: December 19, 2012
    Publication date: June 19, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Brian Fahs, Eric T. ANDERSON, Nick Barrow-Williams, Shirish GADRE, Joel James MCCORMACK, Bryon S. NORDQUIST, Nirmal Raj Saxena, Lacky V. Shah
  • Publication number: 20140168245
    Abstract: A texture processing pipeline can be configured to service memory access requests that represent texture data access operations or generic data access operations. When the texture processing pipeline receives a memory access request that represents a texture data access operation, the texture processing pipeline may retrieve texture data based on texture coordinates. When the memory access request represents a generic data access operation, the texture pipeline extracts a virtual address from the memory access request and then retrieves data based on the virtual address. The texture processing pipeline is also configured to cache generic data retrieved on behalf of a group of threads and to then invalidate that generic data when the group of threads exits.
    Type: Application
    Filed: December 19, 2012
    Publication date: June 19, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Brian Fahs, Eric T. Anderson, Nick Barrow-Williams, Shirish Gadre, Joel James McCormack, Bryon S. Nordquist, Nirmal Raj Saxena, Lacky V. Shah
  • Publication number: 20140173258
    Abstract: A texture processing pipeline can be configured to service memory access requests that represent texture data access operations or generic data access operations. When the texture processing pipeline receives a memory access request that represents a texture data access operation, the texture processing pipeline may retrieve texture data based on texture coordinates. When the memory access request represents a generic data access operation, the texture pipeline extracts a virtual address from the memory access request and then retrieves data based on the virtual address. The texture processing pipeline is also configured to cache generic data retrieved on behalf of a group of threads and to then invalidate that generic data when the group of threads exits.
    Type: Application
    Filed: December 19, 2012
    Publication date: June 19, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Brian FAHS, Eric T. ANDERSON, Nick BARROW-WILLIAMS, Shirish GADRE, Joel James MCCORMACK, Bryon S. NORDQUIST, Nirmal Raj SAXENA, Lacky V. SHAH
  • Patent number: 8751771
    Abstract: One embodiment of the present invention sets forth a technique providing an optimized way to allocate and access memory across a plurality of thread/data lanes. Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays. The device driver computes an address within the memory using information about the number of thread/data lanes and parameters from the instruction itself. The result is a memory allocation and access approach where the device driver properly computes the target address in the memory. Advantageously, processing efficiency is improved where memory in a parallel processing subsystem is internally stored and accessed as an array of structures of arrays, proportional to the SIMT/SIMD group width (the number of threads or lanes per execution group).
    Type: Grant
    Filed: September 28, 2011
    Date of Patent: June 10, 2014
    Assignee: NVIDIA Corporation
    Inventors: Brian Fahs, Henry Packard Moreton, Brett W. Coon, Kathleen Elliott Nickolls
  • Publication number: 20140123146
    Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.
    Type: Application
    Filed: October 25, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, JR., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
  • Publication number: 20140122829
    Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.
    Type: Application
    Filed: October 25, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA Corporation
    Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, JR., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
  • Publication number: 20140123145
    Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.
    Type: Application
    Filed: October 25, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, Jr., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE