Patents by Inventor Jerome F. Duluk, Jr.
Jerome F. Duluk, Jr. has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9754561Abstract: One embodiment of the present invention includes a memory management unit (MMU) that is configured to manage sparse mappings. The MMU processes requests to translate virtual addresses to physical addresses based on page table entries (PTEs) that indicate a sparse status. If the MMU determines that the PTE does not include a mapping from a virtual address to a physical address, then the MMU responds to the request based on the sparse status. If the sparse status is active, then the MMU determines the physical address based on whether the type of the request is a write operation and, subsequently, generates an acknowledgement of the request. By contrast, if the sparse status is not active, then the MMU generates a page fault. Advantageously, the disclosed embodiments enable the computer system to manage sparse mappings without incurring the performance degradation associated with both page faults and conventional software-based sparse mapping management.Type: GrantFiled: October 4, 2013Date of Patent: September 5, 2017Assignee: NVIDIA CORPORATIONInventors: Jonathan Dunaisky, Henry Packard Moreton, Jeffrey A. Bolz, Yury Y. Uralsky, James Leroy Deming, Rui M. Bastos, Patrick R. Brown, Amanpreet Grewal, Christian Amsinck, Poornachandra Rao, Jerome F. Duluk, Jr., Andrew J. Tao
-
Publication number: 20170249254Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.Type: ApplicationFiled: October 16, 2013Publication date: August 31, 2017Applicant: NVIDIA CorporationInventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, Sherry CHEUNG, James Leroy DEMING, Samuel H. DUNCAN, Lucien DUNNING, Robert GEORGE, Arvind GOPALAKRISHNAN, Mark HAIRGROVE, Chenghuan JIA, John MASHEY
-
Patent number: 9734545Abstract: One embodiment of the present invention sets forth a technique for executing a software method within a graphics processing unit (GPU) that minimizes the number of clock cycles during which the graphics engine is idled. The function of the software method is performed by a firmware method that is executed by a processor within the GPU. The firmware method is executed to access and optionally update the state stored in the GPU. Unlike execution of a conventional software method, execution of the firmware method does not require an exchange of information between a CPU and the GPU. Therefore, the CPU is not interrupted and throughput of the CPU is not reduced.Type: GrantFiled: October 7, 2010Date of Patent: August 15, 2017Assignee: NVIDIA CorporationInventors: Jerome F. Duluk, Jr., John Christopher Cook, Fred Gruner, Gregory Scott Palmer
-
Patent number: 9715413Abstract: One embodiment of the present invention sets forth a technique for selecting a first processor included in a plurality of processors to receive work related to a compute task. The technique involves analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, receiving, from each of the one or more processors identified as eligible, an availability value that indicates the capacity of the processor to receive new work, selecting a first processor to receive work related to the one compute task based on the availability values received from the one or more processors, and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task.Type: GrantFiled: January 18, 2012Date of Patent: July 25, 2017Assignee: NVIDIA CorporationInventors: Karim M. Abdalla, Lacky V. Shah, Jerome F. Duluk, Jr., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
-
Patent number: 9710306Abstract: Systems and methods for auto-throttling encapsulated compute tasks. A device driver may configure a parallel processor to execute compute tasks in a number of discrete throttled modes. The device driver may also allocate memory to a plurality of different processing units in a non-throttled mode. The device driver may also allocate memory to a subset of the plurality of processing units in each of the throttling modes. Data structures defined for each task include a flag that instructs the processing unit whether the task may be executed in the non-throttled mode or in the throttled mode. A work distribution unit monitors each of the tasks scheduled to run on the plurality of processing units and determines whether the processor should be configured to run in the throttled mode or in the non-throttled mode.Type: GrantFiled: April 9, 2012Date of Patent: July 18, 2017Assignee: NVIDIA CorporationInventors: Jerome F. Duluk, Jr., Jesse David Hall, Philip Alexander Cuadra, Karim M. Abdalla
-
Publication number: 20170199689Abstract: One embodiment of the present invention is a memory subsystem that includes a sliding window tracker that tracks memory accesses associated with a sliding window of memory page groups. When the sliding window tracker detects an access operation associated with a memory page group within the sliding window, the sliding window tracker sets a reference bit that is associated with the memory page group and is included in a reference vector that represents accesses to the memory page groups within the sliding window. Based on the values of the reference bits, the sliding window tracker causes the selection a memory page in a memory page group that has fallen into disuse from a first memory to a second memory. Because the sliding window tracker tunes the memory pages that are resident in the first memory to reflect memory access patterns, the overall performance of the memory subsystem is improved.Type: ApplicationFiled: May 31, 2016Publication date: July 13, 2017Inventors: John MASHEY, Cameron BUSCHARDT, James Leroy DEMING, Jerome F. DULUK, JR., Brian FAHS
-
Publication number: 20170185526Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.Type: ApplicationFiled: October 16, 2013Publication date: June 29, 2017Applicant: NVIDIA CORPORATIONInventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, Sherry CHEUNG, James Leroy DEMING, Samuel H. DUNCAN, Lucien DUNNING, Robert GEORGE, Arvind GOPALAKRISHNAN, Mark HAIRGROVE, Chenghuan JIA, John MASHEY
-
Patent number: 9679350Abstract: One embodiment sets forth a method for associating each stencil value included in a stencil buffer with multiple fragments. Components within a graphics processing pipeline use a set of stencil masks to partition the bits of each stencil value. Each stencil mask selects a different subset of bits, and each fragment is strategically associated with both a stencil value and a stencil mask. Before performing stencil actions associated with a fragment, the raster operations unit performs stencil mask operations on the operands. No fragments are associated with both the same stencil mask and the same stencil value. Consequently, no fragments are associated with the same stencil bits included in the stencil buffer. Advantageously, by reducing the number of stencil bits associated with each fragment, certain classes of software applications may reduce the wasted memory associated with stencil buffers in which each stencil value is associated with a single fragment.Type: GrantFiled: August 3, 2015Date of Patent: June 13, 2017Assignee: NVDIA CorporationInventors: Eric B. Lum, Jerome F. Duluk, Jr.
-
Publication number: 20170161206Abstract: One embodiment of the present invention is a parallel processing unit (PPU) that includes one or more streaming multiprocessors (SMs) and implements a replay unit per SM. Upon detecting a page fault associated with a memory transaction issued by a particular SM, the corresponding replay unit causes the SM, but not any unaffected SMs, to cease issuing new memory transactions. The replay unit then stores the faulting memory transaction and any faulting in-flight memory transaction in a replay buffer. As page faults are resolved, the replay unit replays the memory transactions in the replay buffer—removing successful memory transactions from the replay buffer—until all of the stored memory transactions have successfully executed. Advantageously, the overall performance of the PPU is improved compared to conventional PPUs that, upon detecting a page fault, stop performing memory transactions across all SMs included in the PPU until the fault is resolved.Type: ApplicationFiled: February 20, 2017Publication date: June 8, 2017Inventors: James Leroy DEMING, Jerome F. DULUK, JR., John MASHEY, Mark HAIRGROVE, Lucien DUNNING, Jonathon Stuart Ramsey EVANS, Samuel H. DUNCAN, Cameron BUSCHARDT, Brian FAHS
-
Patent number: 9639474Abstract: Techniques are provided by which memory pages may be migrated among PPU memories in a multi-PPU system. According to the techniques, a UVM driver determines that a particular memory page should change ownership state and/or be migrated between one PPU memory and another PPU memory. In response to this determination, the UVM driver initiates a peer transition sequence to cause the ownership state and/or location of the memory page to change. Various peer transition sequences involve modifying mappings for one or more PPU, and copying a memory page from one PPU memory to another PPU memory. Several steps in peer transition sequences may be performed in parallel for increased processing speed.Type: GrantFiled: December 19, 2013Date of Patent: May 2, 2017Assignee: NVIDIA CorporationInventors: Jerome F. Duluk, Jr., John Mashey, Mark Hairgrove, Chenghuan Jia, Cameron Buschardt, Lucien Dunning, Brian Fahs
-
Patent number: 9639367Abstract: One embodiment of the present invention sets forth a graphics processing system configured to track event counts in a tile-based architecture. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline includes a first unit, a count memory associated with the first unit, and an accumulating memory associated with the first unit. The first unit is configured to detect an event type and increment the count memory. The tiling unit is configured to cause the screen-space pipeline to update an external memory address to reflect a first value stored in the count memory when the first unit completes processing of a first set of primitives. The tiling unit is also configured to cause the screen-space pipeline to update the accumulating memory to reflect a second value stored in the count memory when the first unit completes processing of a second set of primitives.Type: GrantFiled: October 4, 2013Date of Patent: May 2, 2017Assignee: NVIDIA CorporationInventors: Ziyad S. Hakura, Jerome F. Duluk, Jr.
-
Patent number: 9612839Abstract: A graphics processing pipeline configured for z-cull operations. The graphics processing pipeline comprising a screen-space pipeline and a tiling unit. The screen-space pipeline includes a z-cull unit configured to perform z-culling operations. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to transmit the first set of primitives to the screen-space pipeline for processing. The tiling unit is further configured to select between processing the first set of primitives in a full-surface z-cull mode or processing the first set of primitives in a partial-surface z-cull mode. The tiling unit is also configured to cause the z-cull unit to process the first set of primitives in the full-surface z-cull mode or to process the first set of primitives in the partial-surface z-cull mode.Type: GrantFiled: October 23, 2013Date of Patent: April 4, 2017Assignee: NVIDIA CorporationInventors: Ziyad S. Hakura, Jerome F. Duluk, Jr.
-
Patent number: 9588903Abstract: One embodiment of the present invention includes a microcontroller coupled to a memory management unit (MMU). The MMU is coupled to a page table included in a physical memory, and the microcontroller is configured to perform one or more virtual memory operations associated with the physical memory and the page table. In operation, the microcontroller receives a page fault generated by the MMU in response to an invalid memory access via a virtual memory address. To remedy such a page fault, the microcontroller performs actions to map the virtual memory address to an appropriate location in the physical memory. By contrast, in prior-art systems, a fault handler would typically remedy the page fault. Advantageously, because the microcontroller executes these tasks locally with respect to the MMU and the physical memory, latency associated with remedying page faults may be decreased. Consequently, overall system performance may be increased.Type: GrantFiled: August 27, 2013Date of Patent: March 7, 2017Assignee: NVIDIA CorporationInventors: Cameron Buschardt, Jerome F. Duluk, Jr., John Mashey, Mark Hairgrove, James Leroy Deming, Brian Fahs
-
Patent number: 9589310Abstract: One embodiment of the present invention sets forth a technique for splitting a set of vertices into a plurality of batches for processing. The method includes receiving one or more primitives each containing an associated set of vertices. For each of the one or more primitives, one or more vertices are gathered from the set of vertices, the vertices are arranged into one or more batches, the batch is routed to a processing pipeline line to process each batch as a separate primitive, and the one or more batches are processed to produce results identical to those of processing the entire primitive as a single entity.Type: GrantFiled: October 5, 2010Date of Patent: March 7, 2017Assignee: NVIDIA CorporationInventors: Jerome F. Duluk, Jr., Thomas Roell, Patrick R. Brown
-
Patent number: 9575892Abstract: One embodiment of the present invention is a parallel processing unit (PPU) that includes one or more streaming multiprocessors (SMs) and implements a replay unit per SM. Upon detecting a page fault associated with a memory transaction issued by a particular SM, the corresponding replay unit causes the SM, but not any unaffected SMs, to cease issuing new memory transactions. The replay unit then stores the faulting memory transaction and any faulting in-flight memory transaction in a replay buffer. As page faults are resolved, the replay unit replays the memory transactions in the replay buffer—removing successful memory transactions from the replay buffer—until all of the stored memory transactions have successfully executed. Advantageously, the overall performance of the PPU is improved compared to conventional PPUs that, upon detecting a page fault, stop performing memory transactions across all SMs included in the PPU until the fault is resolved.Type: GrantFiled: December 17, 2013Date of Patent: February 21, 2017Assignee: NVIDIA CorporationInventors: James Leroy Deming, Jerome F. Duluk, Jr., John Mashey, Mark Hairgrove, Lucien Dunning, Jonathon Stuart Ramsey Evans, Samuel H. Duncan, Cameron Buschardt, Brian Fahs
-
Publication number: 20160357482Abstract: One embodiment of the present invention sets forth a computer-implemented method for migrating a memory page from a first memory to a second memory. The method includes determining a first page size supported by the first memory. The method also includes determining a second page size supported by the second memory. The method further includes determining a use history of the memory page based on an entry in a page state directory associated with the memory page. The method also includes migrating the memory page between the first memory and the second memory based on the first page size, the second page size, and the use history.Type: ApplicationFiled: August 22, 2016Publication date: December 8, 2016Inventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, James Leroy DEMING, Lucien DUNNING, Brian FAHS, Mark HAIRGROVE, Chenghuan JIA, John MASHEY, James M. VAN DYKE
-
Patent number: 9513975Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.Type: GrantFiled: May 2, 2012Date of Patent: December 6, 2016Assignee: NVIDIA CorporationInventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Jr., Christopher Lamb
-
Patent number: 9507638Abstract: One embodiment of the present invention sets forth a technique for managing the allocation and release of resources during multi-threaded program execution. Programmable reference counters are initialized to values that limit the amount of resources for allocation to tasks that share the same reference counter. Resource parameters are specified for each task to define the amount of resources allocated for consumption by each array of execution threads that is launched to execute the task. The resource parameters also specify the behavior of the array for acquiring and releasing resources. Finally, during execution of each thread in the array, an exit instruction may be configured to override the release of the resources that were allocated to the array. The resources may then be retained for use by a child task that is generated during execution of a thread.Type: GrantFiled: November 8, 2011Date of Patent: November 29, 2016Assignee: NVIDIA CorporationInventors: Philip Alexander Cuadra, Karim M. Abdalla, Jerome F. Duluk, Jr., Luke Durant, Gerald F. Luiz, Timothy John Purcell, Lacky V. Shah
-
Patent number: 9495721Abstract: Techniques for dispatching pixel information in a graphics processing pipeline. A fragment processing unit generates a pixel that includes multiple samples based on a first portion of a graphics primitive received by a first thread. The fragment processing unit calculates a first value for the first pixel, where the first value is calculated only once for the pixel. The fragment processing unit calculates a first set of values for the samples, where each value in the first set of values corresponds to a different sample and is calculated only once for the corresponding sample. The fragment processing unit combines the first value with each value in the first set of values to create a second set of values. The fragment processing unit creates one or more dispatch messages to store the second set of values in a set of output registers.Type: GrantFiled: December 21, 2012Date of Patent: November 15, 2016Assignee: NVIDIA CorporationInventors: Jerome F. Duluk, Jr., Rouslan Dimitrov, Eric Lum, Rui Bastos
-
Patent number: 9442759Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.Type: GrantFiled: December 9, 2011Date of Patent: September 13, 2016Assignee: NVIDIA CorporationInventors: Samuel H. Duncan, Lacky V. Shah, Sean J. Treichler, Daniel Elliot Wexler, Jerome F. Duluk, Jr., Philip Browning Johnson, Jonathon Stuart Ramsay Evans