Patents Assigned to NVidia
-
Patent number: 9798698Abstract: A system and method for preconditioning or smoothing (e.g., multi-color DILU preconditioning) for iterative solving of a system of equations. The method includes accessing a matrix comprising a plurality of coefficients of a system of equations and accessing coloring information corresponding to the matrix. The method further includes determining a diagonal matrix based on the matrix and the coloring information corresponding to the matrix. The determining of the diagonal matrix may be determined in parallel on a per color basis. The method may further include determining an updated solution to the system of equations where the updated solution is determined in parallel on a per color basis using the diagonal matrix.Type: GrantFiled: August 13, 2012Date of Patent: October 24, 2017Assignee: Nvidia CorporationInventors: Patrice Castonguay, Robert Strzodka
-
Patent number: 9798544Abstract: Systems and methods for scheduling instructions for execution on a multi-core processor reorder the execution of different threads to ensure that instructions specified as having localized memory access behavior are executed over one or more sequential clock cycles to benefit from memory access locality. At compile time, code sequences including memory access instructions that may be localized are delineated into separate batches. A scheduling unit ensures that multiple parallel threads are processed over one or more sequential scheduling cycles to execute the batched instructions. The scheduling unit waits to schedule execution of instructions that are not included in the particular batch until execution of the batched instructions is done so that memory access locality is maintained for the particular batch. In between the separate batches, instructions that are not included in a batch are scheduled so that threads executing non-batched instructions are also processed and not starved.Type: GrantFiled: December 10, 2012Date of Patent: October 24, 2017Assignee: NVIDIA CORPORATIONInventors: Olivier Giroux, Jack Hilaire Choquette, Xiaogang Qiu, Robert J. Stoll
-
Patent number: 9798569Abstract: A system for and method of retrieving values of captured local variables for a lambda function in Java. In one embodiment, the system includes: (1) a Java virtual machine and (2) a captured variable retriever that interacts with the Java virtual machine and configured to retrieve a signature of the lambda function from a classfile of a Java class containing the lambda function, compare the signature with a declaration of the lambda function to identify arguments corresponding to the captured local variables, modify the lambda function and cause the Java virtual machine to execute the modified lambda function.Type: GrantFiled: February 15, 2016Date of Patent: October 24, 2017Assignee: Nvidia CorporationInventors: Michael Lai, Vinod Grover, Sean Lee, Jaydeep Marathe
-
Patent number: 9798548Abstract: Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction.Type: GrantFiled: December 21, 2011Date of Patent: October 24, 2017Assignee: NVIDIA CorporationInventors: Jack Hilaire Choquette, Robert J. Stoll, Olivier Giroux
-
Patent number: 9798543Abstract: One embodiment of the present invention sets forth a technique for allocating register file entries included in a register file to a thread group. A request to allocate a number of register file entries to the thread group is received. A required number of mapping table entries included in a register file mapping table (RFMT) is determined based on the request, where each mapping table entry included in the RFMT is associated with a different plurality of register file entries included in the register file. The RFMT is parsed to locate an available mapping table entry in the RFMT for each of the required mapping table entries. For each available mapping table entry, a register file pointer is associated with an address that corresponds to a first register file entry in the plurality of register file entries associated with the available mapping table entry.Type: GrantFiled: September 3, 2010Date of Patent: October 24, 2017Assignee: NVIDIA CorporationInventors: Michael Fiyak, Ming Y. Siu
-
Patent number: 9800158Abstract: A system and method are provided for regulating a voltage level at a load. A current source generates a current and a voltage control mechanism provides a portion of the current to regulate the voltage level at the load. When the voltage level at the load is greater than a maximum voltage level, the current source is decoupled from the load and the current source is coupled to a current sink to reduce the voltage level at the load. An electric power conversion comprises the current source and the voltage control mechanism. A downstream controller is configured to control the voltage control mechanism to decouple the current source from the load and couple the current source to a current sink to reduce the voltage level at the load when the voltage level at the load is greater than a maximum voltage level.Type: GrantFiled: January 30, 2013Date of Patent: October 24, 2017Assignee: NVIDIA CorporationInventor: William J. Dally
-
Patent number: 9798487Abstract: One embodiment of the present invention sets forth a computer-implemented method for migrating a memory page from a first memory to a second memory. The method includes determining a first page size supported by the first memory. The method also includes determining a second page size supported by the second memory. The method further includes determining a use history of the memory page based on an entry in a page state directory associated with the memory page. The method also includes migrating the memory page between the first memory and the second memory based on the first page size, the second page size, and the use history.Type: GrantFiled: August 22, 2016Date of Patent: October 24, 2017Assignee: NVIDIA CorporationInventors: Jerome F. Duluk, Jr., Cameron Buschardt, James Leroy Deming, Lucien Dunning, Brian Fahs, Mark Hairgrove, Chenghuan Jia, John Mashey, James M. Van Dyke
-
Patent number: 9794593Abstract: A video decoder architecture for processing out-of-order macro-blocks of a video stream. A microcode engine receives compressed data representing macro-blocks of a frame of a video stream, wherein at least one macro-block is received out-of-order. The microcode engine is for buffering the compressed data and for ordering the macro-blocks of the frame in raster scan order. A digital video decoder receives the macro-blocks in raster scan order and is for decoding the macro-blocks.Type: GrantFiled: December 9, 2005Date of Patent: October 17, 2017Assignee: Nvidia CorporationInventors: Iole Moccagatta, Ignatius B. Tjandrasuwita, Harikrishna M. Reddy
-
Patent number: 9792122Abstract: One embodiment of the present invention includes a technique for processing graphics primitives in a tile-based architecture. The technique includes storing, in a buffer, a first plurality of graphics primitives and a first plurality of state bundles received from the world-space pipeline. The technique further includes determining, based on a first condition, that the first plurality of graphics primitives should be replayed from the buffer, and, in response, replaying the first plurality of graphics primitives against a first tile included in a first plurality of tiles. Replaying the first plurality of graphics primitives includes comparing each graphics primitive against the first tile to determine whether the graphics primitive intersects the first tile, determining that one or more graphics primitives intersects the first tile, and transmitting the one or more graphics primitives and one or more associated state bundles to a screen-space pipeline for processing.Type: GrantFiled: October 4, 2013Date of Patent: October 17, 2017Assignee: NVIDIA CORPORATIONInventors: Ziyad S. Hakura, Walter R. Steiner, Cynthia Ann Edgeworth Allison, Rouslan Dimitrov, Karim M. Abdalla, Dale L. Kirkland, Emmett M. Kilgariff
-
Patent number: 9792220Abstract: One embodiment of the present invention includes a microcontroller coupled to a memory management unit (MMU). The MMU is coupled to a page table included in a physical memory, and the microcontroller is configured to perform one or more virtual memory operations associated with the physical memory and the page table. In operation, the microcontroller receives a page fault generated by the MMU in response to an invalid memory access via a virtual memory address. To remedy such a page fault, the microcontroller performs actions to map the virtual memory address to an appropriate location in the physical memory. By contrast, in prior-art systems, a fault handler would typically remedy the page fault. Advantageously, because the microcontroller executes these tasks locally with respect to the MMU and the physical memory, latency associated with remedying page faults may be decreased. Consequently, overall system performance may be increased.Type: GrantFiled: August 27, 2013Date of Patent: October 17, 2017Assignee: NVIDIA CorporationInventors: Cameron Buschardt, Jerome F. Duluk, Jr., John Mashey, Mark Hairgrove, James Leroy Deming, Brian Fahs
-
Patent number: 9786255Abstract: A method, computer program product, and system for adjusting a dynamic refresh frequency of a display device are disclosed. The method includes the steps of obtaining a current frame duration associated with a first image, computing, based on the current frame duration, a repetition value for a second image, and repeating presentation of the second image on a display device based on the repetition value. The logic for implementing the method may be included in a graphics processing unit or within the display device itself.Type: GrantFiled: May 22, 2015Date of Patent: October 10, 2017Assignee: NVIDIA CorporationInventors: Tom Verbeure, Robert Jan Schutten, Gerrit A. Slavenburg, Thomas F. Fox
-
Patent number: 9787509Abstract: An apparatus including a receiver coupled to receive an input signal from a communication link and operable to employ decision feedback equalization to the input signal of the communication link and generate an edge sample signal. The apparatus also includes a timing recovery module coupled to the receiver and operable to receive the edge sample signal and use the edge sample signal to generate a data sampling phase signal, wherein the edge sample signal influences a settling point of the data sampling phase signal.Type: GrantFiled: January 5, 2016Date of Patent: October 10, 2017Assignee: Nvidia CorporationInventors: Lizhi Zhong, Vishnu Balan, Arif Al Amin, Sanjeev Maheshwari
-
Patent number: 9788347Abstract: Provided is a recursive method and apparatus for processing a signal for determining a plurality of frequency components of the signal, the signal being a chirp-like polyphase sequence. In one embodiment, the method includes: (1) determining a first frequency component of the plurality of frequency components, (2) determining a component factor by accessing a factor table, (3) determining the second frequency component using the determined first frequency component and the determined component factor.Type: GrantFiled: June 3, 2015Date of Patent: October 10, 2017Assignee: Nvidia CorporationInventors: Tarik Tabet, Godfrey Costa, Nallepilli Ramesh
-
Patent number: 9785729Abstract: A simulation engine is configured to generate a physical simulation of a chain of particles by implementing a physics-based algorithm. The simulation engine is configured to generate a predicted position for each particle and to then adjust the predicted position of each particle based on a set of constraints associated with the physics-based algorithm. The simulation engine may then generate a predicted velocity for a given particle based on the adjusted, predicted position of that particle and based on the adjusted, predicted position of an adjacent particle.Type: GrantFiled: December 14, 2012Date of Patent: October 10, 2017Assignee: NVIDIA CorporationInventors: Matthias Muller-Fischer, Nuttapong Chentanez, Tae-Yong Kim
-
Publication number: 20170286198Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.Type: ApplicationFiled: October 16, 2013Publication date: October 5, 2017Applicant: NVIDIA CORPORATIONInventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, Sherry CHEUNG, James Leroy DEMING, Samuel H. DUNCAN, Lucien DUNNING, Robert GEORGE, Arvind GOPALAKRISHNAN, Mark HAIRGROVE, Chenghuan JIA, John MASHEY
-
Patent number: 9779533Abstract: One embodiment of the present invention includes a method for processing graphics objects. The method includes receiving a first draw-call and a second draw-call. The method also includes dividing the first draw-call into a first set of sub-draw-calls and the second draw-call into a second set of sub-draw-calls. The method further includes identifying a first screen tile. The method also includes identifying a first group of sub-draw-calls included in the first set of sub-draw-calls that overlap the first screen tile and a second group of sub-draw-calls included in the second set of sub-draw-calls that overlap the second screen tile. The method further includes causing the first group of sub-draw-calls and the second group of sub-draw-calls to be processed together.Type: GrantFiled: January 27, 2014Date of Patent: October 3, 2017Assignee: NVIDIA CorporationInventors: Rouslan Dimitrov, Ziyad S. Hakura
-
Patent number: 9773460Abstract: A system, method, and computer program product are provided for combining low motion blur and variable refresh rate in a display. In one embodiment, a hold-type display is operated in a first mode of operation where the hold-type display is dynamically refreshed such that the hold type display handles updates to image frames at unpredictable times and where for each of the image frames a backlight of the hold-type display is activated for an entire duration of display of the image frame. Additionally, it is determined that at least one predefined condition has been met. Further, in response to the determination, the hold-type display is operated in a second mode of operation where the hold-type display is statically refreshed such that the hold-type display handles updates to image frames at regular intervals and where for each of the image frames the backlight of the hold-type display is flashed.Type: GrantFiled: October 18, 2013Date of Patent: September 26, 2017Assignee: NVIDIA CorporationInventors: Tom Verbeure, Gerrit A. Slavenburg, Thomas F. Fox, Robert Jan Schutten, Luis Mariano Lucas, Marcel Dominicus Janssens
-
Patent number: 9773341Abstract: One embodiment of the present invention includes techniques for rasterizing geometries. First, a processing unit defines a bounding primitive that covers the geometry and does not include any internal edges. If the bounding primitive intersects any enabled clip plane, then the processing unit generates fragments to fill a current viewport. Alternatively, the processing unit generates fragments to fill the bounding primitive. Because the rasterized region includes no internal edges, conflation artifacts caused when the number of coverage samples per pixel exceeds the number of color samples per pixel may be reduced. In prior-art techniques, reducing such conflation artifacts typically involves increasing the number of color samples per pixel to equal the number of coverage samples per pixel.Type: GrantFiled: August 20, 2013Date of Patent: September 26, 2017Assignee: NVIDIA CorporationInventors: Jeffrey A. Bolz, Mark J. Kilgard
-
Patent number: 9772827Abstract: One embodiment sets forth a method for efficiently determining memory resource dependencies between instructions included in a software application. For each instruction, a dependency analyzer uses overlapping search techniques to identify one or more overlaps between the memory elements included in the current instruction and the memory elements included in previous instructions. The dependency analyzer then maps objects included in the instructions to a set of partition elements wherein each partition element represents a set of memory elements that are functionally equivalent for dependency analysis. Subsequently, the dependency analyzer uses the set of partition elements to determine memory dependencies between the instructions at the memory element level.Type: GrantFiled: April 22, 2013Date of Patent: September 26, 2017Assignee: NVIDIA CorporationInventor: Julius Vanderspek
-
Patent number: 9773344Abstract: A method for graphics processor clock scaling comprises the following steps. A percentage of idle-time is calculated, based upon an elapsed idle-time and an elapsed active time. A graphics processor clock rate is reduced if the percentage of idle time is higher than a high limit threshold. The graphics processor clock rate is increased if the percentage of idle time is lower than a low limit threshold.Type: GrantFiled: December 12, 2012Date of Patent: September 26, 2017Assignee: Nvidia CorporationInventors: Ilan Aelion, Terje Bergstrom, Matthew R. Longnecker