Patents Assigned to NVidia
-
Patent number: 9727521Abstract: Techniques are disclosed for peer-to-peer data transfers where a source device receives a request to read data words from a target device. The source device creates a first and second read command for reading a first portion and a second portion of a plurality of data words from the target device, respectively. The source device transmits the first read command to the target device, and, before a first read operation associated with the first read command is complete, transmits the second read command to the target device. The first and second portions of the plurality of data words are stored in a first and second portion a buffer memory, respectively. Advantageously, an arbitrary number of multiple read operations may be in progress at a given time without using multiple peer-to-peer memory buffers. Performance for large data block transfers is improved without consuming peer-to-peer memory buffers needed by other peer GPUs.Type: GrantFiled: September 14, 2012Date of Patent: August 8, 2017Assignee: NVIDIA CorporationInventors: Dennis K. Ma, Karan Gupta, Lei Tian, Franck R. Diard, Praveen Jain, Wei-Je Huang, Atul Kalambur
-
Patent number: 9727339Abstract: Embodiments of the present invention are operable to communicate a list of important shaders and their current best-known compilations to remote client devices over a communications network. Client devices are allowed to produce modified shader compilations by varying optimizations. If a client device produces a modified compilation that beats an important shader's current best-known compilation, embodiments of the present invention can communicate this new best-known shader compilation back to a host computer system. Furthermore, embodiments of the present invention may periodically broadcast the new best-known shader compilation back to client devices for possible further optimization or for efficient rendering operations using the best-known shader compilation.Type: GrantFiled: July 18, 2013Date of Patent: August 8, 2017Assignee: Nvidia CorporationInventor: Jeremy Zelsnack
-
Patent number: 9727338Abstract: A system and method of translating functions of a program. In one embodiment, the system includes: (1) a local-scope variable identifier operable to identify local-scope variables employed in the at least some of the functions as being either thread-shared local-scope variables or thread-private local-scope variables and (2) a function translator associated with the local-scope variable identifier and operable to translate the at least some of the functions to cause thread-shared memory to be employed to store the thread-shared local-scope variables and thread-private memory to be employed to store the thread-private local-scope variables.Type: GrantFiled: December 21, 2012Date of Patent: August 8, 2017Assignee: Nvidia CorporationInventors: Yuan Lin, Gautam Chakrabarti, Jaydeep Marathe, Okwan Kwon, Amit Sabne
-
Patent number: 9727392Abstract: Techniques for passing dependencies in an application programming interface API includes identifying a plurality of passes of execution commands. For each set of passes, wherein one pass is a destination pass and the other pass is a source pass to each other, one or more dependencies, of one or more dependency types, are determined between the execution commands of the destination pass and the source pass. Pass objects are then created for each identified pass, wherein each pass object records the execution commands and dependencies between the corresponding destination and source passes.Type: GrantFiled: September 16, 2015Date of Patent: August 8, 2017Assignee: NVIDIA CORPORATIONInventor: Jeffrey Bolz
-
Patent number: 9721381Abstract: A system, method, and computer program product are provided for discarding pixel samples. The method includes the steps of completing shading operations for a pixel set including one or more pixels to generate per-sample shaded attributes according to a shader program executed by a processing pipeline. Discard information for the pixel set is evaluated and one or more per-sample shaded attributes for at least one pixel in the pixel set are discarded based on the evaluated discard information.Type: GrantFiled: October 11, 2013Date of Patent: August 1, 2017Assignee: NVIDIA CorporationInventors: Christian Jean Rouet, Manan Maheshkumar Patel, Shirish Gadre, Daniel Paul Wilde
-
Patent number: 9720842Abstract: A device driver calculates a tile size for a plurality of cache memories in a cache hierarchy. The device driver calculates a storage capacity of a first cache memory. The device driver calculates a first tile size based on the storage capacity of the first cache memory and one or more additional characteristics. The device driver calculates a storage capacity of a second cache memory. The device driver calculates a second tile size based on the storage capacity of the second cache memory and one or more additional characteristics, where the second tile size is different than the first tile size. The device driver transmits the second tile size to a second coalescing binning unit. One advantage of the disclosed techniques is that data locality and cache memory hit rates are improved where tile size is optimized for each cache level in the cache hierarchy.Type: GrantFiled: February 20, 2013Date of Patent: August 1, 2017Assignee: NVIDIA CorporationInventors: Rouslan Dimitrov, Rui Bastos, Ziyad S. Hakura, Eric B. Lum
-
Patent number: 9721320Abstract: A non-transitory computer-readable storage medium having computer-executable instructions for causing a computer system to perform a method for constructing bounding volume hierarchies from binary trees is disclosed. The method includes providing a binary tree including a plurality of leaf nodes and a plurality of internal nodes. Each of the plurality of internal nodes is uniquely associated with two child nodes, wherein each child node comprises either an internal node or leaf node. The method also includes determining a plurality of bounding volumes for nodes in the binary tree by traversing the binary tree from the plurality of leaf nodes upwards toward a root node, wherein each parent node is processed once by a later arriving corresponding child node.Type: GrantFiled: December 31, 2012Date of Patent: August 1, 2017Assignee: NVIDIA CORPORATIONInventor: Tero Karras
-
Patent number: 9720768Abstract: A receiver, transmitter and method for early packet header verification are provided. In one embodiment, the method includes: (1) receiving a payload flit of a preceding packet and a header flit of a current packet; and (2) using a Cyclic Redundancy Check (CRC) in the header flit to verify the payload flit of the preceding packet and the header flit of the current packet.Type: GrantFiled: October 6, 2015Date of Patent: August 1, 2017Assignee: Nvidia CorporationInventors: Stephen D. Glaser, Eric Tyson, Mark Hummel, Michael Osborn, Jonathan Owen, Marvin Denman, Dennis Ma, Denis Foley
-
Patent number: 9720858Abstract: A texture processing pipeline can be configured to service memory access requests that represent texture data access operations or generic data access operations. When the texture processing pipeline receives a memory access request that represents a texture data access operation, the texture processing pipeline may retrieve texture data based on texture coordinates. When the memory access request represents a generic data access operation, the texture pipeline extracts a virtual address from the memory access request and then retrieves data based on the virtual address. The texture processing pipeline is also configured to cache generic data retrieved on behalf of a group of threads and to then invalidate that generic data when the group of threads exits.Type: GrantFiled: December 19, 2012Date of Patent: August 1, 2017Assignee: NVIDIA CORPORATIONInventors: Brian Fahs, Eric T. Anderson, Nick Barrow-Williams, Shirish Gadre, Joel James McCormack, Bryon S. Nordquist, Nirmal Raj Saxena, Lacky V. Shah
-
Patent number: 9723216Abstract: A method for generating images. The method includes capturing first image data representing a first scene taken optically at a first magnification index, wherein the first image data comprises a first region of an image. The method includes capturing second image data representing a second scene taken optically at a second magnification index that is less than the first magnification index, wherein the second image data comprises a second region of the image. The method includes digitally zooming the second image data in the second region to the first magnification index. The method includes digitally stitching the second image data in the second region to the first image data in the first region.Type: GrantFiled: February 13, 2014Date of Patent: August 1, 2017Assignee: NVIDIA CORPORATIONInventor: Rajat Aggarwal
-
Patent number: 9721187Abstract: A system, method, and computer program product for providing a lasso selection tool for a stereoscopic image is disclosed. The method includes the steps of obtaining a lasso region of a stereoscopic image pair based on a path defined by a user using a lasso selection tool. An object in a first image of the stereoscopic image pair is identified, where the object is at least partially included within the lasso region and the object is identified in a second image of the stereoscopic image pair.Type: GrantFiled: August 30, 2013Date of Patent: August 1, 2017Assignee: NVIDIA CorporationInventor: David R. Cook
-
Patent number: 9716051Abstract: A packaging substrate, a packaged semiconductor device, a computing device and methods for forming the same are provided. In one embodiment, a packaging substrate is provided that includes a packaging structure having a chip mounting surface and a bottom surface. The packaging structure has at a plurality of conductive paths formed between the chip mounting surface and the bottom surface. The conductive paths are configured to provide electrical connection between an integrated circuit chip disposed on the chip mounting surface and the bottom surface of the packaging structure. The packaging structure has an opening formed in the chip mounting surface proximate a perimeter of the packaging structure. A stiffening microstructure is disposed in the opening and is coupled to the packaging structure.Type: GrantFiled: November 2, 2012Date of Patent: July 25, 2017Assignee: NVIDIA CorporationInventors: Leilei Zhang, Ron Boja, Abraham Yee, Zuhair Bokharey
-
Patent number: 9715413Abstract: One embodiment of the present invention sets forth a technique for selecting a first processor included in a plurality of processors to receive work related to a compute task. The technique involves analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, receiving, from each of the one or more processors identified as eligible, an availability value that indicates the capacity of the processor to receive new work, selecting a first processor to receive work related to the one compute task based on the availability values received from the one or more processors, and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task.Type: GrantFiled: January 18, 2012Date of Patent: July 25, 2017Assignee: NVIDIA CorporationInventors: Karim M. Abdalla, Lacky V. Shah, Jerome F. Duluk, Jr., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
-
Publication number: 20170206623Abstract: A method for managing bind-render-target commands in a tile-based architecture. The method includes receiving a requested set of bound render targets and a draw command. The method also includes, upon receiving the draw command, determining whether a current set of bound render targets includes each of the render targets identified in the requested set. The method further includes, if the current set does not include each render target identified in the requested set, then issuing a flush-tiling-unit-command to a parallel processing subsystem, modifying the current set to include each render target identified in the requested set, and issuing bind-render-target commands identifying the requested set to the tile-based architecture for processing. The method further includes, if the current set of render targets includes each render target identified in the requested set, then not issuing the flush-tiling-unit-command.Type: ApplicationFiled: October 1, 2013Publication date: July 20, 2017Applicant: NVIDIA CORPORATIONInventors: Ziyad S. HAKURA, Jeffrey A. BOLZ, Amanpreet GREWAL, Matthew JOHNSON, Andrei KHODAKOVSKY
-
Patent number: 9711099Abstract: A method for driving a display panel having a variable refresh rate is disclosed. The method comprises receiving a current input frame from an image source. Next, it comprises determining a number of re-scanned frames to insert between the current input frame and a subsequent input frame, wherein the re-scanned frames repeat the input frame, and wherein the number of re-scanned frames depends on the minimum refresh interval (MRI) of the display panel. Further, it comprises calculating respective intervals at which to insert the re-scanned frames between the current input frame and the subsequent input frame. Subsequently, it comprises determining if a charge accumulation in pixels of the display panel has crossed over a predetermined threshold value. Finally, responsive to a determination that the charge accumulation has crossed over a predetermined threshold value, it comprises performing a counter-measure to remediate the charge accumulation.Type: GrantFiled: February 26, 2014Date of Patent: July 18, 2017Assignee: NVIDIA CORPORATIONInventors: Rudolf Bloks, Robert Schutten, Tom Verbeure
-
Patent number: 9710874Abstract: One embodiment of the present invention sets forth a technique for mid-primitive execution preemption. When preemption is initiated no new instructions are issued, in-flight instructions progress to an execution unit boundary, and the execution state is unloaded from the processing pipeline. The execution units within the processing pipeline, including the coarse rasterization unit complete execution of in-flight instructions and become idle. However, rasterization of a triangle may be preempted at a coarse raster region boundary. The amount of context state to be stored is reduced because the execution units are idle. Preempting at the mid-primitive level during rasterization reduces the time from when preemption is initiated to when another process can execute because the entire triangle is not rasterized.Type: GrantFiled: December 27, 2012Date of Patent: July 18, 2017Assignee: NVIDIA CorporationInventors: Gregory Scott Palmer, Ziyad S. Hakura, Emmett M. Kilgariff, Dale L. Kirkland, Lacky V. Shah
-
Patent number: 9710306Abstract: Systems and methods for auto-throttling encapsulated compute tasks. A device driver may configure a parallel processor to execute compute tasks in a number of discrete throttled modes. The device driver may also allocate memory to a plurality of different processing units in a non-throttled mode. The device driver may also allocate memory to a subset of the plurality of processing units in each of the throttling modes. Data structures defined for each task include a flag that instructs the processing unit whether the task may be executed in the non-throttled mode or in the throttled mode. A work distribution unit monitors each of the tasks scheduled to run on the plurality of processing units and determines whether the processor should be configured to run in the throttled mode or in the non-throttled mode.Type: GrantFiled: April 9, 2012Date of Patent: July 18, 2017Assignee: NVIDIA CorporationInventors: Jerome F. Duluk, Jr., Jesse David Hall, Philip Alexander Cuadra, Karim M. Abdalla
-
Patent number: 9710275Abstract: A system and method for allocating shared memory of differing properties to shared data objects and a hybrid stack data structure. In one embodiment, the system includes: (1) a hybrid stack creator configured to create, in the shared memory, a hybrid stack data structure having a lower portion having a more favorable property and a higher portion having a less favorable property and (2) a data object allocator associated with the hybrid stack creator and configured to allocate storage for shared data object in the lower portion if the lower portion has a sufficient remaining capacity to contain the shared data object and alternatively allocate storage for the shared data object in the higher portion if the lower portion has an insufficient remaining capacity to contain the shared data object.Type: GrantFiled: December 21, 2012Date of Patent: July 18, 2017Assignee: Nvidia CorporationInventors: Jaydeep Marathe, Yuan Lin, Gautam Chakrabarti, Okwan Kwon, Amit Sabne
-
Patent number: 9710894Abstract: A system and method for enhanced multi-sample anti-aliasing. The method includes determining a sampling pattern corresponding to a pixel and adjusting the sampling pattern based on a visual effect (e.g., post-processing visual effect). The method further includes accessing a first plurality of samples based on the sampling pattern. The first plurality of samples may comprise a second plurality of samples within the pixel and a third plurality of pixels outside of the pixel. The method further includes performing anti-aliasing filtering of the pixel based on the first plurality of samples and the sampling pattern.Type: GrantFiled: June 4, 2013Date of Patent: July 18, 2017Assignee: NVIDIA CORPORATIONInventor: Timothy Paul Lottes
-
Patent number: 9704212Abstract: A system and method for image processing are provided. The system comprises a main computing device and a secondary computing device. The main computing device comprises a main graphics card and a main central processing unit, and the secondary computing device comprises a secondary graphics card and a secondary central processing unit. The main computing device is configured to detect the secondary computing device. The main central processing unit is configured to send a request to process raw image data together to the secondary central processing unit and allocate the raw image data to the main graphics card and the secondary graphics card after receiving a response from the secondary central processing unit. The main graphics card and the secondary graphics card are configured to process images based on the allocation of the main central processing unit.Type: GrantFiled: February 7, 2014Date of Patent: July 11, 2017Assignee: Nvidia CorporationInventor: Maojiang (Jacen) Lin