Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.
Type:
Grant
Filed:
November 18, 2019
Date of Patent:
December 8, 2020
Assignee:
NVIDIA Corporation
Inventors:
William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Abstract: Many computing systems process data organized in a matrix format. For example, artificial neural networks (ANNs) perform numerous computations on data organized into matrices using conventional matrix arithmetic operations. One such operation, which is commonly performed, is the transpose operation. Additionally, many such systems need to process many matrices and/or matrices that are large in size. For sparse matrices that hold few significant values and many values that can be ignored, transmitting and processing all the values in such matrices is wasteful. Thus, techniques are introduced for storing a sparse matrix in a compressed format that allows for a matrix transpose operation to be performed on the compressed matrix without having to first decompress the compressed matrix. By utilizing the introduced techniques, more matrix operations can be performed than conventional systems.
Type:
Grant
Filed:
February 27, 2019
Date of Patent:
December 8, 2020
Assignee:
Nvidia Corporation
Inventors:
Jorge Albericio Latorre, Jeff Pool, David Garcia
Abstract: Detection of activity in video content, and more particularly detecting in video start and end frames inclusive of an activity and a classification for the activity, is fundamental for video analytics including categorizing, searching, indexing, segmentation, and retrieval of videos. Existing activity detection processes rely on a large set of features and classifiers that exhaustively run over every time step of a video at multiple temporal scales, or as a small improvement computationally propose segments of the video on which to perform classification. These existing activity detection processes, however, are computationally expensive, particularly when trying to achieve activity detection accuracy, and moreover are not configurable for any particular time or computation budget. The present disclosure provides a time and/or computation budget-aware method for detecting activity in video that relies on a recurrent neural network implementing a learned policy.
Type:
Grant
Filed:
November 28, 2018
Date of Patent:
December 8, 2020
Assignee:
NVIDIA Corporation
Inventors:
Xiaodong Yang, Pavlo Molchanov, Jan Kautz, Behrooz Mahasseni
Abstract: A graphics processing pipeline includes three architectural features that allow a fragment shader to efficiently calculate per-sample attribute values using barycentric coordinates and per-vertex attributes. The first feature is barycentric coordinate injection to provide barycentric coordinates to the fragment shader. The second feature is an attribute qualifier that allows an attribute of a graphics primitive to be processed without conventional fixed-function interpolation. The third feature is a direct access path from the fragment shader to triangle data storage hardware resources where vertex attribute data and/or plane equation coefficients are stored. Allowing the fragment shader to calculate per-sample attribute values in this way advantageously increases system flexibility while reducing workload associated with triangle plane equation setup.
Type:
Grant
Filed:
February 6, 2019
Date of Patent:
December 8, 2020
Assignee:
NVIDIA Corporation
Inventors:
David Patrick, Dale L. Kirkland, Henry Packard Moreton, Ziyad Sami Hakura, Yury Uralsky
Abstract: Systems and methods for detecting blockages in images are described. An example method may include receiving a plurality of images captured by a camera installed on an apparatus. The method may include identifying one or more candidate blocked regions in the plurality of images. Each of the candidate blocked regions may contain image data caused by blockages in the camera's field-of-view. The method may further include assigning scores to the one or more candidate blocked regions based on relationships among the one or more candidate blocked regions in the plurality of images. In response to a determination that one of the scores is above a predetermined blockage threshold, the method may include generating an alarm signal for the apparatus.
Abstract: The disclosure is directed to methods and processes of rendering a complex scene using a combination of raytracing and rasterization. The methods and processes can be implemented in a video driver or software library. A developer of an application can provide information to an application programming interface (API) call as if a conventional raytrace API is being called. The method and processes can analyze the scene using a variety of parameters to determine a grouping of objects within the scene. The rasterization algorithm can use as input primitive cluster data retrieved from raytracing acceleration structures. Each group of objects can be rendered using its own balance of raytracing and rasterization to improve rendering performance while maintaining a visual quality target level.
Type:
Grant
Filed:
May 23, 2019
Date of Patent:
December 1, 2020
Assignee:
Nvidia Corporation
Inventors:
Christoph Kubisch, Ziyad Hakura, Manuel Kraemer
Abstract: In various examples, a portable computing device is provided that has a bottom shell and a top shell pivotally coupled to the bottom shell for movement between a closed position and at least one open position. A display fits within a perimeter rim of the top shell, the display being obscured from view when the top shell is in the closed position and being viewable when the top shell is in the at least one open position. A coupling linkage couples the display, the top shell, and the bottom shell, to move the display between at least a first position with the display closer to the top shell when the top shell is in the closed position and a second position with at least a portion of the display farther from the top shell when the top shell is in the open position.
Type:
Grant
Filed:
August 29, 2019
Date of Patent:
December 1, 2020
Assignee:
NVIDIA CORPORATION
Inventors:
Harrison Snagwha Kim, Jin Hyup Lee, Younseok Sung, Yunseok Kim, Jeongyong Jeon, Seungkug Park
Abstract: System and method of compiling a program having a mixture of host code and device code to enable Profile Guided Optimization (PGO) for device code execution. An exemplary integrated compiler can compile source code programmed to be executed by a host processor (e.g., CPU) and a co-processor (e.g., a GPU) concurrently. The compilation can generate an instrumented executable code which includes: profile instrumentation counters for the device functions; and instructions for the host processor to allocate and initialize device memory for the counters and to retrieve collected profile information from the device memory to generate instrumentation output. The output is fed back to the compiler for compiling the source code a second time to generate optimized executable code for the device functions defined in the source code.
Type:
Grant
Filed:
October 8, 2018
Date of Patent:
December 1, 2020
Assignee:
NVIDIA Corporation
Inventors:
Hariharan Sandanagobalane, Sean Lee, Vinod Grover
Abstract: An integrated circuit such as, for example a graphics processing unit (GPU), having an on-chip analog to digital converter (ADC) for use in overcurrent protection of the chip is described, where the overcurrent protection response times are substantially faster than techniques with external ADC. A system-on-chip (SoC) includes the integrated circuit and a multiplexer arranged externally to the chip having the ADC, where the multiplexer provides the ADC with a data stream of sampling information from a plurality of power sources. Methods for overcurrent protection using an on-chip ADC are also described.
Type:
Grant
Filed:
July 31, 2018
Date of Patent:
December 1, 2020
Assignee:
NVIDIA Corporation
Inventors:
Sachin Idgunji, Ben Pei En Tsai, Jun (Alex) Gu, James Reilley, Thomas E. Dewey
Abstract: A communication method between a source device and a target device utilizes speculative connection setup between the source device and the target device, target-device-side packet ordering, and fine-grained ordering to remove packet dependencies.
Abstract: A communication method between a source device and a target device utilizes speculative connection setup between the source device and the target device, target-device-side packet ordering, and fine-grained ordering to remove packet dependencies.
Abstract: Embodiments of the present invention provide a novel solution which can be used to detect and analyze instances of micro stutter within a given game, GPU and/or driver version. Embodiments of the present invention may be operable to divide an application session into a set of sub-sessions and perform multiple derivative calculations on time-varying application parameters (e.g., frame rates) measured during each sub-session. Embodiments of the present invention may also be operable to generate separate histograms for each derivative calculation performed. As such, based on calculations performed, embodiments of the present invention may synchronously increment histogram bins representing a corresponding range of performance in real-time. Upon the completion of the application session, sub-session histograms may be compressed and then saved into a log which can be fetched and uploaded to a host computer system for aggregation and storage into a database for server-side optimization analysis.
Abstract: A linear regulator for applications with low area constraint resulting in limited load decoupling capacitance that introduces a compensating zero in the regulator loop to counteract the loss of phase margin and further introduces a feed-forward noise cancellation path operating over a wide frequency range covering a first package resonance frequency. The feed-forward path has low power consumption and improves the power-supply rejection ratio.
Abstract: The present disclosure provides methods, systems, and computer program products that use embeddings of candidate variation information and deep learning models to accurately and efficiently detect variations in biopolymer sequencing data, particularly suboptimal sequencing data.
Type:
Application
Filed:
May 13, 2019
Publication date:
November 19, 2020
Applicant:
NVIDIA Corporation
Inventors:
Nikolai YAKOVENKO, Johnny ISRAELI, Avantika LAL, Michael VELLA, Zhen HU
Abstract: A gaze tracking system for use in head mounted displays includes an eyepiece having an opaque frame circumferentially enclosing a transparent field of view, light emitting diodes coupled to the opaque frame for emitting infrared light onto various regions of an eye gazing through the transparent field of view, and diodes for sensing intensity of infrared light reflected off of various regions of the eye.
Type:
Grant
Filed:
September 20, 2019
Date of Patent:
November 17, 2020
Assignee:
NVIDIA Corp.
Inventors:
Eric Whitmire, Kaan Aksit, Michael Stengel, Jan Kautz, David Luebke, Ben Boudaoud
Abstract: The disclosure provides a virtual view broadcaster, a cloud-based renderer, and a method of providing stereoscopic images. In one embodiment, the method includes (1) generating a monoscopic set of rendered images and (2) converting the set of rendered images into a stereoscopic pair of images employing depth information from the monoscopic set of rendered images and raymarching.
Abstract: A method for displaying a near-eye light field display (NELD) image is disclosed. The method comprises determining a pre-filtered image to be displayed, wherein the pre-filtered image corresponds to a target image. It further comprises displaying the pre-filtered image on a display. Subsequently, it comprises producing a near-eye light field after the pre-filtered image travels through a microlens array adjacent to the display, wherein the near-eye light field is operable to simulate a light field corresponding to the target image. Finally, it comprises altering the near-eye light field using at least one converging lens, wherein the altering allows a user to focus on the target image at an increased depth of field at an increased distance from an eye of the user and wherein the altering increases spatial resolution of said target image.
Abstract: This disclosure relates to a receiver comprising a clock and data recovery loop and a phase offset loop. The clock and data recovery loop may be controlled by a sum of gradients for a plurality of data interleaves. The phase offset loop may be controlled by an accumulated differential gradient for each of the data interleaves.
Abstract: Point cloud registration sits at the core of many important and challenging 3D perception problems including autonomous navigation, object/scene recognition, and augmented reality (AR). A new registration algorithm is presented that achieves speed and accuracy by registering a point cloud to a representation of a reference point cloud. A target point cloud is registered to the reference point cloud by iterating through a number of cycles of an EM algorithm where, during an Expectation step, each point in the target point cloud is associated with a node of a hierarchical tree data structure and, during a Maximization step, an estimated transformation is determined based on the association of the points with corresponding nodes of the hierarchical tree data structure. The estimated transformation is determined by solving a minimization problem associated with a sum, over a number of mixture components, over terms related to a Mahalanobis distance.
Type:
Grant
Filed:
March 12, 2019
Date of Patent:
November 3, 2020
Assignee:
NVIDIA Corporation
Inventors:
Benjamin David Eckart, Kihwan Kim, Jan Kautz