Abstract: Apparatuses, systems, and techniques to infer a sequence of actions to perform using one or more neural networks trained, at least in part, by optimizing a probability distribution function using a cost function, wherein the probability distribution represents different sequences of actions that can be performed. In at least one embodiment, a model predictive control problem is formulated as a Bayesian inference task to infer a set of solutions.
Type:
Grant
Filed:
April 28, 2020
Date of Patent:
May 9, 2023
Assignee:
NVIDIA Corporation
Inventors:
Alexander Conrad Lambert, Adam Harper Fishman, Dieter Fox, Byron Boots, Fabio Tozeto Ramos
Abstract: IR drop predictions are obtained using a maximum convolutional neural network. A circuit structure is partitioned into a grid. For cells of the circuit structure in sub-intervals of a clock period, power consumption of the cell is amortized into a set of grid tiles that include portions of the cell, thus forming a set of power maps. The power maps are applied to a neural network to generate IR drop predictions for the circuit structure.
Type:
Grant
Filed:
March 17, 2020
Date of Patent:
May 9, 2023
Assignee:
NVIDIA Corp.
Inventors:
Zhiyao Xie, Haoxing Ren, Brucek Khailany, Sheng Ye
Abstract: Through-hole mounted semiconductor assemblies are described. A printed circuit board (“PCB”) has first and second PCB sides and has a through hole therein. The through hole defines a hole area. A semiconductor package may be disposed in the hole area such that the semiconductor package is at least partially exposed on one or more of the first and the second PCB sides. Package contacts on the semiconductor package may be electrically coupled to PCB contacts disposed on one or more of the PCB sides. In some embodiments, one or more support structures may be coupled to the PCB and may touch the semiconductor package. In some embodiments, cooling devices may be placed in thermal communication with the semiconductor package on both sides of the PCB.
Type:
Grant
Filed:
February 3, 2021
Date of Patent:
May 9, 2023
Assignee:
NVIDIA Corporation
Inventors:
Joey Cai, Tiger Yan, Jacky Zhu, Oliver Yi
Abstract: A phase-locked loop (PLL) device includes a first phase detector to receive an in-phase reference clock and an in-phase feedback clock, the first phase detector to output a first phase error; a second phase detector to receive a quadrature reference clock and a quadrature feedback clock, the second phase detector to output a second phase error; a proportional path component to generate first current pulses from the first phase error and second current pulses from the second phase error; an integrator circuit coupled to the proportional path component, the integrator circuit to sum, within a current output signal, the first current pulses and the second current pulses; a ring oscillator to be driven by the current output signal; and a pair of phase interpolators coupled to an output of the ring oscillator, the pair of phase interpolators to respectively generate the in-phase feedback clock and the quadrature feedback clock.
Abstract: A system and method may determine if a class of process (e.g. NN execution, cryptocurrency mining, graphic processing) is executing on a processor, or which class is executing, by calculating or determining features from execution telemetry or measurements collected from processors executing processes, and determining from at least a subset of the features the likelihood that the processor is executing the class of process. Execution telemetry may include data regarding or describing the execution of the process, or describing hardware used to execute the process, such as processor temperature, memory usage, etc.
Type:
Application
Filed:
January 3, 2022
Publication date:
May 4, 2023
Applicant:
NVIDIA CORPORATION
Inventors:
Tamar VICLIZKI, Vadim GECHMAN, Ahmad SALEH, Bartley RICHARDSON, Gorkem BATMAZ, Avighan MAJUMDER, Vibhor AGRAWAL, Fang-Yi WANG, Douglas LUU
Abstract: A method dynamically selects one of a first sampling order and a second sampling order for a ray trace of pixels in a tile where the selection is based on a motion vector for the tile. The sampling order may be a bowtie pattern or an hourglass pattern.
Type:
Grant
Filed:
July 6, 2021
Date of Patent:
April 25, 2023
Assignee:
NVIDIA Corp.
Inventors:
Johan Pontus Andersson, Jim Nilsson, Tomas Guy Akenine-Möller
Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.
Type:
Grant
Filed:
September 5, 2019
Date of Patent:
April 25, 2023
Assignee:
NVIDIA CORPORATION
Inventors:
Jerome F. Duluk, Jr., Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H. Duncan, Wishwesh Anil Gandhi, Lacky V. Shah, Eric Rock, Feiqi Su, James Leroy Deming, Alan Menezes, Pranav Vaidya, Praveen Joginipally, Timothy John Purcell, Manas Mandal
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
Type:
Grant
Filed:
August 2, 2021
Date of Patent:
April 25, 2023
Assignee:
NVIDIA Corporation
Inventors:
Ching-Yu Hung, Ravi P Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
Abstract: The computational scaling challenges of holographic displays are mitigated by techniques for generating holograms that introduce foveation into a wave front recording planes approach to hologram generation. Spatial hashing is applied to organize the points or polygons of a display object into keys and values.
Type:
Grant
Filed:
July 23, 2020
Date of Patent:
April 25, 2023
Assignee:
Nvidia Corp.
Inventors:
Jui-Hsien Wang, Ward Lopes, Rachel Anastasia Brown, Peter Shirley
Abstract: In a self-driving autonomous vehicle, a controller architecture includes multiple processors within the same box. Each processor monitors the others and takes appropriate safe action when needed. Some processors may run dormant or low priority redundant functions that become active when another processor is detected to have failed. The processors are independently powered and independently execute redundant algorithms from sensor data processing to actuation commands using different hardware capabilities (GPUs, processing cores, different input signals, etc.). Intentional hardware and software diversity improves fault tolerance. The resulting fault-tolerant/fail-operational system meets ISO26262 ASIL-D specifications based on a single electronic controller unit platform that can be used for self-driving vehicles.
Type:
Grant
Filed:
November 22, 2021
Date of Patent:
April 25, 2023
Assignee:
NVIDIA Corporation
Inventors:
Mohammed Abdulla Yousuf, T. Y. Chan, Ram Ganapathi, Ashok Srinivasan, Mike Truog
Abstract: In various examples, lane location criteria and object class criteria may be used to determine a set of objects in an environment to track. For example, lane information, freespace information, and/or object detection information may be used to filter out or discard non-essential objects (e.g., objects that are not in an ego-lane or adjacent lanes) from objects detected using an object detection algorithm. Further, objects corresponding to non-essential object classes may be filtered out to generate a final filtered set of objects to be tracked that may be of a lower quantity than the actual number of detected objects. As a result, object tracking may only be executed on the final filtered set of objects, thereby decreasing compute requirements and runtime of the system without sacrificing object tracking accuracy and reliability with respect to more pertinent objects.
Abstract: A display device includes an array of LEDs, an array of LCD pixels, and a display controller. The display controller is configured to compensate for one or more sources of color variation in light produced by the LEDs. The display controller can determine a first color variation at a given LCD pixel based on the distance between the given LCD pixel and one or more LEDs. The display controller can also determine a second color variation at the given LCD pixel based on a current level supplied to the one or more LEDs. The display controller configures the given LCD pixel to filter light that is received from the one or more LEDs in a manner that reduces or eliminates either or both of the first and second color variations.
Abstract: Apparatuses, systems, and techniques to receive, at one or more processors associated with an image signal processing (ISP) pipeline for a camera, an image generated using an image sensor of the camera, wherein the image comprises a plurality of channels associated with color information of the image; process, by the one or more processors, the plurality of channels of the image to generate a plurality of luminance and/or radiance values; generate, by the one or more processors, an updated version of the image using the plurality of luminance and/or radiance values; and output the updated version of the image.
Type:
Grant
Filed:
December 11, 2020
Date of Patent:
April 25, 2023
Assignee:
NVIDIA Corporation
Inventors:
Sean Midthun Pieper, Robin Brian Jenkin
Abstract: Machine learning systems and methods that determine gaze direction by using face orientation information, such as facial landmarks, to modify eye direction information determined from images of the subject's eyes. System inputs include eye crops of the eyes of the subject, as well as face orientation information such as facial landmarks of the subject's face in the input image. Facial orientation information, or facial landmark information, is used to determine a coarse prediction of gaze direction as well as to learn a context vector of features describing subject face pose. The context vector is then used to adaptively re-weight the eye direction features determined from the eye crops. The re-weighted features are then combined with the coarse gaze prediction to determine gaze direction.
Abstract: A computation graph is accessed. In the computation graph, operations to be performed are represented as interior nodes, inputs to the operations are represented as leaf nodes, and a result of the operations is represented as a root. Selected sets of the operations are combined to form respective kernels of operations. Code is generated execute the kernels of operations. The code is executed to determine the result.
Type:
Grant
Filed:
January 16, 2018
Date of Patent:
April 18, 2023
Assignee:
NVIDIA Corporation
Inventors:
Mahesh Ravishankar, Vinod Grover, Evghenii Gaburov, Alberto Magni, Sean Lee
Abstract: In one embodiment of the present invention, a programmable vision accelerator enables applications to collapse multi-dimensional loops into one dimensional loops. In general, configurable components included in the programmable vision accelerator work together to facilitate such loop collapsing. The configurable elements include multi-dimensional address generators, vector units, and load/store units. Each multi-dimensional address generator generates a different address pattern. Each address pattern represents an overall addressing sequence associated with an object accessed within the collapsed loop. The vector units and the load store units provide execution functionality typically associated with multi-dimensional loops based on the address pattern. Advantageously, collapsing multi-dimensional loops in a flexible manner dramatically reduces the overhead associated with implementing a wide range of computer vision algorithms.
Type:
Grant
Filed:
April 28, 2016
Date of Patent:
April 18, 2023
Assignee:
NVIDIA CORPORATION
Inventors:
Ching Y. Hung, Jagadeesh Sankaran, Ravi P. Singh, Stanley Tzeng
Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.
Type:
Grant
Filed:
April 22, 2021
Date of Patent:
April 18, 2023
Assignee:
NVIDIA CORPORATION
Inventors:
Xiaodong Yang, Ming-Yu Liu, Jan Kautz, Fanyi Xiao, Xitong Yang
Abstract: An augmented reality display system includes a first beam path for a foveal inset image on a holographic optical element, a second beam path for a peripheral display image on the holographic optical element, and pupil position tracking logic that generates control signals to set a position of the foveal inset as perceived through the holographic optical element, to determine the peripheral display image, and to control a moveable stage.
Type:
Grant
Filed:
July 6, 2021
Date of Patent:
April 18, 2023
Assignee:
NVIDIA Corp.
Inventors:
Jonghyun Kim, Youngmo Jeong, Michael Stengel, Morgan McGuire, David Luebke
Abstract: A fully-connected neural network may be configured for execution by a processor as a fully-fused neural network by limiting slow global memory accesses to reading and writing inputs to and outputs from the fully-connected neural network. The computational cost of fully-connected neural networks scale quadratically with its width, whereas its memory traffic scales linearly. Modern graphics processing units typically have much greater computational throughput compared with memory bandwidth, so that for narrow, fully-connected neural networks, the linear memory traffic is the bottleneck. The key to improving performance of the fully-connected neural network is to minimize traffic to slow “global” memory (off-chip memory and high-level caches) and to fully utilize fast on-chip memory (low-level caches, “shared” memory, and registers), which is achieved by the fully-fused approach.
Type:
Grant
Filed:
June 7, 2021
Date of Patent:
April 18, 2023
Assignee:
NVIDIA Corporation
Inventors:
Thomas Müller, Nikolaus Binder, Fabrice Pierre Armand Rousselle, Jan Novák, Alexander Georg Keller
Abstract: A transceiver circuit includes a receiver front end utilizing a ring oscillator, and a transmitter front end utilizing a pass-gate circuit in a first feedback path across a last-stage driver circuit. The transceiver circuit provides low impedance at low frequency and high impedance at high frequency, and desirable peaking behavior.