Abstract: A sparse convolutional neural network accelerator system that dynamically and efficiently identifies fine-grained parallelism in sparse convolution operations. The system determines matching pairs of non-zero input activations and weights from the compacted input activation and weight arrays utilizing a scalable, dynamic parallelism discovery unit (PDU) that performs a parallel search on the input activation array and the weight array to identify reducible input activation and weight pairs.
Type:
Grant
Filed:
January 23, 2019
Date of Patent:
April 23, 2024
Assignee:
NVIDIA CORP.
Inventors:
Ching-En Lee, Yakun Shao, Angshuman Parashar, Joel Emer, Stephen W. Keckler
Abstract: A multi-rank system includes multiple circuit ranks communicating over a common data line to multiple data receivers, each corresponding to one or more of the ranks and each having a corresponding reference voltage generator and clock timing adjustment circuit, such that a rank to communicate on the shared data line is switched without reconfiguring outputs of either the reference voltage generators or the clock timing adjustment circuits.
Type:
Grant
Filed:
April 27, 2022
Date of Patent:
April 23, 2024
Assignee:
NVIDIA CORP.
Inventors:
Wen-Hung Lo, Michael Ivan Halfen, Abhishek Dhir, Jaewon Lee
Abstract: Methods of operating a serial data bus divide series of data bits into sequences of one or more bits and encode the sequences as N-level symbols, which are then transmitted at multiple discrete voltage levels. These methods may be utilized to communicate over serial data lines to improve bandwidth and reduce crosstalk and other sources of noise.
Abstract: Techniques are described for detecting an electromagnetic (“EM”) fault injection attack directed toward circuitry in a target digital system. In various embodiments, a first node may be coupled to first driving circuitry, and a second node may be coupled to second driving circuitry. The driving circuitry is implemented in a manner such that a logic state on the second node has greater sensitivity to an EM pulse than has a logic state on the first node. Comparison circuitry may be coupled to the first and to the second nodes to assert an attack detection output responsive to sensing a logic state on the second node that is unexpected relative to a logic state on the first node.
Abstract: A circuit for improving control over asynchronous signal crossings during circuit scan tests includes multiple scan registers and a decoder configured to translate a combined output of the scan registers into multiple one-hot controls to the local clock gates of scan registers disposed in multiple different clock domains. Programmable registers are provided to selectively enable and disable the local clock gates of the different clock domains.
Type:
Grant
Filed:
September 16, 2022
Date of Patent:
March 26, 2024
Assignee:
NVIDIA CORP.
Inventors:
Mahmut Yilmaz, Vinod Pagalone, Munish Aggarwal, Doochul Shin
Abstract: A circuit for improving control over asynchronous signal crossings during circuit scan tests includes multiple scan registers and a decoder configured to translate a combined output of the scan registers into multiple one-hot controls to the local clock gates of scan registers disposed in multiple different clock domains. Programmable registers are provided to selectively enable and disable the local clock gates of the different clock domains.
Type:
Application
Filed:
September 16, 2022
Publication date:
March 21, 2024
Applicant:
NVIDIA Corp.
Inventors:
Mahmut Yilmaz, Vinod Pagalone, Munish Aggarwal, Doochul Shin
Abstract: A transceiver circuit includes a receiver front end utilizing a ring oscillator, and a transmitter front end utilizing a pass-gate circuit in a first feedback path across a last-stage driver circuit. The transceiver circuit provides low impedance at low frequency and high impedance at high frequency, and desirable peaking behavior.
Abstract: Warp sharding techniques to switch execution between divergent shards on instructions that trigger a long stall, thereby interleaving execution between diverged threads within a warp instead of across warps. The technique may be applied to mitigate pipeline stalls in applications with low warp occupancy and high divergence. Warp data cache locality may also be improved by concentrating memory accesses within a warp rather than spreading them across warps.
Type:
Grant
Filed:
February 24, 2021
Date of Patent:
March 19, 2024
Assignee:
NVIDIA CORP.
Inventors:
Sana Damani, Mark Stephenson, Ram Rangan, Daniel Robert Johnson, Rishkul Kulkarni
Abstract: A ring oscillator circuit with a frequency that is sensitive to the timing of a clock-to-Q (clk2Q) propagation delay of one or more flip-flops utilized in the ring oscillator. The clock2Q is the delay between the clock signal arriving at the clock pin on the flop and the Q output reflecting the state of the input data signal to the flop. Clk2q delay measurements are made based on measurement of the ring oscillator frequency, leading to more accurate estimates of clk2Q for different types of flip-flops and flip-flop combinations, which may in turn enable improvements in circuit layouts, performance, and area.
Abstract: Stacked voltage domain level shifting circuits for shifting signals low-to-high or high-to-low include a storage cell powered by a mid-range supply rail of the stacked voltage domain level shifting circuit, and control drivers powered by moving supply voltages generated by the storage cell, wherein the control drivers coupled to drive gates of common-source configured devices coupled to storage nodes of the storage cell.
Type:
Application
Filed:
September 14, 2022
Publication date:
January 25, 2024
Applicant:
NVIDIA Corp.
Inventors:
Walker Joseph Turner, John Poulton, Sanquan Song
Abstract: A level-shifting circuits utilizing storage cells for shifting signals low-to-high or high-to-low include control drivers with moving supply voltages. The moving supply voltages may power positive or negative supply terminals of the control drivers. The control drivers drive gates of common-source configured devices coupled to storage nodes of the storage cell.
Type:
Application
Filed:
September 14, 2022
Publication date:
January 25, 2024
Applicant:
NVIDIA Corp.
Inventors:
Walker Joseph Turner, John Poulton, Sanquan Song
Abstract: Stacked voltage domain level shifting circuits for shifting signals low-to-high or high-to-low include a storage cell and control drivers powered by a mid-range supply rail of the stacked voltage domain level shifting circuit, wherein the control drivers are coupled to drive common-source configured devices coupled to storage nodes of the storage cell.
Type:
Application
Filed:
September 14, 2022
Publication date:
January 25, 2024
Applicant:
NVIDIA Corp.
Inventors:
Walker Joseph Turner, John Poulton, Sanquan Song
Abstract: A multi-rank circuit system utilizing a shared IO channel includes a first stage of multiple selectors coupled to input multiple digital busses, and a second stage including one or more selectors coupled to receive outputs of the first stage of selectors and to individually select one of the outputs of the first stage of selectors to one or more control circuits for IO circuits of the ranks. The system switches one of the ranks to be an active rank on the shared IO channel, and operates the first stage of selectors to select one of the digital busses to the second stage of selectors in advance of switching a next active rank to the shared IO channel.
Type:
Grant
Filed:
April 27, 2022
Date of Patent:
January 23, 2024
Assignee:
NVIDIA CORP.
Inventors:
Jiwang Lee, Jaewon Lee, Hsuche Nee, Po-Chien Chiang, Wen-Hung Lo, Abhishek Dhir, Michael Ivan Halfen, Chunjen Su
Abstract: A circuit mask optimizer utilizes a Convolutional Fourier Neural Operator (CFNO) to efficiently learn layout tile dependencies, enabling stitch-less largescale mask optimization with limited intervention of legacy tools. Litho-guided self training via a trained machine learning model provides non-convex optimization, enabling iterative model and dataset refinements at a substantial performance improvement over conventional solutions.
Abstract: A transceiver configured to communicate a burst of data bits and meta-data bits for the data bits includes data channels, auxiliary data channels, and at least one error correction channel. The transceiver includes an encoder that applies 11b7s encoding to a first number of the data bits to generate first PAM-3 symbols on some or all of the communication channels, and that applies 3b2s encoding to a second number of the data bits to generate second PAM-3 symbols on at least some of the communication channels.
Abstract: To mitigate pulse shape degradation along a signal route, the signal is driven from two ends. One end of the route is loaded and the other is relatively unloaded. The loaded route and unloaded route may traverse two different metal layers on a printed circuit board. The two routes may thus be related such that the unloaded route has less RC distortion effects on the signal than does the loaded route.
Type:
Grant
Filed:
December 20, 2021
Date of Patent:
December 26, 2023
Assignee:
NVIDIA CORP.
Inventors:
Lalit Gupta, Andreas Jon Gotterba, Jesse Wang
Abstract: Layout techniques for circuits on substrates are disclosed that address the multivariate problem of minimizing routing distances for high-speed I/O pins between circuits while simultaneously providing for the rapid provision of transient power demands to the circuits. The layout techniques may also enable improved thermal management for the circuits.
Type:
Application
Filed:
September 6, 2023
Publication date:
December 21, 2023
Applicant:
NVIDIA Corp.
Inventors:
Shuo Zhang, Eric Zhu, Minto Zheng, Michael Zhai, Town Zhang, Jie Ma
Abstract: Convergence of threads executing common code sections is facilitated using instructions inserted at strategic locations in computer code sections. The inserted instructions enable the threads in a warp or other group to cooperate with a thread scheduler to promote thread convergence.
Type:
Grant
Filed:
August 11, 2022
Date of Patent:
December 19, 2023
Assignee:
NVIDIA CORP.
Inventors:
Daniel Robert Johnson, Jack Choquette, Olivier Giroux, Michael Patrick McKeown, Mark Stephenson, Sana Damani
Abstract: Neural network-based structures for action user equipment device detection, estimation of time-of-arrival, and estimation of carrier frequency offset utilized with the narrowband physical random-access channel of wireless communication systems. The structure includes a neural network to generate predictions of active user equipment devices, and a twin neural network to generate time-of-arrival predictions for signals from the user equipment devices and carrier frequency offset predictions for signals from the user equipment devices.
Type:
Application
Filed:
March 24, 2023
Publication date:
November 23, 2023
Applicant:
NVIDIA Corp.
Inventors:
Faycal Ait Aoudia, Jakob Hoydis, Sebastian Cammerer, Matthijs Jules Van keirsbilck, Alexander Keller
Abstract: Voltage level conversion circuits include PMOS pull-down devices or NMOS pull-up devices, and inverters with outputs that determine gate voltages of these devices. The inverters are powered by moving supply voltages, for example complementary supply voltages generated from a pair of cross-coupled inverters. The cross-coupled inverters may implement a data storage latch with the moving supply voltages generated from the internal data storage nodes of the latch.
Type:
Grant
Filed:
July 25, 2022
Date of Patent:
November 21, 2023
Assignee:
NVIDIA CORP.
Inventors:
Walker Joseph Turner, John Poulton, Sanquan Song