Abstract: In various examples, one or more components or regions of a processing unit—such as a processing core, and/or component thereof—may be tested for faults during deployment in the field. To perform testing while in deployment, the state of a component subject to test may be retrieved and/or stored during the test to maintain state integrity, the component may be clamped to communicatively isolate the component from other components of the processing unit, a test vector may be applied to the component, and the output of the component may be compared against an expected output to determine if any faults are present. The state of the component may be restored after testing, and the clamp removed, thereby returning the component to its operating state without a perceivable detriment to operation of the processing unit in deployment.
Abstract: In a ray tracer, a cache for streaming workloads groups ray requests for coherent successive bounding volume hierarchy traversal operations by sending common data down an attached data path to all ray requests in the group at the same time or about the same time. Grouping the requests provides good performance with a smaller number of cache lines.
Type:
Grant
Filed:
April 20, 2023
Date of Patent:
October 22, 2024
Assignee:
NVIDIA Corporation
Inventors:
Gregory A. Muthler, Timo Aila, Tero Karras, Samuli Laine, William Parsons Newhall, Jr., Ronald Charles Babich, Jr., John Burgess, Ignacio Llamas
Abstract: Apparatuses, systems, and techniques to optimize processor performance. In at least one embodiment, a method increases an operation voltage of one or more processors, based at least in part, on one or more error rates of the one or more processors.
Type:
Grant
Filed:
June 23, 2022
Date of Patent:
October 22, 2024
Assignee:
NVIDIA CORPORATION
Inventors:
Benjamin D. Faulkner, Padmanabhan Kannan, Srinivasan Raghuraman, Peng Cheng Shen, Divya Ramakrishnan, Swanand Santosh Bindoo, Sreedhar Narayanaswamy, Amey Y. Marathe
Abstract: State information can be determined for a subject that is robust to different inputs or conditions. For drowsiness, facial landmarks can be determined from captured image data and used to determine a set of blink parameters. These parameters can be used, such as with a temporal network, to estimate a state (e.g., drowsiness) of the subject. To improve robustness, an eye state determination network can determine eye state from the image data, without reliance on intermediate landmarks, that can be used, such as with another temporal network, to estimate the state of the subject. A weighted combination of these values can be used to determine an overall state of the subject. To improve accuracy, individual behavior patterns and context information can be utilized to account for variations in the data due to subject variation or current context rather than changes in state.
Abstract: Apparatuses, systems, and techniques for real-time persistent object tracking for intelligent video analytics systems. A state of a first object included in an environment may be tracked based on a first set of images depicting the environment. The first set of images may be generated during a first time period. It may be determined that the first object is not detected in the environment depicted in a second set of images. The second set of images may be generated during a second time period that is subsequent to the first time period. One or more predicted future states of the first object may be obtained in view of the state of the first object in the environment depicted in the first set of images. A second object may be detected in the environment depicted in a third set of images generated during a third time period that is subsequent to the second time period.
Abstract: Systems and methods of compressing video data are disclosed. The proposed systems provide a computer-implemented process configured to classify a person's behavior(s) during a video and encode the behaviors as a representation of the video. When playback of the video is requested, a reconstruction of the video is generated by a video synthesizer based on a reference image of the person and the sequence of codes corresponding to their behavior during the video. Storage and transmission of the video can then be limited to the reference image and the behavioral codes rather than the video file itself, significantly reducing memory and bandwidth requirements.
Abstract: In various examples, a user may access or acquire an application to download to the user's local computing device. Upon accessing the application, a local instance of the application may begin downloading to the computing device, and the user may be given the option to play a cloud-hosted instance of the application. If the user selects to play a hosted instance of the application, the cloud-hosted instance of the application may begin streaming while the local instance of the application downloads to the user's computing device in the background. Application state data may be stored and associated with the user during gameplay such that, once the local instance of the application has downloaded, the user may switch from the hosted instance of the application to the local instance to begin playing locally, with the application state information accounted for.
Abstract: In various examples, game session audio data—e.g., representing speech of users participating in the game—may be monitored and/or analyzed to determine whether inappropriate language is being used. Where inappropriate language is identified, the portions of the audio corresponding to the inappropriate language may be edited or modified such that other users do not hear the inappropriate language. As a result, toxic behavior or language within instances of gameplay may be censored—thereby enhancing the user experience and making online gaming environments safer for more vulnerable populations. In some embodiments, the inappropriate language may be reported—e.g., automatically—to the game developer or game application host in order to suspend, ban, or otherwise manage users of the system that have a proclivity for toxic behavior.
Abstract: Devices and methods to update semiconductor components are disclosed. In at least one embodiment, a device updates semiconductor components independent of a semiconductor component operational state.
Type:
Grant
Filed:
January 21, 2021
Date of Patent:
October 22, 2024
Assignee:
Nvidia Corporation
Inventors:
Ryan Albright, William Andrew Mecham, Michael Thompson, Aaron Richard Carkin, William Ryan Weese, Benjamin Goska
Abstract: Systems and methods for cooling a datacenter are disclosed. In at least one embodiment, one or more outlet reservoirs are associated with a stabilizing subsystem and a rack so that one or more outlet reservoirs can receive two-phase fluid that is outlet from a plurality of cold plates of a rack and so that a stabilizing subsystem can stabilize a quality factor of a two-phase fluid to a predetermined quality factor before heat is removed from a two-phase fluid and it is cycled to such cold plates.
Abstract: Systems and methods for a datacenter cooling system are disclosed. In at least one embodiment, reconfigurable terminations are provided for fluid loops in a datacenter cooling system with individual ones of such reconfigurable terminations are to be configured in a first state to enable non-cooling fluid runs through individual ones of such fluid loops, taken individually and in combination, during commissioning of a datacenter cooling system, and are to be configured in a second state to enable cooling fluid runs to cool at least one cold plate after commissioning of a datacenter cooling system.
Abstract: Apparatuses, systems, and techniques to identify at least one physical characteristic of materials from computer simulations of manipulations of materials. In at least one embodiment, physical characteristics are determined by comparing measured statistics of observed manipulations to simulations of manipulations using a simulator trained with a likelihood-free inference engine.
Abstract: Disclosed are apparatuses, systems, and techniques that improve efficiency and decrease latency of processing of authorization requests by cloud-based access servers that evaluate access rights to access various cloud-based services. The techniques include but are not limited to generating and processing advanced authorization requests that anticipate future authorization requests that may be generated by cloud-based services. The techniques further include processing of frequently accessed policies and policy data dependencies and preemptive generation and processing of authorization requests that are replicated from existing authorization requests.
Abstract: Apparatuses, systems, and techniques to parallelize operations in one or more programs with data copies from global memory to shared memory in each of the one or more programs. In at least one embodiment, a program performs operations on shared data and then asynchronously copies shared data to shared memory, and continues performing additional operations in parallel while the shared data is copied to shared memory until an indicator provided by an application programming interface to facilitate parallel computing, such as CUDA, informs said program that shared data has been copied to shared memory.
Abstract: According to an aspect of an embodiment, operations may comprise obtaining a pose graph that comprises a plurality of nodes. The operations may also comprise dividing the pose graph into a plurality of pose subgraphs, each pose subgraph comprising one or more respective pose subgraph interior nodes and one or more respective pose subgraph boundary nodes. The operations may also comprise generating one or more boundary subgraphs based on the plurality of pose subgraphs, each of the one or more boundary subgraphs comprising one or more respective boundary subgraph boundary nodes and comprising one or more respective boundary subgraph interior nodes. The operations may also comprise obtaining an optimized pose graph by performing a pose graph optimization. The pose graph optimization may comprise performing a pose subgraph optimization of the plurality of pose subgraphs and performing a boundary subgraph optimization of the plurality of boundary subgraphs.
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
Type:
Grant
Filed:
August 2, 2021
Date of Patent:
October 15, 2024
Assignee:
NVIDIA Corporation
Inventors:
Ching-Yu Hung, Ravi P Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
Abstract: Disclosed are apparatuses, systems, and techniques that improve efficiency and decrease latency of processing of authorization requests by a cloud service. The techniques include obtaining, from an access server, a snapshot associated with processing an authorization request to evaluate an access to a resource of the cloud service and generating, using the snapshot, preemptive authorization requests by modifying the authorization request with a new user identity or a new resource identity. The techniques further include receiving, from the cloud service, a subsequent authorization request to evaluate an authorization of a user to access a particular resource of the cloud service, determining that the subsequent authorization request corresponds to one of preemptive authorization requests, and providing, to the cloud service, an authorization response for the user to access the resource, based on evaluation of this preemptive authorization request.
Abstract: Apparatuses, systems, and techniques to perform a K-nearest-neighbor query. In at least one embodiment, a set of bounding boxes corresponding to a set of primitives is generated that allows the query to be solved using light transport simulation acceleration features of a GPU.
Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.
Type:
Grant
Filed:
December 12, 2023
Date of Patent:
October 15, 2024
Assignee:
NVIDIA Corporation
Inventors:
William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
Abstract: According to an aspect of an embodiment, operations may comprise receiving sensor data from one or more vehicles, determining, by combining the received sensor data, a high definition map comprising a point cloud, and labeling one or more objects in the point cloud. The operations may also comprise generating training data by receiving a new image captured by one of the vehicles, receiving a pose of the vehicle when the new image was captured, determining an object having a label in the point cloud that is observable from the pose of the vehicle, determining a position of the object in the new image, and labeling the new image by assigning the label of the object to the new image, the labeled new image comprising the training data. The operations may also comprise training a deep learning model using the training data.