Patents by Inventor Anima Anandkumar

Anima Anandkumar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

3D OBJECT DETECTION USING TEMPORAL INPUTS

Publication number: 20250218160

Abstract: Apparatuses, systems, and techniques of using one or more machine learning processes (e.g., neural network(s)) to detect objects from a plurality of image frames. In at least one embodiment, a plurality of image frames are fused into a feature map using one or more neural networks. In at least one embodiment, a plurality of image frames are processed using one or more neural networks to detect objects in a 3D space.

Type: Application

Filed: December 27, 2023

Publication date: July 3, 2025

Inventors: Renhao Wang, Zhiding Yu, Shiyi Lan, Ke Chen, Anima Anandkumar, Jose Manuel Alvarez Lopez
TRAJECTORY STITCHING FOR ACCELERATING DIFFUSION MODELS

Publication number: 20250103968

Abstract: Diffusion models are machine learning algorithms that are uniquely trained to generate high-quality data from an input lower-quality data. Diffusion probabilistic models use discrete-time random processes or continuous-time stochastic differential equations (SDEs) that learn to gradually remove the noise added to the data points. With diffusion probabilistic models, high quality output currently requires sampling from a large diffusion probabilistic model which corners at a high computational cost. The present disclosure stitches together the trajectory of two or more inferior diffusion probabilistic models during a denoising process, which can in turn accelerate the denoising process by avoiding use of only a single large diffusion probabilistic model.

Type: Application

Filed: August 30, 2024

Publication date: March 27, 2025

Inventors: Zizheng Pan, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Anima Anandkumar
FULLY ATTENTIONAL NETWORKS WITH SELF-EMERGING TOKEN LABELING

Publication number: 20250078489

Abstract: One embodiment of the present invention sets forth a technique for training an image classifier. The technique includes training a first vision transformer model to generate patch labels for corresponding images patches of images, converting the patch labels to token labels, and training a second vision transformer model to classify images based on the token labels.

Type: Application

Filed: December 15, 2023

Publication date: March 6, 2025

Inventors: Bingyin ZHAO, Jose Manuel ALVAREZ LOPEZ, Anima ANANDKUMAR, Shi Yi LAN, Zhiding YU
NEURAL NETWORK-BASED ENVIRONMENT REPRESENTATION

Publication number: 20250020481

Abstract: Apparatuses, systems, and techniques are presented to determination about objects in an environment. In at least one embodiment, a neural network can be used to determine one or more positions of one or more objects within a three-dimensional (3D) environment and to generate a segmented map of the 3D environment based, at least in part, on one or more two dimensional (2D) images of the one or more objects.

Type: Application

Filed: April 7, 2022

Publication date: January 16, 2025

Inventors: Enze Xie, Zhiding Yu, Jonah Philion, Anima Anandkumar, Sanja Fidler, Jose Manuel Alvarez Lopez
CONDITIONAL DIFFUSION MODEL FOR DATA-TO-DATA TRANSLATION

Publication number: 20240273682

Abstract: Image restoration generally involves recovering a target clean image from a given image having noise, blurring, or other degraded features. Current image restoration solutions typically include a diffusion model that is trained for image restoration by a forward process that progressively diffuses data to noise, and then by learning in a reverse process to generate the data from the noise. However, the forward process relies on Gaussian noise to diffuse the original data, which has little or no structural information corresponding to the original data versus learning from the degraded image itself which is much more structurally informative compared to the random Gaussian noise. Similar problems also exist for other data-to-data translation tasks.

Type: Application

Filed: February 2, 2024

Publication date: August 15, 2024

Inventors: Weili Nie, Guan-Horng Liu, Arash Vahdat, De-An Huang, Anima Anandkumar
LONG-RANGE 3D OBJECT DETECTION USING 2D BOUNDING BOXES

Publication number: 20240249538

Abstract: 3D object detection is a computer vision task that generally detects (e.g. classifies and localizes) objects in 3D space from the 2D images or videos that capture the objects. Current techniques used for 3D object detection rely on machine learning processes that learn to detect 3D objects from existing images annotated with high-quality 3D information including depth information generally obtained using lidar technology. However, due to lidar's limited measurable range, current machine learning solutions to 3D object detection do not support detection of 3D objects beyond the lidar range, which is needed for numerous applications, including autonomous driving applications where existing close or midrange 3D object detection does not always meet the safety-critical requirement of autonomous driving. The present disclosure provides for 3D object detection using a technique that supports long-range detection (i.e. detection beyond the lidar range).

Type: Application

Filed: July 18, 2023

Publication date: July 25, 2024

Inventors: Zetong Yang, Zhiding Yu, Ren Hao Wang, Chris Choy, Anima Anandkumar, Jose M. Alvarez Lopez
POINT-LEVEL SUPERVISION FOR VIDEO INSTANCE SEGMENTATION

Publication number: 20240221166

Abstract: Video instance segmentation is a computer vision task that aims to detect, segment, and track objects continuously in videos. It can be used in numerous real-world applications, such as video editing, three-dimensional (3D) reconstruction, 3D navigation (e.g. for autonomous driving and/or robotics), and view point estimation. However, current machine learning-based processes employed for video instance segmentation are lacking, particularly because the densely annotated videos needed for supervised training of high-quality models are not readily available and are not easily generated. To address the issues in the prior art, the present disclosure provides point-level supervision for video instance segmentation in a manner that allows the resulting machine learning model to handle any object category.

Type: Application

Filed: December 22, 2023

Publication date: July 4, 2024

Inventors: Zhiding Yu, Shuaiyi Huang, De-An Huang, Shiyi Lan, Subhashree Radhakrishnan, Jose M. Alvarez Lopez, Anima Anandkumar
Adversarial scenarios for safety testing of autonomous vehicles

Patent number: 11977386

Abstract: Techniques to generate driving scenarios for autonomous vehicles characterize a path in a driving scenario according to metrics such as narrowness and effort. Nodes of the path are assigned a time for action to avoid collision from the node. The generated scenarios may be simulated in a computer.

Type: Grant

Filed: November 18, 2022

Date of Patent: May 7, 2024

Assignee: NVIDIA CORP.

Inventors: Siva Kumar Sastry Hari, Iuri Frosio, Zahra Ghodsi, Anima Anandkumar, Timothy Tsai, Stephen W. Keckler, Alejandro Troccoli
NEURAL NETWORK-BASED PERTURBATION REMOVAL

Publication number: 20240104698

Abstract: Apparatuses, systems, and techniques are presented to remove unintended variations introduced into data. In at least one embodiment, a first image of an object can be generated based, at least in part, upon adding noise to, and removing the noise from, a second image of the object.

Type: Application

Filed: April 12, 2022

Publication date: March 28, 2024

Inventors: Weili Nie, Yujia Huang, Chaowei Xiao, Arash Vahdat, Anima Anandkumar
Data selection based on uncertainty quantification

Patent number: 11941899

Abstract: Apparatuses, systems, and techniques generate poses of an object based on image data of the object obtained from a first viewpoint of the object and a second viewpoint of the object. The poses can be evaluated to determine a portion of the image data usable by an estimator to generate a pose of the object.

Type: Grant

Filed: May 26, 2021

Date of Patent: March 26, 2024

Assignee: NVIDIA Corporation

Inventors: Jonathan Tremblay, Fabio Tozeto Ramos, Yuke Zhu, Anima Anandkumar, Guanya Shi
NEURAL NETWORK-BASED LANGUAGE RESTRICTION

Publication number: 20240095447

Abstract: Apparatuses, systems, and techniques are presented to identify and prevent generation of restricted content. In at least one embodiment, one or more neural networks are used to identify restricted content based only on the restricted content.

Type: Application

Filed: June 22, 2022

Publication date: March 21, 2024

Inventors: Wei Ping, Boxin Wang, Chaowei Xiao, Mohammad Shoeybi, Mostofa Patwary, Anima Anandkumar, Bryan Catanzaro
NEURAL NETWORK PROMPT TUNING

Publication number: 20240095534

Abstract: Apparatuses, systems, and techniques to perform neural networks. In at least one embodiment, a most consistent output of one or more pre-trained neural networks is to be selected. In at least one embodiment, a most consistent output of one or more pre-trained neural networks is to be selected based, at least in part, on a plurality of variances of one or more inputs to the one or more neural networks.

Type: Application

Filed: September 7, 2023

Publication date: March 21, 2024

Inventors: Anima Anandkumar, Chaowei Xiao, Weili Nie, De-An Huang, Zhiding Yu, Manli Shu
Data selection based on uncertainty quantification

Patent number: 11931909

Abstract: Apparatuses, systems, and techniques generate poses of an object based on data of the object observed from a first viewpoint and a second viewpoint. The poses can be evaluated to determine a portion of the data usable by an estimator to generate a pose of the object.

Type: Grant

Filed: May 26, 2021

Date of Patent: March 19, 2024

Assignee: NVIDIA Corporation

Inventors: Jonathan Tremblay, Fabio Tozeto Ramos, Yuke Zhu, Anima Anandkumar, Guanya Shi
PERFORMING VISUAL RELATIONAL REASONING

Publication number: 20240078423

Abstract: A vision transformer (ViT) is a deep learning model that performs one or more vision processing tasks. ViTs may be modified to include a global task that clusters images with the same concept together to produce semantically consistent relational representations, as well as a local task that guides the ViT to discover object-centric semantic correspondence across images. A database of concepts and associated features may be created and used to train the global and local tasks, which may then enable the ViT to perform visual relational reasoning faster, without supervision, and outside of a synthetic domain.

Type: Application

Filed: August 22, 2022

Publication date: March 7, 2024

Inventors: Xiaojian Ma, Weili Nie, Zhiding Yu, Huaizu Jiang, Chaowei Xiao, Yuke Zhu, Anima Anandkumar
PERFORMING VISUAL RELATIONAL REASONING

Publication number: 20240062534

Abstract: A vision transformer (ViT) is a deep learning model that performs one or more vision processing tasks. ViTs may be modified to include a global task that clusters images with the same concept together to produce semantically consistent relational representations, as well as a local task that guides the ViT to discover object-centric semantic correspondence across images. A database of concepts and associated features may be created and used to train the global and local tasks, which may then enable the ViT to perform visual relational reasoning faster, without supervision, and outside of a synthetic domain.

Type: Application

Filed: August 22, 2022

Publication date: February 22, 2024

Inventors: Xiaojian Ma, Weili Nie, Zhiding Yu, Huaizu Jiang, Chaowei Xiao, Yuke Zhu, Anima Anandkumar
VIDEO INSTANCE SEGMENTATION

Publication number: 20240037756

Abstract: Apparatuses, systems, and techniques to track one or more objects in one or more frames of a video. In at least one embodiment, one or more objects in one or more frames of a video are tracked based on, for example, one or more sets of embeddings.

Type: Application

Filed: May 5, 2023

Publication date: February 1, 2024

Inventors: De-An Huang, Zhiding Yu, Anima Anandkumar
TRAJECTORY GENERATION

Publication number: 20240017745

Abstract: Apparatuses, systems, and techniques to generate trajectory data for moving objects. In at least one embodiment, adversarial trajectories are generated to evaluate a trajectory prediction model and are based, at least in part, on a differentiable dynamic model.

Type: Application

Filed: July 14, 2022

Publication date: January 18, 2024

Inventors: Yulong Cao, Chaowei Xiao, Danfei Xu, Anima Anandkumar, Marco Pavone
TECHNIQUES FOR WEAKLY SUPERVISED REFERRING IMAGE SEGMENTATION

Publication number: 20240013504

Abstract: One embodiment of a method for training a machine learning model includes receiving a training data set that includes at least one image, text referring to at least one object included in the at least one image, and at least one bounding box annotation associated with the at least one object, and performing, based on the training data set, one or more operations to generate a trained machine learning model to segment images based on text, where the one or more operations to generate the trained machine learning model include minimizing a loss function that comprises at least one of a multiple instance learning loss term or an energy loss term

Type: Application

Filed: October 31, 2022

Publication date: January 11, 2024

Inventors: Zhiding YU, Boyi LI, Chaowei XIAO, De-An HUANG, Weili NIE, Linxi FAN, Anima ANANDKUMAR
Image processing using coupled segmentation and edge learning

Patent number: 11790633

Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.

Type: Grant

Filed: July 1, 2021

Date of Patent: October 17, 2023

Assignee: NVIDIA Corporation

Inventors: Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz
ROBUST VISION TRANSFORMERS

Publication number: 20230290135

Abstract: Apparatuses, systems, and techniques to generate a robust representation of an image. In at least one embodiment, input tokens of an input image are received, and an inference about the input image is generated based on a vision transformer (ViT) system comprising at least one self-attention module to perform token mixing and a channel self-attention module to perform channel processing.

Type: Application

Filed: March 9, 2023

Publication date: September 14, 2023

Inventors: Daquan Zhou, Zhiding Yu, Enze Xie, Anima Anandkumar, Chaowei Xiao, Jose Manuel Alvarez Lopez

1 2 next