Patents by Inventor Zhiding Yu

Zhiding Yu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

LANGUAGE INSTRUCTED TEMPORAL LOCALIZATION IN VIDEOS

Publication number: 20250191369

Abstract: Embodiments of the present disclosure relate to language instructed temporal localization in videos, and provide multimodal large language models (LLMs) for performing language instructed temporal localization in video, as well as methods for training and implementing such models. In contrast to conventional systems, models according to embodiments of the present disclosure are designed to answer “when?” questions, while simultaneously improving other relevant capabilities of multimodal LLMs.

Type: Application

Filed: July 29, 2024

Publication date: June 12, 2025

Inventors: De-An Huang, Shijia Liao, Subhashree Radhakrishnan, Hongxu Yin, Pavlo Molchanov, Zhiding Yu, Jan Kautz
TECHNIQUES FOR CONTROLLING VEHICLES WITHOUT OVER-RELIANCE ON VEHICLE STATUS INFORMATION

Publication number: 20250171051

Abstract: One embodiment of a method for controlling a vehicle includes processing sensor data associated with a vehicle using a first trained machine learning model to generate one or more bird's eye view (BEV) features, performing one or more cross-attention operations based on the one or more BEV features and a learned embedding to generate an updated embedding, processing the updated embedding using a second trained machine learning model to generate one or more trajectories, and performing one or more operations to control the vehicle based on the one or more trajectories.

Type: Application

Filed: October 28, 2024

Publication date: May 29, 2025

Inventors: Zhiqi LI, Zhiding YU, Shiyi LAN, Jan KAUTZ, Jose Manuel ALVAREZ LOPEZ
TRAJECTORY STITCHING FOR ACCELERATING DIFFUSION MODELS

Publication number: 20250103968

Abstract: Diffusion models are machine learning algorithms that are uniquely trained to generate high-quality data from an input lower-quality data. Diffusion probabilistic models use discrete-time random processes or continuous-time stochastic differential equations (SDEs) that learn to gradually remove the noise added to the data points. With diffusion probabilistic models, high quality output currently requires sampling from a large diffusion probabilistic model which corners at a high computational cost. The present disclosure stitches together the trajectory of two or more inferior diffusion probabilistic models during a denoising process, which can in turn accelerate the denoising process by avoiding use of only a single large diffusion probabilistic model.

Type: Application

Filed: August 30, 2024

Publication date: March 27, 2025

Inventors: Zizheng Pan, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Anima Anandkumar
FULLY ATTENTIONAL NETWORKS WITH SELF-EMERGING TOKEN LABELING

Publication number: 20250078489

Abstract: One embodiment of the present invention sets forth a technique for training an image classifier. The technique includes training a first vision transformer model to generate patch labels for corresponding images patches of images, converting the patch labels to token labels, and training a second vision transformer model to classify images based on the token labels.

Type: Application

Filed: December 15, 2023

Publication date: March 6, 2025

Inventors: Bingyin ZHAO, Jose Manuel ALVAREZ LOPEZ, Anima ANANDKUMAR, Shi Yi LAN, Zhiding YU
AUTO-LABELING SYSTEMS AND APPLICATIONS FOR OPEN-SET AND OUT-OF-DOMAIN SEGMENTATION

Publication number: 20250029409

Abstract: Approaches are disclosed herein for an automatic segmentation labeling system that identifies objects for potential open-class categories and generates segmentation masks for objects. The disclosed system may use a training pipeline that trains two segmentation models. The training pipeline may take, as input, a set of images with bounding boxes and class labels. The set of images may be fed into a first segmentation network with the bounding boxes used as ground truth for weak supervision. The first segmentation network may be trained to generate pseudo segmentation masks. In a second stage, the trained first segmentation network is used to generate pseudo masks for a set of input images. The generated pseudo masks are provided as input, along with the corresponding images, to a second segmentation network to be used as a type of ground truth data for training the second segmentation network to generate high-quality segmentation masks.

Type: Application

Filed: July 18, 2023

Publication date: January 23, 2025

Inventors: Subhashree Radhakrishnan, Ramanathan Arunachahalam, Farzin Aghdasi, Zhiding Yu, Shiyi Lan
NEURAL NETWORK-BASED ENVIRONMENT REPRESENTATION

Publication number: 20250020481

Abstract: Apparatuses, systems, and techniques are presented to determination about objects in an environment. In at least one embodiment, a neural network can be used to determine one or more positions of one or more objects within a three-dimensional (3D) environment and to generate a segmented map of the 3D environment based, at least in part, on one or more two dimensional (2D) images of the one or more objects.

Type: Application

Filed: April 7, 2022

Publication date: January 16, 2025

Inventors: Enze Xie, Zhiding Yu, Jonah Philion, Anima Anandkumar, Sanja Fidler, Jose Manuel Alvarez Lopez
OCCUPANCY PREDICTION USING FORWARD-BACKWARD VIEW TRANSFORMATION

Publication number: 20240416963

Abstract: Apparatuses, systems, and techniques of using one or more machine learning processes (e.g., neural network(s)) to predict occupancy using an image input. In at least one embodiment, image data is processed using a neural network to predict occupancy in a 3D voxel space. In at least one embodiment, image data is processed using a neural network to detect objects in a 3D space.

Type: Application

Filed: October 12, 2023

Publication date: December 19, 2024

Inventors: Zhiqi Li, Zhiding Yu, David Austin, Shiyi Lan, Jan Kautz, Jose Manuel Alvarez Lopez
Learning dense correspondences for images

Patent number: 12169882

Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.

Type: Grant

Filed: September 1, 2022

Date of Patent: December 17, 2024

Assignee: NVIDIA Corporation

Inventors: Sifei Liu, Jiteng Mu, Shalini De Mello, Zhiding Yu, Jan Kautz
OBJECT SEGMENTATION USING MACHINE LEARNING FOR AUTONOMOUS SYSTEMS AND APPLICATIONS

Publication number: 20240386586

Abstract: In various examples, systems and methods are disclosed relating to using neural networks for object detection or instance/semantic segmentation for, without limitation, autonomous or semi-autonomous systems and applications. In some implementations, one or more neural networks receive an image (or other sensor data representation) and a bounding shape corresponding to at least a portion of an object in the image. The bounding shape can include or be labeled with an identifier, class, and/or category of the object. The neural network can determine a mask for the object based at least on processing the image and the bounding shape. The mask can be used for various applications, such as annotating masks for vehicle or machine perception and navigation processes.

Type: Application

Filed: May 19, 2023

Publication date: November 21, 2024

Applicant: NVIDIA Corporation

Inventors: Alperen DEGIRMENCI, Jiwoong CHOI, Zhiding YU, Ke CHEN, Shubhranshu SINGH, Yashar ASGARIEH, Subhashree RADHAKRISHNAN, James SKINNER, Jose Manuel ALVAREZ LOPEZ
BI-DIRECTIONAL FEATURE PROJECTION FOR 3D PERCEPTION SYSTEMS AND APPLICATIONS

Publication number: 20240378799

Abstract: In various examples, bi-directional projection techniques may be used to generate enhanced Bird's-Eye View (BEV) representations. For example, a system(s) may generate one or more BEV features associated with a BEV of an environment using a projection process that associates 2D image features to one or more first locations of a 3D space. At least partially using the BEV feature(s), the system(s) may determine one or more second locations of the 3D space that correspond to one or more regions of interest in the environment. The system(s) may then generate one or more additional BEV features corresponding to the second location(s) using a different projection process that associates the second location(s) from the 3D space to at least a portion of the 2D image features. The system(s) may then generate an updated BEV of the environment based at least on the BEV feature(s) and/or the additional BEV feature(s).

Type: Application

Filed: April 22, 2024

Publication date: November 14, 2024

Inventors: Zhiqi Li, Zhiding Yu, Animashree Anandkumar, Jose Manuel Alvarez Lopez
TEMPORAL-BASED PERCEPTION FOR AUTONOMOUS SYSTEMS AND APPLICATIONS

Publication number: 20240312219

Abstract: In various examples, temporal-based perception for autonomous or semi-autonomous systems and applications is described. Systems and methods are disclosed that use a machine learning model (MLM) to intrinsically fuse feature maps associated with different sensors and different instances in time. To generate a feature map, image data generated using image sensors (e.g., cameras) located around a vehicle are processed using a MLM that is trained to generate the feature map. The MLM may then fuse the feature maps in order to generate a final feature map associated with a current instance in time. The feature maps associated with the previous instances in time may be preprocessed using one or more layers of the MLM, where the one or more layers are associated with performing temporal transformation before the fusion is performed. The MLM may then use the final feature map to generate one or more outputs.

Type: Application

Filed: March 16, 2023

Publication date: September 19, 2024

Inventors: Jiwoong Choi, Jose Manuel Alvarez Lopez, Shiyi Lan, Yashar Asgarieh, Zhiding Yu
VISION-LANGUAGE MODEL WITH AN ENSEMBLE OF EXPERTS

Publication number: 20240265690

Abstract: A vision-language model learns skills and domain knowledge via distinct and separate task-specific neural networks, referred to as experts. Each expert is independently optimized for a specific task, facilitating the use of domain-specific data and architectures that are not feasible with a single large neural network trained for multiple tasks. The vision-language model implemented as an ensemble of pre-trained experts and is more efficiently trained compared with the single large neural network. During training, the vision-language model integrates specialized skills and domain knowledge, rather than trying to simultaneously learn multiple tasks, resulting in effective multi-modal learning.

Type: Application

Filed: December 19, 2023

Publication date: August 8, 2024

Inventors: Animashree Anandkumar, Linxi Fan, Zhiding Yu, Chaowei Xiao, Shikun Liu
LONG-RANGE 3D OBJECT DETECTION USING 2D BOUNDING BOXES

Publication number: 20240249538

Abstract: 3D object detection is a computer vision task that generally detects (e.g. classifies and localizes) objects in 3D space from the 2D images or videos that capture the objects. Current techniques used for 3D object detection rely on machine learning processes that learn to detect 3D objects from existing images annotated with high-quality 3D information including depth information generally obtained using lidar technology. However, due to lidar's limited measurable range, current machine learning solutions to 3D object detection do not support detection of 3D objects beyond the lidar range, which is needed for numerous applications, including autonomous driving applications where existing close or midrange 3D object detection does not always meet the safety-critical requirement of autonomous driving. The present disclosure provides for 3D object detection using a technique that supports long-range detection (i.e. detection beyond the lidar range).

Type: Application

Filed: July 18, 2023

Publication date: July 25, 2024

Inventors: Zetong Yang, Zhiding Yu, Ren Hao Wang, Chris Choy, Anima Anandkumar, Jose M. Alvarez Lopez
POINT-LEVEL SUPERVISION FOR VIDEO INSTANCE SEGMENTATION

Publication number: 20240221166

Abstract: Video instance segmentation is a computer vision task that aims to detect, segment, and track objects continuously in videos. It can be used in numerous real-world applications, such as video editing, three-dimensional (3D) reconstruction, 3D navigation (e.g. for autonomous driving and/or robotics), and view point estimation. However, current machine learning-based processes employed for video instance segmentation are lacking, particularly because the densely annotated videos needed for supervised training of high-quality models are not readily available and are not easily generated. To address the issues in the prior art, the present disclosure provides point-level supervision for video instance segmentation in a manner that allows the resulting machine learning model to handle any object category.

Type: Application

Filed: December 22, 2023

Publication date: July 4, 2024

Inventors: Zhiding Yu, Shuaiyi Huang, De-An Huang, Shiyi Lan, Subhashree Radhakrishnan, Jose M. Alvarez Lopez, Anima Anandkumar
SPECIFIC DNA FRAGMENTS FOR SEX IDENTIFICATION OF MASTACEMBELUS ARMATUS, GENETIC sex MARKER PRIMERS AND APPLICATIONS THEREOF

Publication number: 20240191296

Abstract: A specific DNA fragment, genetic sex marker primer, and application for identifying the sex of Mastacembelus armatus. The specific DNA fragment includes upstream primer of Contig-1 with a nucleotide sequence shown as SEQ ID NO.1, downstream primer of Contig-1 with a nucleotide sequence shown as SEQ ID NO.2, upstream primer of Contig-2 with a nucleotide sequence shown as SEQ ID NO.3, and downstream primer of Contig-2 with a nucleotide sequence shown as SEQ ID NO.4. The genetic sex marker primer for identifying the sex of Mastacembelus armatus is primer pair 1, including upstream primer of Contig-1 and downstream primer of Contig-1. This specific DNA fragment and genetic sex marker primer can be applied to the all-male breeding of Mastacembelus armatus. This application has a wide range of applications, a short detection time, and accurate results.

Type: Application

Filed: December 13, 2023

Publication date: June 13, 2024

Inventors: Hu Shu, Chong Han, Weijian Qin, Mingxiang Cui, Jinlin Yang, Baoyue Lu, Zhide Yu, Jiaxing Cui
CLASS AGNOSTIC OBJECT MASK GENERATION

Publication number: 20240169545

Abstract: Class agnostic object mask generation uses a vision transformer-based auto-labeling framework requiring only images and object bounding boxes to generate object (segmentation) masks. The generated object masks, images, and object labels may then be used to train instance segmentation models or other neural networks to localize and segment objects with pixel-level accuracy. The generated object masks may supplement or replace conventional human generated annotations. The human generated annotations may be misaligned compared with the object boundaries, resulting in poor quality labeled segmentation masks. In contrast with conventional techniques, the generated object masks are class agnostic and are automatically generated based only on a bounding box image region without relying on either labels or semantic information.

Type: Application

Filed: July 20, 2023

Publication date: May 23, 2024

Inventors: Shiyi Lan, Zhiding Yu, Subhashree Radhakrishnan, Jose Manuel Alvarez Lopez, Animashree Anandkumar
Learning contrastive representation for semantic correspondence

Patent number: 11960570

Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.

Type: Grant

Filed: August 25, 2021

Date of Patent: April 16, 2024

Assignee: NVIDIA Corporation

Inventors: Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz
ENCODER-BASED APPROACH FOR INFERRING A THREE-DIMENSIONAL REPRESENTATION FROM AN IMAGE

Publication number: 20240104842

Abstract: A method for generating, by an encoder-based model, a three-dimensional (3D) representation of a two-dimensional (2D) image is provided. The encoder-based model is trained to infer the 3D representation using a synthetic training data set generated by a pre-trained model. The pre-trained model is a 3D generative model that produces a 3D representation and a corresponding 2D rendering, which can be used to train a separate encoder-based model for downstream tasks like estimating a triplane representation, neural radiance field, mesh, depth map, 3D key points, or the like, given a single input image, using the pseudo ground truth 3D synthetic training data set. In a particular embodiment, the encoder-based model is trained to predict a triplane representation of the input image, which can then be rendered by a volume renderer according to pose information to generate an output image of the 3D scene from the corresponding viewpoint.

Type: Application

Filed: September 22, 2023

Publication date: March 28, 2024

Inventors: Koki Nagano, Alexander Trevithick, Chao Liu, Eric Ryan Chan, Sameh Khamis, Michael Stengel, Zhiding Yu
NEURAL NETWORK PROMPT TUNING

Publication number: 20240095534

Abstract: Apparatuses, systems, and techniques to perform neural networks. In at least one embodiment, a most consistent output of one or more pre-trained neural networks is to be selected. In at least one embodiment, a most consistent output of one or more pre-trained neural networks is to be selected based, at least in part, on a plurality of variances of one or more inputs to the one or more neural networks.

Type: Application

Filed: September 7, 2023

Publication date: March 21, 2024

Inventors: Anima Anandkumar, Chaowei Xiao, Weili Nie, De-An Huang, Zhiding Yu, Manli Shu
SPARSE VOXEL TRANSFORMER FOR CAMERA-BASED 3D SEMANTIC SCENE COMPLETION

Publication number: 20240087222

Abstract: An artificial intelligence framework is described that incorporates a number of neural networks and a number of transformers for converting a two-dimensional image into three-dimensional semantic information. Neural networks convert one or more images into a set of image feature maps, depth information associated with the one or more images, and query proposals based on the depth information. A first transformer implements a cross-attention mechanism to process the set of image feature maps in accordance with the query proposals. The output of the first transformer is combined with a mask token to generate initial voxel features of the scene. A second transformer implements a self-attention mechanism to convert the initial voxel features into refined voxel features, which are up-sampled and processed by a lightweight neural network to generate the three-dimensional semantic information, which may be used by, e.g., an autonomous vehicle for various advanced driver assistance system (ADAS) functions.

Type: Application

Filed: November 20, 2023

Publication date: March 14, 2024

Inventors: Yiming Li, Zhiding Yu, Christopher B. Choy, Chaowei Xiao, Jose Manuel Alvarez Lopez, Sanja Fidler, Animashree Anandkumar

1 2 3 next