Patents by Inventor Zhiding Yu

Zhiding Yu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11960570
    Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.
    Type: Grant
    Filed: August 25, 2021
    Date of Patent: April 16, 2024
    Assignee: NVIDIA Corporation
    Inventors: Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz
  • Publication number: 20240104842
    Abstract: A method for generating, by an encoder-based model, a three-dimensional (3D) representation of a two-dimensional (2D) image is provided. The encoder-based model is trained to infer the 3D representation using a synthetic training data set generated by a pre-trained model. The pre-trained model is a 3D generative model that produces a 3D representation and a corresponding 2D rendering, which can be used to train a separate encoder-based model for downstream tasks like estimating a triplane representation, neural radiance field, mesh, depth map, 3D key points, or the like, given a single input image, using the pseudo ground truth 3D synthetic training data set. In a particular embodiment, the encoder-based model is trained to predict a triplane representation of the input image, which can then be rendered by a volume renderer according to pose information to generate an output image of the 3D scene from the corresponding viewpoint.
    Type: Application
    Filed: September 22, 2023
    Publication date: March 28, 2024
    Inventors: Koki Nagano, Alexander Trevithick, Chao Liu, Eric Ryan Chan, Sameh Khamis, Michael Stengel, Zhiding Yu
  • Publication number: 20240095534
    Abstract: Apparatuses, systems, and techniques to perform neural networks. In at least one embodiment, a most consistent output of one or more pre-trained neural networks is to be selected. In at least one embodiment, a most consistent output of one or more pre-trained neural networks is to be selected based, at least in part, on a plurality of variances of one or more inputs to the one or more neural networks.
    Type: Application
    Filed: September 7, 2023
    Publication date: March 21, 2024
    Inventors: Anima Anandkumar, Chaowei Xiao, Weili Nie, De-An Huang, Zhiding Yu, Manli Shu
  • Publication number: 20240087222
    Abstract: An artificial intelligence framework is described that incorporates a number of neural networks and a number of transformers for converting a two-dimensional image into three-dimensional semantic information. Neural networks convert one or more images into a set of image feature maps, depth information associated with the one or more images, and query proposals based on the depth information. A first transformer implements a cross-attention mechanism to process the set of image feature maps in accordance with the query proposals. The output of the first transformer is combined with a mask token to generate initial voxel features of the scene. A second transformer implements a self-attention mechanism to convert the initial voxel features into refined voxel features, which are up-sampled and processed by a lightweight neural network to generate the three-dimensional semantic information, which may be used by, e.g., an autonomous vehicle for various advanced driver assistance system (ADAS) functions.
    Type: Application
    Filed: November 20, 2023
    Publication date: March 14, 2024
    Inventors: Yiming Li, Zhiding Yu, Christopher B. Choy, Chaowei Xiao, Jose Manuel Alvarez Lopez, Sanja Fidler, Animashree Anandkumar
  • Publication number: 20240078423
    Abstract: A vision transformer (ViT) is a deep learning model that performs one or more vision processing tasks. ViTs may be modified to include a global task that clusters images with the same concept together to produce semantically consistent relational representations, as well as a local task that guides the ViT to discover object-centric semantic correspondence across images. A database of concepts and associated features may be created and used to train the global and local tasks, which may then enable the ViT to perform visual relational reasoning faster, without supervision, and outside of a synthetic domain.
    Type: Application
    Filed: August 22, 2022
    Publication date: March 7, 2024
    Inventors: Xiaojian Ma, Weili Nie, Zhiding Yu, Huaizu Jiang, Chaowei Xiao, Yuke Zhu, Anima Anandkumar
  • Publication number: 20240062534
    Abstract: A vision transformer (ViT) is a deep learning model that performs one or more vision processing tasks. ViTs may be modified to include a global task that clusters images with the same concept together to produce semantically consistent relational representations, as well as a local task that guides the ViT to discover object-centric semantic correspondence across images. A database of concepts and associated features may be created and used to train the global and local tasks, which may then enable the ViT to perform visual relational reasoning faster, without supervision, and outside of a synthetic domain.
    Type: Application
    Filed: August 22, 2022
    Publication date: February 22, 2024
    Inventors: Xiaojian Ma, Weili Nie, Zhiding Yu, Huaizu Jiang, Chaowei Xiao, Yuke Zhu, Anima Anandkumar
  • Patent number: 11899749
    Abstract: In various examples, training methods as described to generate a trained neural network that is robust to various environmental features. In an embodiment, training includes modifying images of a dataset and generating boundary boxes and/or other segmentation information for the modified images which is used to train a neural network.
    Type: Grant
    Filed: March 15, 2021
    Date of Patent: February 13, 2024
    Assignee: NVIDIA CORPORATION
    Inventors: Subhashree Radhakrishnan, Partha Sriram, Farzin Aghdasi, Seunghwan Cha, Zhiding Yu
  • Publication number: 20240037756
    Abstract: Apparatuses, systems, and techniques to track one or more objects in one or more frames of a video. In at least one embodiment, one or more objects in one or more frames of a video are tracked based on, for example, one or more sets of embeddings.
    Type: Application
    Filed: May 5, 2023
    Publication date: February 1, 2024
    Inventors: De-An Huang, Zhiding Yu, Anima Anandkumar
  • Publication number: 20240013504
    Abstract: One embodiment of a method for training a machine learning model includes receiving a training data set that includes at least one image, text referring to at least one object included in the at least one image, and at least one bounding box annotation associated with the at least one object, and performing, based on the training data set, one or more operations to generate a trained machine learning model to segment images based on text, where the one or more operations to generate the trained machine learning model include minimizing a loss function that comprises at least one of a multiple instance learning loss term or an energy loss term
    Type: Application
    Filed: October 31, 2022
    Publication date: January 11, 2024
    Inventors: Zhiding YU, Boyi LI, Chaowei XIAO, De-An HUANG, Weili NIE, Linxi FAN, Anima ANANDKUMAR
  • Publication number: 20230385687
    Abstract: Approaches for training data set size estimation for machine learning model systems and applications are described. Examples include a machine learning model training system that estimates target data requirements for training a machine learning model, given an approximate relationship between training data set size and model performance using one or more validation score estimation functions. To derive a validation score estimation function, a regression data set is generated from training data, and subsets of the regression data set are used to train the machine learning model. A validation score is computed for the subsets and used to compute regression function parameters to curve fit the selected regression function to the training data set. The validation score estimation function is then solved for and provides an output of an estimate of the number additional training samples needed for the validation score estimation function to meet or exceed a target validation score.
    Type: Application
    Filed: May 31, 2022
    Publication date: November 30, 2023
    Inventors: Rafid Reza Mahmood, James Robert Lucas, David Jesus Acuna Marrero, Daiqing Li, Jonah Philion, Jose Manuel Alvarez Lopez, Zhiding Yu, Sanja Fidler, Marc Law
  • Publication number: 20230376849
    Abstract: In various examples, estimating optimal training data set sizes for machine learning model systems and applications. Systems and methods are disclosed that estimate an amount of data to include in a training data set, where the training data set is then used to train one or more machine learning models to reach a target validation performance. To estimate the amount of training data, subsets of an initial training data set may be used to train the machine learning model(s) in order to determine estimates for the minimum amount of training data needed to train the machine learning model(s) to reach the target validation performance. The estimates may then be used to generate one or more functions, such as a cumulative density function and/or a probability density function, wherein the function(s) is then used to estimate the amount of training data needed to train the machine learning model(s).
    Type: Application
    Filed: May 16, 2023
    Publication date: November 23, 2023
    Inventors: Rafid Reza Mahmood, Marc Law, James Robert Lucas, Zhiding Yu, Jose Manuel Alvarez Lopez, Sanja Fidler
  • Patent number: 11790633
    Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.
    Type: Grant
    Filed: July 1, 2021
    Date of Patent: October 17, 2023
    Assignee: NVIDIA Corporation
    Inventors: Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz
  • Publication number: 20230290135
    Abstract: Apparatuses, systems, and techniques to generate a robust representation of an image. In at least one embodiment, input tokens of an input image are received, and an inference about the input image is generated based on a vision transformer (ViT) system comprising at least one self-attention module to perform token mixing and a channel self-attention module to perform channel processing.
    Type: Application
    Filed: March 9, 2023
    Publication date: September 14, 2023
    Inventors: Daquan Zhou, Zhiding Yu, Enze Xie, Anima Anandkumar, Chaowei Xiao, Jose Manuel Alvarez Lopez
  • Publication number: 20230252692
    Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.
    Type: Application
    Filed: September 1, 2022
    Publication date: August 10, 2023
    Inventors: Sifei Liu, Jiteng Mu, Shalini De Mello, Zhiding Yu, Jan Kautz
  • Publication number: 20230186428
    Abstract: Apparatuses, systems, and techniques for texture synthesis from small input textures in images using convolutional neural networks. In at least one embodiment, one or more convolutional layers are used in conjunction with one or more transposed convolution operations to generate a large textured output image from a small input textured image while preserving global features and texture, according to various novel techniques described herein.
    Type: Application
    Filed: February 6, 2023
    Publication date: June 15, 2023
    Inventors: Guilin Liu, Andrew Tao, Bryan Christopher Catanzaro, Ting-Chun Wang, Zhiding Yu, Shiqiu Liu, Fitsum Reda, Karan Sapra, Brandon Rowlett
  • Publication number: 20230074706
    Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.
    Type: Application
    Filed: August 25, 2021
    Publication date: March 9, 2023
    Inventors: Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz
  • Publication number: 20230015989
    Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.
    Type: Application
    Filed: July 1, 2021
    Publication date: January 19, 2023
    Inventors: Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz
  • Publication number: 20220292306
    Abstract: In various examples, training methods as described to generate a trained neural network that is robust to various environmental features. In an embodiment, training includes modifying images of a dataset and generating boundary boxes and/or other segmentation information for the modified images which is used to train a neural network.
    Type: Application
    Filed: March 15, 2021
    Publication date: September 15, 2022
    Inventors: Subhashree Radhakrishnan, Partha Sriram, Farzin Aghdasi, Seunghwan Cha, Zhiding Yu
  • Publication number: 20220261593
    Abstract: Apparatuses, systems, and techniques to train one or more neural networks. In at least one embodiment, one or more neural networks are trained to perform segmentation tasks based at least in part on training data comprising bounding box annotations.
    Type: Application
    Filed: February 16, 2021
    Publication date: August 18, 2022
    Inventors: Zhiding Yu, Shiyi Lan, Chris Choy, Subhashree Radhakrishnan, Guilin Liu, Yuke Zhu, Anima Anandkumar
  • Patent number: 11367268
    Abstract: Object re-identification refers to a process by which images that contain an object of interest are retrieved from a set of images captured using disparate cameras or in disparate environments. Object re-identification has many useful applications, particularly as it is applied to people (e.g. person tracking). Current re-identification processes rely on convolutional neural networks (CNNs) that learn re-identification for a particular object class from labeled training data specific to a certain domain (e.g. environment), but that do not apply well in other domains. The present disclosure provides cross-domain disentanglement of id-related and id-unrelated factors. In particular, the disentanglement is performed using a labeled image set and an unlabeled image set, respectively captured from different domains but for a same object class.
    Type: Grant
    Filed: August 20, 2020
    Date of Patent: June 21, 2022
    Assignee: NVIDIA CORPORATION
    Inventors: Xiaodong Yang, Yang Zou, Zhiding Yu, Jan Kautz