Patents by Inventor Jan Kautz

Jan Kautz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11960570
    Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.
    Type: Grant
    Filed: August 25, 2021
    Date of Patent: April 16, 2024
    Assignee: NVIDIA Corporation
    Inventors: Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz
  • Publication number: 20240119361
    Abstract: One embodiment of a method for training a first machine learning model having a different architecture than a second machine learning model includes receiving a first data set, performing one or more operations to generate a second data set based on the first data set and the second machine learning model, wherein the second data set includes at least one feature associated with one or more tasks that the second machine learning model was previously trained to perform, and performing one or more operations to train the first machine learning model based on the second data set and the second machine learning model.
    Type: Application
    Filed: July 6, 2023
    Publication date: April 11, 2024
    Inventors: Hongxu YIN, Wonmin BYEON, Jan KAUTZ, Divyam MADAAN, Pavlo MOLCHANOV
  • Patent number: 11948078
    Abstract: The disclosure provides a framework or system for learning visual representation using a large set of image/text pairs. The disclosure provides, for example, a method of visual representation learning, a joint representation learning system, and an artificial intelligence (AI) system that employs one or more of the trained models from the method or system. The AI system can be used, for example, in autonomous or semi-autonomous vehicles. In one example, the method of visual representation learning includes: (1) receiving a set of image embeddings from an image representation model and a set of text embeddings from a text representation model, and (2) training, employing mutual information, a critic function by learning relationships between the set of image embeddings and the set of text embeddings.
    Type: Grant
    Filed: August 21, 2020
    Date of Patent: April 2, 2024
    Assignee: NVIDIA Corporation
    Inventors: Arash Vahdat, Tanmay Gupta, Xiaodong Yang, Jan Kautz
  • Patent number: 11941719
    Abstract: Various embodiments enable a robot, or other autonomous or semi-autonomous device or system, to receive data involving the performance of a task in the physical world. The data can be provided as input to a perception network to infer a set of percepts about the task, which can correspond to relationships between objects observed during the performance. The percepts can be provided as input to a plan generation network, which can infer a set of actions as part of a plan. Each action can correspond to one of the observed relationships. The plan can be reviewed and any corrections made, either manually or through another demonstration of the task. Once the plan is verified as correct, the plan (and any related data) can be provided as input to an execution network that can infer instructions to cause the robot, and/or another robot, to perform the task.
    Type: Grant
    Filed: January 23, 2019
    Date of Patent: March 26, 2024
    Assignee: NVIDIA Corporation
    Inventors: Jonathan Tremblay, Stan Birchfield, Stephen Tyree, Thang To, Jan Kautz, Artem Molchanov
  • Publication number: 20240096115
    Abstract: Landmark detection refers to the detection of landmarks within an image or a video, and is used in many computer vision tasks such emotion recognition, face identity verification, hand tracking, gesture recognition, and eye gaze tracking. Current landmark detection methods rely on a cascaded computation through cascaded networks or an ensemble of multiple models, which starts with an initial guess of the landmarks and iteratively produces corrected landmarks which match the input more finely. However, the iterations required by current methods typically increase the training memory cost linearly, and do not have an obvious stopping criteria. Moreover, these methods tend to exhibit jitter in landmark detection results for video. The present disclosure improves current landmark detection methods by providing landmark detection using an iterative neural network.
    Type: Application
    Filed: September 7, 2023
    Publication date: March 21, 2024
    Inventors: Pavlo Molchanov, Jan Kautz, Arash Vahdat, Hongxu Yin, Paul Micaelli
  • Publication number: 20240070874
    Abstract: Estimating motion of a human or other object in video is a common computer task with applications in robotics, sports, mixed reality, etc. However, motion estimation becomes difficult when the camera capturing the video is moving, because the observed object and camera motions are entangled. The present disclosure provides for joint estimation of the motion of a camera and the motion of articulated objects captured in video by the camera.
    Type: Application
    Filed: April 17, 2023
    Publication date: February 29, 2024
    Inventors: Muhammed Kocabas, Ye Yuan, Umar Iqbal, Pavlo Molchanov, Jan Kautz
  • Publication number: 20240070987
    Abstract: Transferring pose to three-dimensional characters is a common computer graphics task that typically involves transferring the pose of a reference avatar to a (stylized) three-dimensional character. Since three-dimensional characters are created by professional artists through imagination and exaggeration, and therefore, unlike human or animal avatars, have distinct shape and features, matching the pose of a three-dimensional character to that of a reference avatar generally requires manually creating shape information for the three-dimensional character that is required for pose transfer. The present disclosure provides for the automated transfer of a reference pose to a three-dimensional character, based specifically on a learned shape code for the three-dimensional character.
    Type: Application
    Filed: February 15, 2023
    Publication date: February 29, 2024
    Inventors: Xueting Li, Sifei Liu, Shalini De Mello, Orazio Gallo, Jiashun Wang, Jan Kautz
  • Patent number: 11907846
    Abstract: One embodiment of the present invention sets forth a technique for performing spatial propagation. The technique includes generating a first directed acyclic graph (DAG) by connecting spatially adjacent points included in a set of unstructured points via directed edges along a first direction. The technique also includes applying a first set of neural network layers to one or more images associated with the set of unstructured points to generate (i) a set of features for the set of unstructured points and (ii) a set of pairwise affinities between the spatially adjacent points connected by the directed edges. The technique further includes generating a set of labels for the set of unstructured points by propagating the set of features across the first DAG based on the set of pairwise affinities.
    Type: Grant
    Filed: September 10, 2020
    Date of Patent: February 20, 2024
    Assignee: NVIDIA Corporation
    Inventors: Sifei Liu, Shalini De Mello, Varun Jampani, Jan Kautz, Xueting Li
  • Publication number: 20240054720
    Abstract: Systems and methods generate a hybrid lighting model for rendering objects within an image. The hybrid lighting model includes lighting effects attributed to a first source, such as the sun, and to a second source, such as spatially-varying effects of objects within the image. The hybrid lighting model may be generated for an input image and then one or more virtual objects may be rendered to appear as if part of the input image, where the hybrid lighting model is used to apply one or more lighting effects to the one or more virtual objects.
    Type: Application
    Filed: August 11, 2022
    Publication date: February 15, 2024
    Inventors: Sanja Fidler, Zian Wang, Jan Kautz, Wenzheng Chen
  • Patent number: 11880927
    Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.
    Type: Grant
    Filed: May 19, 2023
    Date of Patent: January 23, 2024
    Assignee: NVIDIA Corporation
    Inventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Jan Kautz
  • Publication number: 20240020897
    Abstract: Apparatuses, systems, and techniques are presented to generate image data. In at least one embodiment, one or more neural networks are used to cause a lighting effect to be applied to one or more objects within one or more images based, at least in part, on synthetically generated images of the one or more objects.
    Type: Application
    Filed: July 12, 2022
    Publication date: January 18, 2024
    Inventors: Ting-Chun Wang, Ming-Yu Liu, Koki Nagano, Sameh Khamis, Jan Kautz
  • Publication number: 20230394781
    Abstract: Vision transformers are deep learning models that employ a self-attention mechanism to obtain feature representations for an input image. To date, the configuration of vision transformers has limited the self-attention computation to a local window of the input image, such that short-range dependencies are modeled in the output. The present disclosure provides a vision transformer that captures global context, and that is therefore able to model long-range dependencies in its output.
    Type: Application
    Filed: December 16, 2022
    Publication date: December 7, 2023
    Applicant: NVIDIA Corporation
    Inventors: Ali Hatamizadeh, Hongxu Yin, Jan Kautz, Pavlo Molchanov
  • Publication number: 20230368501
    Abstract: A neural network is trained to identify one or more features of an image. The neural network is trained using a small number of original images, from which a plurality of additional images are derived. The additional images generated by rotating and decoding embeddings of the image in a latent space generated by an autoencoder. The images generated by the rotation and decoding exhibit changes to a feature that is in proportion to the amount of rotation.
    Type: Application
    Filed: February 24, 2023
    Publication date: November 16, 2023
    Inventors: Seonwook Park, Shalini De Mello, Pavlo Molchanov, Umar Iqbal, Jan Kautz
  • Patent number: 11790633
    Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.
    Type: Grant
    Filed: July 1, 2021
    Date of Patent: October 17, 2023
    Assignee: NVIDIA Corporation
    Inventors: Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz
  • Publication number: 20230316458
    Abstract: In various examples, dynamic seam placement is used to position seams in regions of overlapping image data to avoid crossing salient objects or regions. Objects may be detected from image frames representing overlapping views of an environment surrounding an ego-object such as a vehicle. The images may be aligned to create an aligned composite image or surface (e.g., a panorama, a 360° image, bowl shaped surface) with regions of overlapping image data, and a representation of the detected objects and/or salient regions (e.g., a saliency mask) may be generated and projected onto the aligned composite image or surface. Seams may be positioned in the overlapping regions to avoid or minimize crossing salient pixels represented in the projected masks, and the image data may be blended at the seams to create a stitched image or surface (e.g., a stitched panorama, stitched 360° image, stitched textured surface).
    Type: Application
    Filed: February 23, 2023
    Publication date: October 5, 2023
    Inventors: Yuzhuo REN, Kenneth TURKOWSKI, Nuri Murat ARAR, Orazio GALLO, Jan KAUTZ, Niranjan AVADHANAM, Hang SU
  • Publication number: 20230319218
    Abstract: In various examples, a state machine is used to select between a default seam placement or dynamic seam placement that avoids salient regions, and to enable and disable dynamic seam placement based on speed of ego-motion, direction of ego-motion, proximity to salient objects, active viewport, driver gaze, and/or other factors. Images representing overlapping views of an environment may be aligned to create an aligned composite image or surface (e.g., a panorama, a 360° image, bowl shaped surface) with overlapping regions of image data, and a default or dynamic seam placement may be selected based on driving scenario (e.g., driving direction, speed, proximity to nearby objects). As such, seams may be positioned in the overlapping regions of image data, and the image data may be blended at the seams to create a stitched image or surface (e.g., a stitched panorama, stitched 360° image, stitched textured surface).
    Type: Application
    Filed: February 23, 2023
    Publication date: October 5, 2023
    Inventors: Yuzhuo REN, Nuri Murat ARAR, Orazio GALLO, Jan KAUTZ, Niranjan AVADHANAM, Hang SU
  • Publication number: 20230316635
    Abstract: In various examples, an environment surrounding an ego-object is visualized using an adaptive 3D bowl that models the environment with a shape that changes based on distance (and direction) to one or more representative point(s) on detected objects. Distance (and direction) to detected objects may be determined using 3D object detection or a top-down 2D or 3D occupancy grid, and used to adapt the shape of the adaptive 3D bowl in various ways (e.g., by sizing its ground plane to fit within the distance to the closest detected object, fitting a shape using an optimization algorithm). The adaptive 3D bowl may be enabled or disabled during each time slice (e.g., based on ego-speed), and the 3D bowl for each time slice may be used to render a visualization of the environment (e.g., a top-down projection image, a textured 3D bowl, and/or a rendered view thereof).
    Type: Application
    Filed: February 23, 2023
    Publication date: October 5, 2023
    Inventors: Hairong JIANG, Nuri Murat ARAR, Orazio GALLO, Jan KAUTZ, Ronan LETOQUIN
  • Publication number: 20230290038
    Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.
    Type: Application
    Filed: May 19, 2023
    Publication date: September 14, 2023
    Inventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Jan Kautz
  • Patent number: 11748887
    Abstract: Systems and methods to detect one or more segments of one or more objects within one or more images based, at least in part, on a neural network trained in an unsupervised manner to infer the one or more segments. Systems and methods to help train one or more neural networks to detect one or more segments of one or more objects within one or more images in an unsupervised manner.
    Type: Grant
    Filed: April 8, 2019
    Date of Patent: September 5, 2023
    Assignee: NVIDIA Corporation
    Inventors: Varun Jampani, Wei-Chih Hung, Sifei Liu, Pavlo Molchanov, Jan Kautz
  • Publication number: 20230267306
    Abstract: In various embodiments, a training application generates a trained machine learning model that represents items in a spectral domain. The training application executes a first neural network on a first set of data points associated with both a first item and the spectral domain to generate a second neural network. Subsequently, the training application generates a set of predicted data points that are associated with both the first item and the spectral domain via the second neural network. The training application generates the trained machine learning model based on the first neural network, the second neural network, and the set of predicted data points. The trained machine learning model maps one or more positions within the spectral domain to one or more values associated with an item based on a set of data points associated with both the item and the spectral domain.
    Type: Application
    Filed: September 20, 2022
    Publication date: August 24, 2023
    Inventors: Benjamin ECKART, Jan KAUTZ, Chao LIU, Benjamin WU