Patents by Inventor Jan Kautz

Jan Kautz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11790633
    Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.
    Type: Grant
    Filed: July 1, 2021
    Date of Patent: October 17, 2023
    Assignee: NVIDIA Corporation
    Inventors: Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz
  • Publication number: 20230316458
    Abstract: In various examples, dynamic seam placement is used to position seams in regions of overlapping image data to avoid crossing salient objects or regions. Objects may be detected from image frames representing overlapping views of an environment surrounding an ego-object such as a vehicle. The images may be aligned to create an aligned composite image or surface (e.g., a panorama, a 360° image, bowl shaped surface) with regions of overlapping image data, and a representation of the detected objects and/or salient regions (e.g., a saliency mask) may be generated and projected onto the aligned composite image or surface. Seams may be positioned in the overlapping regions to avoid or minimize crossing salient pixels represented in the projected masks, and the image data may be blended at the seams to create a stitched image or surface (e.g., a stitched panorama, stitched 360° image, stitched textured surface).
    Type: Application
    Filed: February 23, 2023
    Publication date: October 5, 2023
    Inventors: Yuzhuo REN, Kenneth TURKOWSKI, Nuri Murat ARAR, Orazio GALLO, Jan KAUTZ, Niranjan AVADHANAM, Hang SU
  • Publication number: 20230319218
    Abstract: In various examples, a state machine is used to select between a default seam placement or dynamic seam placement that avoids salient regions, and to enable and disable dynamic seam placement based on speed of ego-motion, direction of ego-motion, proximity to salient objects, active viewport, driver gaze, and/or other factors. Images representing overlapping views of an environment may be aligned to create an aligned composite image or surface (e.g., a panorama, a 360° image, bowl shaped surface) with overlapping regions of image data, and a default or dynamic seam placement may be selected based on driving scenario (e.g., driving direction, speed, proximity to nearby objects). As such, seams may be positioned in the overlapping regions of image data, and the image data may be blended at the seams to create a stitched image or surface (e.g., a stitched panorama, stitched 360° image, stitched textured surface).
    Type: Application
    Filed: February 23, 2023
    Publication date: October 5, 2023
    Inventors: Yuzhuo REN, Nuri Murat ARAR, Orazio GALLO, Jan KAUTZ, Niranjan AVADHANAM, Hang SU
  • Publication number: 20230316635
    Abstract: In various examples, an environment surrounding an ego-object is visualized using an adaptive 3D bowl that models the environment with a shape that changes based on distance (and direction) to one or more representative point(s) on detected objects. Distance (and direction) to detected objects may be determined using 3D object detection or a top-down 2D or 3D occupancy grid, and used to adapt the shape of the adaptive 3D bowl in various ways (e.g., by sizing its ground plane to fit within the distance to the closest detected object, fitting a shape using an optimization algorithm). The adaptive 3D bowl may be enabled or disabled during each time slice (e.g., based on ego-speed), and the 3D bowl for each time slice may be used to render a visualization of the environment (e.g., a top-down projection image, a textured 3D bowl, and/or a rendered view thereof).
    Type: Application
    Filed: February 23, 2023
    Publication date: October 5, 2023
    Inventors: Hairong JIANG, Nuri Murat ARAR, Orazio GALLO, Jan KAUTZ, Ronan LETOQUIN
  • Publication number: 20230290038
    Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.
    Type: Application
    Filed: May 19, 2023
    Publication date: September 14, 2023
    Inventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Jan Kautz
  • Patent number: 11748887
    Abstract: Systems and methods to detect one or more segments of one or more objects within one or more images based, at least in part, on a neural network trained in an unsupervised manner to infer the one or more segments. Systems and methods to help train one or more neural networks to detect one or more segments of one or more objects within one or more images in an unsupervised manner.
    Type: Grant
    Filed: April 8, 2019
    Date of Patent: September 5, 2023
    Assignee: NVIDIA Corporation
    Inventors: Varun Jampani, Wei-Chih Hung, Sifei Liu, Pavlo Molchanov, Jan Kautz
  • Publication number: 20230267306
    Abstract: In various embodiments, a training application generates a trained machine learning model that represents items in a spectral domain. The training application executes a first neural network on a first set of data points associated with both a first item and the spectral domain to generate a second neural network. Subsequently, the training application generates a set of predicted data points that are associated with both the first item and the spectral domain via the second neural network. The training application generates the trained machine learning model based on the first neural network, the second neural network, and the set of predicted data points. The trained machine learning model maps one or more positions within the spectral domain to one or more values associated with an item based on a set of data points associated with both the item and the spectral domain.
    Type: Application
    Filed: September 20, 2022
    Publication date: August 24, 2023
    Inventors: Benjamin ECKART, Jan KAUTZ, Chao LIU, Benjamin WU
  • Publication number: 20230267656
    Abstract: In various embodiments, an inference application constructs medical images. The inference application executes a first trained machine learning model on a set of data points associated with a both a medical item and a spectral domain to generate a second model that represents the medical item within the spectral domain. The inference application maps a set of positions to a set of predicted values associated with both the medical item and the spectral domain via the second model. The inference application constructs an image of the medical item based on the first set of predicted values.
    Type: Application
    Filed: September 20, 2022
    Publication date: August 24, 2023
    Inventors: Benjamin ECKART, Jan KAUTZ, Chao LIU, Benjamin WU
  • Publication number: 20230267659
    Abstract: In various embodiments, an inference application reconstructs representations of items in a spectral domain. The inference application maps a first set of data points associated with a both an item and the spectral domain to conditioning information via a first trained machine learning model. The inference application updates a second trained machine learning model based on the conditioning information to generate a model that represents the item within the spectral domain. The inference application generates a second set of data points associated with both the item and the spectral domain via the model. The inference application constructs an image associated with the item based on the second set of data points.
    Type: Application
    Filed: September 20, 2022
    Publication date: August 24, 2023
    Inventors: Benjamin ECKART, Jan KAUTZ, Chao LIU, Benjamin WU
  • Publication number: 20230252692
    Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.
    Type: Application
    Filed: September 1, 2022
    Publication date: August 10, 2023
    Inventors: Sifei Liu, Jiteng Mu, Shalini De Mello, Zhiding Yu, Jan Kautz
  • Patent number: 11704857
    Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.
    Type: Grant
    Filed: May 2, 2022
    Date of Patent: July 18, 2023
    Assignee: NVIDIA Corporation
    Inventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Jan Kautz
  • Publication number: 20230186077
    Abstract: One embodiment of the present invention sets forth a technique for executing a transformer neural network. The technique includes computing a first set of halting scores for a first set of tokens that has been input into a first layer of the transformer neural network. The technique also includes determining that a first halting score included in the first set of halting scores exceeds a threshold value. The technique further includes in response to the first halting score exceeding the threshold value, causing a first token that is included in the first set of tokens and is associated with the first halting score not to be processed by one or more layers within the transformer neural network that are subsequent to the first layer.
    Type: Application
    Filed: June 15, 2022
    Publication date: June 15, 2023
    Inventors: Hongxu YIN, Jan KAUTZ, Jose Manuel ALVAREZ LOPEZ, Arun MALLYA, Pavlo MOLCHANOV, Arash VAHDAT
  • Publication number: 20230177810
    Abstract: Semantic segmentation includes the task of providing pixel-wise annotations for a provided image. To train a machine learning environment to perform semantic segmentation, image/caption pairs are retrieved from one or more databases. These image/caption pairs each include an image and associated textual caption. The image portion of each image/caption pair is passed to an image encoder of the machine learning environment that outputs potential pixel groupings (e.g., potential segments of pixels) within each image, while nouns are extracted from the caption portion and are converted to text prompts which are then passed to a text encoder that outputs a corresponding text representation. Contrastive loss operations are then performed on features extracted from these pixel groupings and text representations to determine an extracted feature for each noun of each caption that most closely matches the extracted features for the associated image.
    Type: Application
    Filed: June 29, 2022
    Publication date: June 8, 2023
    Inventors: Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz
  • Publication number: 20230144458
    Abstract: In examples, locations of facial landmarks may be applied to one or more machine learning models (MLMs) to generate output data indicating profiles corresponding to facial expressions, such as facial action coding system (FACS) values. The output data may be used to determine geometry of a model. For example, video frames depicting one or more faces may be analyzed to determine the locations. The facial landmarks may be normalized, then be applied to the MLM(s) to infer the profile(s), which may then be used to animate the mode for expression retargeting from the video. The MLM(s) may include sub-networks that each analyze a set of input data corresponding to a region of the face to determine profiles that correspond to the region. The profiles from the sub-networks, along global locations of facial landmarks may be used by a subsequent network to infer the profiles for the overall face.
    Type: Application
    Filed: October 31, 2022
    Publication date: May 11, 2023
    Inventors: Alexander Malafeev, Shalini De Mello, Jaewoo Seo, Umar Iqbal, Koki Nagano, Jan Kautz, Simon Yuen
  • Patent number: 11645530
    Abstract: A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer. The method also includes the steps of setting hidden-to-hidden weights of the recurrent layer to initial values and processing video image data by the visual sequence learning neural network model to generate classification or regression output data.
    Type: Grant
    Filed: May 19, 2021
    Date of Patent: May 9, 2023
    Assignee: NVIDIA Corporation
    Inventors: Xiaodong Yang, Pavlo Molchanov, Jan Kautz
  • Patent number: 11636668
    Abstract: A method includes filtering a point cloud transformation of a 3D object to generate a 3D lattice and processing the 3D lattice through a series of bilateral convolution networks (BCL), each BCL in the series having a lower lattice feature scale than a preceding BCL in the series. The output of each BCL in the series is concatenated to generate an intermediate 3D lattice. Further filtering of the intermediate 3D lattice generates a first prediction of features of the 3D object.
    Type: Grant
    Filed: May 22, 2018
    Date of Patent: April 25, 2023
    Inventors: Varun Jampani, Hang Su, Deqing Sun, Ming-Hsuan Yang, Jan Kautz
  • Patent number: 11631239
    Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.
    Type: Grant
    Filed: April 22, 2021
    Date of Patent: April 18, 2023
    Assignee: NVIDIA CORPORATION
    Inventors: Xiaodong Yang, Ming-Yu Liu, Jan Kautz, Fanyi Xiao, Xitong Yang
  • Publication number: 20230088912
    Abstract: In various examples, historical trajectory information of objects in an environment may be tracked by an ego-vehicle and encoded into a state feature. The encoded state features for each of the objects observed by the ego-vehicle may be used—e.g., by a bi-directional long short-term memory (LSTM) network—to encode a spatial feature. The encoded spatial feature and the encoded state feature for an object may be used to predict lateral and/or longitudinal maneuvers for the object, and the combination of this information may be used to determine future locations of the object. The future locations may be used by the ego-vehicle to determine a path through the environment, or may be used by a simulation system to control virtual objects—according to trajectories determined from the future locations—through a simulation environment.
    Type: Application
    Filed: September 26, 2022
    Publication date: March 23, 2023
    Inventors: Ruben Villegas, Alejandro Troccoli, Iuri Frosio, Stephen Tyree, Wonmin Byeon, Jan Kautz
  • Publication number: 20230080247
    Abstract: A vision transformer is a deep learning model used to perform vision processing tasks such as image recognition. Vision transformers are currently designed with a plurality of same-size blocks that perform the vision processing tasks. However, some portions of these blocks are unnecessary and not only slow down the vision transformer but use more memory than required. In response, parameters of these blocks are analyzed to determine a score for each parameter, and if the score falls below a threshold, the parameter is removed from the associated block. This reduces a size of the resulting vision transformer, which reduces unnecessary memory usage and increases performance.
    Type: Application
    Filed: December 14, 2021
    Publication date: March 16, 2023
    Inventors: Hongxu Yin, Huanrui Yang, Pavlo Molchanov, Jan Kautz
  • Publication number: 20230074706
    Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.
    Type: Application
    Filed: August 25, 2021
    Publication date: March 9, 2023
    Inventors: Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz