Patents by Inventor Jan Kautz

Jan Kautz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MACHINE-LEARNING TECHNIQUES FOR REPRESENTING ITEMS IN A SPECTRAL DOMAIN

Publication number: 20230267306

Abstract: In various embodiments, a training application generates a trained machine learning model that represents items in a spectral domain. The training application executes a first neural network on a first set of data points associated with both a first item and the spectral domain to generate a second neural network. Subsequently, the training application generates a set of predicted data points that are associated with both the first item and the spectral domain via the second neural network. The training application generates the trained machine learning model based on the first neural network, the second neural network, and the set of predicted data points. The trained machine learning model maps one or more positions within the spectral domain to one or more values associated with an item based on a set of data points associated with both the item and the spectral domain.

Type: Application

Filed: September 20, 2022

Publication date: August 24, 2023

Inventors: Benjamin ECKART, Jan KAUTZ, Chao LIU, Benjamin WU
MACHINE-LEARNING TECHNIQUES FOR CONSTRUCTING MEDICAL IMAGES

Publication number: 20230267656

Abstract: In various embodiments, an inference application constructs medical images. The inference application executes a first trained machine learning model on a set of data points associated with a both a medical item and a spectral domain to generate a second model that represents the medical item within the spectral domain. The inference application maps a set of positions to a set of predicted values associated with both the medical item and the spectral domain via the second model. The inference application constructs an image of the medical item based on the first set of predicted values.

Type: Application

Filed: September 20, 2022

Publication date: August 24, 2023

Inventors: Benjamin ECKART, Jan KAUTZ, Chao LIU, Benjamin WU
MACHINE-LEARNING TECHNIQUES FOR SPARSE-TO-DENSE SPECTRAL RECONSTRUCTION

Publication number: 20230267659

Abstract: In various embodiments, an inference application reconstructs representations of items in a spectral domain. The inference application maps a first set of data points associated with a both an item and the spectral domain to conditioning information via a first trained machine learning model. The inference application updates a second trained machine learning model based on the conditioning information to generate a model that represents the item within the spectral domain. The inference application generates a second set of data points associated with both the item and the spectral domain via the model. The inference application constructs an image associated with the item based on the second set of data points.

Type: Application

Filed: September 20, 2022

Publication date: August 24, 2023

Inventors: Benjamin ECKART, Jan KAUTZ, Chao LIU, Benjamin WU
LEARNING DENSE CORRESPONDENCES FOR IMAGES

Publication number: 20230252692

Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.

Type: Application

Filed: September 1, 2022

Publication date: August 10, 2023

Inventors: Sifei Liu, Jiteng Mu, Shalini De Mello, Zhiding Yu, Jan Kautz
Three-dimensional object reconstruction from a video

Patent number: 11704857

Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.

Type: Grant

Filed: May 2, 2022

Date of Patent: July 18, 2023

Assignee: NVIDIA Corporation

Inventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Jan Kautz
ADAPTIVE TOKEN DEPTH ADJUSTMENT IN TRANSFORMER NEURAL NETWORKS

Publication number: 20230186077

Abstract: One embodiment of the present invention sets forth a technique for executing a transformer neural network. The technique includes computing a first set of halting scores for a first set of tokens that has been input into a first layer of the transformer neural network. The technique also includes determining that a first halting score included in the first set of halting scores exceeds a threshold value. The technique further includes in response to the first halting score exceeding the threshold value, causing a first token that is included in the first set of tokens and is associated with the first halting score not to be processed by one or more layers within the transformer neural network that are subsequent to the first layer.

Type: Application

Filed: June 15, 2022

Publication date: June 15, 2023

Inventors: Hongxu YIN, Jan KAUTZ, Jose Manuel ALVAREZ LOPEZ, Arun MALLYA, Pavlo MOLCHANOV, Arash VAHDAT
PERFORMING SEMANTIC SEGMENTATION TRAINING WITH IMAGE/TEXT PAIRS

Publication number: 20230177810

Abstract: Semantic segmentation includes the task of providing pixel-wise annotations for a provided image. To train a machine learning environment to perform semantic segmentation, image/caption pairs are retrieved from one or more databases. These image/caption pairs each include an image and associated textual caption. The image portion of each image/caption pair is passed to an image encoder of the machine learning environment that outputs potential pixel groupings (e.g., potential segments of pixels) within each image, while nouns are extracted from the caption portion and are converted to text prompts which are then passed to a text encoder that outputs a corresponding text representation. Contrastive loss operations are then performed on features extracted from these pixel groupings and text representations to determine an extracted feature for each noun of each caption that most closely matches the extracted features for the associated image.

Type: Application

Filed: June 29, 2022

Publication date: June 8, 2023

Inventors: Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz
ESTIMATING FACIAL EXPRESSIONS USING FACIAL LANDMARKS

Publication number: 20230144458

Abstract: In examples, locations of facial landmarks may be applied to one or more machine learning models (MLMs) to generate output data indicating profiles corresponding to facial expressions, such as facial action coding system (FACS) values. The output data may be used to determine geometry of a model. For example, video frames depicting one or more faces may be analyzed to determine the locations. The facial landmarks may be normalized, then be applied to the MLM(s) to infer the profile(s), which may then be used to animate the mode for expression retargeting from the video. The MLM(s) may include sub-networks that each analyze a set of input data corresponding to a region of the face to determine profiles that correspond to the region. The profiles from the sub-networks, along global locations of facial landmarks may be used by a subsequent network to infer the profiles for the overall face.

Type: Application

Filed: October 31, 2022

Publication date: May 11, 2023

Inventors: Alexander Malafeev, Shalini De Mello, Jaewoo Seo, Umar Iqbal, Koki Nagano, Jan Kautz, Simon Yuen
Transforming convolutional neural networks for visual sequence learning

Patent number: 11645530

Abstract: A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer. The method also includes the steps of setting hidden-to-hidden weights of the recurrent layer to initial values and processing video image data by the visual sequence learning neural network model to generate classification or regression output data.

Type: Grant

Filed: May 19, 2021

Date of Patent: May 9, 2023

Assignee: NVIDIA Corporation

Inventors: Xiaodong Yang, Pavlo Molchanov, Jan Kautz
Bilateral convolution layer network for processing point clouds

Patent number: 11636668

Abstract: A method includes filtering a point cloud transformation of a 3D object to generate a 3D lattice and processing the 3D lattice through a series of bilateral convolution networks (BCL), each BCL in the series having a lower lattice feature scale than a preceding BCL in the series. The output of each BCL in the series is concatenated to generate an intermediate 3D lattice. Further filtering of the intermediate 3D lattice generates a first prediction of features of the 3D object.

Type: Grant

Filed: May 22, 2018

Date of Patent: April 25, 2023

Inventors: Varun Jampani, Hang Su, Deqing Sun, Ming-Hsuan Yang, Jan Kautz
Iterative spatio-temporal action detection in video

Patent number: 11631239

Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.

Type: Grant

Filed: April 22, 2021

Date of Patent: April 18, 2023

Assignee: NVIDIA CORPORATION

Inventors: Xiaodong Yang, Ming-Yu Liu, Jan Kautz, Fanyi Xiao, Xitong Yang
FUTURE OBJECT TRAJECTORY PREDICTIONS FOR AUTONOMOUS MACHINE APPLICATIONS

Publication number: 20230088912

Abstract: In various examples, historical trajectory information of objects in an environment may be tracked by an ego-vehicle and encoded into a state feature. The encoded state features for each of the objects observed by the ego-vehicle may be used—e.g., by a bi-directional long short-term memory (LSTM) network—to encode a spatial feature. The encoded spatial feature and the encoded state feature for an object may be used to predict lateral and/or longitudinal maneuvers for the object, and the combination of this information may be used to determine future locations of the object. The future locations may be used by the ego-vehicle to determine a path through the environment, or may be used by a simulation system to control virtual objects—according to trajectories determined from the future locations—through a simulation environment.

Type: Application

Filed: September 26, 2022

Publication date: March 23, 2023

Inventors: Ruben Villegas, Alejandro Troccoli, Iuri Frosio, Stephen Tyree, Wonmin Byeon, Jan Kautz
PRUNING A VISION TRANSFORMER

Publication number: 20230080247

Abstract: A vision transformer is a deep learning model used to perform vision processing tasks such as image recognition. Vision transformers are currently designed with a plurality of same-size blocks that perform the vision processing tasks. However, some portions of these blocks are unnecessary and not only slow down the vision transformer but use more memory than required. In response, parameters of these blocks are analyzed to determine a score for each parameter, and if the score falls below a threshold, the parameter is removed from the associated block. This reduces a size of the resulting vision transformer, which reduces unnecessary memory usage and increases performance.

Type: Application

Filed: December 14, 2021

Publication date: March 16, 2023

Inventors: Hongxu Yin, Huanrui Yang, Pavlo Molchanov, Jan Kautz
LEARNING CONTRASTIVE REPRESENTATION FOR SEMANTIC CORRESPONDENCE

Publication number: 20230074706

Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.

Type: Application

Filed: August 25, 2021

Publication date: March 9, 2023

Inventors: Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz
PERFORMING OCCLUSION-AWARE GLOBAL 3D POSE AND SHAPE ESTIMATION OF ARTICULATED OBJECTS

Publication number: 20230070514

Abstract: In order to determine accurate three-dimensional (3D) models for objects within a video, the objects are first identified and tracked within the video, and a pose and shape are estimated for these tracked objects. A translation and global orientation are removed from the tracked objects to determine local motion for the objects, and motion infilling is performed to fill in any missing portions for the object within the video. A global trajectory is then determined for the objects within the video, and the infilled motion and global trajectory are then used to determine infilled global motion for the object within the video. This enables the accurate depiction of each object as a 3D pose sequence for that model that accounts for occlusions and global factors within the video.

Type: Application

Filed: January 25, 2022

Publication date: March 9, 2023

Inventors: Ye Yuan, Umar Iqbal, Pavlo Molchanov, Jan Kautz
Self-supervised hierarchical motion learning for video action recognition

Patent number: 11594006

Abstract: There are numerous features in video that can be detected using computer-based systems, such as objects and/or motion. The detection of these features, and in particular the detection of motion, has many useful applications, such as action recognition, activity detection, object tracking, etc. The present disclosure provides a neural network that learns motion from unlabeled video frames. In particular, the neural network uses the unlabeled video frames to perform self-supervised hierarchical motion learning. The present disclosure also describes how the learned motion can be used in video action recognition.

Type: Grant

Filed: August 20, 2020

Date of Patent: February 28, 2023

Assignee: NVIDIA CORPORATION

Inventors: Xiaodong Yang, Xitong Yang, Sifei Liu, Jan Kautz
Few-shot training of a neural network

Patent number: 11593661

Abstract: A neural network is trained to identify one or more features of an image. The neural network is trained using a small number of original images, from which a plurality of additional images are derived. The additional images generated by rotating and decoding embeddings of the image in a latent space generated by an autoencoder. The images generated by the rotation and decoding exhibit changes to a feature that is in proportion to the amount of rotation.

Type: Grant

Filed: April 19, 2019

Date of Patent: February 28, 2023

Assignee: NVIDIA Corporation

Inventors: Seonwook Park, Shalini De Mello, Pavlo Molchanov, Umar Iqbal, Jan Kautz
SYNTHESIZING VIDEO FROM AUDIO USING ONE OR MORE NEURAL NETWORKS

Publication number: 20230035306

Abstract: Apparatuses, systems, and techniques are presented to generate media content.

Type: Application

Filed: July 21, 2021

Publication date: February 2, 2023

Inventors: Ming-Yu Liu, Koki Nagano, Yeongho Seol, Jose Rafael Valle Gomes da Costa, Jaewoo Seo, Ting-Chun Wang, Arun Mallya, Sameh Khamis, Wei Ping, Rohan Badlani, Kevin Jonathan Shih, Bryan Catanzaro, Simon Yuen, Jan Kautz
IMAGE PROCESSING USING COUPLED SEGMENTATION AND EDGE LEARNING

Publication number: 20230015989

Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.

Type: Application

Filed: July 1, 2021

Publication date: January 19, 2023

Inventors: Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz
TRAINING OBJECT DETECTION SYSTEMS WITH GENERATED IMAGES

Publication number: 20230004760

Abstract: Apparatuses, systems, and techniques to identify objects within an image using self-supervised machine learning. In at least one embodiment, a machine learning system is trained to recognize objects by training a first network to recognize objects within images that are generated by a second network. In at least one embodiment, the second network is a controllable network.

Type: Application

Filed: June 28, 2021

Publication date: January 5, 2023

Inventors: Siva Karthik Mustikovela, Shalini De Mello, Aayush Prakash, Umar Iqbal, Sifei Liu, Jan Kautz

prev 1 2 3 4 5 6 … next