Patents by Inventor Jan Kautz

Jan Kautz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MACHINE LEARNING FRAMEWORK APPLIED IN A SEMI-SUPERVISED SETTING TO PERFORM INSTANCE TRACKING IN A SEQUENCE OF IMAGE FRAMES

Publication number: 20220222832

Abstract: A method and system are provided for tracking instances within a sequence of video frames. The method includes the steps of processing an image frame by a backbone network to generate a set of feature maps, processing the set of feature maps by one or more prediction heads, and analyzing the embedding features corresponding to a set of instances in two or more image frames of the sequence of video frames to establish a one-to-one correlation between instances in different image frames. The one or more prediction heads includes an embedding head configured to generate a set of embedding features corresponding to one or more instances of an object identified in the image frame. The method may also include training the one or more prediction heads using a set of annotated image frames and/or a plurality of sequences of unlabeled video frames.

Type: Application

Filed: January 6, 2022

Publication date: July 14, 2022

Inventors: Yang Fu, Sifei Liu, Umar Iqbal, Shalini De Mello, Jan Kautz
Few-shot viewpoint estimation

Patent number: 11375176

Abstract: When an image is projected from 3D, the viewpoint of objects in the image, relative to the camera, must be determined. Since the image itself will not have sufficient information to determine the viewpoint of the various objects in the image, techniques to estimate the viewpoint must be employed. To date, neural networks have been used to infer such viewpoint estimates on an object category basis, but must first be trained with numerous examples that have been manually created. The present disclosure provides a neural network that is trained to learn, from just a few example images, a unique viewpoint estimation network capable of inferring viewpoint estimations for a new object category.

Type: Grant

Filed: February 3, 2020

Date of Patent: June 28, 2022

Assignee: NVIDIA CORPORATION

Inventors: Hung-Yu Tseng, Shalini De Mello, Jonathan Tremblay, Sifei Liu, Jan Kautz, Stanley Thomas Birchfield
Cross-domain image processing for object re-identification

Patent number: 11367268

Abstract: Object re-identification refers to a process by which images that contain an object of interest are retrieved from a set of images captured using disparate cameras or in disparate environments. Object re-identification has many useful applications, particularly as it is applied to people (e.g. person tracking). Current re-identification processes rely on convolutional neural networks (CNNs) that learn re-identification for a particular object class from labeled training data specific to a certain domain (e.g. environment), but that do not apply well in other domains. The present disclosure provides cross-domain disentanglement of id-related and id-unrelated factors. In particular, the disentanglement is performed using a labeled image set and an unlabeled image set, respectively captured from different domains but for a same object class.

Type: Grant

Filed: August 20, 2020

Date of Patent: June 21, 2022

Assignee: NVIDIA CORPORATION

Inventors: Xiaodong Yang, Yang Zou, Zhiding Yu, Jan Kautz
VISUALLY TRACKED SPATIAL AUDIO

Publication number: 20220191638

Abstract: Apparatuses, systems, and techniques to determine head poses of users and provide audio for the users. In at least one embodiment, a head pose is determined based, at least in part, on camera frame information, and an audio signal is generated, based at least in part, on the determined head pose.

Type: Application

Filed: March 3, 2021

Publication date: June 16, 2022

Inventors: Michael Stengel, Jan Kautz, David Patrick Luebke, Morgan Samuel McGuire
Articulated body mesh estimation using three-dimensional (3D) body keypoints

Patent number: 11361507

Abstract: Estimating a three-dimensional (3D) pose and shape of an articulated body mesh is useful for many different applications including health and fitness, entertainment, and computer graphics. A set of estimated 3D keypoint positions for a human body structure are processed to compute parameters defining the pose and shape of a parametric human body mesh using a set of geometric operations. During processing, 3D keypoints are extracted from the parametric human body mesh and a set of rotations are computed to align the extracted 3D keypoints with the estimated 3D keypoints. The set of rotations may correctly position a particular 3D keypoint location at a “joint”, but an arbitrary number of rotations of the “joint” keypoint may produce a twist in a connection to a child keypoint. Rules are applied to the set of rotations to resolve ambiguous twists and articulate the parametric human body mesh according to the computed parameters.

Type: Grant

Filed: May 7, 2021

Date of Patent: June 14, 2022

Assignee: NVIDIA Corporation

Inventors: Umar Iqbal, Pavlo Molchanov, Jan Kautz, Yun Rong Guo, Cheng Xie
Three-dimensional object reconstruction from a video

Patent number: 11354847

Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.

Type: Grant

Filed: July 31, 2020

Date of Patent: June 7, 2022

Assignee: NVIDIA Corporation

Inventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Jan Kautz
Switchable propagation neural network

Patent number: 11328173

Abstract: A temporal propagation network (TPN) system learns the affinity matrix for video image processing tasks. An affinity matrix is a generic matrix that defines the similarity of two points in space. The TPN system includes a guidance neural network model and a temporal propagation module and is trained for a particular computer vision task to propagate visual properties from a key-frame represented by dense data (color), to another frame that is represented by coarse data (grey-scale). The guidance neural network model generates an affinity matrix referred to as a global transformation matrix from task-specific data for the key-frame and the other frame. The temporal propagation module applies the global transformation matrix to the key-frame property data to produce propagated property data (color) for the other frame. For example, the TPN system may be used to colorize several frames of greyscale video using a single manually colorized key-frame.

Type: Grant

Filed: October 27, 2020

Date of Patent: May 10, 2022

Assignee: NVIDIA Corporation

Inventors: Sifei Liu, Shalini De Mello, Jinwei Gu, Varun Jampani, Jan Kautz
Switchable propagation neural network

Patent number: 11328169

Abstract: A temporal propagation network (TPN) system learns the affinity matrix for video image processing tasks. An affinity matrix is a generic matrix that defines the similarity of two points in space. The TPN system includes a guidance neural network model and a temporal propagation module and is trained for a particular computer vision task to propagate visual properties from a key-frame represented by dense data (color), to another frame that is represented by coarse data (grey-scale). The guidance neural network model generates an affinity matrix referred to as a global transformation matrix from task-specific data for the key-frame and the other frame. The temporal propagation module applies the global transformation matrix to the key-frame property data to produce propagated property data (color) for the other frame. For example, the TPN system may be used to colorize several frames of greyscale video using a single manually colorized key-frame.

Type: Grant

Filed: March 14, 2019

Date of Patent: May 10, 2022

Assignee: NVIDIA Corporation

Inventors: Sifei Liu, Shalini De Mello, Jinwei Gu, Varun Jampani, Jan Kautz
SELF-SUPERVISED SINGLE-VIEW 3D RECONSTRUCTION VIA SEMANTIC CONSISTENCY

Publication number: 20220139037

Abstract: Apparatuses, systems, and techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object. In at least one embodiment, objects are identified in an image using one or more neural networks that have been trained on objects of a similar category and a three-dimensional mesh template.

Type: Application

Filed: January 18, 2022

Publication date: May 5, 2022

Inventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Jan Kautz
Systems and methods for pruning neural networks for resource efficient inference

Patent number: 11315018

Abstract: A method, computer readable medium, and system are disclosed for neural network pruning. The method includes the steps of receiving first-order gradients of a cost function relative to layer parameters for a trained neural network and computing a pruning criterion for each layer parameter based on the first-order gradient corresponding to the layer parameter, where the pruning criterion indicates an importance of each neuron that is included in the trained neural network and is associated with the layer parameter. The method includes the additional steps of identifying at least one neuron having a lowest importance and removing the at least one neuron from the trained neural network to produce a pruned neural network.

Type: Grant

Filed: October 17, 2017

Date of Patent: April 26, 2022

Assignee: NVIDIA Corporation

Inventors: Pavlo Molchanov, Stephen Walter Tyree, Tero Tapani Karras, Timo Oskari Aila, Jan Kautz
Inverse rendering of a scene from a single image

Patent number: 11295514

Abstract: Inverse rendering estimates physical scene attributes (e.g., reflectance, geometry, and lighting) from image(s) and is used for gaming, virtual reality, augmented reality, and robotics. An inverse rendering network (IRN) receives a single input image of a 3D scene and generates the physical scene attributes for the image. The IRN is trained by using the estimated physical scene attributes generated by the IRN to reproduce the input image and updating parameters of the IRN to reduce differences between the reproduced input image and the input image. A direct renderer and a residual appearance renderer (RAR) reproduce the input image. The RAR predicts a residual image representing complex appearance effects of the real (not synthetic) image based on features extracted from the image and the reflectance and geometry properties. The residual image represents near-field illumination, cast shadows, inter-reflections, and realistic shading that are not provided by the direct renderer.

Type: Grant

Filed: November 15, 2019

Date of Patent: April 5, 2022

Assignee: NVIDIA Corporation

Inventors: Jinwei Gu, Kihwan Kim, Jan Kautz, Guilin Liu, Soumyadip Sengupta
TRAINING ENERGY-BASED VARIATIONAL AUTOENCODERS

Publication number: 20220101145

Abstract: One embodiment sets forth a technique for creating a generative model. The technique includes generating a trained generative model with a first component that converts data points in the training dataset into latent variable values, a second component that learns a distribution of the latent variable values, and a third component that converts the latent variable values into output distributions. The technique also includes training an energy-based model to learn an energy function based on values sampled from a first distribution associated with the training dataset and values sampled from a second distribution during operation of the trained generative model. The technique further includes creating a joint model that includes one or more portions of the trained generative model and the energy-based model, and that applies energy values from the energy-based model to samples from the second distribution to produce additional values used to generate a new data point.

Type: Application

Filed: June 24, 2021

Publication date: March 31, 2022

Inventors: Arash VAHDAT, Karsten KREIS, Zhisheng XIAO, Jan KAUTZ
ENERGY-BASED VARIATIONAL AUTOENCODERS

Publication number: 20220101122

Abstract: One embodiment of the present invention sets forth a technique for generating data using a generative model. The technique includes sampling from one or more distributions of one or more variables to generate a first set of values for the one or more variables, where the one or more distributions are used during operation of one or more portions of the generative model. The technique also includes applying one or more energy values generated via an energy-based model to the first set of values to produce a second set of values for the one or more variables. The technique further includes either outputting the set of second values as output data or performing one or more operations based on the second set of values to generate output data.

Type: Application

Filed: June 24, 2021

Publication date: March 31, 2022

Inventors: Arash VAHDAT, Karsten KREIS, Zhisheng XIAO, Jan KAUTZ
LEARNING AND PROPAGATING VISUAL ATTRIBUTES

Publication number: 20220076128

Abstract: One embodiment of the present invention sets forth a technique for performing spatial propagation. The technique includes generating a first directed acyclic graph (DAG) by connecting spatially adjacent points included in a set of unstructured points via directed edges along a first direction. The technique also includes applying a first set of neural network layers to one or more images associated with the set of unstructured points to generate (i) a set of features for the set of unstructured points and (ii) a set of pairwise affinities between the spatially adjacent points connected by the directed edges. The technique further includes generating a set of labels for the set of unstructured points by propagating the set of features across the first DAG based on the set of pairwise affinities.

Type: Application

Filed: September 10, 2020

Publication date: March 10, 2022

Inventors: Sifei LIU, Shalini DE MELLO, Varun JAMPANI, Jan KAUTZ
Deep-learning method for separating reflection and transmission images visible at a semi-reflective surface in a computer image of a real-world scene

Patent number: 11270161

Abstract: When a computer image is generated from a real-world scene having a semi-reflective surface (e.g. window), the computer image will create, at the semi-reflective surface from the viewpoint of the camera, both a reflection of a scene in front of the semi-reflective surface and a transmission of a scene located behind the semi-reflective surface. Similar to a person viewing the real-world scene from different locations, angles, etc., the reflection and transmission may change, and also move relative to each other, as the viewpoint of the camera changes. Unfortunately, the dynamic nature of the reflection and transmission negatively impacts the performance of many computer applications, but performance can generally be improved if the reflection and transmission are separated. The present disclosure uses deep learning to separate reflection and transmission at a semi-reflective surface of a computer image generated from a real-world scene.

Type: Grant

Filed: July 8, 2020

Date of Patent: March 8, 2022

Assignee: NVIDIA Corporation

Inventors: Orazio Gallo, Jinwei Gu, Jan Kautz, Patrick Wieschollek
Training a neural network to predict superpixels using segmentation-aware affinity loss

Patent number: 11256961

Abstract: Segmentation is the identification of separate objects within an image. An example is identification of a pedestrian passing in front of a car, where the pedestrian is a first object and the car is a second object. Superpixel segmentation is the identification of regions of pixels within an object that have similar properties. An example is identification of pixel regions having a similar color, such as different articles of clothing worn by the pedestrian and different components of the car. A pixel affinity neural network (PAN) model is trained to generate pixel affinity maps for superpixel segmentation. The pixel affinity map defines the similarity of two points in space. In an embodiment, the pixel affinity map indicates a horizontal affinity and vertical affinity for each pixel in the image. The pixel affinity map is processed to identify the superpixels.

Type: Grant

Filed: July 6, 2020

Date of Patent: February 22, 2022

Assignee: NVIDIA Corporation

Inventors: Wei-Chih Tu, Ming-Yu Liu, Varun Jampani, Deqing Sun, Ming-Hsuan Yang, Jan Kautz
THREE-DIMENSIONAL OBJECT RECONSTRUCTION FROM A VIDEO

Publication number: 20220036635

Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.

Type: Application

Filed: July 31, 2020

Publication date: February 3, 2022

Inventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Jan Kautz
Self-supervised single-view 3D reconstruction via semantic consistency

Patent number: 11238650

Abstract: Apparatuses, systems, and techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object. In at least one embodiment, objects are identified in an image using one or more neural networks that have been trained on objects of a similar category and a three-dimensional mesh template.

Type: Grant

Filed: April 15, 2020

Date of Patent: February 1, 2022

Assignee: NVIDIA Corporation

Inventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Jan Kautz
CREATING AN IMAGE UTILIZING A MAP REPRESENTING DIFFERENT CLASSES OF PIXELS

Publication number: 20220012536

Abstract: A method, computer readable medium, and system are disclosed for creating an image utilizing a map representing different classes of specific pixels within a scene. One or more computing systems use the map to create a preliminary image. This preliminary image is then compared to an original image that was used to create the map. A determination is made whether the preliminary image matches the original image, and results of the determination are used to adjust the computing systems that created the preliminary image, which improves a performance of such computing systems. The adjusted computing systems are then used to create images based on different input maps representing various object classes of specific pixels within a scene.

Type: Application

Filed: September 23, 2021

Publication date: January 13, 2022

Inventors: Ting-Chun Wang, Ming-Yu Liu, Bryan Christopher Catanzaro, Jan Kautz, Andrew J. Tao
DEEP HIERARCHICAL VARIATIONAL AUTOENCODER

Publication number: 20210397945

Abstract: One embodiment of the present invention sets forth a technique for performing machine learning. The technique includes inputting a training dataset into a variational autoencoder (VAE) comprising an encoder network, a prior network, and a decoder network. The technique also includes training the VAE by updating one or more parameters of the VAE based on a smoothness of one or more outputs produced by the VAE from the training dataset. The technique further includes producing generative output that reflects a first distribution of the training dataset by applying the decoder network to one or more values sampled from a second distribution of latent variables generated by the prior network.

Type: Application

Filed: November 4, 2020

Publication date: December 23, 2021

Inventors: Arash VAHDAT, Jan KAUTZ

prev 1 2 3 4 5 6 7 8 … next