Patents by Inventor Jan Kautz

Jan Kautz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

3D HUMAN BODY POSE ESTIMATION USING A MODEL TRAINED FROM UNLABELED MULTI-VIEW DATA

Publication number: 20210248772

Abstract: Learning to estimate a 3D body pose, and likewise the pose of any type of object, from a single 2D image is of great interest for many practical graphics applications and generally relies on neural networks that have been trained with sample data which annotates (labels) each sample 2D image with a known 3D pose. Requiring this labeled training data however has various drawbacks, including for example that traditionally used training data sets lack diversity and therefore limit the extent to which neural networks are able to estimate 3D pose. Expanding these training data sets is also difficult since it requires manually provided annotations for 2D images, which is time consuming and prone to errors. The present disclosure overcomes these and other limitations of existing techniques by providing a model that is trained from unlabeled multi-view data for use in 3D pose estimation.

Type: Application

Filed: June 9, 2020

Publication date: August 12, 2021

Inventors: Umar Iqbal, Pavlo Molchanov, Jan Kautz
ITERATIVE SPATIO-TEMPORAL ACTION DETECTION IN VIDEO

Publication number: 20210241489

Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.

Type: Application

Filed: April 22, 2021

Publication date: August 5, 2021

Inventors: Xiaodong YANG, Ming-Yu LIU, Jan KAUTZ, Fanyi XIAO, Xitong YANG
Using residual video data resulting from a compression of original video data to improve a decompression of the original video data

Patent number: 11082720

Abstract: A method, computer readable medium, and system are disclosed for identifying residual video data. This data describes data that is lost during a compression of original video data. For example, the original video data may be compressed and then decompressed, and this result may be compared to the original video data to determine the residual video data. This residual video data is transformed into a smaller format by means of encoding, binarizing, and compressing, and is sent to a destination. At the destination, the residual video data is transformed back into its original format and is used during the decompression of the compressed original video data to improve a quality of the decompressed original video data.

Type: Grant

Filed: November 14, 2018

Date of Patent: August 3, 2021

Assignee: NVIDIA CORPORATION

Inventors: Yi-Hsuan Tsai, Ming-Yu Liu, Deqing Sun, Ming-Hsuan Yang, Jan Kautz
DETERMINING A 3-D HAND POSE FROM A 2-D IMAGE USING MACHINE LEARNING

Publication number: 20210233273

Abstract: Apparatuses, systems, and techniques that determine the pose of a human hand from a 2-D image are described herein. In at least one embodiment, training of a neural network is augmented using weakly labeled or unlabeled pose data which is augmented with losses based on a human hand model.

Type: Application

Filed: January 24, 2020

Publication date: July 29, 2021

Inventors: Adrian Spurr, Pavlo Molchanov, Umar Iqbal, Jan Kautz
Transforming convolutional neural networks for visual sequence learning

Patent number: 11049018

Abstract: A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer. The method also includes the steps of setting hidden-to-hidden weights of the recurrent layer to initial values and processing video image data by the visual sequence learning neural network model to generate classification or regression output data.

Type: Grant

Filed: January 25, 2018

Date of Patent: June 29, 2021

Assignee: NVIDIA Corporation

Inventors: Xiaodong Yang, Pavlo Molchanov, Jan Kautz
3D plane detection and reconstruction using a monocular image

Patent number: 11037051

Abstract: Planar regions in three-dimensional scenes offer important geometric cues in a variety of three-dimensional perception tasks such as scene understanding, scene reconstruction, and robot navigation. Image analysis to detect planar regions can be performed by a deep learning architecture that includes a number of neural networks configured to estimate parameters for the planar regions. The neural networks process an image to detect an arbitrary number of plane objects in the image. Each plane object is associated with a number of estimated parameters including bounding box parameters, plane normal parameters, and a segmentation mask. Global parameters for the image, including a depth map, can also be estimated by one of the neural networks. Then, a segmentation refinement network jointly optimizes (i.e., refines) the segmentation masks for each instance of the plane objects and combines the refined segmentation masks to generate an aggregate segmentation mask for the image.

Type: Grant

Filed: September 10, 2019

Date of Patent: June 15, 2021

Assignee: NVIDIA Corporation

Inventors: Kihwan Kim, Jinwei Gu, Chen Liu, Jan Kautz
Iterative spatio-temporal action detection in video

Patent number: 11017556

Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.

Type: Grant

Filed: October 4, 2018

Date of Patent: May 25, 2021

Assignee: NVIDIA Corporation

Inventors: Xiaodong Yang, Xitong Yang, Fanyi Xiao, Ming-Yu Liu, Jan Kautz
TRAINING AND INFERENCING USING A NEURAL NETWORK TO PREDICT ORIENTATIONS OF OBJECTS IN IMAGES

Publication number: 20210150757

Abstract: Apparatuses, systems, and techniques to identify orientations of objects within images. In at least one embodiment, one or more neural networks are trained to identify an orientations of one or more objects based, at least in part, on one or more characteristics of the object other than the object's orientation.

Type: Application

Filed: November 20, 2019

Publication date: May 20, 2021

Inventors: Siva Karthik Mustikovela, Varun Jampani, Shalini De Mello, Sifei Liu, Umar Iqbal, Jan Kautz
LEARNING RIGIDITY OF DYNAMIC SCENES FOR THREE-DIMENSIONAL SCENE FLOW ESTIMATION

Publication number: 20210150736

Abstract: A neural network model receives color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space. Motion of objects in the image sequence results from a combination of a dynamic camera orientation and motion or a change in the shape of an object in the 3D space. The neural network model generates two components that are used to produce a 3D motion field representing the dynamic (non-rigid) part of the scene. The two components are information identifying dynamic and static portions of each image and the camera orientation. The dynamic portions of each image contain motion in the 3D space that is independent of the camera orientation. In other words, the motion in the 3D space (estimated 3D scene flow data) is separated from the motion of the camera.

Type: Application

Filed: January 22, 2021

Publication date: May 20, 2021

Inventors: Zhaoyang Lv, Kihwan Kim, Deqing Sun, Alejandro Jose Troccoli, Jan Kautz
SYNTHESIZING DATA FOR TRAINING ONE OR MORE NEURAL NETWORKS

Publication number: 20210142177

Abstract: Apparatuses, systems, and techniques are presented to generate data useful for further training of a neural network. In at least one embodiment, one or more neural networks can be re-trained based, at least in part, on data generated by the one or more neural networks including data used to previously train the one or more neural networks.

Type: Application

Filed: November 13, 2019

Publication date: May 13, 2021

Inventors: Arun Mallya, Jan Kautz, Zhizhong Li, Pavlo Molchanov, Hongxu Danny Yin
IMAGE ALIGNING NEURAL NETWORK

Publication number: 20210133990

Abstract: Apparatuses, systems, and techniques to generate a 3D model of an object. In at least one embodiment, a 3D model of an object is generated by one or more neural networks, based on a plurality of images of the object.

Type: Application

Filed: November 5, 2019

Publication date: May 6, 2021

Inventors: Benjamin David Eckart, Wentao Yuan, Varun Jampani, Kihwan Kim, Jan Kautz
THREE-DIMENSIONAL (3D) POSE ESTIMATION FROM A MONOCULAR CAMERA

Publication number: 20210117661

Abstract: Estimating a three-dimensional (3D) pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is necessary for human-computer interaction. A hand pose can be represented by a set of points in 3D space, called keypoints. Two coordinates (x,y) represent spatial displacement and a third coordinate represents a depth of every point with respect to the camera. A monocular camera is used to capture an image of the 3D pose, but does not capture depth information. A neural network architecture is configured to generate a depth value for each keypoint in the captured image, even when portions of the pose are occluded, or the orientation of the object is ambiguous. Generation of the depth values enables estimation of the 3D pose of the object.

Type: Application

Filed: December 28, 2020

Publication date: April 22, 2021

Inventors: Umar Iqbal, Pavlo Molchanov, Thomas Michael Breuel, Jan Kautz
Domain stylization using a neural network model

Patent number: 10984286

Abstract: A style transfer neural network may be used to generate stylized synthetic images, where real images provide the style (e.g., seasons, weather, lighting) for transfer to synthetic images. The stylized synthetic images may then be used to train a recognition neural network. In turn, the trained neural network may be used to predict semantic labels for the real images, providing recognition data for the real images. Finally, the real training dataset (real images and predicted recognition data) and the synthetic training dataset are used by the style transfer neural network to generate stylized synthetic images. The training of the neural network, prediction of recognition data for the real images, and stylizing of the synthetic images may be repeated for a number of iterations. The stylization operation more closely aligns a covariate of the synthetic images to the covariate of the real images, improving accuracy of the recognition neural network.

Type: Grant

Filed: February 1, 2019

Date of Patent: April 20, 2021

Assignee: NVIDIA Corporation

Inventors: Aysegul Dundar, Ming-Yu Liu, Ting-Chun Wang, John Zedlewski, Jan Kautz
Learning-based camera pose estimation from images of an environment

Patent number: 10964061

Abstract: A deep neural network (DNN) system learns a map representation for estimating a camera position and orientation (pose). The DNN is trained to learn a map representation corresponding to the environment, defining positions and attributes of structures, trees, walls, vehicles, etc. The DNN system learns a map representation that is versatile and performs well for many different environments (indoor, outdoor, natural, synthetic, etc.). The DNN system receives images of an environment captured by a camera (observations) and outputs an estimated camera pose within the environment. The estimated camera pose is used to perform camera localization, i.e., recover the three-dimensional (3D) position and orientation of a moving camera, which is a fundamental task in computer vision with a wide variety of applications in robot navigation, car localization for autonomous driving, device localization for mobile navigation, and augmented/virtual reality.

Type: Grant

Filed: May 12, 2020

Date of Patent: March 30, 2021

Assignee: NVIDIA Corporation

Inventors: Jinwei Gu, Samarth Manoj Brahmbhatt, Kihwan Kim, Jan Kautz
DUAL RECURRENT NEURAL NETWORK ARCHITECTURE FOR MODELING LONG-TERM DEPENDENCIES IN SEQUENTIAL DATA

Publication number: 20210089867

Abstract: Learning the dynamics of an environment and predicting consequences in the future is a recent technical advancement that can be applied to video prediction, speech recognition, among other applications. Generally, machine learning, such as deep learning models, neural networks, or other artificial intelligence algorithms are used to make the predictions. However, current artificial intelligence algorithms used for making predictions are typically limited to making short-term future predictions, mainly as a result of 1) the presence of complex dynamics in high-dimensional video data, 2) prediction error propagation over time, and 3) inherent uncertainty of the future. The present disclosure enables the modeling of long-term dependencies in sequential data for use in making long-term predictions by providing a dual (i.e. two-part) recurrent neural network architecture.

Type: Application

Filed: September 24, 2019

Publication date: March 25, 2021

Inventors: Wonmin Byeon, Jan Kautz
DRIVER GAZE TRACKING SYSTEM FOR USE IN VEHICLES

Publication number: 20210088784

Abstract: A gaze tracking system for use by the driver of a vehicle includes an opaque frame circumferentially enclosing a transparent field of view of the driver, light emitting diodes coupled to the opaque frame for emitting infrared light onto various regions of the driver's eye gazing through the transparent field of view, and diodes for sensing intensity of infrared light reflected off of various regions of the driver's eye.

Type: Application

Filed: September 20, 2019

Publication date: March 25, 2021

Applicant: Nvidia Corp.

Inventors: Eric Whitmire, Kaan Aksit, Michael Stengel, Jan Kautz, David Luebke, Ben Boudaoud
SWITCHABLE PROPAGATION NEURAL NETWORK

Publication number: 20210073575

Abstract: A temporal propagation network (TPN) system learns the affinity matrix for video image processing tasks. An affinity matrix is a generic matrix that defines the similarity of two points in space. The TPN system includes a guidance neural network model and a temporal propagation module and is trained for a particular computer vision task to propagate visual properties from a key-frame represented by dense data (color), to another frame that is represented by coarse data (grey-scale). The guidance neural network model generates an affinity matrix referred to as a global transformation matrix from task-specific data for the key-frame and the other frame. The temporal propagation module applies the global transformation matrix to the key-frame property data to produce propagated property data (color) for the other frame. For example, the TPN system may be used to colorize several frames of greyscale video using a single manually colorized key-frame.

Type: Application

Filed: October 27, 2020

Publication date: March 11, 2021

Inventors: Sifei Liu, Shalini De Mello, Jinwei Gu, Varun Jampani, Jan Kautz
MACHINE-LEARNING-BASED ARCHITECTURE SEARCH METHOD FOR A NEURAL NETWORK

Publication number: 20210073612

Abstract: In at least one embodiment, differentiable neural architecture search and reinforcement learning are combined under one framework to discover network architectures with desired properties such as high accuracy, low latency, or both. In at least one embodiment, an objective function for search based on generalization error prevents the selection of architectures prone to overfitting.

Type: Application

Filed: September 10, 2019

Publication date: March 11, 2021

Inventors: Arash Vahdat, Arun Mohanray Mallya, Ming-Yu Liu, Jan Kautz
VIDEO INTERPOLATION USING ONE OR MORE NEURAL NETWORKS

Publication number: 20210067735

Abstract: Apparatuses, systems, and techniques to enhance video. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having a higher frame rate, higher resolution, or reduced number of missing or corrupt video frames.

Type: Application

Filed: September 3, 2019

Publication date: March 4, 2021

Inventors: Fitsum Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro
CROSS-DOMAIN IMAGE PROCESSING FOR OBJECT RE-IDENTIFICATION

Publication number: 20210064907

Abstract: Object re-identification refers to a process by which images that contain an object of interest are retrieved from a set of images captured using disparate cameras or in disparate environments. Object re-identification has many useful applications, particularly as it is applied to people (e.g. person tracking). Current re-identification processes rely on convolutional neural networks (CNNs) that learn re-identification for a particular object class from labeled training data specific to a certain domain (e.g. environment), but that do not apply well in other domains. The present disclosure provides cross-domain disentanglement of id-related and id-unrelated factors. In particular, the disentanglement is performed using a labeled image set and an unlabeled image set, respectively captured from different domains but for a same object class.

Type: Application

Filed: August 20, 2020

Publication date: March 4, 2021

Inventors: Xiaodong Yang, Yang Zou, Zhiding Yu, Jan Kautz

prev 1 2 3 4 5 6 7 8 9 … next