Patents by Inventor Jan Kautz

Jan Kautz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

LEARNING ROBOTIC TASKS USING ONE OR MORE NEURAL NETWORKS

Publication number: 20210390653

Abstract: Various embodiments enable a robot, or other autonomous or semi-autonomous device or system, to receive data involving the performance of a task in the physical world. The data can be provided as input to a perception network to infer a set of percepts about the task, which can correspond to relationships between objects observed during the performance. The percepts can be provided as input to a plan generation network, which can infer a set of actions as part of a plan. Each action can correspond to one of the observed relationships. The plan can be reviewed and any corrections made, either manually or through another demonstration of the task. Once the plan is verified as correct, the plan (and any related data) can be provided as input to an execution network that can infer instructions to cause the robot, and/or another robot, to perform the task.

Type: Application

Filed: August 26, 2021

Publication date: December 16, 2021

Inventors: Jonathan Tremblay, Stan Birchfield, Stephen Tyree, Thang To, Jan Kautz, Artem Molchanov
DISTANCE DETERMINATIONS USING ONE OR MORE NEURAL NETWORKS

Publication number: 20210326694

Abstract: Apparatuses, systems, and techniques are presented to determine distance for one or more objects. In at least one embodiment, a disparity network is trained to determine distance data from input stereoscopic images using a loss function that includes at least one of a gradient loss term and an occlusion loss term.

Type: Application

Filed: April 20, 2020

Publication date: October 21, 2021

Inventors: Jialiang Wang, Varun Jampani, Stan Birchfield, Charles Loop, Jan Kautz
USING RESIDUAL VIDEO DATA RESULTING FROM A COMPRESSION OF ORIGINAL VIDEO DATA TO IMPROVE A DECOMPRESSION OF THE ORIGINAL VIDEO DATA

Publication number: 20210314629

Abstract: A method, computer readable medium, and system are disclosed for identifying residual video data. This data describes data that is lost during a compression of original video data. For example, the original video data may be compressed and then decompressed, and this result may be compared to the original video data to determine the residual video data. This residual video data is transformed into a smaller format by means of encoding, binarizing, and compressing, and is sent to a destination. At the destination, the residual video data is transformed back into its original format and is used during the decompression of the compressed original video data to improve a quality of the decompressed original video data.

Type: Application

Filed: June 18, 2021

Publication date: October 7, 2021

Inventors: Yi-Hsuan Tsai, Ming-Yu Liu, Deqing Sun, Ming-Hsuan Yang, Jan Kautz
Unconstrained appearance-based gaze estimation

Patent number: 11132543

Abstract: A method, computer readable medium, and system are disclosed for performing unconstrained appearance-based gaze estimation. The method includes the steps of identifying an image of an eye and a head orientation associated with the image of the eye, determining an orientation for the eye by analyzing, within a convolutional neural network (CNN), the image of the eye and the head orientation associated with the image of the eye, and returning the orientation of the eye.

Type: Grant

Filed: December 27, 2017

Date of Patent: September 28, 2021

Assignee: NVIDIA CORPORATION

Inventors: Rajeev Ranjan, Shalini De Mello, Jan Kautz
SELF-SUPERVISED SINGLE-VIEW 3D RECONSTRUCTION VIA SEMANTIC CONSISTENCY

Publication number: 20210287430

Abstract: Apparatuses, systems, and techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object. In at least one embodiment, objects are identified in an image using one or more neural networks that have been trained on objects of a similar category and a three-dimensional mesh template.

Type: Application

Filed: April 15, 2020

Publication date: September 16, 2021

Inventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Jan Kautz
TRANSFORMING CONVOLUTIONAL NEURAL NETWORKS FOR VISUAL SEQUENCE LEARNING

Publication number: 20210271977

Abstract: A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer. The method also includes the steps of setting hidden-to-hidden weights of the recurrent layer to initial values and processing video image data by the visual sequence learning neural network model to generate classification or regression output data.

Type: Application

Filed: May 19, 2021

Publication date: September 2, 2021

Inventors: Xiaodong Yang, Pavlo Molchanov, Jan Kautz
3D HUMAN BODY POSE ESTIMATION USING A MODEL TRAINED FROM UNLABELED MULTI-VIEW DATA

Publication number: 20210248772

Abstract: Learning to estimate a 3D body pose, and likewise the pose of any type of object, from a single 2D image is of great interest for many practical graphics applications and generally relies on neural networks that have been trained with sample data which annotates (labels) each sample 2D image with a known 3D pose. Requiring this labeled training data however has various drawbacks, including for example that traditionally used training data sets lack diversity and therefore limit the extent to which neural networks are able to estimate 3D pose. Expanding these training data sets is also difficult since it requires manually provided annotations for 2D images, which is time consuming and prone to errors. The present disclosure overcomes these and other limitations of existing techniques by providing a model that is trained from unlabeled multi-view data for use in 3D pose estimation.

Type: Application

Filed: June 9, 2020

Publication date: August 12, 2021

Inventors: Umar Iqbal, Pavlo Molchanov, Jan Kautz
ITERATIVE SPATIO-TEMPORAL ACTION DETECTION IN VIDEO

Publication number: 20210241489

Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.

Type: Application

Filed: April 22, 2021

Publication date: August 5, 2021

Inventors: Xiaodong YANG, Ming-Yu LIU, Jan KAUTZ, Fanyi XIAO, Xitong YANG
Using residual video data resulting from a compression of original video data to improve a decompression of the original video data

Patent number: 11082720

Abstract: A method, computer readable medium, and system are disclosed for identifying residual video data. This data describes data that is lost during a compression of original video data. For example, the original video data may be compressed and then decompressed, and this result may be compared to the original video data to determine the residual video data. This residual video data is transformed into a smaller format by means of encoding, binarizing, and compressing, and is sent to a destination. At the destination, the residual video data is transformed back into its original format and is used during the decompression of the compressed original video data to improve a quality of the decompressed original video data.

Type: Grant

Filed: November 14, 2018

Date of Patent: August 3, 2021

Assignee: NVIDIA CORPORATION

Inventors: Yi-Hsuan Tsai, Ming-Yu Liu, Deqing Sun, Ming-Hsuan Yang, Jan Kautz
DETERMINING A 3-D HAND POSE FROM A 2-D IMAGE USING MACHINE LEARNING

Publication number: 20210233273

Abstract: Apparatuses, systems, and techniques that determine the pose of a human hand from a 2-D image are described herein. In at least one embodiment, training of a neural network is augmented using weakly labeled or unlabeled pose data which is augmented with losses based on a human hand model.

Type: Application

Filed: January 24, 2020

Publication date: July 29, 2021

Inventors: Adrian Spurr, Pavlo Molchanov, Umar Iqbal, Jan Kautz
Transforming convolutional neural networks for visual sequence learning

Patent number: 11049018

Abstract: A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer. The method also includes the steps of setting hidden-to-hidden weights of the recurrent layer to initial values and processing video image data by the visual sequence learning neural network model to generate classification or regression output data.

Type: Grant

Filed: January 25, 2018

Date of Patent: June 29, 2021

Assignee: NVIDIA Corporation

Inventors: Xiaodong Yang, Pavlo Molchanov, Jan Kautz
3D plane detection and reconstruction using a monocular image

Patent number: 11037051

Abstract: Planar regions in three-dimensional scenes offer important geometric cues in a variety of three-dimensional perception tasks such as scene understanding, scene reconstruction, and robot navigation. Image analysis to detect planar regions can be performed by a deep learning architecture that includes a number of neural networks configured to estimate parameters for the planar regions. The neural networks process an image to detect an arbitrary number of plane objects in the image. Each plane object is associated with a number of estimated parameters including bounding box parameters, plane normal parameters, and a segmentation mask. Global parameters for the image, including a depth map, can also be estimated by one of the neural networks. Then, a segmentation refinement network jointly optimizes (i.e., refines) the segmentation masks for each instance of the plane objects and combines the refined segmentation masks to generate an aggregate segmentation mask for the image.

Type: Grant

Filed: September 10, 2019

Date of Patent: June 15, 2021

Assignee: NVIDIA Corporation

Inventors: Kihwan Kim, Jinwei Gu, Chen Liu, Jan Kautz
Iterative spatio-temporal action detection in video

Patent number: 11017556

Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.

Type: Grant

Filed: October 4, 2018

Date of Patent: May 25, 2021

Assignee: NVIDIA Corporation

Inventors: Xiaodong Yang, Xitong Yang, Fanyi Xiao, Ming-Yu Liu, Jan Kautz
TRAINING AND INFERENCING USING A NEURAL NETWORK TO PREDICT ORIENTATIONS OF OBJECTS IN IMAGES

Publication number: 20210150757

Abstract: Apparatuses, systems, and techniques to identify orientations of objects within images. In at least one embodiment, one or more neural networks are trained to identify an orientations of one or more objects based, at least in part, on one or more characteristics of the object other than the object's orientation.

Type: Application

Filed: November 20, 2019

Publication date: May 20, 2021

Inventors: Siva Karthik Mustikovela, Varun Jampani, Shalini De Mello, Sifei Liu, Umar Iqbal, Jan Kautz
LEARNING RIGIDITY OF DYNAMIC SCENES FOR THREE-DIMENSIONAL SCENE FLOW ESTIMATION

Publication number: 20210150736

Abstract: A neural network model receives color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space. Motion of objects in the image sequence results from a combination of a dynamic camera orientation and motion or a change in the shape of an object in the 3D space. The neural network model generates two components that are used to produce a 3D motion field representing the dynamic (non-rigid) part of the scene. The two components are information identifying dynamic and static portions of each image and the camera orientation. The dynamic portions of each image contain motion in the 3D space that is independent of the camera orientation. In other words, the motion in the 3D space (estimated 3D scene flow data) is separated from the motion of the camera.

Type: Application

Filed: January 22, 2021

Publication date: May 20, 2021

Inventors: Zhaoyang Lv, Kihwan Kim, Deqing Sun, Alejandro Jose Troccoli, Jan Kautz
SYNTHESIZING DATA FOR TRAINING ONE OR MORE NEURAL NETWORKS

Publication number: 20210142177

Abstract: Apparatuses, systems, and techniques are presented to generate data useful for further training of a neural network. In at least one embodiment, one or more neural networks can be re-trained based, at least in part, on data generated by the one or more neural networks including data used to previously train the one or more neural networks.

Type: Application

Filed: November 13, 2019

Publication date: May 13, 2021

Inventors: Arun Mallya, Jan Kautz, Zhizhong Li, Pavlo Molchanov, Hongxu Danny Yin
IMAGE ALIGNING NEURAL NETWORK

Publication number: 20210133990

Abstract: Apparatuses, systems, and techniques to generate a 3D model of an object. In at least one embodiment, a 3D model of an object is generated by one or more neural networks, based on a plurality of images of the object.

Type: Application

Filed: November 5, 2019

Publication date: May 6, 2021

Inventors: Benjamin David Eckart, Wentao Yuan, Varun Jampani, Kihwan Kim, Jan Kautz
THREE-DIMENSIONAL (3D) POSE ESTIMATION FROM A MONOCULAR CAMERA

Publication number: 20210117661

Abstract: Estimating a three-dimensional (3D) pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is necessary for human-computer interaction. A hand pose can be represented by a set of points in 3D space, called keypoints. Two coordinates (x,y) represent spatial displacement and a third coordinate represents a depth of every point with respect to the camera. A monocular camera is used to capture an image of the 3D pose, but does not capture depth information. A neural network architecture is configured to generate a depth value for each keypoint in the captured image, even when portions of the pose are occluded, or the orientation of the object is ambiguous. Generation of the depth values enables estimation of the 3D pose of the object.

Type: Application

Filed: December 28, 2020

Publication date: April 22, 2021

Inventors: Umar Iqbal, Pavlo Molchanov, Thomas Michael Breuel, Jan Kautz
Domain stylization using a neural network model

Patent number: 10984286

Abstract: A style transfer neural network may be used to generate stylized synthetic images, where real images provide the style (e.g., seasons, weather, lighting) for transfer to synthetic images. The stylized synthetic images may then be used to train a recognition neural network. In turn, the trained neural network may be used to predict semantic labels for the real images, providing recognition data for the real images. Finally, the real training dataset (real images and predicted recognition data) and the synthetic training dataset are used by the style transfer neural network to generate stylized synthetic images. The training of the neural network, prediction of recognition data for the real images, and stylizing of the synthetic images may be repeated for a number of iterations. The stylization operation more closely aligns a covariate of the synthetic images to the covariate of the real images, improving accuracy of the recognition neural network.

Type: Grant

Filed: February 1, 2019

Date of Patent: April 20, 2021

Assignee: NVIDIA Corporation

Inventors: Aysegul Dundar, Ming-Yu Liu, Ting-Chun Wang, John Zedlewski, Jan Kautz
Learning-based camera pose estimation from images of an environment

Patent number: 10964061

Abstract: A deep neural network (DNN) system learns a map representation for estimating a camera position and orientation (pose). The DNN is trained to learn a map representation corresponding to the environment, defining positions and attributes of structures, trees, walls, vehicles, etc. The DNN system learns a map representation that is versatile and performs well for many different environments (indoor, outdoor, natural, synthetic, etc.). The DNN system receives images of an environment captured by a camera (observations) and outputs an estimated camera pose within the environment. The estimated camera pose is used to perform camera localization, i.e., recover the three-dimensional (3D) position and orientation of a moving camera, which is a fundamental task in computer vision with a wide variety of applications in robot navigation, car localization for autonomous driving, device localization for mobile navigation, and augmented/virtual reality.

Type: Grant

Filed: May 12, 2020

Date of Patent: March 30, 2021

Assignee: NVIDIA Corporation

Inventors: Jinwei Gu, Samarth Manoj Brahmbhatt, Kihwan Kim, Jan Kautz

prev 1 2 3 4 5 6 7 8 9 … next