Patents by Inventor Jan Kautz

Jan Kautz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210248772
    Abstract: Learning to estimate a 3D body pose, and likewise the pose of any type of object, from a single 2D image is of great interest for many practical graphics applications and generally relies on neural networks that have been trained with sample data which annotates (labels) each sample 2D image with a known 3D pose. Requiring this labeled training data however has various drawbacks, including for example that traditionally used training data sets lack diversity and therefore limit the extent to which neural networks are able to estimate 3D pose. Expanding these training data sets is also difficult since it requires manually provided annotations for 2D images, which is time consuming and prone to errors. The present disclosure overcomes these and other limitations of existing techniques by providing a model that is trained from unlabeled multi-view data for use in 3D pose estimation.
    Type: Application
    Filed: June 9, 2020
    Publication date: August 12, 2021
    Inventors: Umar Iqbal, Pavlo Molchanov, Jan Kautz
  • Publication number: 20210241489
    Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.
    Type: Application
    Filed: April 22, 2021
    Publication date: August 5, 2021
    Inventors: Xiaodong YANG, Ming-Yu LIU, Jan KAUTZ, Fanyi XIAO, Xitong YANG
  • Patent number: 11082720
    Abstract: A method, computer readable medium, and system are disclosed for identifying residual video data. This data describes data that is lost during a compression of original video data. For example, the original video data may be compressed and then decompressed, and this result may be compared to the original video data to determine the residual video data. This residual video data is transformed into a smaller format by means of encoding, binarizing, and compressing, and is sent to a destination. At the destination, the residual video data is transformed back into its original format and is used during the decompression of the compressed original video data to improve a quality of the decompressed original video data.
    Type: Grant
    Filed: November 14, 2018
    Date of Patent: August 3, 2021
    Assignee: NVIDIA CORPORATION
    Inventors: Yi-Hsuan Tsai, Ming-Yu Liu, Deqing Sun, Ming-Hsuan Yang, Jan Kautz
  • Publication number: 20210233273
    Abstract: Apparatuses, systems, and techniques that determine the pose of a human hand from a 2-D image are described herein. In at least one embodiment, training of a neural network is augmented using weakly labeled or unlabeled pose data which is augmented with losses based on a human hand model.
    Type: Application
    Filed: January 24, 2020
    Publication date: July 29, 2021
    Inventors: Adrian Spurr, Pavlo Molchanov, Umar Iqbal, Jan Kautz
  • Patent number: 11049018
    Abstract: A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer. The method also includes the steps of setting hidden-to-hidden weights of the recurrent layer to initial values and processing video image data by the visual sequence learning neural network model to generate classification or regression output data.
    Type: Grant
    Filed: January 25, 2018
    Date of Patent: June 29, 2021
    Assignee: NVIDIA Corporation
    Inventors: Xiaodong Yang, Pavlo Molchanov, Jan Kautz
  • Patent number: 11037051
    Abstract: Planar regions in three-dimensional scenes offer important geometric cues in a variety of three-dimensional perception tasks such as scene understanding, scene reconstruction, and robot navigation. Image analysis to detect planar regions can be performed by a deep learning architecture that includes a number of neural networks configured to estimate parameters for the planar regions. The neural networks process an image to detect an arbitrary number of plane objects in the image. Each plane object is associated with a number of estimated parameters including bounding box parameters, plane normal parameters, and a segmentation mask. Global parameters for the image, including a depth map, can also be estimated by one of the neural networks. Then, a segmentation refinement network jointly optimizes (i.e., refines) the segmentation masks for each instance of the plane objects and combines the refined segmentation masks to generate an aggregate segmentation mask for the image.
    Type: Grant
    Filed: September 10, 2019
    Date of Patent: June 15, 2021
    Assignee: NVIDIA Corporation
    Inventors: Kihwan Kim, Jinwei Gu, Chen Liu, Jan Kautz
  • Patent number: 11017556
    Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.
    Type: Grant
    Filed: October 4, 2018
    Date of Patent: May 25, 2021
    Assignee: NVIDIA Corporation
    Inventors: Xiaodong Yang, Xitong Yang, Fanyi Xiao, Ming-Yu Liu, Jan Kautz
  • Publication number: 20210150757
    Abstract: Apparatuses, systems, and techniques to identify orientations of objects within images. In at least one embodiment, one or more neural networks are trained to identify an orientations of one or more objects based, at least in part, on one or more characteristics of the object other than the object's orientation.
    Type: Application
    Filed: November 20, 2019
    Publication date: May 20, 2021
    Inventors: Siva Karthik Mustikovela, Varun Jampani, Shalini De Mello, Sifei Liu, Umar Iqbal, Jan Kautz
  • Publication number: 20210150736
    Abstract: A neural network model receives color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space. Motion of objects in the image sequence results from a combination of a dynamic camera orientation and motion or a change in the shape of an object in the 3D space. The neural network model generates two components that are used to produce a 3D motion field representing the dynamic (non-rigid) part of the scene. The two components are information identifying dynamic and static portions of each image and the camera orientation. The dynamic portions of each image contain motion in the 3D space that is independent of the camera orientation. In other words, the motion in the 3D space (estimated 3D scene flow data) is separated from the motion of the camera.
    Type: Application
    Filed: January 22, 2021
    Publication date: May 20, 2021
    Inventors: Zhaoyang Lv, Kihwan Kim, Deqing Sun, Alejandro Jose Troccoli, Jan Kautz
  • Publication number: 20210142177
    Abstract: Apparatuses, systems, and techniques are presented to generate data useful for further training of a neural network. In at least one embodiment, one or more neural networks can be re-trained based, at least in part, on data generated by the one or more neural networks including data used to previously train the one or more neural networks.
    Type: Application
    Filed: November 13, 2019
    Publication date: May 13, 2021
    Inventors: Arun Mallya, Jan Kautz, Zhizhong Li, Pavlo Molchanov, Hongxu Danny Yin
  • Publication number: 20210133990
    Abstract: Apparatuses, systems, and techniques to generate a 3D model of an object. In at least one embodiment, a 3D model of an object is generated by one or more neural networks, based on a plurality of images of the object.
    Type: Application
    Filed: November 5, 2019
    Publication date: May 6, 2021
    Inventors: Benjamin David Eckart, Wentao Yuan, Varun Jampani, Kihwan Kim, Jan Kautz
  • Publication number: 20210117661
    Abstract: Estimating a three-dimensional (3D) pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is necessary for human-computer interaction. A hand pose can be represented by a set of points in 3D space, called keypoints. Two coordinates (x,y) represent spatial displacement and a third coordinate represents a depth of every point with respect to the camera. A monocular camera is used to capture an image of the 3D pose, but does not capture depth information. A neural network architecture is configured to generate a depth value for each keypoint in the captured image, even when portions of the pose are occluded, or the orientation of the object is ambiguous. Generation of the depth values enables estimation of the 3D pose of the object.
    Type: Application
    Filed: December 28, 2020
    Publication date: April 22, 2021
    Inventors: Umar Iqbal, Pavlo Molchanov, Thomas Michael Breuel, Jan Kautz
  • Patent number: 10984286
    Abstract: A style transfer neural network may be used to generate stylized synthetic images, where real images provide the style (e.g., seasons, weather, lighting) for transfer to synthetic images. The stylized synthetic images may then be used to train a recognition neural network. In turn, the trained neural network may be used to predict semantic labels for the real images, providing recognition data for the real images. Finally, the real training dataset (real images and predicted recognition data) and the synthetic training dataset are used by the style transfer neural network to generate stylized synthetic images. The training of the neural network, prediction of recognition data for the real images, and stylizing of the synthetic images may be repeated for a number of iterations. The stylization operation more closely aligns a covariate of the synthetic images to the covariate of the real images, improving accuracy of the recognition neural network.
    Type: Grant
    Filed: February 1, 2019
    Date of Patent: April 20, 2021
    Assignee: NVIDIA Corporation
    Inventors: Aysegul Dundar, Ming-Yu Liu, Ting-Chun Wang, John Zedlewski, Jan Kautz
  • Patent number: 10964061
    Abstract: A deep neural network (DNN) system learns a map representation for estimating a camera position and orientation (pose). The DNN is trained to learn a map representation corresponding to the environment, defining positions and attributes of structures, trees, walls, vehicles, etc. The DNN system learns a map representation that is versatile and performs well for many different environments (indoor, outdoor, natural, synthetic, etc.). The DNN system receives images of an environment captured by a camera (observations) and outputs an estimated camera pose within the environment. The estimated camera pose is used to perform camera localization, i.e., recover the three-dimensional (3D) position and orientation of a moving camera, which is a fundamental task in computer vision with a wide variety of applications in robot navigation, car localization for autonomous driving, device localization for mobile navigation, and augmented/virtual reality.
    Type: Grant
    Filed: May 12, 2020
    Date of Patent: March 30, 2021
    Assignee: NVIDIA Corporation
    Inventors: Jinwei Gu, Samarth Manoj Brahmbhatt, Kihwan Kim, Jan Kautz
  • Publication number: 20210089867
    Abstract: Learning the dynamics of an environment and predicting consequences in the future is a recent technical advancement that can be applied to video prediction, speech recognition, among other applications. Generally, machine learning, such as deep learning models, neural networks, or other artificial intelligence algorithms are used to make the predictions. However, current artificial intelligence algorithms used for making predictions are typically limited to making short-term future predictions, mainly as a result of 1) the presence of complex dynamics in high-dimensional video data, 2) prediction error propagation over time, and 3) inherent uncertainty of the future. The present disclosure enables the modeling of long-term dependencies in sequential data for use in making long-term predictions by providing a dual (i.e. two-part) recurrent neural network architecture.
    Type: Application
    Filed: September 24, 2019
    Publication date: March 25, 2021
    Inventors: Wonmin Byeon, Jan Kautz
  • Publication number: 20210088784
    Abstract: A gaze tracking system for use by the driver of a vehicle includes an opaque frame circumferentially enclosing a transparent field of view of the driver, light emitting diodes coupled to the opaque frame for emitting infrared light onto various regions of the driver's eye gazing through the transparent field of view, and diodes for sensing intensity of infrared light reflected off of various regions of the driver's eye.
    Type: Application
    Filed: September 20, 2019
    Publication date: March 25, 2021
    Applicant: Nvidia Corp.
    Inventors: Eric Whitmire, Kaan Aksit, Michael Stengel, Jan Kautz, David Luebke, Ben Boudaoud
  • Publication number: 20210073575
    Abstract: A temporal propagation network (TPN) system learns the affinity matrix for video image processing tasks. An affinity matrix is a generic matrix that defines the similarity of two points in space. The TPN system includes a guidance neural network model and a temporal propagation module and is trained for a particular computer vision task to propagate visual properties from a key-frame represented by dense data (color), to another frame that is represented by coarse data (grey-scale). The guidance neural network model generates an affinity matrix referred to as a global transformation matrix from task-specific data for the key-frame and the other frame. The temporal propagation module applies the global transformation matrix to the key-frame property data to produce propagated property data (color) for the other frame. For example, the TPN system may be used to colorize several frames of greyscale video using a single manually colorized key-frame.
    Type: Application
    Filed: October 27, 2020
    Publication date: March 11, 2021
    Inventors: Sifei Liu, Shalini De Mello, Jinwei Gu, Varun Jampani, Jan Kautz
  • Publication number: 20210073612
    Abstract: In at least one embodiment, differentiable neural architecture search and reinforcement learning are combined under one framework to discover network architectures with desired properties such as high accuracy, low latency, or both. In at least one embodiment, an objective function for search based on generalization error prevents the selection of architectures prone to overfitting.
    Type: Application
    Filed: September 10, 2019
    Publication date: March 11, 2021
    Inventors: Arash Vahdat, Arun Mohanray Mallya, Ming-Yu Liu, Jan Kautz
  • Publication number: 20210067735
    Abstract: Apparatuses, systems, and techniques to enhance video. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having a higher frame rate, higher resolution, or reduced number of missing or corrupt video frames.
    Type: Application
    Filed: September 3, 2019
    Publication date: March 4, 2021
    Inventors: Fitsum Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro
  • Publication number: 20210064907
    Abstract: Object re-identification refers to a process by which images that contain an object of interest are retrieved from a set of images captured using disparate cameras or in disparate environments. Object re-identification has many useful applications, particularly as it is applied to people (e.g. person tracking). Current re-identification processes rely on convolutional neural networks (CNNs) that learn re-identification for a particular object class from labeled training data specific to a certain domain (e.g. environment), but that do not apply well in other domains. The present disclosure provides cross-domain disentanglement of id-related and id-unrelated factors. In particular, the disentanglement is performed using a labeled image set and an unlabeled image set, respectively captured from different domains but for a same object class.
    Type: Application
    Filed: August 20, 2020
    Publication date: March 4, 2021
    Inventors: Xiaodong Yang, Yang Zou, Zhiding Yu, Jan Kautz