Patents by Inventor Jan Kautz

Jan Kautz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210067735
    Abstract: Apparatuses, systems, and techniques to enhance video. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having a higher frame rate, higher resolution, or reduced number of missing or corrupt video frames.
    Type: Application
    Filed: September 3, 2019
    Publication date: March 4, 2021
    Inventors: Fitsum Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro
  • Publication number: 20210064907
    Abstract: Object re-identification refers to a process by which images that contain an object of interest are retrieved from a set of images captured using disparate cameras or in disparate environments. Object re-identification has many useful applications, particularly as it is applied to people (e.g. person tracking). Current re-identification processes rely on convolutional neural networks (CNNs) that learn re-identification for a particular object class from labeled training data specific to a certain domain (e.g. environment), but that do not apply well in other domains. The present disclosure provides cross-domain disentanglement of id-related and id-unrelated factors. In particular, the disentanglement is performed using a labeled image set and an unlabeled image set, respectively captured from different domains but for a same object class.
    Type: Application
    Filed: August 20, 2020
    Publication date: March 4, 2021
    Inventors: Xiaodong Yang, Yang Zou, Zhiding Yu, Jan Kautz
  • Publication number: 20210064931
    Abstract: There are numerous features in video that can be detected using computer-based systems, such as objects and/or motion. The detection of these features, and in particular the detection of motion, has many useful applications, such as action recognition, activity detection, object tracking, etc. The present disclosure provides a neural network that learns motion from unlabeled video frames. In particular, the neural network uses the unlabeled video frames to perform self-supervised hierarchical motion learning. The present disclosure also describes how the learned motion can be used in video action recognition.
    Type: Application
    Filed: August 20, 2020
    Publication date: March 4, 2021
    Inventors: Xiaodong Yang, Xitong Yang, Sifei Liu, Jan Kautz
  • Publication number: 20210056353
    Abstract: The disclosure provides a framework or system for learning visual representation using a large set of image/text pairs. The disclosure provides, for example, a method of visual representation learning, a joint representation learning system, and an artificial intelligence (AI) system that employs one or more of the trained models from the method or system. The AI system can be used, for example, in autonomous or semi-autonomous vehicles. In one example, the method of visual representation learning includes: (1) receiving a set of image embeddings from an image representation model and a set of text embeddings from a text representation model, and (2) training, employing mutual information, a critic function by learning relationships between the set of image embeddings and the set of text embeddings.
    Type: Application
    Filed: August 21, 2020
    Publication date: February 25, 2021
    Inventors: Arash Vahdat, Tanmay Gupta, Xiaodong Yang, Jan Kautz
  • Patent number: 10929654
    Abstract: Estimating a three-dimensional (3D) pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is necessary for human-computer interaction. A hand pose can be represented by a set of points in 3D space, called keypoints. Two coordinates (x,y) represent spatial displacement and a third coordinate represents a depth of every point with respect to the camera. A monocular camera is used to capture an image of the 3D pose, but does not capture depth information. A neural network architecture is configured to generate a depth value for each keypoint in the captured image, even when portions of the pose are occluded, or the orientation of the object is ambiguous. Generation of the depth values enables estimation of the 3D pose of the object.
    Type: Grant
    Filed: March 1, 2019
    Date of Patent: February 23, 2021
    Assignee: NVIDIA Corporation
    Inventors: Umar Iqbal, Pavlo Molchanov, Thomas Michael Breuel, Jan Kautz
  • Patent number: 10929987
    Abstract: A neural network model receives color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space. Motion of objects in the image sequence results from a combination of a dynamic camera orientation and motion or a change in the shape of an object in the 3D space. The neural network model generates two components that are used to produce a 3D motion field representing the dynamic (non-rigid) part of the scene. The two components are information identifying dynamic and static portions of each image and the camera orientation. The dynamic portions of each image contain motion in the 3D space that is independent of the camera orientation. In other words, the motion in the 3D space (estimated 3D scene flow data) is separated from the motion of the camera.
    Type: Grant
    Filed: August 1, 2018
    Date of Patent: February 23, 2021
    Assignee: NVIDIA Corporation
    Inventors: Zhaoyang Lv, Kihwan Kim, Deqing Sun, Alejandro Jose Troccoli, Jan Kautz
  • Patent number: 10922793
    Abstract: Missing image content is generated using a neural network. In an embodiment, a high resolution image and associated high resolution semantic label map are generated from a low resolution image and associated low resolution semantic label map. The input image/map pair (low resolution image and associated low resolution semantic label map) lacks detail and is therefore missing content. Rather than simply enhancing the input image/map pair, data missing in the input image/map pair is improvised or hallucinated by a neural network, creating plausible content while maintaining spatio-temporal consistency. Missing content is hallucinated to generate a detailed zoomed in portion of an image. Missing content is hallucinated to generate different variations of an image, such as different seasons or weather conditions for a driving video.
    Type: Grant
    Filed: March 14, 2019
    Date of Patent: February 16, 2021
    Assignee: NVIDIA Corporation
    Inventors: Seung-Hwan Baek, Kihwan Kim, Jinwei Gu, Orazio Gallo, Alejandro Jose Troccoli, Ming-Yu Liu, Jan Kautz
  • Patent number: 10872399
    Abstract: Photorealistic image stylization concerns transferring style of a reference photo to a content photo with the constraint that the stylized photo should remain photorealistic. Examples of styles include seasons (summer, winter, etc.), weather (sunny, rainy, foggy, etc.), lighting (daytime, nighttime, etc.). A photorealistic image stylization process includes a stylization step and a smoothing step. The stylization step transfers the style of the reference photo to the content photo. A photo style transfer neural network model receives a photorealistic content image and a photorealistic style image and generates an intermediate stylized photorealistic image that includes the content of the content image modified according to the style image. A smoothing function receives the intermediate stylized photorealistic image and pixel similarity data and generates the stylized photorealistic image, ensuring spatially consistent stylizations.
    Type: Grant
    Filed: January 11, 2019
    Date of Patent: December 22, 2020
    Assignee: NVIDIA Corporation
    Inventors: Yijun Li, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz
  • Publication number: 20200394458
    Abstract: Apparatuses, systems, and techniques to detect object in images including digital representations of those objects. In at least one embodiment, one or more objects are detected in an image based, at least in part, on one or more pseudo-labels corresponding to said one or more objects.
    Type: Application
    Filed: June 17, 2019
    Publication date: December 17, 2020
    Inventors: Zhiding Yu, Jason Ren, Xiaodong Yang, Ming-Yu Liu, Jan Kautz
  • Patent number: 10860859
    Abstract: Detection of activity in video content, and more particularly detecting in video start and end frames inclusive of an activity and a classification for the activity, is fundamental for video analytics including categorizing, searching, indexing, segmentation, and retrieval of videos. Existing activity detection processes rely on a large set of features and classifiers that exhaustively run over every time step of a video at multiple temporal scales, or as a small improvement computationally propose segments of the video on which to perform classification. These existing activity detection processes, however, are computationally expensive, particularly when trying to achieve activity detection accuracy, and moreover are not configurable for any particular time or computation budget. The present disclosure provides a time and/or computation budget-aware method for detecting activity in video that relies on a recurrent neural network implementing a learned policy.
    Type: Grant
    Filed: November 28, 2018
    Date of Patent: December 8, 2020
    Assignee: NVIDIA Corporation
    Inventors: Xiaodong Yang, Pavlo Molchanov, Jan Kautz, Behrooz Mahasseni
  • Patent number: 10838492
    Abstract: A gaze tracking system for use in head mounted displays includes an eyepiece having an opaque frame circumferentially enclosing a transparent field of view, light emitting diodes coupled to the opaque frame for emitting infrared light onto various regions of an eye gazing through the transparent field of view, and diodes for sensing intensity of infrared light reflected off of various regions of the eye.
    Type: Grant
    Filed: September 20, 2019
    Date of Patent: November 17, 2020
    Assignee: NVIDIA Corp.
    Inventors: Eric Whitmire, Kaan Aksit, Michael Stengel, Jan Kautz, David Luebke, Ben Boudaoud
  • Patent number: 10826786
    Abstract: Point cloud registration sits at the core of many important and challenging 3D perception problems including autonomous navigation, object/scene recognition, and augmented reality (AR). A new registration algorithm is presented that achieves speed and accuracy by registering a point cloud to a representation of a reference point cloud. A target point cloud is registered to the reference point cloud by iterating through a number of cycles of an EM algorithm where, during an Expectation step, each point in the target point cloud is associated with a node of a hierarchical tree data structure and, during a Maximization step, an estimated transformation is determined based on the association of the points with corresponding nodes of the hierarchical tree data structure. The estimated transformation is determined by solving a minimization problem associated with a sum, over a number of mixture components, over terms related to a Mahalanobis distance.
    Type: Grant
    Filed: March 12, 2019
    Date of Patent: November 3, 2020
    Assignee: NVIDIA Corporation
    Inventors: Benjamin David Eckart, Kihwan Kim, Jan Kautz
  • Publication number: 20200342263
    Abstract: When a computer image is generated from a real-world scene having a semi-reflective surface (e.g. window), the computer image will create, at the semi-reflective surface from the viewpoint of the camera, both a reflection of a scene in front of the semi-reflective surface and a transmission of a scene located behind the semi-reflective surface. Similar to a person viewing the real-world scene from different locations, angles, etc., the reflection and transmission may change, and also move relative to each other, as the viewpoint of the camera changes. Unfortunately, the dynamic nature of the reflection and transmission negatively impacts the performance of many computer applications, but performance can generally be improved if the reflection and transmission are separated. The present disclosure uses deep learning to separate reflection and transmission at a semi-reflective surface of a computer image generated from a real-world scene.
    Type: Application
    Filed: July 8, 2020
    Publication date: October 29, 2020
    Inventors: Orazio Gallo, Jinwei Gu, Jan Kautz, Patrick Wieschollek
  • Publication number: 20200334502
    Abstract: Segmentation is the identification of separate objects within an image. An example is identification of a pedestrian passing in front of a car, where the pedestrian is a first object and the car is a second object. Superpixel segmentation is the identification of regions of pixels within an object that have similar properties. An example is identification of pixel regions having a similar color, such as different articles of clothing worn by the pedestrian and different components of the car. A pixel affinity neural network (PAN) model is trained to generate pixel affinity maps for superpixel segmentation. The pixel affinity map defines the similarity of two points in space. In an embodiment, the pixel affinity map indicates a horizontal affinity and vertical affinity for each pixel in the image. The pixel affinity map is processed to identify the superpixels.
    Type: Application
    Filed: July 6, 2020
    Publication date: October 22, 2020
    Inventors: Wei-Chih Tu, Ming-Yu Liu, Varun Jampani, Deqing Sun, Ming-Hsuan Yang, Jan Kautz
  • Publication number: 20200334543
    Abstract: A neural network is trained to identify one or more features of an image. The neural network is trained using a small number of original images, from which a plurality of additional images are derived. The additional images generated by rotating and decoding embeddings of the image in a latent space generated by an autoencoder. The images generated by the rotation and decoding exhibit changes to a feature that is in proportion to the amount of rotation.
    Type: Application
    Filed: April 19, 2019
    Publication date: October 22, 2020
    Inventors: Seonwook Park, Shalini De Mello, Pavlo Molchanov, Umar Iqbal, Jan Kautz
  • Publication number: 20200320401
    Abstract: Systems and methods to detect one or more segments of one or more objects within one or more images based, at least in part, on a neural network trained in an unsupervised manner to infer the one or more segments. Systems and methods to help train one or more neural networks to detect one or more segments of one or more objects within one or more images in an unsupervised manner.
    Type: Application
    Filed: April 8, 2019
    Publication date: October 8, 2020
    Inventors: Varun Jampani, Wei-Chih Hung, Sifei Liu, Pavlo Molchanov, Jan Kautz
  • Patent number: 10789678
    Abstract: A superpixel sampling network utilizes a neural network coupled to a differentiable simple linear iterative clustering component to determine pixel-superpixel associations from a set of pixel features output by the neural network. The superpixel sampling network computes updated superpixel centers and final pixel-superpixel associations over a number of iterations.
    Type: Grant
    Filed: September 13, 2018
    Date of Patent: September 29, 2020
    Assignee: NVIDIA Corp.
    Inventors: Varun Jampani, Deqing Sun, Ming-Yu Liu, Jan Kautz
  • Patent number: 10783394
    Abstract: A method, computer readable medium, and system are disclosed to generate coordinates of landmarks within images. The landmark locations may be identified on an image of a human face and used for emotion recognition, face identity verification, eye gaze tracking, pose estimation, etc. A transform is applied to input image data to produce transformed input image data. The transform is also applied to predicted coordinates for landmarks of the input image data to produce transformed predicted coordinates. A neural network model processes the transformed input image data to generate additional landmarks of the transformed input image data and additional predicted coordinates for each one of the additional landmarks. Parameters of the neural network model are updated to reduce differences between the transformed predicted coordinates and the additional predicted coordinates.
    Type: Grant
    Filed: June 12, 2018
    Date of Patent: September 22, 2020
    Assignee: NVIDIA Corporation
    Inventors: Pavlo Molchanov, Stephen Walter Tyree, Jan Kautz, Sina Honari
  • Patent number: 10783393
    Abstract: A method, computer readable medium, and system are disclosed for sequential multi-tasking to generate coordinates of landmarks within images. The landmark locations may be identified on an image of a human face and used for emotion recognition, face identity verification, eye gaze tracking, pose estimation, etc. A neural network model processes input image data to generate pixel-level likelihood estimates for landmarks in the input image data and a soft-argmax function computes predicted coordinates of each landmark based on the pixel-level likelihood estimates.
    Type: Grant
    Filed: June 12, 2018
    Date of Patent: September 22, 2020
    Assignee: NVIDIA Corporation
    Inventors: Pavlo Molchanov, Stephen Walter Tyree, Jan Kautz, Sina Honari
  • Publication number: 20200294194
    Abstract: A video stitching system combines video from different cameras to form a panoramic video that, in various embodiments, is temporally stable and tolerant to strong parallax. In an embodiment, the system provides a smooth spatial interpolation that can be used to connect the input video images. In an embodiment, the system applies an interpolation layer to slices of the overlapping video sources, and the network learns a dense flow field to smoothly align the input videos with spatial interpolation. Various embodiments are applicable to areas such as virtual reality, immersive telepresence, autonomous driving, and video surveillance.
    Type: Application
    Filed: March 11, 2019
    Publication date: September 17, 2020
    Inventors: Deqing Sun, Orazio Gallo, Jan Kautz, Jinwei GU, Wei-Sheng Lai