Patents by Inventor Sifei Liu

Sifei Liu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

VIEW SYNTHESIS USING CAMERA POSES LEARNED FROM A VIDEO

Publication number: 20250191270

Abstract: View synthesis is a computer graphics process that generates a new image of a scene from a novel (previously unseen) viewpoint of the scene. Typically, the graphics process relies on a machine learning model that has been trained with ground truth pose information. Since ground truth pose information is not readily available, some solutions rely on a Structure-from-Motion (SfM) library COLMAP to generate pose information for a given image. However, this pre-processing step is not only time-consuming but also can fail due to its sensitivity to feature extraction errors and difficulties in handling texture-less or repetitive regions. The present disclosure provides view synthesis from learned camera poses without relying on SfM pre-processing.

Type: Application

Filed: November 27, 2024

Publication date: June 12, 2025

Inventors: Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz
SINGLE IMAGE TO REALISTIC 3D OBJECT GENERATION VIA SEMI-SUPERVISED 2D AND 3D JOINT TRAINING

Publication number: 20250111592

Abstract: Virtual reality and augmented reality bring increasing demand for 3D content creation. In an effort to automate the generation of 3D content, artificial intelligence-based processes have been developed. However, these processes are limited in terms of the quality of their output because they typically involve a model trained on limited 3D data thereby resulting in a model that does not generalize well to unseen objects, or a model trained on 2D data thereby resulting in a model that suffers from poor geometry due to ignorance of 3D information. The present disclosure jointly uses both 2D and 3D data to train a machine learning model to be able to generate 3D content from a single 2D image.

Type: Application

Filed: September 20, 2024

Publication date: April 3, 2025

Inventors: Dejia Xu, Morteza Mardani, Jiaming Song, Sifei Liu, Ye Yuan, Arash Vahdat
Training and inferencing using a neural network to predict orientations of objects in images

Patent number: 12266144

Abstract: Apparatuses, systems, and techniques to identify orientations of objects within images. In at least one embodiment, one or more neural networks are trained to identify an orientations of one or more objects based, at least in part, on one or more characteristics of the object other than the object's orientation.

Type: Grant

Filed: November 20, 2019

Date of Patent: April 1, 2025

Assignee: NVIDIA Corporation

Inventors: Siva Karthik Mustikovela, Varun Jampani, Shalini De Mello, Sifei Liu, Umar Iqbal, Jan Kautz
Self-supervised single-view 3D reconstruction via semantic consistency

Patent number: 12182940

Abstract: Apparatuses, systems, and techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object. In at least one embodiment, objects are identified in an image using one or more neural networks that have been trained on objects of a similar category and a three-dimensional mesh template.

Type: Grant

Filed: January 18, 2022

Date of Patent: December 31, 2024

Assignee: NVIDIA Corporation

Inventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Jan Kautz
Learning dense correspondences for images

Patent number: 12169882

Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.

Type: Grant

Filed: September 1, 2022

Date of Patent: December 17, 2024

Assignee: NVIDIA Corporation

Inventors: Sifei Liu, Jiteng Mu, Shalini De Mello, Zhiding Yu, Jan Kautz
NEURAL HEAD AVATAR CONSTRUCTION FROM AN IMAGE

Publication number: 20240404174

Abstract: Systems and methods are disclosed that animate a source portrait image with motion (i.e., pose and expression) from a target image. In contrast to conventional systems, given an unseen single-view portrait image, an implicit three-dimensional (3D) head avatar is constructed that not only captures photo-realistic details within and beyond the face region, but also is readily available for animation without requiring further optimization during inference. In an embodiment, three processing branches of a system produce three tri-planes representing coarse 3D geometry for the head avatar, detailed appearance of a source image, as well as the expression of a target image. By applying volumetric rendering to a combination of the three tri-planes, an image of the desired identity, expression and pose is generated.

Type: Application

Filed: May 2, 2024

Publication date: December 5, 2024

Inventors: Xueting Li, Shalini De Mello, Sifei Liu, Koki Nagano, Umar Iqbal, Jan Kautz
CONTEXT-AWARE SYNTHESIS AND PLACEMENT OF OBJECT INSTANCES

Publication number: 20240338871

Abstract: One embodiment of a method includes applying a first generator model to a semantic representation of an image to generate an affine transformation, where the affine transformation represents a bounding box associated with at least one region within the image. The method further includes applying a second generator model to the affine transformation and the semantic representation to generate a shape of an object. The method further includes inserting the object into the image based on the bounding box and the shape.

Type: Application

Filed: June 18, 2024

Publication date: October 10, 2024

Inventors: Donghoom LEE, Sifei Liu, Jinwei Gu, Ming-Yu Liu, Jan Kautz
TECHNIQUES FOR FINE-TUNING A MACHINE LEARNING MODEL TO RECONSTRUCT A THREE-DIMENSIONAL SCENE

Publication number: 20240169652

Abstract: In various embodiments, a scene reconstruction model generates three-dimensional (3D) representations of scenes. The scene reconstruction model computes a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene. The scene reconstruction model maps the first 3D feature grid to a first 3D representation of the first scene. The scene reconstruction model computes a first reconstruction loss based on the first 3D representation and the set of RGBD images. The scene reconstruction model modifies at least one of the first 3D feature grid, a first pre-trained geometry decoder, or a first pre-trained texture decoder based on the first reconstruction loss to generate a second 3D representation of the first scene.

Type: Application

Filed: October 30, 2023

Publication date: May 23, 2024

Inventors: Yang FU, Sifei LIU, Jan KAUTZ, Xueting LI, Shalini DE MELLO, Amey KULKARNI, Milind NAPHADE
TECHNIQUES FOR RECONSTRUCTING DIFFERENT THREE-DIMENSIONAL SCENES USING THE SAME TRAINED MACHINE LEARNING MODEL

Publication number: 20240161383

Abstract: In various embodiments, a scene reconstruction model generates three-dimensional (3D) representations of scenes. The scene reconstruction model maps a first red, blue, green, and depth (RGBD) image associated with both a first scene and a first viewpoint to a first surface representation of at least a first portion of the first scene. The scene reconstruction model maps a second RGBD image associated with both the first scene and a second viewpoint to a second surface representation of at least a second portion of the first scene. The scene reconstruction model aggregates at least the first surface representation and the second surface representation in a 3D space to generate a first fused surface representation of the first scene. The scene reconstruction model maps the first fused surface representation of the first scene to a 3D representation of the first scene.

Type: Application

Filed: October 30, 2023

Publication date: May 16, 2024

Inventors: Yang FU, Sifei LIU, Jan KAUTZ, Xueting LI, Shalini DE MELLO, Amey KULKARNI, Milind NAPHADE
TECHNIQUES FOR GENERATING IMAGES OF OBJECT INTERACTIONS

Publication number: 20240161468

Abstract: Techniques are disclosed herein for generating an image. The techniques include performing one or more first denoising operations based on a first machine learning model and an input image that includes a first object to generate a mask that indicates a spatial arrangement associated with a second object interacting with the first object, and performing one or more second denoising operations based on a second machine learning model, the input image, and the mask to generate an image of the second object interacting with the first object.

Type: Application

Filed: August 21, 2023

Publication date: May 16, 2024

Inventors: Xueting LI, Stanley BIRCHFIELD, Shalini DE MELLO, Sifei LIU, Jiaming SONG, Yufei YE
TECHNIQUES FOR TRAINING A MACHINE LEARNING MODEL TO RECONSTRUCT DIFFERENT THREE-DIMENSIONAL SCENES

Publication number: 20240161404

Abstract: In various embodiments, a training application trains a machine learning model to generate three-dimensional (3D) representations of two-dimensional images. The training application maps a depth image and a viewpoint to signed distance function (SDF) values associated with 3D query points. The training application maps a red, blue, and green (RGB) image to radiance values associated with the 3DI query points. The training application computes a red, blue, green, and depth (RGBD) reconstruction loss based on at least the SDF values and the radiance values. The training application modifies at least one of a pre-trained geometry encoder, a pre-trained geometry decoder, an untrained texture encoder, or an untrained texture decoder based on the RGBD reconstruction loss to generate a trained machine learning model that generates 3D representations of RGBD images.

Type: Application

Filed: October 30, 2023

Publication date: May 16, 2024

Inventors: Yang FU, Sifei LIU, Jan KAUTZ, Xueting LI, Shalini DE MELLO, Amey KULKARNI, Milind NAPHADE
DIFFUSION-BASED OPEN-VOCABULARY SEGMENTATION

Publication number: 20240153093

Abstract: An open-vocabulary diffusion-based panoptic segmentation system is not limited to perform segmentation using only object categories seen during training, and instead can also successfully perform segmentation of object categories not seen during training and only seen during testing and inferencing. In contrast with conventional techniques, a text-conditioned diffusion (generative) model is used to perform the segmentation. The text-conditioned diffusion model is pre-trained to generate images from text captions, including computing internal representations that provide spatially well-differentiated object features. The internal representations computed within the diffusion model comprise object masks and a semantic visual representation of the object. The semantic visual representation may be extracted from the diffusion model and used in conjunction with a text representation of a category label to classify the object.

Type: Application

Filed: May 1, 2023

Publication date: May 9, 2024

Inventors: Jiarui Xu, Shalini De Mello, Sifei Liu, Arash Vahdat, Wonmin Byeon
Learning contrastive representation for semantic correspondence

Patent number: 11960570

Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.

Type: Grant

Filed: August 25, 2021

Date of Patent: April 16, 2024

Assignee: NVIDIA Corporation

Inventors: Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz
POSE TRANSFER FOR THREE-DIMENSIONAL CHARACTERS USING A LEARNED SHAPE CODE

Publication number: 20240070987

Abstract: Transferring pose to three-dimensional characters is a common computer graphics task that typically involves transferring the pose of a reference avatar to a (stylized) three-dimensional character. Since three-dimensional characters are created by professional artists through imagination and exaggeration, and therefore, unlike human or animal avatars, have distinct shape and features, matching the pose of a three-dimensional character to that of a reference avatar generally requires manually creating shape information for the three-dimensional character that is required for pose transfer. The present disclosure provides for the automated transfer of a reference pose to a three-dimensional character, based specifically on a learned shape code for the three-dimensional character.

Type: Application

Filed: February 15, 2023

Publication date: February 29, 2024

Inventors: Xueting Li, Sifei Liu, Shalini De Mello, Orazio Gallo, Jiashun Wang, Jan Kautz
Learning and propagating visual attributes

Patent number: 11907846

Abstract: One embodiment of the present invention sets forth a technique for performing spatial propagation. The technique includes generating a first directed acyclic graph (DAG) by connecting spatially adjacent points included in a set of unstructured points via directed edges along a first direction. The technique also includes applying a first set of neural network layers to one or more images associated with the set of unstructured points to generate (i) a set of features for the set of unstructured points and (ii) a set of pairwise affinities between the spatially adjacent points connected by the directed edges. The technique further includes generating a set of labels for the set of unstructured points by propagating the set of features across the first DAG based on the set of pairwise affinities.

Type: Grant

Filed: September 10, 2020

Date of Patent: February 20, 2024

Assignee: NVIDIA Corporation

Inventors: Sifei Liu, Shalini De Mello, Varun Jampani, Jan Kautz, Xueting Li
Three-dimensional object reconstruction from a video

Patent number: 11880927

Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.

Type: Grant

Filed: May 19, 2023

Date of Patent: January 23, 2024

Assignee: NVIDIA Corporation

Inventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Jan Kautz
Image processing using coupled segmentation and edge learning

Patent number: 11790633

Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.

Type: Grant

Filed: July 1, 2021

Date of Patent: October 17, 2023

Assignee: NVIDIA Corporation

Inventors: Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz
THREE-DIMENSIONAL OBJECT RECONSTRUCTION FROM A VIDEO

Publication number: 20230290038

Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.

Type: Application

Filed: May 19, 2023

Publication date: September 14, 2023

Inventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Jan Kautz
Segmentation using an unsupervised neural network training technique

Patent number: 11748887

Abstract: Systems and methods to detect one or more segments of one or more objects within one or more images based, at least in part, on a neural network trained in an unsupervised manner to infer the one or more segments. Systems and methods to help train one or more neural networks to detect one or more segments of one or more objects within one or more images in an unsupervised manner.

Type: Grant

Filed: April 8, 2019

Date of Patent: September 5, 2023

Assignee: NVIDIA Corporation

Inventors: Varun Jampani, Wei-Chih Hung, Sifei Liu, Pavlo Molchanov, Jan Kautz
LEARNING DENSE CORRESPONDENCES FOR IMAGES

Publication number: 20230252692

Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.

Type: Application

Filed: September 1, 2022

Publication date: August 10, 2023

Inventors: Sifei Liu, Jiteng Mu, Shalini De Mello, Zhiding Yu, Jan Kautz

1 2 3 next