Patents by Inventor Sifei Liu
Sifei Liu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250111592Abstract: Virtual reality and augmented reality bring increasing demand for 3D content creation. In an effort to automate the generation of 3D content, artificial intelligence-based processes have been developed. However, these processes are limited in terms of the quality of their output because they typically involve a model trained on limited 3D data thereby resulting in a model that does not generalize well to unseen objects, or a model trained on 2D data thereby resulting in a model that suffers from poor geometry due to ignorance of 3D information. The present disclosure jointly uses both 2D and 3D data to train a machine learning model to be able to generate 3D content from a single 2D image.Type: ApplicationFiled: September 20, 2024Publication date: April 3, 2025Inventors: Dejia Xu, Morteza Mardani, Jiaming Song, Sifei Liu, Ye Yuan, Arash Vahdat
-
Patent number: 12266144Abstract: Apparatuses, systems, and techniques to identify orientations of objects within images. In at least one embodiment, one or more neural networks are trained to identify an orientations of one or more objects based, at least in part, on one or more characteristics of the object other than the object's orientation.Type: GrantFiled: November 20, 2019Date of Patent: April 1, 2025Assignee: NVIDIA CorporationInventors: Siva Karthik Mustikovela, Varun Jampani, Shalini De Mello, Sifei Liu, Umar Iqbal, Jan Kautz
-
Patent number: 12182940Abstract: Apparatuses, systems, and techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object. In at least one embodiment, objects are identified in an image using one or more neural networks that have been trained on objects of a similar category and a three-dimensional mesh template.Type: GrantFiled: January 18, 2022Date of Patent: December 31, 2024Assignee: NVIDIA CorporationInventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Jan Kautz
-
Patent number: 12169882Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.Type: GrantFiled: September 1, 2022Date of Patent: December 17, 2024Assignee: NVIDIA CorporationInventors: Sifei Liu, Jiteng Mu, Shalini De Mello, Zhiding Yu, Jan Kautz
-
Publication number: 20240404174Abstract: Systems and methods are disclosed that animate a source portrait image with motion (i.e., pose and expression) from a target image. In contrast to conventional systems, given an unseen single-view portrait image, an implicit three-dimensional (3D) head avatar is constructed that not only captures photo-realistic details within and beyond the face region, but also is readily available for animation without requiring further optimization during inference. In an embodiment, three processing branches of a system produce three tri-planes representing coarse 3D geometry for the head avatar, detailed appearance of a source image, as well as the expression of a target image. By applying volumetric rendering to a combination of the three tri-planes, an image of the desired identity, expression and pose is generated.Type: ApplicationFiled: May 2, 2024Publication date: December 5, 2024Inventors: Xueting Li, Shalini De Mello, Sifei Liu, Koki Nagano, Umar Iqbal, Jan Kautz
-
Publication number: 20240338871Abstract: One embodiment of a method includes applying a first generator model to a semantic representation of an image to generate an affine transformation, where the affine transformation represents a bounding box associated with at least one region within the image. The method further includes applying a second generator model to the affine transformation and the semantic representation to generate a shape of an object. The method further includes inserting the object into the image based on the bounding box and the shape.Type: ApplicationFiled: June 18, 2024Publication date: October 10, 2024Inventors: Donghoom LEE, Sifei Liu, Jinwei Gu, Ming-Yu Liu, Jan Kautz
-
Publication number: 20240169652Abstract: In various embodiments, a scene reconstruction model generates three-dimensional (3D) representations of scenes. The scene reconstruction model computes a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene. The scene reconstruction model maps the first 3D feature grid to a first 3D representation of the first scene. The scene reconstruction model computes a first reconstruction loss based on the first 3D representation and the set of RGBD images. The scene reconstruction model modifies at least one of the first 3D feature grid, a first pre-trained geometry decoder, or a first pre-trained texture decoder based on the first reconstruction loss to generate a second 3D representation of the first scene.Type: ApplicationFiled: October 30, 2023Publication date: May 23, 2024Inventors: Yang FU, Sifei LIU, Jan KAUTZ, Xueting LI, Shalini DE MELLO, Amey KULKARNI, Milind NAPHADE
-
Publication number: 20240161404Abstract: In various embodiments, a training application trains a machine learning model to generate three-dimensional (3D) representations of two-dimensional images. The training application maps a depth image and a viewpoint to signed distance function (SDF) values associated with 3D query points. The training application maps a red, blue, and green (RGB) image to radiance values associated with the 3DI query points. The training application computes a red, blue, green, and depth (RGBD) reconstruction loss based on at least the SDF values and the radiance values. The training application modifies at least one of a pre-trained geometry encoder, a pre-trained geometry decoder, an untrained texture encoder, or an untrained texture decoder based on the RGBD reconstruction loss to generate a trained machine learning model that generates 3D representations of RGBD images.Type: ApplicationFiled: October 30, 2023Publication date: May 16, 2024Inventors: Yang FU, Sifei LIU, Jan KAUTZ, Xueting LI, Shalini DE MELLO, Amey KULKARNI, Milind NAPHADE
-
Publication number: 20240161383Abstract: In various embodiments, a scene reconstruction model generates three-dimensional (3D) representations of scenes. The scene reconstruction model maps a first red, blue, green, and depth (RGBD) image associated with both a first scene and a first viewpoint to a first surface representation of at least a first portion of the first scene. The scene reconstruction model maps a second RGBD image associated with both the first scene and a second viewpoint to a second surface representation of at least a second portion of the first scene. The scene reconstruction model aggregates at least the first surface representation and the second surface representation in a 3D space to generate a first fused surface representation of the first scene. The scene reconstruction model maps the first fused surface representation of the first scene to a 3D representation of the first scene.Type: ApplicationFiled: October 30, 2023Publication date: May 16, 2024Inventors: Yang FU, Sifei LIU, Jan KAUTZ, Xueting LI, Shalini DE MELLO, Amey KULKARNI, Milind NAPHADE
-
Publication number: 20240161468Abstract: Techniques are disclosed herein for generating an image. The techniques include performing one or more first denoising operations based on a first machine learning model and an input image that includes a first object to generate a mask that indicates a spatial arrangement associated with a second object interacting with the first object, and performing one or more second denoising operations based on a second machine learning model, the input image, and the mask to generate an image of the second object interacting with the first object.Type: ApplicationFiled: August 21, 2023Publication date: May 16, 2024Inventors: Xueting LI, Stanley BIRCHFIELD, Shalini DE MELLO, Sifei LIU, Jiaming SONG, Yufei YE
-
Publication number: 20240153093Abstract: An open-vocabulary diffusion-based panoptic segmentation system is not limited to perform segmentation using only object categories seen during training, and instead can also successfully perform segmentation of object categories not seen during training and only seen during testing and inferencing. In contrast with conventional techniques, a text-conditioned diffusion (generative) model is used to perform the segmentation. The text-conditioned diffusion model is pre-trained to generate images from text captions, including computing internal representations that provide spatially well-differentiated object features. The internal representations computed within the diffusion model comprise object masks and a semantic visual representation of the object. The semantic visual representation may be extracted from the diffusion model and used in conjunction with a text representation of a category label to classify the object.Type: ApplicationFiled: May 1, 2023Publication date: May 9, 2024Inventors: Jiarui Xu, Shalini De Mello, Sifei Liu, Arash Vahdat, Wonmin Byeon
-
Patent number: 11960570Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.Type: GrantFiled: August 25, 2021Date of Patent: April 16, 2024Assignee: NVIDIA CorporationInventors: Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz
-
Publication number: 20240070987Abstract: Transferring pose to three-dimensional characters is a common computer graphics task that typically involves transferring the pose of a reference avatar to a (stylized) three-dimensional character. Since three-dimensional characters are created by professional artists through imagination and exaggeration, and therefore, unlike human or animal avatars, have distinct shape and features, matching the pose of a three-dimensional character to that of a reference avatar generally requires manually creating shape information for the three-dimensional character that is required for pose transfer. The present disclosure provides for the automated transfer of a reference pose to a three-dimensional character, based specifically on a learned shape code for the three-dimensional character.Type: ApplicationFiled: February 15, 2023Publication date: February 29, 2024Inventors: Xueting Li, Sifei Liu, Shalini De Mello, Orazio Gallo, Jiashun Wang, Jan Kautz
-
Patent number: 11907846Abstract: One embodiment of the present invention sets forth a technique for performing spatial propagation. The technique includes generating a first directed acyclic graph (DAG) by connecting spatially adjacent points included in a set of unstructured points via directed edges along a first direction. The technique also includes applying a first set of neural network layers to one or more images associated with the set of unstructured points to generate (i) a set of features for the set of unstructured points and (ii) a set of pairwise affinities between the spatially adjacent points connected by the directed edges. The technique further includes generating a set of labels for the set of unstructured points by propagating the set of features across the first DAG based on the set of pairwise affinities.Type: GrantFiled: September 10, 2020Date of Patent: February 20, 2024Assignee: NVIDIA CorporationInventors: Sifei Liu, Shalini De Mello, Varun Jampani, Jan Kautz, Xueting Li
-
Patent number: 11880927Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.Type: GrantFiled: May 19, 2023Date of Patent: January 23, 2024Assignee: NVIDIA CorporationInventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Jan Kautz
-
Patent number: 11790633Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.Type: GrantFiled: July 1, 2021Date of Patent: October 17, 2023Assignee: NVIDIA CorporationInventors: Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz
-
Publication number: 20230290038Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.Type: ApplicationFiled: May 19, 2023Publication date: September 14, 2023Inventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Jan Kautz
-
Patent number: 11748887Abstract: Systems and methods to detect one or more segments of one or more objects within one or more images based, at least in part, on a neural network trained in an unsupervised manner to infer the one or more segments. Systems and methods to help train one or more neural networks to detect one or more segments of one or more objects within one or more images in an unsupervised manner.Type: GrantFiled: April 8, 2019Date of Patent: September 5, 2023Assignee: NVIDIA CorporationInventors: Varun Jampani, Wei-Chih Hung, Sifei Liu, Pavlo Molchanov, Jan Kautz
-
Publication number: 20230252692Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.Type: ApplicationFiled: September 1, 2022Publication date: August 10, 2023Inventors: Sifei Liu, Jiteng Mu, Shalini De Mello, Zhiding Yu, Jan Kautz
-
Patent number: 11704857Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.Type: GrantFiled: May 2, 2022Date of Patent: July 18, 2023Assignee: NVIDIA CorporationInventors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Jan Kautz