Patents by Inventor Sanja Fidler
Sanja Fidler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250020481Abstract: Apparatuses, systems, and techniques are presented to determination about objects in an environment. In at least one embodiment, a neural network can be used to determine one or more positions of one or more objects within a three-dimensional (3D) environment and to generate a segmented map of the 3D environment based, at least in part, on one or more two dimensional (2D) images of the one or more objects.Type: ApplicationFiled: April 7, 2022Publication date: January 16, 2025Inventors: Enze Xie, Zhiding Yu, Jonah Philion, Anima Anandkumar, Sanja Fidler, Jose Manuel Alvarez Lopez
-
Patent number: 12192547Abstract: In various examples, systems and methods are disclosed relating to aligning images into frames of a first video using at least one first temporal attention layer of a neural network model. The first video has a first spatial resolution. A second video having a second spatial resolution is generated by up-sampling the first video using at least one second temporal attention layer of an up-sampler neural network model, wherein the second spatial resolution is higher than the first spatial resolution.Type: GrantFiled: March 10, 2023Date of Patent: January 7, 2025Assignee: NVIDIA CorporationInventors: Karsten Julian Kreis, Robin Rombach, Andreas Blattmann, Seung Wook Kim, Huan Ling, Sanja Fidler, Tim Dockhorn
-
Patent number: 12141986Abstract: Various types of image analysis benefit from a multi-stream architecture that allows the analysis to consider shape data. A shape stream can process image data in parallel with a primary stream, where data from layers of a network in the primary stream is provided as input to a network of the shape stream. The shape data can be fused with the primary analysis data to produce more accurate output, such as to produce accurate boundary information when the shape data is used with semantic segmentation data produced by the primary stream. A gate structure can be used to connect the intermediate layers of the primary and shape streams, using higher level activations to gate lower level activations in the shape stream. Such a gate structure can help focus the shape stream on the relevant information and reduces any additional weight of the shape stream.Type: GrantFiled: June 12, 2023Date of Patent: November 12, 2024Assignee: Nvidia CorporationInventors: David Jesus Acuna Marrero, Towaki Takikawa, Varun Jampani, Sanja Fidler
-
Publication number: 20240371096Abstract: Approaches presented herein provide systems and methods for disentangling identity from expression input models. One or more machine learning systems may be trained directly from three-dimensional (3D) points to develop unique latent codes for expressions associated with different identities. These codes may then be mapped to different identities to independently model an object, such as a face, to generate a new mesh including an expression for an independent identity. A pipeline may include a set of machine learning systems to determine model parameters and also adjust input expression codes using gradient backpropagation in order train models for incorporation into a content development pipeline.Type: ApplicationFiled: May 4, 2023Publication date: November 7, 2024Inventors: Sameh Khamis, Koki Nagano, Jan Kautz, Sanja Fidler
-
Publication number: 20240362897Abstract: In various examples, systems and methods are disclosed relating to synthetic data generation using viewpoint augmentation for autonomous and semi-autonomous systems and applications. One or more circuits can identify a set of sequential images corresponding to a first viewpoint and generate a first transformed image corresponding to a second viewpoint using a first image of the set of sequential images as input to a machine-learning model. The one or more circuits can update the machine-learning model based at least on a loss determined according to the first transformed image and a second image of the set of sequential images.Type: ApplicationFiled: April 12, 2024Publication date: October 31, 2024Applicant: NVIDIA CorporationInventors: Tzofi Klinghoffer, Jonah Philion, Zan Gojcic, Sanja Fidler, Or Litany, Wenzheng Chen, Jose Manuel Alvarez Lopez
-
Patent number: 12112445Abstract: Generation of three-dimensional (3D) object models may be challenging for users without a sufficient skill set for content creation and may also be resource intensive. One or more style transfer networks may be used for part-aware style transformation of both geometric features and textural components of a source asset to a target asset. The source asset may be segmented into particular parts and then ellipsoid approximations may be warped according to correspondence of the particular parts to the target assets. Moreover, a texture associated with the target asset may be used to warp or adjust a source texture, where the new texture can be applied to the warped parts.Type: GrantFiled: September 7, 2021Date of Patent: October 8, 2024Assignee: Nvidia CorporationInventors: Kangxue Yin, Jun Gao, Masha Shugrina, Sameh Khamis, Sanja Fidler
-
Publication number: 20240312123Abstract: In various examples, systems and methods are disclosed that relate to data augmentation for training/updating perception models in autonomous or semi-autonomous systems and applications. For example, a system may receive data associated with a set of frames that are captured using a plurality of cameras positioned in fixed relation relative to the machine; generate a panoramic view based at least on the set of frames; provide data associated with the panoramic view to a model to cause the model to generate a high dynamic range (HDR) panoramic view; determine lighting information associated with a light distribution map based at least on the HDR panoramic view; determine a virtual scene; and render an asset and a shadow on at least one of the frames, based at least on the virtual scene and the light distribution map, the shadow being a shadow corresponding to the asset.Type: ApplicationFiled: February 29, 2024Publication date: September 19, 2024Applicant: NVIDIA CorporationInventors: Malik Aqeel Anwar, Tae Eun Choe, Zian Wang, Sanja Fidler, Minwoo Park
-
Publication number: 20240296623Abstract: Approaches presented herein provide for the reconstruction of implicit multi-dimensional shapes. In one embodiment, oriented point cloud data representative of an object can be obtained using a physical scanning process. The point cloud data can be provided as input to a trained density model that can infer density functions for various points. The points can be mapped to a voxel hierarchy, allowing density functions to be determined for those voxels at the various levels that are associated with at least one point of the input point cloud. Contribution weights can be determined for the various density functions for the sparse voxel hierarchy, and the weighted density functions combined to obtain a density field. The density field can be evaluated to generate a geometric mesh where points having a zero, or near-zero, value are determined to contribute to the surface of the object.Type: ApplicationFiled: February 15, 2023Publication date: September 5, 2024Inventors: Jiahui Huang, Francis Williams, Zan Gojcic, Matan Atzmon, Or Litany, Sanja Fidler
-
Publication number: 20240296627Abstract: In various examples, a deep three-dimensional (3D) conditional generative model is implemented that can synthesize high resolution 3D shapes using simple guides—such as coarse voxels, point clouds, etc.—by marrying implicit and explicit 3D representations into a hybrid 3D representation. The present approach may directly optimize for the reconstructed surface, allowing for the synthesis of finer geometric details with fewer artifacts. The systems and methods described herein may use a deformable tetrahedral grid that encodes a discretized signed distance function (SDF) and a differentiable marching tetrahedral layer that converts the implicit SDF representation to an explicit surface mesh representation. This combination allows joint optimization of the surface geometry and topology as well as generation of the hierarchy of subdivisions using reconstruction and adversarial losses defined explicitly on the surface mesh.Type: ApplicationFiled: May 13, 2024Publication date: September 5, 2024Inventors: Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, Sanja Fidler
-
Publication number: 20240296205Abstract: Approaches presented herein provide for unsupervised domain transfer learning. In particular, three neural networks can be trained together using at least labeled data from a first domain and unlabeled data from a second domain. Features of the data are extracted using a feature extraction network. A first classifier network uses these features to classify the data, while a second classifier network uses these features to determine the relevant domain. A combined loss function is used to optimize the networks, with a goal of the feature extraction network extracting features that the first classifier network is able to use to accurately classify the data, but prevent the second classifier from determining the domain for the image. Such optimization enables object classification to be performed with high accuracy for either domain, even though there may have been little to no labeled training data for the second domain.Type: ApplicationFiled: May 6, 2024Publication date: September 5, 2024Inventors: David Acuna Marrero, Guojun Zhang, Marc Law, Sanja Fidler
-
Publication number: 20240290054Abstract: Generation of three-dimensional (3D) object models may be challenging for users without a sufficient skill set for content creation and may also be resource intensive. One or more style transfer networks may be combined with a generative network to generate objects based on parameters associated with a textual input. An input including a 3D mesh and texture may be provided to a trained system along with a textual input that includes parameters for object generation. Features of the input object may be identified and then tuned in accordance with the textual input to generate a modified 3D object that includes a new texture along with one or more geometric adjustments.Type: ApplicationFiled: February 27, 2023Publication date: August 29, 2024Inventors: Kangxue Yin, Huan Ling, Masha Shugrina, Sameh Khamis, Sanja Fidler
-
Publication number: 20240256831Abstract: In various examples, systems and methods are disclosed relating to generating a response from image and/or video input for image/video-based artificial intelligence (AI) systems and applications. Systems and methods are disclosed for a first model (e.g., a teacher model) distilling its knowledge to a second model (a student model). The second model receives a downstream image in a downstream task and generates at least one feature. The first model generates first features corresponding to an image which can be a real image or a synthetic image. The second model generates second features using the image as an input to the second model. Loss with respect to first features is determined. The second model is updated using the loss.Type: ApplicationFiled: January 26, 2023Publication date: August 1, 2024Applicant: NVIDIA CorporationInventors: Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Julian Kreis, Antonio Torralba Barriuso, Sanja Fidler, Amlan Kar
-
Publication number: 20240212261Abstract: Systems and methods are described for rendering complex surfaces or geometry. In at least one embodiment, neural signed distance functions (SDFs) can be used that efficiently capture multiple levels of detail (LODs), and that can be used to reconstruct multi-dimensional geometry or surfaces with high image quality. An example architecture can represent complex shapes in a compressed format with high visual fidelity, and can generalize across different geometries from a single learned example. Extremely small multi-layer perceptrons (MHLPs) can be used with an octree-based feature representation for the learned neural SDFs.Type: ApplicationFiled: January 12, 2024Publication date: June 27, 2024Inventors: Towaki Alan Takikawa, Joey Litalien, Kangxue Yin, Karsten Julian Kreis, Charles Loop, Morgan McGuire, Sanja Fidler
-
Publication number: 20240185523Abstract: In various examples, a technique for performing three-dimensional (3D) scene completion includes determining an initial representation of a first 3D scene. The technique also includes executing a machine learning model to generate a first update to the initial representation at a previous time step and a second update to the initial representation at a current time step, wherein the second update is generated based at least on a threshold applied to a set of predictions corresponding to the first update. The technique also includes generating a 3D model of the 3D scene based at least on the second update to the initial representation.Type: ApplicationFiled: June 22, 2023Publication date: June 6, 2024Inventors: Dongsu ZHANG, Amlan KAR, Francis WILLIAMS, Zan GOJCIC, Karsten KREIS, Sanja FIDLER
-
Publication number: 20240185506Abstract: In various examples, information may be received for a 3D model, such as 3D geometry information, lighting information, and material information. A machine learning model may be trained to disentangle the 3D geometry information, the lighting information, and/or material information from input data to provide the information, which may be used to project geometry of the 3D model onto an image plane to generate a mapping between pixels and portions of the 3D model. Rasterization may then use the mapping to determine which pixels are covered and in what manner, by the geometry. The mapping may also be used to compute radiance for points corresponding to the one or more 3D models using light transport simulation. Disclosed approaches may be used in various applications, such as image editing, 3D model editing, synthetic data generation, and/or data set augmentation.Type: ApplicationFiled: February 14, 2024Publication date: June 6, 2024Inventors: Wenzheng Chen, Joey Litalien, Jun Gao, Zian Wang, Clement Tse Tsian Christophe Louis Fuji Tsang, Sameh Khamis, Or Litany, Sanja Fidler
-
Publication number: 20240171788Abstract: In various examples, systems and methods are disclosed relating to aligning images into frames of a first video using at least one first temporal attention layer of a neural network model. The first video has a first spatial resolution. A second video having a second spatial resolution is generated by up-sampling the first video using at least one second temporal attention layer of an up-sampler neural network model, wherein the second spatial resolution is higher than the first spatial resolution.Type: ApplicationFiled: March 10, 2023Publication date: May 23, 2024Applicant: NVIDIA CorporationInventors: Karsten Julian Kreis, Robin Rombach, Andreas Blattmann, Seung Wook Kim, Huan Ling, Sanja Fidler, Tim Dockhorn
-
Patent number: 11989262Abstract: Approaches presented herein provide for unsupervised domain transfer learning. In particular, three neural networks can be trained together using at least labeled data from a first domain and unlabeled data from a second domain. Features of the data are extracted using a feature extraction network. A first classifier network uses these features to classify the data, while a second classifier network uses these features to determine the relevant domain. A combined loss function is used to optimize the networks, with a goal of the feature extraction network extracting features that the first classifier network is able to use to accurately classify the data, but prevent the second classifier from determining the domain for the image. Such optimization enables object classification to be performed with high accuracy for either domain, even though there may have been little to no labeled training data for the second domain.Type: GrantFiled: April 9, 2021Date of Patent: May 21, 2024Assignee: Nvidia CorporationInventors: David Acuna Marrero, Guojun Zhang, Marc Law, Sanja Fidler
-
Publication number: 20240161403Abstract: Text-to-image generation generally refers to the process of generating an image from one or more text prompts input by a user. While artificial intelligence has been a valuable tool for text-to-image generation, current artificial intelligence-based solutions are more limited as it relates to text-to-3D content creation. For example, these solutions are oftentimes category-dependent, or synthesize 3D content at a low resolution. The present disclosure provides a process and architecture for high-resolution text-to-3D content creation.Type: ApplicationFiled: August 9, 2023Publication date: May 16, 2024Inventors: Chen-Hsuan Lin, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, Karsten Kreis, Luming Tang, Xiaohui Zeng, Jun Gao, Xun Huang, Towaki Takikawa
-
Publication number: 20240160888Abstract: In various examples, systems and methods are disclosed relating to neural networks for realistic and controllable agent simulation using guided trajectories. The neural networks can be configured using training data including trajectories and other state data associated with subjects or agents and remote or neighboring subjects or agents, as well as context data representative of an environment in which the subjects are present. The trajectories can be determining using the neural networks and using various forms of guidance for controllability, such as for waypoint navigation, obstacle avoidance, and group movement.Type: ApplicationFiled: March 31, 2023Publication date: May 16, 2024Applicant: NVIDIA CorporationInventors: Davis Winston Rempe, Karsten Julian Kreis, Sanja Fidler, Or Litany, Jonah Philion
-
Publication number: 20240161377Abstract: In various examples, systems and methods are disclosed relating to generating a simulated environment and update a machine learning model to move each of a plurality of human characters having a plurality of body shapes, to follow a corresponding trajectory within the simulated environment as conditioned on a respective body shape. The simulated human characters can have diverse characteristics (such as gender, body proportions, body shape, and so on) as observed in real-life crowds. A machine learning model can determine an action for a human character in a simulated environment, based at least on a humanoid state, a body shape, and task-related features. The task-related features can include an environmental feature and a trajectory.Type: ApplicationFiled: March 31, 2023Publication date: May 16, 2024Applicant: NVIDIA CorporationInventors: Zhengyi Luo, Jason Peng, Sanja Fidler, Or Litany, Davis Winston Rempe, Ye Yuan