Patents by Inventor Vincent Sitzmann

Vincent Sitzmann has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

CROSS-ATTENTION DECODING FOR VOLUMETRIC RENDERING

Publication number: 20240161389

Abstract: Systems and methods described herein support enhanced computer vision capabilities which may be applicable to, for example, autonomous vehicle operation. An example method includes generating a latent space and a decoder based on image data that includes multiple images, where each image has a different viewing frame of a scene. The method also includes generating a volumetric embedding that is representative of a novel viewing frame of the scene. The method includes decoding, with the decoder, the latent space using cross-attention with the volumetric embedding, and generating a novel viewing frame of the scene based on an output of the decoder.

Type: Application

Filed: August 3, 2023

Publication date: May 16, 2024

Applicants: Toyota Research Institute, Inc., Massachusetts Institute of Technology, Toyota Jidosha Kabushiki Kaisha

Inventors: Vitor Guizilini, Rares A. Ambrus, Jiading Fang, Sergey Zakharov, Vincent Sitzmann, Igor Vasiljevic, Adrien Gaidon
RADIANT AND VOLUMETRIC LATENT SPACE ENCODING FOR VOLUMETRIC RENDERING

Publication number: 20240161471

Abstract: Systems and methods described herein support enhanced computer vision capabilities which may be applicable to, for example, autonomous vehicle operation. An example method includes generating, through training, a shared latent space based on (i) image data that include multiple images, where each image has a different viewing frame of a scene, and (ii) first and second types of embeddings, and training a decoder based on the first type of embeddings. The method also includes generating an embedding based on the first type of embeddings that is representative of a novel viewing frame of the scene, decoding, with the decoder, the shared latent space using cross-attention with the generated embedding, and generating the novel viewing frame of the scene based on an output of the decoder.

Type: Application

Filed: August 3, 2023

Publication date: May 16, 2024

Applicants: Toyota Research Institute, Inc., Massachusetts Institute of Technology, Toyota Jidosha Kabushiki Kaisha

Inventors: Vitor Guizilini, Rares A. Ambrus, Jiading Fang, Sergey Zakharov, Vincent Sitzmann, Igor Vasiljevic, Adrien Gaidon
SHARED LATENT SPACES FOR VOLUMETRIC RENDERING

Publication number: 20240161510

Abstract: Systems and methods described herein support enhanced computer vision capabilities which may be applicable to, for example, autonomous vehicle operation. An example method includes An example method includes training a shared latent space and a first decoder based on first image data that includes multiple images, and training the shared latent space and a second decoder based on second image data that includes multiple images. The method also includes generating a volumetric embedding that is representative of a novel viewing frame the first scene. Further, the method includes decoding, with the first decoders, the shared latent space with the volumetric embedding, and generating the novel viewing frame of the first scene based on the output of the first decoder.

Type: Application

Filed: August 3, 2023

Publication date: May 16, 2024

Applicants: Toyota Research Institute, Inc., Massachusetts Institute of Technology, Toyota Jidosha Kabushiki Kaisha

Inventors: Vitor Guizilini, Rares A. Ambrus, Jiading Fang, Sergey Zakharov, Vincent Sitzmann, Igor Vasiljevic, Adrien Gaidon
SELF-SUPERVISED DEPTH FOR VOLUMETRIC RENDERING REGULARIZATION

Publication number: 20240153197

Abstract: An example method includes generating embeddings of image data that includes multiple images, where each image has a different viewpoints of a scene, generating a latent space and a decoder, wherein the decoder receives embeddings as input to generate an output viewpoint, for each viewpoint in the image data, determining a volumetric rendering view synthesis loss and a multi-view photometric loss, and applying an optimization algorithm to the latent space and the decoder over a number of epochs until the volumetric rendering view synthesis loss is within a volumetric threshold and the multi-view photometric loss is within a multi-view threshold.

Type: Application

Filed: August 3, 2023

Publication date: May 9, 2024

Applicants: Toyota Research Institute, Inc., Massachusetts Institute of Technology, Toyota Jidosha Kabushiki Kaisha

Inventors: Vitor Guizilini, Rares A. Ambrus, Jiading Fang, Sergey Zakharov, Vincent Sitzmann, Igor Vasiljevic, Adrien Gaidon
Systems and methods for reconstructing a scene in three dimensions from a two-dimensional image

Patent number: 11887248

Abstract: Systems and methods described herein relate to reconstructing a scene in three dimensions from a two-dimensional image. One embodiment processes an image using a detection transformer to detect an object in the scene and to generate a NOCS map of the object and a background depth map; uses MLPs to relate the object to a differentiable database of object priors (PriorDB); recovers, from the NOCS map, a partial 3D object shape; estimates an initial object pose; fits a PriorDB object prior to align in geometry and appearance with the partial 3D shape to produce a complete shape and refines the initial pose estimate; generates an editable and re-renderable 3D scene reconstruction based, at least in part, on the complete shape, the refined pose estimate, and the depth map; and controls the operation of a robot based, at least in part, on the editable and re-renderable 3D scene reconstruction.

Type: Grant

Filed: March 16, 2022

Date of Patent: January 30, 2024

Assignees: Toyota Research Institute, Inc., Massachusetts Institute of Technology, The Board of Trustees of the Leland Standford Junior Univeristy

Inventors: Sergey Zakharov, Wadim Kehl, Vitor Guizilini, Adrien David Gaidon, Rares A. Ambrus, Dennis Park, Joshua Tenenbaum, Jiajun Wu, Fredo Durand, Vincent Sitzmann
SYSTEM AND METHOD OF CONDITIONAL NEURAL FLOORPLANS FOR STATIC-DYNAMIC DISENTANGLEMENT

Publication number: 20240005627

Abstract: A method of conditional neural ground planes for static-dynamic disentanglement is described. The method includes extracting, using a convolutional neural network (CNN), CNN image features from an image to form a feature tensor. The method also includes resampling unprojected 2D features of the feature tensor to form feature pillars. The method further includes aggregating the feature pillars to form an entangled neural ground plane. The method also includes decomposing the entangled neural ground plane into a static neural ground plane and a dynamic neural ground plane.

Type: Application

Filed: April 18, 2023

Publication date: January 4, 2024

Applicants: TOYOTA RESEARCH INSTITUTE, INC., TOYOTA JIDOSHA KABUSHIKI KAISHA, MASSACHUSETTS INSTITUTE OF TECHNOLOGY

Inventors: Prafull SHARMA, Ayush TEWARI, Yilun DU, Sergey ZAKHAROV, Rares Andrei AMBRUS, Adrien David GAIDON, William Tafel FREEMAN, Frederic Pierre DURAND, Joshua B. TENENBAUM, Vincent SITZMANN
SYSTEMS AND METHODS FOR RECONSTRUCTING A SCENE IN THREE DIMENSIONS FROM A TWO-DIMENSIONAL IMAGE

Publication number: 20220414974

Abstract: Systems and methods described herein relate to reconstructing a scene in three dimensions from a two-dimensional image. One embodiment processes an image using a detection transformer to detect an object in the scene and to generate a NOCS map of the object and a background depth map; uses MLPs to relate the object to a differentiable database of object priors (PriorDB); recovers, from the NOCS map, a partial 3D object shape; estimates an initial object pose; fits a PriorDB object prior to align in geometry and appearance with the partial 3D shape to produce a complete shape and refines the initial pose estimate; generates an editable and re-renderable 3D scene reconstruction based, at least in part, on the complete shape, the refined pose estimate, and the depth map; and controls the operation of a robot based, at least in part, on the editable and re-renderable 3D scene reconstruction.

Type: Application

Filed: March 16, 2022

Publication date: December 29, 2022

Applicants: Toyota Research Institute, Inc., Massachusetts Institute of Technology, The Board of Trustees of the Leland Stanford Junior University

Inventors: Sergey Zakharov, Wadim Kehl, Vitor Guizilini, Adrien David Gaidon, Rares A. Ambrus, Dennis Park, Joshua Tenenbaum, Jiajun Wu, Fredo Durand, Vincent Sitzmann

CROSS-ATTENTION DECODING FOR VOLUMETRIC RENDERING

RADIANT AND VOLUMETRIC LATENT SPACE ENCODING FOR VOLUMETRIC RENDERING

SHARED LATENT SPACES FOR VOLUMETRIC RENDERING

SELF-SUPERVISED DEPTH FOR VOLUMETRIC RENDERING REGULARIZATION

Systems and methods for reconstructing a scene in three dimensions from a two-dimensional image

SYSTEM AND METHOD OF CONDITIONAL NEURAL FLOORPLANS FOR STATIC-DYNAMIC DISENTANGLEMENT

SYSTEMS AND METHODS FOR RECONSTRUCTING A SCENE IN THREE DIMENSIONS FROM A TWO-DIMENSIONAL IMAGE