Patents by Inventor Igor Vasiljevic
Igor Vasiljevic has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12293548Abstract: Systems, methods, and other embodiments described herein relate to estimating scaled depth maps by sampling variational representations of an image using a learning model. In one embodiment, a method includes encoding data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and calibration information about the camera. The method also includes computing a probability distribution of the conditioned latent representations by factoring scale priors. The method also includes sampling the probability distribution to generate variations for the data embeddings. The method also includes estimating scaled depth maps of a scene from the variations at different coordinates using the attention networks.Type: GrantFiled: October 13, 2023Date of Patent: May 6, 2025Assignees: Toyota Research Institute, Inc., Toyota Jidosha Kabushiki KaishaInventors: Vitor Campagnolo Guizilini, Igor Vasiljevic, Dian Chen, Adrien David Gaidon, Rares A. Ambrus
-
Publication number: 20250095380Abstract: Systems and methods described herein relate to self-supervised scale-aware learning of camera extrinsic parameters. One embodiment processes instantaneous velocity between a target image and a context image captured by a first camera; jointly training a depth network and pose network based on scaling by the instantaneous velocity; produce depth map using the depth network; produce ego-motion of the first camera using the pose network; generate synthesized image from the target image using a reprojection operation based on the depth map, the ego-motion, the context image and camera intrinsics; determine photometric loss by comparing the synthesized image to the target image; generate photometric consistency constraint using a gradient from the photometric loss; determine pose consistency constraint between the first camera and a second camera; and optimize the photometric consistency constraint, the pose consistency constraint, the depth network and the pose network to generate estimated extrinsic parameters.Type: ApplicationFiled: September 18, 2023Publication date: March 20, 2025Applicants: TOYOTA RESEARCH INSTITUTE, INC., TOYOTA JIDOSHA KABUSHIKI KAISHAInventors: TAKAYUKI KANAI, Vitor Campagnolo Guizilini, Rares A. Ambrus, Adrien Gaidon, Igor Vasiljevic
-
Patent number: 12175708Abstract: Systems and methods described herein relate to self-supervised learning of camera intrinsic parameters from a sequence of images. One embodiment produces a depth map from a current image frame captured by a camera; generates a point cloud from the depth map using a differentiable unprojection operation; produces a camera pose estimate from the current image frame and a context image frame; produces a warped point cloud based on the camera pose estimate; generates a warped image frame from the warped point cloud using a differentiable projection operation; compares the warped image frame with the context image frame to produce a self-supervised photometric loss; updates a set of estimated camera intrinsic parameters on a per-image-sequence basis using one or more gradients from the self-supervised photometric loss; and generates, based on a converged set of learned camera intrinsic parameters, a rectified image frame from an image frame captured by the camera.Type: GrantFiled: March 11, 2022Date of Patent: December 24, 2024Assignees: Toyota Research Institute, Inc., Toyota Technological Institute at ChicagoInventors: Vitor Guizilini, Adrien David Gaidon, Rares A. Ambrus, Igor Vasiljevic, Jiading Fang, Gregory Shakhnarovich, Matthew R. Walter
-
Publication number: 20240354991Abstract: Systems, methods, and other embodiments described herein relate to estimating scaled depth maps by sampling variational representations of an image using a learning model. In one embodiment, a method includes encoding data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and calibration information about the camera. The method also includes computing a probability distribution of the conditioned latent representations by factoring scale priors. The method also includes sampling the probability distribution to generate variations for the data embeddings. The method also includes estimating scaled depth maps of a scene from the variations at different coordinates using the attention networks.Type: ApplicationFiled: October 13, 2023Publication date: October 24, 2024Applicants: Toyota Research Institute, Inc., Toyota Jidosha Kabushiki KaishaInventors: Vitor Campagnolo Guizilini, Igor Vasiljevic, Dian Chen, Adrien David Gaidon, Rares A. Ambrus
-
Publication number: 20240354973Abstract: Systems, methods, and other embodiments described herein relate to augmenting image embeddings using derived geometries for estimating scaled depth. In one embodiment, a method includes generating a geometric viewing vector using pixel coordinates and intrinsic parameters about a camera for an image captured about a scene. The method also includes deriving geometric embeddings from the geometric viewing vector associated with the image for the camera. The method also includes computing a representation by augmenting image embeddings with the geometric embeddings, the image embeddings associated with visual characteristics about the image. The method also includes estimating a scaled depth of the image from the representation.Type: ApplicationFiled: September 12, 2023Publication date: October 24, 2024Applicants: Toyota Research Institute, Inc., Toyota Jidosha Kabushiki KaishaInventors: Vitor Campagnolo Guizilini, Igor Vasiljevic, Dian Chen, Adrien David Gaidon, Rares A. Ambrus
-
Publication number: 20240354974Abstract: Systems, methods, and other embodiments described herein relate to augmenting an image frame during training that enhances scene geometries and transformation capabilities for depth prediction. In one embodiment, a method includes generating rays with camera intrinsics to form a grid for an image frame. The method also includes injecting noise, by an encoder during training of a learning model, to individually perturb pixels within pixel boundaries for the rays, the pixel boundaries defined by the grid. The method also includes removing a subset of the rays randomly by the encoder and extract features from the rays. The method also includes comparing scaled depth estimates to a ground truth for a grid resolution using the features and adjust the learning model from the comparison.Type: ApplicationFiled: November 17, 2023Publication date: October 24, 2024Applicants: Toyota Research Institute, Inc., Toyota Jidosha Kabushiki KaishaInventors: Vitor Campagnolo Guizilini, Igor Vasiljevic, Dian Chen, Adrien David Gaidon, Rares Andrei Ambrus
-
Publication number: 20240331268Abstract: System, methods, and other embodiments described herein relate to generating an image by interpolating features estimated from a learning model. In one embodiment, a method includes sampling three-dimensional (3D) points of a light ray that crosses a frustum space associated with a single-view camera, the 3D points reflecting depth estimates derived from data that the single-view camera generates for a scene. The method also includes deriving feature values for the 3D points using tri-linear interpolation across feature planes of the frustum space, the feature planes being estimated by a learning model. The method also includes inferring an image in two dimensions (2D) by translating the feature values and compositing the data with volumetric rendering for the scene. The method also includes executing a control task by a controller using the image.Type: ApplicationFiled: March 29, 2023Publication date: October 3, 2024Applicants: Toyota Research Institute, Inc., Toyota Jidosha Kabushiki Kaisha, Toyota Technological Institute at ChicagoInventors: Jiading Fang, Vitor Guizilini, Igor Vasiljevic, Rares A. Ambrus, Gregory Shakhnarovich, Matthew R. Walter, Adrien David Gaidon
-
Publication number: 20240249465Abstract: Systems and methods for enhanced computer vision capabilities, particularly including depth synthesis, which may be applicable to autonomous vehicle operation are described. A vehicle may be equipped with a geometric scene representation (GSR) architecture for synthesizing depth views at arbitrary viewpoints. The GSR architecture synthesizes depth views enable advanced functions, including depth interpolation and depth extrapolation. The GSR architecture implements functions (i.e., depth interpolation, depth extrapolation) that are useful for various computer vision applications for autonomous vehicles, such as predicting depth maps from unseen locations. For example, a vehicle includes a processor device synthesizing depth views at multiple viewpoints, where the multiple viewpoints are from image data of a surrounding environment for the vehicle.Type: ApplicationFiled: January 19, 2023Publication date: July 25, 2024Applicants: TOYOTA RESEARCH INSTITUTE, INC., TOYOTA JIDOSHA KABUSHIKI KAISHA, TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGOInventors: VITOR GUIZILINI, Igor Vasiljevic, Adrien D. Gaidon, Greg Shakhnarovich, Matthew Walter, Jiading Fang, Rares A. Ambrus
-
Patent number: 12033341Abstract: A method for scale-aware depth estimation using multi-camera projection loss is described. The method includes determining a multi-camera photometric loss associated with a multi-camera rig of an ego vehicle. The method also includes training a scale-aware depth estimation model and an ego-motion estimation model according to the multi-camera photometric loss. The method further includes predicting a 360° point cloud of a scene surrounding the ego vehicle according to the scale-aware depth estimation model and the ego-motion estimation model. The method also includes planning a vehicle control action of the ego vehicle according to the 360° point cloud of the scene surrounding the ego vehicle.Type: GrantFiled: July 30, 2021Date of Patent: July 9, 2024Assignees: TOYOTA RESEARCH INSTITUTE, INC., TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGOInventors: Vitor Guizilini, Rares Andrei Ambrus, Adrien David Gaidon, Igor Vasiljevic, Gregory Shakhnarovich
-
Publication number: 20240161471Abstract: Systems and methods described herein support enhanced computer vision capabilities which may be applicable to, for example, autonomous vehicle operation. An example method includes generating, through training, a shared latent space based on (i) image data that include multiple images, where each image has a different viewing frame of a scene, and (ii) first and second types of embeddings, and training a decoder based on the first type of embeddings. The method also includes generating an embedding based on the first type of embeddings that is representative of a novel viewing frame of the scene, decoding, with the decoder, the shared latent space using cross-attention with the generated embedding, and generating the novel viewing frame of the scene based on an output of the decoder.Type: ApplicationFiled: August 3, 2023Publication date: May 16, 2024Applicants: Toyota Research Institute, Inc., Massachusetts Institute of Technology, Toyota Jidosha Kabushiki KaishaInventors: Vitor Guizilini, Rares A. Ambrus, Jiading Fang, Sergey Zakharov, Vincent Sitzmann, Igor Vasiljevic, Adrien Gaidon
-
Publication number: 20240161389Abstract: Systems and methods described herein support enhanced computer vision capabilities which may be applicable to, for example, autonomous vehicle operation. An example method includes generating a latent space and a decoder based on image data that includes multiple images, where each image has a different viewing frame of a scene. The method also includes generating a volumetric embedding that is representative of a novel viewing frame of the scene. The method includes decoding, with the decoder, the latent space using cross-attention with the volumetric embedding, and generating a novel viewing frame of the scene based on an output of the decoder.Type: ApplicationFiled: August 3, 2023Publication date: May 16, 2024Applicants: Toyota Research Institute, Inc., Massachusetts Institute of Technology, Toyota Jidosha Kabushiki KaishaInventors: Vitor Guizilini, Rares A. Ambrus, Jiading Fang, Sergey Zakharov, Vincent Sitzmann, Igor Vasiljevic, Adrien Gaidon
-
Publication number: 20240161510Abstract: Systems and methods described herein support enhanced computer vision capabilities which may be applicable to, for example, autonomous vehicle operation. An example method includes An example method includes training a shared latent space and a first decoder based on first image data that includes multiple images, and training the shared latent space and a second decoder based on second image data that includes multiple images. The method also includes generating a volumetric embedding that is representative of a novel viewing frame the first scene. Further, the method includes decoding, with the first decoders, the shared latent space with the volumetric embedding, and generating the novel viewing frame of the first scene based on the output of the first decoder.Type: ApplicationFiled: August 3, 2023Publication date: May 16, 2024Applicants: Toyota Research Institute, Inc., Massachusetts Institute of Technology, Toyota Jidosha Kabushiki KaishaInventors: Vitor Guizilini, Rares A. Ambrus, Jiading Fang, Sergey Zakharov, Vincent Sitzmann, Igor Vasiljevic, Adrien Gaidon
-
Publication number: 20240153197Abstract: An example method includes generating embeddings of image data that includes multiple images, where each image has a different viewpoints of a scene, generating a latent space and a decoder, wherein the decoder receives embeddings as input to generate an output viewpoint, for each viewpoint in the image data, determining a volumetric rendering view synthesis loss and a multi-view photometric loss, and applying an optimization algorithm to the latent space and the decoder over a number of epochs until the volumetric rendering view synthesis loss is within a volumetric threshold and the multi-view photometric loss is within a multi-view threshold.Type: ApplicationFiled: August 3, 2023Publication date: May 9, 2024Applicants: Toyota Research Institute, Inc., Massachusetts Institute of Technology, Toyota Jidosha Kabushiki KaishaInventors: Vitor Guizilini, Rares A. Ambrus, Jiading Fang, Sergey Zakharov, Vincent Sitzmann, Igor Vasiljevic, Adrien Gaidon
-
Publication number: 20240029286Abstract: A method of generating additional supervision data to improve learning of a geometrically-consistent latent scene representation with a geometric scene representation architecture is provided. The method includes receiving, with a computing device, a latent scene representation encoding a pointcloud from images of a scene captured by a plurality of cameras each with known intrinsics and poses, generating a virtual camera having a viewpoint different from viewpoints of the plurality of cameras, projecting information from the pointcloud onto the viewpoint of the virtual camera, and decoding the latent scene representation based on the virtual camera thereby generating an RGB image and depth map corresponding to the viewpoint of the virtual camera for implementation as additional supervision data.Type: ApplicationFiled: February 16, 2023Publication date: January 25, 2024Applicants: Toyota Research Institute, Inc., Toyota Jidosha Kabushiki Kaisha, Toyota Technological Institute at ChicagoInventors: Vitor Guizilini, Igor Vasiljevic, Adrien D. Gaidon, Jiading Fang, Gregory Shakhnarovich, Matthew R. Walter, Rares A. Ambrus
-
Patent number: 11875521Abstract: A method for self-supervised depth and ego-motion estimation is described. The method includes determining a multi-camera photometric loss associated with a multi-camera rig of an ego vehicle. The method also includes generating a self-occlusion mask by manually segmenting self-occluded areas of images captured by the multi-camera rig of the ego vehicle. The method further includes multiplying the multi-camera photometric loss with the self-occlusion mask to form a self-occlusion masked photometric loss. The method also includes training a depth estimation model and an ego-motion estimation model according to the self-occlusion masked photometric loss. The method further includes predicting a 360° point cloud of a scene surrounding the ego vehicle according to the depth estimation model and the ego-motion estimation model.Type: GrantFiled: July 26, 2021Date of Patent: January 16, 2024Assignee: TOYOTA RESEARCH INSTITUTE, INC.Inventors: Vitor Guizilini, Rares Andrei Ambrus, Adrien David Gaidon, Igor Vasiljevic, Gregory Shakhnarovich
-
Patent number: 11727589Abstract: A method for multi-camera monocular depth estimation using pose averaging is described. The method includes determining a multi-camera photometric loss associated with a multi-camera rig of an ego vehicle. The method also includes determining a multi-camera pose consistency constraint (PCC) loss associated with the multi-camera rig of the ego vehicle. The method further includes adjusting the multi-camera photometric loss according to the multi-camera PCC loss to form a multi-camera PCC photometric loss. The method also includes training a multi-camera depth estimation model and an ego-motion estimation model according to the multi-camera PCC photometric loss. The method further includes predicting a 360° point cloud of a scene surrounding the ego vehicle according to the trained multi-camera depth estimation model and the ego-motion estimation model.Type: GrantFiled: July 16, 2021Date of Patent: August 15, 2023Assignee: TOYOTA RESEARCH INSTITUTE, INC.Inventors: Vitor Guizilini, Rares Andrei Ambrus, Adrien David Gaidon, Igor Vasiljevic, Gregory Shakhnarovich
-
Patent number: 11704821Abstract: A method for monocular depth/pose estimation in a camera agnostic network is described. The method includes projecting lifted 3D points onto an image plane according to a predicted ray vector based on a monocular depth model, a monocular pose model, and a camera center of a camera agnostic network. The method also includes predicting a warped target image from a predicted depth map of the monocular depth model, a ray surface of the predicted ray vector, and a projection of the lifted 3D points according to the camera agnostic network.Type: GrantFiled: January 21, 2022Date of Patent: July 18, 2023Assignee: TOYOTA RESEARCH INSTITUTE, INC.Inventors: Vitor Guizilini, Sudeep Pillai, Adrien David Gaidon, Rares A. Ambrus, Igor Vasiljevic
-
Patent number: 11704822Abstract: Systems and methods for self-supervised depth estimation using image frames captured from a camera mounted on a vehicle comprise: receiving a first image from the camera mounted at a first location on the vehicle; receiving a second image from the camera mounted at a second location on the vehicle; predicting a depth map for the first image; warping the first image to a perspective of the camera mounted at the second location on the vehicle to arrive at a warped first image; projecting the warped first image onto the second image; determining a loss based on the projection; and updating the predicted depth values for the first image.Type: GrantFiled: January 13, 2022Date of Patent: July 18, 2023Assignee: TOYOTA RESEARCH INSTITUTE, INC.Inventors: Vitor Guizilini, Igor Vasiljevic, Rares A. Ambrus, Adrien Gaidon
-
Patent number: 11688090Abstract: A method for multi-camera self-supervised depth evaluation is described. The method includes training a self-supervised depth estimation model and an ego-motion estimation model according to a multi-camera photometric loss associated with a multi-camera rig of an ego vehicle. The method also includes generating a single-scale correction factor according to a depth map of each camera of the multi-camera rig during a time-step. The method further includes predicting a 360° point cloud of a scene surrounding the ego vehicle according to the self-supervised depth estimation model and the ego-motion estimation model. The method also includes scaling the 360° point cloud according to the single-scale correction factor to form an aligned 360° point cloud.Type: GrantFiled: July 15, 2021Date of Patent: June 27, 2023Assignee: TOYOTA RESEARCH INSTITUTE, INC.Inventors: Vitor Guizilini, Rares Andrei Ambrus, Adrien David Gaidon, Igor Vasiljevic, Gregory Shakhnarovich
-
Patent number: 11652972Abstract: System, methods, and other embodiments described herein relate to improving depth estimates for monocular images using a neural camera model that is independent of a camera type. In one embodiment, a method includes receiving a monocular image from a pair of training images derived from a monocular video. The method includes generating, using a ray surface network, a ray surface that approximates an image character of the monocular image as produced by a camera having the camera type. The method includes creating a synthesized image according to at least the ray surface and a depth map associated with the monocular image.Type: GrantFiled: June 12, 2020Date of Patent: May 16, 2023Assignee: Toyota Research Institute, Inc.Inventors: Vitor Guizilini, Igor Vasiljevic, Rares A. Ambrus, Sudeep Pillai, Adrien David Gaidon