Patents by Inventor Vitor Guizilini
Vitor Guizilini has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11966234Abstract: A method for controlling an ego agent includes capturing a two-dimensional (2D) image of an environment adjacent to the ego agent. The method also includes generating a semantically segmented image of the environment based on the 2D image. The method further includes generating a depth map of the environment based on the semantically segmented image. The method additionally includes generating a three-dimensional (3D) estimate of the environment based on the depth map. The method also includes controlling an action of the ego agent based on the identified location.Type: GrantFiled: July 23, 2020Date of Patent: April 23, 2024Assignee: TOYOTA RESEARCH INSTITUTE, INC.Inventors: Vitor Guizilini, Jie Li, Rares A. Ambrus, Sudeep Pillai, Adrien Gaidon
-
Patent number: 11948310Abstract: Systems and methods described herein relate to jointly training a machine-learning-based monocular optical flow, depth, and scene flow estimator. One embodiment processes a pair of temporally adjacent monocular image frames using a first neural network structure to produce a first optical flow estimate; processes the pair of temporally adjacent monocular image frames using a second neural network structure to produce an estimated depth map and an estimated scene flow; processes the estimated depth map and the estimated scene flow using the second neural network structure to produce a second optical flow estimate; and imposes a consistency loss between the first optical flow estimate and the second optical flow estimate that minimizes a difference between the first optical flow estimate and the second optical flow estimate to improve performance of the first neural network structure in estimating optical flow and the second neural network structure in estimating depth and scene flow.Type: GrantFiled: September 29, 2021Date of Patent: April 2, 2024Assignee: Toyota Research Institute, Inc.Inventors: Vitor Guizilini, Rares A. Ambrus, Kuan-Hui Lee, Adrien David Gaidon
-
Patent number: 11948309Abstract: Systems and methods described herein relate to jointly training a machine-learning-based monocular optical flow, depth, and scene flow estimator. One embodiment processes a pair of temporally adjacent monocular image frames using a first neural network structure to produce an optical flow estimate and to extract, from at least one image frame in the pair of temporally adjacent monocular image frames, a set of encoded image context features; triangulates the optical flow estimate to generate a depth map; extracts a set of encoded depth context features from the depth map using a depth context encoder; and combines the set of encoded image context features and the set of encoded depth context features to improve performance of a second neural network structure in estimating depth and scene flow.Type: GrantFiled: September 29, 2021Date of Patent: April 2, 2024Assignee: Toyota Research Institute, Inc.Inventors: Vitor Guizilini, Rares A. Ambrus, Kuan-Hui Lee, Adrien David Gaidon
-
Publication number: 20240087151Abstract: A method for controlling a vehicle in an environment includes generating, via a cross-attention model, a cross-attention cost volume based on a current image of the environment and a previous image of the environment in a sequence of images. The method also includes generating combined features by combining cost volume features of the cross-attention cost volume with single-frame features associated with the current image. The single-frame features may be generated via a single-frame encoding model. The method further includes generating a depth estimate of the current image based on the combined features. The method still further includes controlling an action of the vehicle based on the depth estimate.Type: ApplicationFiled: September 6, 2022Publication date: March 14, 2024Applicants: TOYOTA RESEARCH INSTITUTE, INC., TOYOTA JIDOSHA KABUSHIKI KAISHAInventor: Vitor GUIZILINI
-
Patent number: 11915487Abstract: Systems and methods to improve machine learning by explicitly over-fitting environmental data obtained by an imaging system, such as a monocular camera are disclosed. The system includes training self-supervised depth and pose networks in monocular visual data collected from a certain area over multiple passes. Pose and depth networks may be trained by extracting data from multiple images of a single environment or trajectory, allowing the system to overfit the image data.Type: GrantFiled: May 5, 2020Date of Patent: February 27, 2024Assignee: TOYOTA RESEARCH INSTITUTE, INC.Inventors: Rares A. Ambrus, Vitor Guizilini, Sudeep Pillai, Adrien David Gaidon
-
Patent number: 11900626Abstract: A method for learning depth-aware keypoints and associated descriptors from monocular video for ego-motion estimation is described. The method includes training a keypoint network and a depth network to learn depth-aware keypoints and the associated descriptors. The training is based on a target image and a context image from successive images of the monocular video. The method also includes lifting 2D keypoints from the target image to learn 3D keypoints based on a learned depth map from the depth network. The method further includes estimating ego-motion from the target image to the context image based on the learned 3D keypoints.Type: GrantFiled: November 9, 2020Date of Patent: February 13, 2024Assignee: TOYOTA RESEARCH INSTITUTE, INC.Inventors: Jiexiong Tang, Rares A. Ambrus, Vitor Guizilini, Sudeep Pillai, Hanme Kim, Adrien David Gaidon
-
Publication number: 20240046655Abstract: A method for keypoint matching performed by a semantically aware keypoint matching model includes generating a semanticly segmented image from an image captured by a sensor of an agent, the semanticly segmented image associating a respective semantic label with each pixel of a group of pixels associated with the image. The method also includes generating a set of augmented keypoint descriptors by augmenting, for each keypoint of the set of keypoints associated with the image, a keypoint descriptor with semantic information associated with one or more pixels, of the semantically segmented image, corresponding to the keypoint. The method further includes controlling an action of the agent in accordance with identifying a target image having one or more first augmented keypoint descriptors that match one or more second augmented keypoint descriptors of the set of augmented keypoint descriptors.Type: ApplicationFiled: October 18, 2023Publication date: February 8, 2024Applicant: TOYOTA RESEARCH INSTITUTE, INC.Inventors: Jiexiong TANG, Rares Andrei AMBRUS, Vitor GUIZILINI, Adrien David GAIDON
-
Patent number: 11891094Abstract: Information that identifies a location can be received. In response to a receipt of the information that identifies the location, a file can be retrieved. The file can be for the location. The file can include image data and a set of node data. The set of node data can include information that identifies nodes in a neural network, information that identifies inputs of the nodes, and values of weights to be applied to the inputs. In response to a retrieval of the file, the weights can be applied to the inputs of the nodes and the image data can be received for the neural network. In response to an application of the weights and a receipt of the image data, the neural network can be executed to produce a digital map for the location. The digital map for the location can be transmitted to an automotive navigation system.Type: GrantFiled: June 10, 2020Date of Patent: February 6, 2024Assignee: Toyota Research Institute, Inc.Inventors: Vitor Guizilini, Rares A. Ambrus, Sudeep Pillai, Adrien David Gaidon
-
Patent number: 11887248Abstract: Systems and methods described herein relate to reconstructing a scene in three dimensions from a two-dimensional image. One embodiment processes an image using a detection transformer to detect an object in the scene and to generate a NOCS map of the object and a background depth map; uses MLPs to relate the object to a differentiable database of object priors (PriorDB); recovers, from the NOCS map, a partial 3D object shape; estimates an initial object pose; fits a PriorDB object prior to align in geometry and appearance with the partial 3D shape to produce a complete shape and refines the initial pose estimate; generates an editable and re-renderable 3D scene reconstruction based, at least in part, on the complete shape, the refined pose estimate, and the depth map; and controls the operation of a robot based, at least in part, on the editable and re-renderable 3D scene reconstruction.Type: GrantFiled: March 16, 2022Date of Patent: January 30, 2024Assignees: Toyota Research Institute, Inc., Massachusetts Institute of Technology, The Board of Trustees of the Leland Standford Junior UniveristyInventors: Sergey Zakharov, Wadim Kehl, Vitor Guizilini, Adrien David Gaidon, Rares A. Ambrus, Dennis Park, Joshua Tenenbaum, Jiajun Wu, Fredo Durand, Vincent Sitzmann
-
Publication number: 20240029286Abstract: A method of generating additional supervision data to improve learning of a geometrically-consistent latent scene representation with a geometric scene representation architecture is provided. The method includes receiving, with a computing device, a latent scene representation encoding a pointcloud from images of a scene captured by a plurality of cameras each with known intrinsics and poses, generating a virtual camera having a viewpoint different from viewpoints of the plurality of cameras, projecting information from the pointcloud onto the viewpoint of the virtual camera, and decoding the latent scene representation based on the virtual camera thereby generating an RGB image and depth map corresponding to the viewpoint of the virtual camera for implementation as additional supervision data.Type: ApplicationFiled: February 16, 2023Publication date: January 25, 2024Applicants: Toyota Research Institute, Inc., Toyota Jidosha Kabushiki Kaisha, Toyota Technological Institute at ChicagoInventors: Vitor Guizilini, Igor Vasiljevic, Adrien D. Gaidon, Jiading Fang, Gregory Shakhnarovich, Matthew R. Walter, Rares A. Ambrus
-
Patent number: 11875521Abstract: A method for self-supervised depth and ego-motion estimation is described. The method includes determining a multi-camera photometric loss associated with a multi-camera rig of an ego vehicle. The method also includes generating a self-occlusion mask by manually segmenting self-occluded areas of images captured by the multi-camera rig of the ego vehicle. The method further includes multiplying the multi-camera photometric loss with the self-occlusion mask to form a self-occlusion masked photometric loss. The method also includes training a depth estimation model and an ego-motion estimation model according to the self-occlusion masked photometric loss. The method further includes predicting a 360° point cloud of a scene surrounding the ego vehicle according to the depth estimation model and the ego-motion estimation model.Type: GrantFiled: July 26, 2021Date of Patent: January 16, 2024Assignee: TOYOTA RESEARCH INSTITUTE, INC.Inventors: Vitor Guizilini, Rares Andrei Ambrus, Adrien David Gaidon, Igor Vasiljevic, Gregory Shakhnarovich
-
Publication number: 20240010225Abstract: A method of representation learning for object detection from unlabeled point cloud sequences is described. The method includes detecting moving object traces from temporally-ordered, unlabeled point cloud sequences. The method also includes extracting a set of moving objects based on the moving object traces detected from the sequence of temporally-ordered, unlabeled point cloud sequences. The method further includes classifying the set of moving objects extracted from on the moving object traces detected from the sequence of temporally-ordered, unlabeled point cloud sequences. The method also includes estimating 3D bounding boxes for the set of moving objects based on the classifying of the set of moving objects.Type: ApplicationFiled: July 7, 2022Publication date: January 11, 2024Applicants: TOYOTA RESEARCH INSTITUTE, INC., TOYOTA JIDOSHA KABUSHIKI KAISHA, MASSACHUSETTS INSTITUE OF TECHNOLOGYInventors: Xiangru HUANG, Yue WANG, Vitor GUIZILINI, Rares Andrei AMBRUS, Adrien David GAIDON, Justin SOLOMON
-
Patent number: 11868439Abstract: Systems, methods, and other embodiments described herein relate to training a multi-task network using real and virtual data. In one embodiment, a method includes acquiring training data that includes real data and virtual data for training a multi-task network that performs at least depth prediction and semantic segmentation. The method includes generating a first output from the multi-task network using the real data and second output from the multi-task network using the virtual data. The method includes generating a mixed loss by analyzing the first output to produce a real loss and the second output to produce a virtual loss. The method includes updating the multi-task network using the mixed loss.Type: GrantFiled: March 29, 2021Date of Patent: January 9, 2024Assignee: Toyota Research Institute, Inc.Inventors: Vitor Guizilini, Adrien David Gaidon, Jie Li, Rares A. Ambrus
-
Publication number: 20240005540Abstract: System, methods, and other embodiments described herein relate to an improved approach to training a depth model to derive depth estimates from monocular images using cost volumes. In one embodiment, a method includes predicting, using a depth model, depth values from at least one input image that is a monocular image. The method includes generating a cost volume by sampling the depth values corresponding to bins of the cost volume. The method includes determining loss values for the bins of the cost volume. The method includes training the depth model according to the loss values of the cost volume.Type: ApplicationFiled: May 27, 2022Publication date: January 4, 2024Applicants: Toyota Research Institute, Inc., Toyota Jidosha Kabushiki KaishaInventors: Vitor Guizilini, Rares A. Ambrus, Sergey Zakharov
-
Publication number: 20230394691Abstract: Systems and methods are provided for depth estimation from monocular images using a depth model with sparse range sensor data and uncertainty in the range sensor as inputs thereto. According to some embodiments, the methods and systems comprise receiving an image captured by an image sensor, where the image represents a scene of an environment. The method and systems also comprise deriving a point cloud representative of the scene of the environment from range sensor data, and deriving range sensor uncertainty from the range sensor data. Then a depth map can be derived for the image based on the point cloud and the range sensor uncertainty as one or more inputs into a depth model.Type: ApplicationFiled: June 7, 2022Publication date: December 7, 2023Applicants: TOYOTA RESEARCH INSTITUTE, INC., TOYOTA JIDOSHA KABUSHIKI KAISHAInventors: Vitor Guizilini, Jie Li, Charles Christopher Ochoa
-
Publication number: 20230386060Abstract: System, methods, and other embodiments described herein relate to an improved approach to training a depth model to derive depth estimates from monocular images using histograms to assess photometric losses. In one embodiment, a method includes determining loss values according to a photometric loss function. The loss values are associated with a depth map derived from an input image that is a monocular image. The method includes generating histograms for the loss values corresponding to different regions of a target image. The method includes, responsive to identifying erroneous values of the loss values, masking the erroneous values to avoid considering the erroneous values during training of the depth model.Type: ApplicationFiled: May 27, 2022Publication date: November 30, 2023Applicants: Toyota Research Institute, Inc., Toyota Jidosha Kabushiki KaishaInventors: Vitor Guizilini, Rares A. Ambrus, Sergey Zakharov
-
Publication number: 20230386059Abstract: System, methods, and other embodiments described herein relate to an improved approach to training a depth model for monocular depth estimation by warping depth features prior to decoding. In one embodiment, a method includes encoding, using an encoder of a depth model, a source image into depth features of a scene depicted by the source image. The method includes warping the depth features into warped features of a target frame of a target image associated with the source image. The method includes decoding, using a decoder of the depth model, the warped features into a depth map. The method includes training the depth model according to a loss derived from the depth map.Type: ApplicationFiled: May 27, 2022Publication date: November 30, 2023Applicants: Toyota Research Institute, Inc., Toyota Jidosha Kabushiki KaishaInventors: Vitor Guizilini, Rares A. Ambrus, Sergey Zakharov
-
Patent number: 11830253Abstract: A method for keypoint matching includes receiving an input image obtained by a sensor of an agent. The method also includes identifying a set of keypoints of the received image. The method further includes augmenting the descriptor of each of the keypoints with semantic information of the input image. The method also includes identifying a target image based on one or more semantically augmented descriptors of the target image matching one or more semantically augmented descriptors of the input image. The method further includes controlling an action of the agent in response to identifying the target.Type: GrantFiled: April 14, 2021Date of Patent: November 28, 2023Assignee: TOYOTA RESEARCH INSTITUTE, INC.Inventors: Jiexiong Tang, Rares Andrei Ambrus, Vitor Guizilini, Adrien David Gaidon
-
Patent number: 11822621Abstract: Systems and methods described herein relate to training a machine-learning-based monocular depth estimator.Type: GrantFiled: March 31, 2021Date of Patent: November 21, 2023Assignee: Toyota Research Institute, Inc.Inventors: Vitor Guizilini, Rares A. Ambrus, Adrien David Gaidon, Jie Li
-
Publication number: 20230360243Abstract: A method for multi-camera monocular depth estimation using pose averaging is described. The method includes determining a multi-camera photometric loss associated with a multi-camera rig of an ego vehicle. The method also includes determining a multi-camera pose consistency constraint (PCC) loss associated with the multi-camera rig of the ego vehicle. The method further includes adjusting the multi-camera photometric loss according to the multi-camera PCC loss to form a multi-camera PCC photometric loss. The method also includes training a multi-camera depth estimation model and an ego-motion estimation model according to the multi-camera PCC photometric loss. The method further includes predicting a 360° point cloud of a scene surrounding the ego vehicle according to the trained multi-camera depth estimation model and the ego-motion estimation model.Type: ApplicationFiled: June 29, 2023Publication date: November 9, 2023Applicant: TOYOTA RESEARCH INSTITUTE, INC.Inventors: Vitor GUIZILINI, Rares Andrei AMBRUS, Adrien David GAIDON, Igor VASILJEVIC, Gregory SHAKHNAROVICH