Patents by Inventor Vitor Guizilini

Vitor Guizilini has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220005217
    Abstract: A method for estimating depth of a scene includes selecting an image of the scene from a sequence of images of the scene captured via an in-vehicle sensor of a first agent. The method also includes identifying previously captured images of the scene. The method further includes selecting a set of images from the previously captured images based on each image of the set of images satisfying depth criteria. The method still further includes estimating the depth of the scene based on the selected image and the selected set of images.
    Type: Application
    Filed: July 6, 2021
    Publication date: January 6, 2022
    Applicant: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Jiexiong TANG, Rares Andrei AMBRUS, Sudeep PILLAI, Vitor GUIZILINI, Adrien David GAIDON
  • Publication number: 20210407117
    Abstract: Systems and methods for extracting ground plane information directly from monocular images using self-supervised depth networks are disclosed. Self-supervised depth networks are used to generate a three-dimensional reconstruction of observed structures. From this reconstruction the system may generate surface normals. The surface normals can be calculated directly from depth maps in a way that is much less computationally expensive and accurate than surface normals extraction from standard LiDAR data. Surface normals facing substantially the same direction and facing upwards may be determined to reflect a ground plane.
    Type: Application
    Filed: June 26, 2020
    Publication date: December 30, 2021
    Applicant: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Vitor GUIZILINI, Rares A. AMBRUS, Adrien David GAIDON
  • Publication number: 20210407115
    Abstract: Systems and methods for generating depth models and depth maps from images obtained from an imaging system are presented. A self-supervised neural network may be capable of regularizing depth information from surface normals. Rather than rely on separate depth and surface normal networks, surface normal information is extracted from the depth information and a smoothness function is applied to the surface normals instead of a depth gradient. Smoothing the surface normal may provide improved representation of environmental structures by both smoothing texture-less areas while preserving sharp boundaries between structures.
    Type: Application
    Filed: June 26, 2020
    Publication date: December 30, 2021
    Applicant: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Vitor GUIZILINI, Adrien David GAIDON, Rares A. AMBRUS
  • Patent number: 11210802
    Abstract: System, methods, and other embodiments described herein relate to self-supervised training for monocular depth estimation. In one embodiment, a method includes filtering disfavored images from first training data to produce second training data that is a subsampled version of the first training data. The disfavored images correspond with anomaly maps within a set of depth maps. The first depth model is trained according to the first training data and generates the depth maps from the first training data after initially being trained with the first training data. The method includes training a second depth model according to a self-supervised training process using the second training data. The method includes providing the second depth model to infer distances from monocular images.
    Type: Grant
    Filed: March 24, 2020
    Date of Patent: December 28, 2021
    Assignee: Toyota Research Institute, Inc.
    Inventors: Vitor Guizilini, Rares A. Ambrus, Rui Hou, Jie Li, Adrien David Gaidon
  • Publication number: 20210398302
    Abstract: A method for scene reconstruction includes generating a depth estimate and a first pose estimate from a current image. The method also includes generating a second pose estimate based on the current image and one or more previous images in a sequence of images. The method further includes generating a warped image by warping each pixel in the current image based on the depth estimate, the first pose estimate, and the second pose estimate. The method still further includes controlling an action of an agent based on the second warped image.
    Type: Application
    Filed: June 22, 2020
    Publication date: December 23, 2021
    Applicant: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Vitor GUIZILINI, Adrien David GAIDON
  • Publication number: 20210398301
    Abstract: A method for monocular depth/pose estimation in a camera agnostic network is described. The method includes training a monocular depth model and a monocular pose model to learn monocular depth estimation and monocular pose estimation based on a target image and context images from monocular video captured by the camera agnostic network. The method also includes lifting 3D points from image pixels of the target image according to the context images. The method further includes projecting the lifted 3D points onto an image plane according to a predicted ray vector based on the monocular depth model, the monocular pose model, and a camera center of the camera agnostic network. The method also includes predicting a warped target image from a predicted depth map of the monocular depth model, a ray surface of the predicted ray vector, and a projection of the lifted 3D points according to the camera agnostic network.
    Type: Application
    Filed: June 17, 2020
    Publication date: December 23, 2021
    Applicant: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Vitor GUIZILINI, Sudeep PILLAI, Adrien David GAIDON, Rares A. AMBRUS, Igor VASILJEVIC
  • Publication number: 20210397855
    Abstract: A method includes capturing a two-dimensional (2D) image of an environment adjacent to an ego vehicle, the environment includes at least a dynamic object and a static object. The method also includes generating, via a depth estimation network, a depth map of the environment based on the 2D image, an accuracy of a depth estimate for the dynamic object in the depth map is greater than an accuracy of a depth estimate for the static object in the depth map. The method further includes generating a three-dimensional (3D) estimate of the environment based on the depth map and identifying a location of the dynamic object in the 3D estimate. The method additionally includes controlling an action of the ego vehicle based on the identified location.
    Type: Application
    Filed: June 23, 2020
    Publication date: December 23, 2021
    Applicant: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Vitor GUIZILINI, Adrien David GAIDON
  • Publication number: 20210390714
    Abstract: A two dimensional image can be received. A depth map can be produced, via a first neural network, from the two dimensional image. A bird's eye view image can be produced, via a second neural network, from the depth map. The second neural network can implement a machine learning algorithm that preserves spatial gradient information associated with one or more objects included in the depth map and causes a position of a pixel in an object, included in the bird's eye view image, to be represented by a differentiable function. Three dimensional objects can be detected, via a third neural network, from the two dimensional image, the bird's eye view image, and the spatial gradient information. A combination of the first neural network, the second neural network, and the third neural network can be end-to-end trainable and can be included in a perception system.
    Type: Application
    Filed: June 11, 2020
    Publication date: December 16, 2021
    Inventors: Vitor Guizilini, Rares A. Ambrus, Sudeep Pillai, Adrien David Gaidon
  • Publication number: 20210387648
    Abstract: Information that identifies a location can be received. In response to a receipt of the information that identifies the location, a file can be retrieved. The file can be for the location. The file can include image data and a set of node data. The set of node data can include information that identifies nodes in a neural network, information that identifies inputs of the nodes, and values of weights to be applied to the inputs. In response to a retrieval of the file, the weights can be applied to the inputs of the nodes and the image data can be received for the neural network. In response to an application of the weights and a receipt of the image data, the neural network can be executed to produce a digital map for the location. The digital map for the location can be transmitted to an automotive navigation system.
    Type: Application
    Filed: June 10, 2020
    Publication date: December 16, 2021
    Inventors: Vitor Guizilini, Rares A. Ambrus, Sudeep Pillai, Adrien David Gaidon
  • Publication number: 20210387649
    Abstract: A representation of a spatial structure of objects in an image can be determined. A mode of a neural network can be set, in response to a receipt of the image and a receipt of a facing direction of a camera that produced the image. The mode can account for the facing direction. The facing direction can include one or more of a first facing direction of a first camera disposed on a vehicle or a second facing direction of a second camera disposed on the vehicle. The neural network can be executed, in response to the mode having been set, to determine the representation of the spatial structure of the objects in the image. The representation of the spatial structure of the objects in the image can be transmitted to an automotive navigation system to determine a distance between the vehicle and a specific object in the image.
    Type: Application
    Filed: June 11, 2020
    Publication date: December 16, 2021
    Inventors: Sudeep Pillai, Vitor Guizilini, Rares A. Ambrus, Adrien David Gaidon
  • Publication number: 20210390718
    Abstract: A method for estimating depth is presented. The method includes generating, at each decoding layer of a neural network, decoded features of an input image. The method also includes upsampling, at each decoding layer, the decoded features to a resolution of a final output of the neural network. The method still further includes concatenating, at each decoding layer, the upsampled decoded features with features generated at a convolution layer of the neural network. The method additionally includes sequentially receiving the concatenated upsampled decoded features at a long-short term memory (LSTM) module of the neural network from each decoding layer. The method still further includes generating, at the LSTM module, a depth estimate of the input image after receiving the concatenated upsampled inverse depth estimate from a final layer of a decoder of the neural network. The method also includes controlling an action of an agent based on the depth estimate.
    Type: Application
    Filed: June 11, 2020
    Publication date: December 16, 2021
    Applicant: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Vitor GUIZILINI, Adrien David GAIDON
  • Publication number: 20210383553
    Abstract: A method includes generating a first warped image based on a pose and a depth estimated from a current image and a previous image in a sequence of images captured by a camera of the agent. The method also includes estimating a motion of dynamic object between the previous image and the target image. The method further includes generating a second warped image from the first warped image based on the estimated motion. The method still further includes controlling an action of an agent based on the second warped image.
    Type: Application
    Filed: June 4, 2020
    Publication date: December 9, 2021
    Applicant: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Vitor Guizilini, Adrien David Gaidon
  • Publication number: 20210383240
    Abstract: A neural architecture search system for generating a neural network includes one or more processors and a memory. The memory includes a generator module, a self-supervised training module, and an output module. The modules cause the one or more processors to generate a candidate neural network by a controller neural network, obtain training data, generate an output by the candidate neural network performing a specific task using the training data as an input, determine a loss value using a loss function that considers the output of the candidate neural network and at least a portion of the training data, adjust the one or more model weights of the controller neural network based on the loss value, and output the candidate neural network. The candidate neural network may be derived from the controller neural network and one or more model weights of the controller neural network.
    Type: Application
    Filed: June 9, 2020
    Publication date: December 9, 2021
    Inventors: Adrien David Gaidon, Jie Li, Vitor Guizilini
  • Publication number: 20210365733
    Abstract: A method for image reconstruction and domain transfer through an invertible depth network is described. The method includes training a first invertible depth network model using a first image dataset corresponding to a first geographic region to estimate a first depth map. The method also includes retraining the first invertible depth network model using a second image dataset corresponding to a second geographic region to estimate a second depth map. The method further includes reconstructing, by the first invertible depth network model, a third image dataset based on the second depth map. The method also includes training a second invertible depth network model using the third image dataset corresponding to the first geographic region and the second geographic region to estimate a third depth map.
    Type: Application
    Filed: May 20, 2020
    Publication date: November 25, 2021
    Applicant: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Vitor GUIZILINI, Adrien David GAIDON
  • Publication number: 20210365697
    Abstract: A system and method generate feature space data that may be used for object detection. The system includes one or more processors and a memory. The memory may include one or more modules having instructions that, when executed by the one or more processors, cause the one or more processors to obtain a two-dimension image of a scene, generate an output depth map based on the two-dimension image of the scene, generate a pseudo-LIDAR point cloud based on the output depth map, generate a bird's eye view (BEV) feature space based on the pseudo-LIDAR point cloud, and modify the BEV feature space to generate an improved BEV feature space using feature space neural network that was trained by using a training LIDAR feature space as a ground truth based on a LIDAR point cloud.
    Type: Application
    Filed: May 20, 2020
    Publication date: November 25, 2021
    Inventors: Victor Vaquero Gomez, Rares A. Ambrus, Vitor Guizilini, Adrien David Gaidon
  • Patent number: 11176709
    Abstract: System, methods, and other embodiments described herein relate to self-supervised training of a depth model for monocular depth estimation. In one embodiment, a method includes processing a first image of a pair according to the depth model to generate a depth map. The method includes processing the first image and a second image of the pair according to a pose model to generate a transformation that defines a relationship between the pair. The pair of images are separate frames depicting a scene of a monocular video. The method includes generating a monocular loss and a pose loss, the pose loss including at least a velocity component that accounts for motion of a camera between the training images. The method includes updating the pose model according to the pose loss and the depth model according to the monocular loss to improve scale awareness of the depth model in producing depth estimates.
    Type: Grant
    Filed: October 17, 2019
    Date of Patent: November 16, 2021
    Assignee: Toyota Research Institute, Inc.
    Inventors: Sudeep Pillai, Rares A. Ambrus, Vitor Guizilini, Adrien David Gaidon
  • Publication number: 20210350222
    Abstract: Systems and methods to improve machine learning by explicitly over-fitting environmental data obtained by an imaging system, such as a monocular camera are disclosed. The system includes training self-supervised depth and pose networks in monocular visual data collected from a certain area over multiple passes. Pose and depth networks may be trained by extracting data from multiple images of a single environment or trajectory, allowing the system to overfit the image data.
    Type: Application
    Filed: May 5, 2020
    Publication date: November 11, 2021
    Applicant: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Rares A. AMBRUS, Vitor GUIZILINI, Sudeep PILLAI, Adrien David GAIDON
  • Publication number: 20210350616
    Abstract: A method is presented. The method includes estimating an ego-motion of an agent based on a current image from a sequence of images and at least one previous image from the sequence of images. Each image in the sequence of images may be a two-dimensional (2D) image. The method also includes estimating a depth of the current image based the at least one previous image. The estimated depth accounts for a depth uncertainty measurement in the current image and the at least one previous image. The method further includes generating a three-dimensional (3D) reconstruction of the current image based on the estimated ego-motion and the estimated depth. The method still further includes controlling an action of the agent based on the three-dimensional reconstruction.
    Type: Application
    Filed: May 7, 2020
    Publication date: November 11, 2021
    Applicant: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Vitor GUIZILINI, Adrien David GAIDON
  • Publication number: 20210326601
    Abstract: A method for keypoint matching includes determining a first set of keypoints corresponding to a current environment of the agent. The method further includes determining a second set of keypoints from a pre-built map of the current environment. The method still further includes identifying matching pairs of keypoints from the first set of keypoints and the second set of keypoints based on geometrical similarities between respective keypoints of the first set of keypoints and the second set of keypoints. The method also includes determining a current location of the agent based on the identified matching pairs of keypoints. The method further includes controlling an action of the agent based on the current location.
    Type: Application
    Filed: April 15, 2021
    Publication date: October 21, 2021
    Applicant: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Jiexiong TANG, Rares Andrei AMBRUS, Jie LI, Vitor GUIZILINI, Sudeep PILLAI, Adrien David GAIDON
  • Publication number: 20210318140
    Abstract: A method for localization performed by an agent includes receiving a query image of a current environment of the agent captured by a sensor integrated with the agent. The method also includes receiving a target image comprising a first set of keypoints matching a second set of keypoints of the query image. The first set of keypoints may be generated based on a task specified for the agent. The method still further includes determining a current location based on the target image.
    Type: Application
    Filed: April 14, 2021
    Publication date: October 14, 2021
    Applicant: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Jiexiong TANG, Rares Andrei AMBRUS, Hanme KIM, Vitor GUIZILINI, Adrien David GAIDON, Xipeng WANG, Jeff WALLS, SR., Sudeep PILLAI