Patents by Inventor Rares A. Ambrus

Rares A. Ambrus has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11615544
    Abstract: Systems and methods for map construction using a video sequence captured on a camera of a vehicle in an environment, comprising: receiving a video sequence from the camera, the video sequence including a plurality of image frames capturing a scene of the environment of the vehicle; using a neural camera model to predict a depth map and a ray surface for the plurality of image frames in the received video sequence; and constructing a map of the scene of the environment based on image data captured in the plurality of frames and depth information in the predicted depth maps.
    Type: Grant
    Filed: September 15, 2020
    Date of Patent: March 28, 2023
    Assignee: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Vitor Guizilini, Igor Vasiljevic, Rares A. Ambrus, Sudeep Pillai, Adrien Gaidon
  • Publication number: 20230080638
    Abstract: Systems and methods described herein relate to self-supervised learning of camera intrinsic parameters from a sequence of images. One embodiment produces a depth map from a current image frame captured by a camera; generates a point cloud from the depth map using a differentiable unprojection operation; produces a camera pose estimate from the current image frame and a context image frame; produces a warped point cloud based on the camera pose estimate; generates a warped image frame from the warped point cloud using a differentiable projection operation; compares the warped image frame with the context image frame to produce a self-supervised photometric loss; updates a set of estimated camera intrinsic parameters on a per-image-sequence basis using one or more gradients from the self-supervised photometric loss; and generates, based on a converged set of learned camera intrinsic parameters, a rectified image frame from an image frame captured by the camera.
    Type: Application
    Filed: March 11, 2022
    Publication date: March 16, 2023
    Applicants: Toyota Research Institute, Inc., Toyota Technological Institute at Chicago
    Inventors: Vitor Guizilini, Adrien David Gaidon, Rares A. Ambrus, Igor Vasiljevic, Jiading Fang, Gregory Shakhnarovich, Matthew R. Walter
  • Publication number: 20230037731
    Abstract: Systems and methods for self-supervised depth estimation using image frames captured from cameras, may include: receiving a first image captured by a first camera while the camera is mounted at a first location, the first image comprising pixels representing a first scene of an environment of a vehicle; receiving a reference image captured by a second camera while the second camera is mounted at a second location, the reference image comprising pixels representing a second scene of the environment; warping the first image to a perspective of the second camera at the second location on the vehicle to arrive at a warped first image; projecting the warped first image onto the reference image; determining a loss based on the projection; and updating predicted depth values for the first image.
    Type: Application
    Filed: October 13, 2022
    Publication date: February 9, 2023
    Inventors: VITOR GUIZILINI, IGOR VASILJEVIC, RARES A. AMBRUS, ADRIEN GAIDON
  • Publication number: 20230029993
    Abstract: Systems, methods, computer-readable media, techniques, and methodologies are disclosed for generating vehicle controls and/or driving policies based on machine learning models that utilize intermediate representation of driving scenes as well as demonstrations (e.g. by behavioral cloning). An intermediate representation that includes inductive biases about the structure of driving scenes for a vehicle can be generated by a self-supervised first machine learning model. A driving policy for the vehicle can be determined by a second machine learning model trained by a set of expert demonstrations and based on the intermediate representation. The expert demonstrations can include labelled data. An appropriate vehicle action may then be determined based on the driving policy. A control signal indicative of this vehicle action may then be output to cause an autonomous vehicle, for example, to implement the appropriate vehicle action.
    Type: Application
    Filed: July 28, 2021
    Publication date: February 2, 2023
    Inventors: ALBERT ZHAO, RARES A. AMBRUS, ADRIEN D. GAIDON
  • Patent number: 11557051
    Abstract: System, methods, and other embodiments described herein relate to training a depth model for joint depth completion and prediction. In one arrangement, a method includes generating depth features from sparse depth data according to a sparse auxiliary network (SAN) of a depth model. The method includes generating a first depth map from a monocular image and a second depth map from the monocular image and the depth features using the depth model. The method includes generating a depth loss from the second depth map and the sparse depth data and an image loss from the first depth map and the sparse depth data. The method includes updating the depth model including the SAN using the depth loss and the image loss.
    Type: Grant
    Filed: January 21, 2021
    Date of Patent: January 17, 2023
    Assignee: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Vitor Guizilini, Rares A. Ambrus, Adrien David Gaidon
  • Publication number: 20220414974
    Abstract: Systems and methods described herein relate to reconstructing a scene in three dimensions from a two-dimensional image. One embodiment processes an image using a detection transformer to detect an object in the scene and to generate a NOCS map of the object and a background depth map; uses MLPs to relate the object to a differentiable database of object priors (PriorDB); recovers, from the NOCS map, a partial 3D object shape; estimates an initial object pose; fits a PriorDB object prior to align in geometry and appearance with the partial 3D shape to produce a complete shape and refines the initial pose estimate; generates an editable and re-renderable 3D scene reconstruction based, at least in part, on the complete shape, the refined pose estimate, and the depth map; and controls the operation of a robot based, at least in part, on the editable and re-renderable 3D scene reconstruction.
    Type: Application
    Filed: March 16, 2022
    Publication date: December 29, 2022
    Applicants: Toyota Research Institute, Inc., Massachusetts Institute of Technology, The Board of Trustees of the Leland Stanford Junior University
    Inventors: Sergey Zakharov, Wadim Kehl, Vitor Guizilini, Adrien David Gaidon, Rares A. Ambrus, Dennis Park, Joshua Tenenbaum, Jiajun Wu, Fredo Durand, Vincent Sitzmann
  • Patent number: 11531892
    Abstract: Systems and methods for detecting and matching keypoints between different views of a scene are disclosed herein. One embodiment acquires first and second images; subdivides the first and second images into first and second pluralities of cells, respectively; processes both pluralities of cells using a neural keypoint detection network to identify a first keypoint for a particular cell in the first plurality of cells and a second keypoint for a particular cell in the second plurality of cells, at least one of the first and second keypoints lying in a cell other than the particular cell in the first or second plurality of cells for which it was identified; and classifies the first keypoint and the second keypoint as a matching keypoint pair based, at least in part, on a comparison between a first descriptor associated with the first keypoint and a second descriptor associated with the second keypoint.
    Type: Grant
    Filed: March 31, 2020
    Date of Patent: December 20, 2022
    Assignee: Toyota Research Institute, Inc.
    Inventors: Jiexiong Tang, Rares A. Ambrus, Vitor Guizilini, Sudeep Pillai, Hanme Kim
  • Publication number: 20220392083
    Abstract: Systems and methods described herein relate to jointly training a machine-learning-based monocular optical flow, depth, and scene flow estimator. One embodiment processes a pair of temporally adjacent monocular image frames using a first neural network structure to produce a first optical flow estimate; processes the pair of temporally adjacent monocular image frames using a second neural network structure to produce an estimated depth map and an estimated scene flow; processes the estimated depth map and the estimated scene flow using the second neural network structure to produce a second optical flow estimate; and imposes a consistency loss between the first optical flow estimate and the second optical flow estimate that minimizes a difference between the first optical flow estimate and the second optical flow estimate to improve performance of the first neural network structure in estimating optical flow and the second neural network structure in estimating depth and scene flow.
    Type: Application
    Filed: September 29, 2021
    Publication date: December 8, 2022
    Inventors: Vitor Guizilini, Rares A. Ambrus, Kuan-Hui Lee, Adrien David Gaidon
  • Publication number: 20220392089
    Abstract: Systems and methods described herein relate to jointly training a machine-learning-based monocular optical flow, depth, and scene flow estimator. One embodiment processes a pair of temporally adjacent monocular image frames using a first neural network structure to produce an optical flow estimate and to extract, from at least one image frame in the pair of temporally adjacent monocular image frames, a set of encoded image context features; triangulates the optical flow estimate to generate a depth map; extracts a set of encoded depth context features from the depth map using a depth context encoder; and combines the set of encoded image context features and the set of encoded depth context features to improve performance of a second neural network structure in estimating depth and scene flow.
    Type: Application
    Filed: September 29, 2021
    Publication date: December 8, 2022
    Inventors: Vitor Guizilini, Rares A. Ambrus, Kuan-Hui Lee, Adrien David Gaidon
  • Patent number: 11508080
    Abstract: Systems and methods for self-supervised learning for visual odometry using camera images captured on a camera, may include: using a key point network to learn a keypoint matrix for a target image and a context image captured by the camera; using the learned descriptors to estimate correspondences between the target image and the context image; based on the keypoint correspondences, lifting a set of 2D keypoints to 3D, using a learned neural camera model; estimating a transformation between the target image and the context image using 3D-2D keypoint correspondences; and projecting the 3D keypoints into the context image using the learned neural camera model.
    Type: Grant
    Filed: September 15, 2020
    Date of Patent: November 22, 2022
    Assignee: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Vitor Guizilini, Igor Vasiljevic, Rares A. Ambrus, Sudeep Pillai, Adrien Gaidon
  • Patent number: 11501490
    Abstract: The embodiments disclosed herein describe vehicles, systems and methods for multi-resolution fusion of pseudo-LiDAR features. In one aspect, a method for multi-resolution fusion of pseudo-LiDAR features includes receiving image data from one or more image sensors, generating a point cloud from the image data, generating, from the point cloud, a first bird's eye view map having a first resolution, generating, from the point cloud, a second bird's eye view map having a second resolution, and generating a combined bird's eye view map by combining features of the first bird's eye view map with features from the second bird's eye view map.
    Type: Grant
    Filed: July 28, 2020
    Date of Patent: November 15, 2022
    Assignee: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Victor Vaquero Gomez, Rares A. Ambrus, Vitor Guizilini, Adrien D. Gaidon
  • Patent number: 11494927
    Abstract: Systems and methods for self-supervised depth estimation using image frames captured from a vehicle-mounted camera, may include: receiving a first image captured by the camera while the camera is mounted at a first location on the vehicle, the source image comprising pixels representing a scene of the environment of the vehicle; receiving a reference image captured by the camera while the camera is mounted at a second location on the vehicle, the reference image comprising pixels representing a scene of the environment; predicting a depth map for the first image comprising predicted depth values for pixels of the first image; warping the first image to a perspective of the camera at the second location on the vehicle to arrive at a warped first image; projecting the warped first image onto the source image; determining a loss based on the projection; and updating predicted depth values for the first image.
    Type: Grant
    Filed: September 15, 2020
    Date of Patent: November 8, 2022
    Assignee: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Vitor Guizilini, Igor Vasiljevic, Rares A. Ambrus, Adrien Gaidon
  • Publication number: 20220343096
    Abstract: A method for 3D object detection is described. The method includes detecting semantic keypoints from monocular images of a video stream capturing a 3D object. The method also includes inferring a 3D bounding box of the 3D object corresponding to the detected semantic vehicle keypoints. The method further includes scoring the inferred 3D bounding box of the 3D object. The method also includes detecting the 3D object according to a final 3D bounding box generated based on the scoring of the inferred 3D bounding box.
    Type: Application
    Filed: April 27, 2021
    Publication date: October 27, 2022
    Applicant: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Arjun BHARGAVA, Haofeng CHEN, Adrien David GAIDON, Rares A. AMBRUS, Sudeep PILLAI
  • Publication number: 20220309695
    Abstract: Systems and methods described herein relate to training a machine-learning-based monocular depth estimator. One embodiment selects a virtual image in a virtual dataset, the virtual dataset including a plurality of computer-generated virtual images; generates, from the virtual image in accordance with virtual-camera intrinsics, a point cloud in three-dimensional space based on ground-truth depth information associated with the virtual image; reprojects the point cloud back to two-dimensional image space in accordance with real-world camera intrinsics to generate a transformed virtual image; and trains the machine-learning-based monocular depth estimator, at least in part, using the transformed virtual image.
    Type: Application
    Filed: March 25, 2021
    Publication date: September 29, 2022
    Inventors: Vitor Guizilini, Rares A. Ambrus, Adrien David Gaidon, Jie Li
  • Publication number: 20220300768
    Abstract: System, methods, and other embodiments described herein relate to evaluating a perception network in relation to the accuracy of depth estimates and object detections. In one embodiment, a method includes segmenting range data associated with an image according to bounding boxes of objects identified in the image to produce masked data. The method includes comparing the masked data with corresponding depth estimates in the depth map according to an evaluation mask that correlates the depth estimates with the depth map. The method includes providing a metric that quantifies the comparing to assess a network that generated the depth map and the bounding boxes.
    Type: Application
    Filed: June 25, 2021
    Publication date: September 22, 2022
    Inventors: Rares A. Ambrus, Dennis Park, Vitor Guizilini, Jie Li, Adrien David Gaidon
  • Publication number: 20220300748
    Abstract: A method for tracking an object performed by an object tracking system includes encoding locations of visible objects in an environment captured in a current frame of a sequence of frames. The method also includes generating a representation of a current state of the environment based on an aggregation of the encoded locations and an encoded location of each object visible in one or more frames of the sequence of frames occurring prior to the current frame. The method further includes predicting a location of an object occluded in the current frame based on a comparison of object centers decoded from the representation of the current state to object centers saved from each prior representation associated with a different respective frame of the sequence of frames occurring prior to the current frame. The method still further includes adjusting a behavior of an autonomous agent in response to identifying the location of the occluded object.
    Type: Application
    Filed: March 16, 2021
    Publication date: September 22, 2022
    Applicant: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Pavel V. TOKMAKOV, Rares A. AMBRUS, Wolfram BURGARD, Adrien David GAIDON
  • Publication number: 20220301202
    Abstract: System, methods, and other embodiments described herein relate to performing depth estimation and object detection using a common network architecture. In one embodiment, a method includes generating, using a backbone of a combined network, a feature map at multiple scales from an input image. The method includes decoding, using a top-down pathway of the combined network, the feature map to provide features at the multiple scales. The method includes generating, using a head of the combined network, a depth map from the features for a scene depicted in the input image, and bounding boxes identifying objects in the input image.
    Type: Application
    Filed: May 28, 2021
    Publication date: September 22, 2022
    Inventors: Dennis Park, Rares A. Ambrus, Vitor Guizilini, Jie Li, Adrien David Gaidon
  • Publication number: 20220300746
    Abstract: Described are systems and methods for self-learned label refinement of a training set. In on example, a system includes a processor and a memory having a training set generation module that causes the processor to train a model using an image as an input to the model and 2D bounding based on 3D bounding boxes as ground truths, select a first subset from predicted 2D bounding boxes previously outputted by the model, retrain the model using the image as the input and the first subset as ground truths, select a second set of predicted 2D bounding boxes previously outputted by the model, and generate the training set by selecting the 3D bounding boxes from a master set of 3D bounding boxes that have corresponding 2D bounding boxes that form the second subset.
    Type: Application
    Filed: May 25, 2021
    Publication date: September 22, 2022
    Inventors: Dennis Park, Rares A. Ambrus, Vitor Guizilini, Jie Li, Adrien David Gaidon
  • Publication number: 20220301203
    Abstract: System, methods, and other embodiments described herein relate to a manner of training a depth prediction system using bounding boxes. In one embodiment, a method includes segmenting an image to mask areas beyond bounding boxes and identify unmasked areas within the bounding boxes. The method also includes training a depth model using depth losses from comparing weighted points associated with pixels of the image within the unmasked areas to ground-truth depth. The method also includes providing the depth model for object detection.
    Type: Application
    Filed: July 23, 2021
    Publication date: September 22, 2022
    Inventors: Rares A. Ambrus, Dennis Park, Vitor Guizilini, Jie Li, Adrien David Gaidon
  • Patent number: 11436743
    Abstract: System, methods, and other embodiments described herein relate to semi-supervised training of a depth model using a neural camera model that is independent of a camera type. In one embodiment, a method includes acquiring training data including at least a pair of training images and depth data associated with the training images. The method includes training the depth model using the training data to generate a self-supervised loss from the pair of training images and a supervised loss from the depth data. Training the depth model includes learning the camera type by generating, using a ray surface model, a ray surface that approximates an image character of the training images as produced by a camera having the camera type. The method includes providing the depth model to infer depths from monocular images in a device.
    Type: Grant
    Filed: June 19, 2020
    Date of Patent: September 6, 2022
    Assignee: Toyota Research Institute, Inc.
    Inventors: Vitor Guizilini, Igor Vasiljevic, Rares A. Ambrus, Sudeep Pillai, Adrien David Gaidon