Patents by Inventor James Watson
James Watson has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12573153Abstract: A model predicts the geometry of both visible and occluded traversable surfaces from input images. The model may be trained from stereo video sequences, using camera poses, per-frame depth, and semantic segmentation to form training data, which is used to supervise an image to image network. In various embodiments, the model is applied to a single RGB image depicting a scene to produce information describing traversable space of the scene that includes occluded traversable. The information describing traversable space can include a segmentation mask of traversable space (both visible and occluded) and non-traversable space and a depth map indicating an estimated depth to traversable surfaces corresponding to each pixel determined to correspond to traversable space.Type: GrantFiled: July 5, 2023Date of Patent: March 10, 2026Assignee: Niantic Spatial, Inc.Inventors: James Watson, Michael David Firman, Aaron Monszpart, Gabriel J. Brostow
-
Publication number: 20250371761Abstract: A depth estimation model leverages a geometry-rendered depth map from a low-cost geometry model to provide depth hints. The model is trained and configured to input a time series of frames including a target frame. The time series of images are captured as monocular video data by a camera assembly. Applying the model includes: applying a feature encoder to extract visual features forming a feature map for each frame, matching features across the features maps forming a cost volume, obtaining a geometry-rendered depth map from the low-cost geometry model of the scene based on a pose of the target frame, modifying the cost volume based on the geometry-rendered depth map, and applying a depth decoder to the modified cost volume to generate the depth map for the target frame. A client device implementing the model may generate virtual content using the depth map to display the target frame of the scene augmented with the virtual content.Type: ApplicationFiled: May 29, 2024Publication date: December 4, 2025Inventors: Mohamed Amr Abdelfattah Sayed, Filippo Aleotti, James Watson, Zawar Imam Qureshi, Sara Alexandra Gomes Vicente, Michael David Firman, Guillermo Garcia-Hernando, Gabriel J. Brostow
-
Patent number: 12488413Abstract: A depth prediction model for predicting a depth map from an input image is disclosed. The depth prediction model leverages wavelet decomposition to minimize computations. The depth prediction model comprises a plurality of encoding layers, a coarse prediction layer, a plurality of decoding layers, and a plurality of inverse discrete wavelet transforms (IDWTs). The encoding layers are configured to input the image and to downsample the image into feature maps including a coarse feature map. The coarse depth prediction layer is configured to input the coarse feature map and to output a coarse depth map. The decoding layers are configured to input the feature maps and to predict wavelet coefficients based on the feature maps. The IDWTs are configured to upsample the coarse depth map based on the predicted wavelet coefficients to the final depth map at the same resolution as the input image.Type: GrantFiled: May 20, 2022Date of Patent: December 2, 2025Assignee: Niantic Spatial, Inc.Inventors: MichaëL Lalaina Ramamonjisoa, Michael David Firman, James Watson, Daniyar Turmukhambetov
-
Publication number: 20250252674Abstract: Depth maps are generated based on a sequence of posed images captured by a camera, the depth maps are fused into a truncated signed distance function (TSDF), and an initial estimate of 3-dimensional (3D) scene geometry is generated by extracting a 3D mesh via the TSDF. 3D embeddings are estimated for each vertex in the 3D mesh by mapping each vertex to a multi-view consistent plane embedding space such that vertices on a same plane map to nearly a same place in the embedding space. The vertices are clustered into 3D plane instances based on respective 3D embeddings and geometry information defined by the 3D mesh to create a planar representation of the scene. A location of a virtual element in a virtual world of an augmented reality game is determined based on the planar representation.Type: ApplicationFiled: February 3, 2025Publication date: August 7, 2025Inventors: James Watson, Filippo Aleotti, Mohamed Amr Abdelfattah Sayed, Zawar Imam Qureshi, Oisin Mac Aodha, Gabriel J. Brostow, Michael David Firman, Sara Alexandra Gomes Vicente
-
Patent number: 12250116Abstract: Techniques for implementing instance local boots in a cloud provider network via auxiliary domains are described. An auxiliary compute instance is launched and attached to a local storage device of the computing device. The auxiliary compute instance can pre-warm a boot volume by fetching its data from a remote system and storing the boot volume to the local storage device. The auxiliary compute instance is terminated, and a user compute device is launched into the same slot and connected to the local storage device. The user compute device utilizes the pre-warmed boot volume for launch.Type: GrantFiled: October 30, 2023Date of Patent: March 11, 2025Assignee: Amazon Technologies, Inc.Inventors: Sean Cameron, Aviv David Greenberg, James Watson
-
Publication number: 20250054255Abstract: A computer-implemented method is disclosed for generating scene reconstructions from image data. The method includes: receiving image data of a scene captured by a camera; inputting the image data of the scene into a scene reconstruction model; receiving, from the scene reconstruction model, a final spatial model of the scene, wherein the scene reconstruction model generates the final spatial model by: predicting a depth map for each image of the image data, extracting a feature map for each image of the image data, generating a first spatial model based on the predicted depth maps of the images, generating a second spatial model based on the extracted feature maps of the images, and determining the final spatial model by combining the first spatial model and the second spatial model; and providing functionality on a computing device related to the scene and based on the final spatial model.Type: ApplicationFiled: October 30, 2024Publication date: February 13, 2025Inventors: James Watson, Sara Alexandra Gomes Vicente, Oisin Mac Aodha, Clément Godard, Gabriel J. Brostow, Michael David Firman
-
Patent number: 12159358Abstract: A scene reconstruction model is disclosed that outputs a heightfield for a series of input images. The model, for each input image, predicts a depth map and extracts a feature map. The model builds a 3D model utilizing the predicted depth maps and camera poses for the images. The model raycasts the 3D model to determine a raw heightfield for the scene. The model utilizes the raw heightfield to sample features from the feature maps corresponding to positions on the heightfield. The model aggregates the sampled features into an aggregate feature map. The model regresses a refined heightfield based on the aggregate feature map. The model determines the final heightfield based on a combination of the raw heightfield and the refined heightfield. With the final heightfield, a client device may generate virtual content augmented on real-world images captured by the client device.Type: GrantFiled: December 14, 2022Date of Patent: December 3, 2024Assignee: Niantic, Inc.Inventors: James Watson, Sara Alexandra Gomes Vicente, Oisin MacAodha, Clément Godard, Gabriel J. Brostow, Michael David Firman
-
Patent number: 12103175Abstract: Described is a robotic apparatus (10) for investigating a confined area comprising: an articulated robot (20) for insertion into a confined area, the robotic apparatus further comprising a robot control system (30) for controlling the articulated robot. Further, the robot control system comprises a control unit (50), a robot driving means, a seal (70) for isolating the confined area from the external environment and at least one transmission member (80), wherein the control unit is configured to send control signals to the robot driving means, and the at least one transmission member extends from the robot driving means to connect to the articulated robot, the at least one transmission member extending through the seal.Type: GrantFiled: September 8, 2020Date of Patent: October 1, 2024Assignee: Process Vision Ltd.Inventors: Harry Thorpe, James Watson, Gisle-Andre Larsen, Vincent Strong, Simon White, Paul Stockwell
-
Patent number: 12080010Abstract: A multi-frame depth estimation model is disclosed. The model is trained and configured to receive an input image and an additional image. The model outputs a depth map for the input image based on the input image and the additional image. The model may extract a feature map for the input image and an additional feature map for the additional image. For each of a plurality of depth planes, the model warps the feature map to the depth plane based on relative pose between the input image and the additional image, the depth plane, and camera intrinsics. The model builds a cost volume from the warped feature maps for the plurality of depth planes. A decoder of the model inputs the cost volume and the input image to output the depth map.Type: GrantFiled: December 8, 2021Date of Patent: September 3, 2024Assignee: NIANTIC, INC.Inventors: James Watson, Oisin MacAodha, Victor Adrian Prisacariu, Gabriel J. Brostow, Michael David Firman
-
Publication number: 20240185478Abstract: A system generates augmented reality content by generating an occlusion mask via implicit depth estimation. The system receives input image(s) of a real-world environment captured by a camera assembly. The system generates a feature map from the input image(s), wherein the feature map comprises abstract features representing depth of object(s) in the real-world environment. The system generates an occlusion mask from the feature map and a depth map for the virtual object. The depth map for the virtual object indicates a depth of each pixel of the virtual object. The occlusion mask indicates pixel(s) of the virtual object that are occluded by an object in the real-world environment. The system generates the composite image based on a first input image at a current timestamp, the virtual object, and the occlusion mask. The composite image may then displayed on an electronic display.Type: ApplicationFiled: December 5, 2023Publication date: June 6, 2024Inventors: James Watson, Mohamed Sayed, Zawar Imam Qureshi, Gabriel J. Brostow, Sara Alexandra Gomes Vicente, Oisin Mac Aodha, Michael David Firman
-
Publication number: 20230360241Abstract: A depth estimation module may receive a reference image and a set of source images of an environment. The depth module may receive image features of the reference image and the set of source images. The depth module may generate a 4D feature volume that includes the image features and metadata associated with the reference image and set of source images. The image features and the metadata may be arranged in the feature volume based on relative pose distances between the reference image and the set of source images. The depth module may reduce the 4D feature volume to generate a 3D cost volume. The depth module may apply a depth estimation model to the 3D cost volume and data based on the reference image to generate a two dimensional (2D) depth map for the reference image.Type: ApplicationFiled: May 5, 2023Publication date: November 9, 2023Inventors: Mohamed Sayed, John Gibson, James Watson, Victor Adrian Prisacariu, Michael David Firman, Clément Godard
-
Publication number: 20230360339Abstract: A model predicts the geometry of both visible and occluded traversable surfaces from input images. The model may be trained from stereo video sequences, using camera poses, per-frame depth, and semantic segmentation to form training data, which is used to supervise an image to image network. In various embodiments, the model is applied to a single RGB image depicting a scene to produce information describing traversable space of the scene that includes occluded traversable. The information describing traversable space can include a segmentation mask of traversable space (both visible and occluded) and non-traversable space and a depth map indicating an estimated depth to traversable surfaces corresponding to each pixel determined to correspond to traversable space.Type: ApplicationFiled: July 5, 2023Publication date: November 9, 2023Inventors: James Watson, Michael David Firman, Aaron Monszpart, Gabriel J. Brostow
-
Patent number: 11805236Abstract: A computer system generates stereo image data from monocular images. The system generates depth maps for single images using a monocular depth estimation method. The system converts the depth maps to disparity maps and uses the disparity maps to generate additional images forming stereo pairs with the monocular images. The stereo pairs can be used to form a stereo image training data set for training various models, including depth estimation models or stereo matching models.Type: GrantFiled: May 11, 2021Date of Patent: October 31, 2023Assignee: NIANTIC, INC.Inventors: James Watson, Oisin MacAodha, Daniyar Turmukhambetov, Gabriel J. Brostow, Michael David Firman
-
Patent number: 11741675Abstract: A model predicts the geometry of both visible and occluded traversable surfaces from input images. The model may be trained from stereo video sequences, using camera poses, per-frame depth, and semantic segmentation to form training data, which is used to supervise an image to image network. In various embodiments, the model is applied to a single RGB image depicting a scene to produce information describing traversable space of the scene that includes occluded traversable. The information describing traversable space can include a segmentation mask of traversable space (both visible and occluded) and non-traversable space and a depth map indicating an estimated depth to traversable surfaces corresponding to each pixel determined to correspond to traversable space.Type: GrantFiled: March 5, 2021Date of Patent: August 29, 2023Assignee: Niantic, Inc.Inventors: James Watson, Michael David Firman, Aron Monszpart, Gabriel J. Brostow
-
Patent number: 11711508Abstract: A method for training a depth estimation model with depth hints is disclosed. For each image pair: for a first image, a depth prediction is determined by the depth estimation model and a depth hint is obtained; the second image is projected onto the first image once to generate a synthetic frame based on the depth prediction and again to generate a hinted synthetic frame based on the depth hint; a primary loss is calculated with the synthetic frame; a hinted loss is calculated with the hinted synthetic frame; and an overall loss is calculated for the image pair based on a per-pixel determination of whether the primary loss or the hinted loss is smaller, wherein if the hinted loss is smaller than the primary loss, then the overall loss includes the primary loss and a supervised depth loss between depth prediction and depth hint. The depth estimation model is trained by minimizing the overall losses for the image pairs.Type: GrantFiled: March 16, 2022Date of Patent: July 25, 2023Assignee: Niantic, Inc.Inventors: James Watson, Michael David Firman, Gabriel J. Brostow, Daniyar Turmukhambetov
-
Publication number: 20230196690Abstract: A scene reconstruction model is disclosed that outputs a heightfield for a series of input images. The model, for each input image, predicts a depth map and extracts a feature map. The model builds a 3D model utilizing the predicted depth maps and camera poses for the images. The model raycasts the 3D model to determine a raw heightfield for the scene. The model utilizes the raw heightfield to sample features from the feature maps corresponding to positions on the heightfield. The model aggregates the sampled features into an aggregate feature map. The model regresses a refined heightfield based on the aggregate feature map. The model determines the final heightfield based on a combination of the raw heightfield and the refined heightfield. With the final heightfield, a client device may generate virtual content augmented on real-world images captured by the client device.Type: ApplicationFiled: December 14, 2022Publication date: June 22, 2023Inventors: James Watson, Sara Alexandra Gomes Vicente, Oisin Mac Aodha, Clément Godard, Gabriel J. Brostow, Michael David Firman
-
Publication number: 20220410374Abstract: Described is a robotic apparatus (10) for investigating a confined area comprising: an articulated robot (20) for insertion into a confined area, the robotic apparatus further comprising a robot control system (30) for controlling the articulated robot. Further, the robot control system comprises a control unit (50), a robot driving means, a seal (70) for isolating the confined area from the external environment and at least one transmission member (80), wherein the control unit is configured to send control signals to the robot driving means, and the at least one transmission member extends from the robot driving means to connect to the articulated robot, the at least one transmission member extending through the seal.Type: ApplicationFiled: September 8, 2020Publication date: December 29, 2022Inventors: Harry Thorpe, James Watson, Gisle-Andre Larsen, Vincent Strong, Simon White, Paul Stockwell
-
Publication number: 20220383449Abstract: A depth prediction model for predicting a depth map from an input image is disclosed. The depth prediction model leverages wavelet decomposition to minimize computations. The depth prediction model comprises a plurality of encoding layers, a coarse prediction layer, a plurality of decoding layers, and a plurality of inverse discrete wavelet transforms (IDWTs). The encoding layers are configured to input the image and to downsample the image into feature maps including a coarse feature map. The coarse depth prediction layer is configured to input the coarse feature map and to output a coarse depth map. The decoding layers are configured to input the feature maps and to predict wavelet coefficients based on the feature maps. The IDWTs are configured to upsample the coarse depth map based on the predicted wavelet coefficients to the final depth map at the same resolution as the input image.Type: ApplicationFiled: May 20, 2022Publication date: December 1, 2022Inventors: Michaël Lalaina Ramamonjisoa, Michael David Firman, James Watson, Daniyar Turmukhambetov
-
Publication number: 20220210392Abstract: A method for training a depth estimation model with depth hints is disclosed. For each image pair: for a first image, a depth prediction is determined by the depth estimation model and a depth hint is obtained; the second image is projected onto the first image once to generate a synthetic frame based on the depth prediction and again to generate a hinted synthetic frame based on the depth hint; a primary loss is calculated with the synthetic frame; a hinted loss is calculated with the hinted synthetic frame; and an overall loss is calculated for the image pair based on a per-pixel determination of whether the primary loss or the hinted loss is smaller, wherein if the hinted loss is smaller than the primary loss, then the overall loss includes the primary loss and a supervised depth loss between depth prediction and depth hint. The depth estimation model is trained by minimizing the overall losses for the image pairs.Type: ApplicationFiled: March 16, 2022Publication date: June 30, 2022Inventors: James Watson, Michael David Firman, Gabriel J. Brostow, Daniyar Turmukhambetov
-
Publication number: 20220189049Abstract: A multi-frame depth estimation model is disclosed. The model is trained and configured to receive an input image and an additional image. The model outputs a depth map for the input image based on the input image and the additional image. The model may extract a feature map for the input image and an additional feature map for the additional image. For each of a plurality of depth planes, the model warps the feature map to the depth plane based on relative pose between the input image and the additional image, the depth plane, and camera intrinsics. The model builds a cost volume from the warped feature maps for the plurality of depth planes. A decoder of the model inputs the cost volume and the input image to output the depth map.Type: ApplicationFiled: December 8, 2021Publication date: June 16, 2022Inventors: James Watson, Oisin Mac Aodha, Victor Adrian Prisacariu, Gabriel J. Brostow, Michael David Firman