Patents by Inventor Sergey Zakharov

Sergey Zakharov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12149489
    Abstract: Described herein is a technique for processing a received media content item (e.g., a message), received at a messaging application of a first end-user of a messaging service, to generate a selection of some predetermined number of recommended stickers. The recommended stickers are then presented in a user interface to the first end-user, allowing the first end-user to select a sticker for use in replying to the received media content item. To generate the selection of recommended stickers, in response to receiving the media content item, the messaging application processes the media content item to identify specific attributes and characteristics (e.g., text included with the message, stickers used with the message, and other contextual metadata). The identified attributes and characteristics of the received message are then processed by a scoring model to identify the predetermined number of stickers for presenting in the reply interface as recommended reply stickers.
    Type: Grant
    Filed: March 14, 2023
    Date of Patent: November 19, 2024
    Assignee: SNAP INC.
    Inventors: Roman Golobokov, Sergey Smetanin, Sofya Savinova, Aleksandr Zakharov
  • Patent number: 12136251
    Abstract: In accordance with one embodiment of the present disclosure, a method includes receiving an input image having an object and a background, intrinsically decomposing the object and the background into an input image data having a set of features, augmenting the input image data with a 2.5D differentiable renderer for each feature of the set of features to create a set of augmented images, and compiling the input image and the set of augmented images into a training data set for training a downstream task network.
    Type: Grant
    Filed: January 19, 2022
    Date of Patent: November 5, 2024
    Assignee: Toyota Research Institute, Inc.
    Inventors: Sergey Zakharov, Rares Ambrus, Vitor Guizilini, Adrien Gaidon
  • Publication number: 20240320843
    Abstract: Aspects of the present disclosure provide techniques for category and joint agnostic reconstruction of articulated objects. An example method includes obtaining images of an environment having objects and generating, using a trained AI encoder, first information associated with the images based at least in part on the images, the first information comprising a plurality of joint codes and a plurality of shape codes associated with the images. The method further includes generating, using a trained AI decoder, second information associated with the objects based at least in part on the plurality of joint codes and the plurality of shape codes, the second information comprising shape information, one or more joint types, and one or more joint states corresponding to at least one of the objects. The method further includes storing the second information in memory.
    Type: Application
    Filed: February 14, 2024
    Publication date: September 26, 2024
    Applicants: Toyota Research Institute, Inc., Toyota Jidosha Kabushiki Kaisha, The Board of Trustees of the Leland Stanford Junior University
    Inventors: Thomas KOLLAR, Nick HEPPERT, Muhammed Zubair IRSHAD, Rares A AMBRUS, Katherine LIU, Jeannette BOHG, Sergey ZAKHAROV
  • Publication number: 20240314091
    Abstract: Described herein is a technique for processing a received media content item (e.g., a message), received at a messaging application of a first end-user of a messaging service, to generate a selection of some predetermined number of recommended stickers. The recommended stickers are then presented in a user interface to the first end-user, allowing the first end-user to select a sticker for use in replying to the received media content item. To generate the selection of recommended stickers, in response to receiving the media content item, the messaging application processes the media content item to identify specific attributes and characteristics (e.g., text included with the message, stickers used with the message, and other contextual metadata). The identified attributes and characteristics of the received message are then processed by a scoring model to identify the predetermined number of stickers for presenting in the reply interface as recommended reply stickers.
    Type: Application
    Filed: March 14, 2023
    Publication date: September 19, 2024
    Inventors: Roman Golobokov, Sergey Smetanin, Sofya Savinova, Aleksandr Zakharov
  • Publication number: 20240303923
    Abstract: Systems, methods, and other embodiments described herein relate to using octrees and trilinear interpretation to generate field-specific representations. In one embodiment, a method includes acquiring a latent vector describing an object. The method includes generating an octree from the latent vector according to a recursive network, the octree representing the object at a desired level-of-detail (LoD). The method includes extracting features from the octree at separate resolutions. The method includes providing a field as a representation of the object according to the features.
    Type: Application
    Filed: August 31, 2023
    Publication date: September 12, 2024
    Applicants: Toyota Research Institute, Inc., Toyota Jidosha Kabushiki Kaisha
    Inventors: Sergey Zakharov, Katherine Y Liu, Adrien David Gaidon, Rares A Ambrus
  • Publication number: 20240296606
    Abstract: Examples disclosed herein describe techniques related to automated image generation in an interaction system. An image generation request is received from a first user device associated with a first user of an interaction system. The image generation request comprises a text prompt. Responsive to receiving the image generation request, an image is automatically generated by an automated text-to-image generator, based on the text prompt. The image is caused to be presented on the first user device. An indication of user input to select the image is received from the user device. Responsive to receiving the indication of the user input to select the image, the image is associated with the first user within the interaction system, and a second user of the interaction system is enabled to be presented with the image.
    Type: Application
    Filed: March 1, 2023
    Publication date: September 5, 2024
    Inventors: Sergey Smetanin, Arnab Ghosh, Pavel Savchenkov, Jian Ren, Sergey Tulyakov, Ivan Babanin, Timur Zakirov, Roman Golobokov, Aleksandr Zakharov, Dor Ayalon, Nikita Demidov, Vladimir Gordienko, Daniel Moreno, Nikita Belosludtcev, Sofya Savinova
  • Publication number: 20240295953
    Abstract: Examples disclosed herein describe prompt modification techniques for automated image generation. An image generation request comprising a base prompt is received from a user device. A plurality of prompt modifiers is identified. A processor-implemented scoring engine determines, for each prompt modifier, a modifier score. The modifier score for each prompt modifier is associated with the base prompt. One or more of the prompt modifiers are automatically selected based on the modifier scores. A modified prompt is generated. The modified prompt is based on the base prompt and the one or more selected prompt modifiers. The modified prompt is provided as input to an automated image generator to generate an image, and the image is caused to be presented on the user device.
    Type: Application
    Filed: March 1, 2023
    Publication date: September 5, 2024
    Inventors: Aleksandr Zakharov, Sergey Smetanin, Arnab Ghosh, Pavel Savchenkov
  • Publication number: 20240249426
    Abstract: A method for dynamic modeling and manipulation of multi-object scenes is described. The method includes using object-centric neural implicit scattering functions (OSFs) as object representations in a model-predictive control (MPC) framework for the multi-object scenes. The method also includes modeling a per-object light transport to enable compositional scene re-rendering under object rearrangement and varying lighting conditions. The method further includes applying inverse parameter estimation and graph neural network (GNN) dynamics models to estimate initial object poses and a light position in the multi-object scene. The method also includes manipulating an object perceived in the multi-object scene according to the applying of the inverse parameter estimation and the GNN dynamics models.
    Type: Application
    Filed: December 13, 2023
    Publication date: July 25, 2024
    Applicants: TOYOTA RESEARCH INSTITUTE, INC., TOYOTA JIDOSHA KABUSHIKI KAISHA, THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY
    Inventors: Stephen TIAN, Yancheng CAI, Hong-Xing YU, Sergey ZAKHAROV, Katherine LIU, Adrien David GAIDON, Yunzhu LI, Jiajun WU
  • Patent number: 12045998
    Abstract: In accordance with one embodiment of the present disclosure, a method includes receiving a set of images, each image depicting a view of a scene, generating sparse depth data from each image of the set of images, training a monocular depth estimation model with the sparse depth data, generating, with the trained monocular depth estimation model, depth data and uncertainty data for each image, training a NeRF model with the set of images, wherein the training is constrained by the depth data and uncertainty data, and rendering, with the trained NeRF model, a new image having a new view of the scene.
    Type: Grant
    Filed: May 18, 2022
    Date of Patent: July 23, 2024
    Assignee: TOYOTA RESEARCH INSTITUTE, INC.
    Inventors: Rares Ambrus, Sergey Zakharov, Vitor C. Guizilini, Adrien Gaidon
  • Publication number: 20240171724
    Abstract: The present disclosure provides neural fields for sparse novel view synthesis of outdoor scenes. Given just a single or a few input images from a novel scene, the disclosed technology can render new 360° views of complex unbounded outdoor scenes. This can be achieved by constructing an image-conditional triplanar representation to model the 3D surrounding from various perspectives. The disclosed technology can generalize across novel scenes and viewpoints for complex 360° outdoor scenes.
    Type: Application
    Filed: October 16, 2023
    Publication date: May 23, 2024
    Applicants: TOYOTA RESEARCH INSTITUTE, INC., TOYOTA JIDOSHA KABUSHIKI KAISHA
    Inventors: MUHAMMAD ZUBAIR IRSHAD, SERGEY ZAKHAROV, KATHERINE Y. LIU, VITOR GUIZILINI, THOMAS KOLLAR, ADRIEN D. GAIDON, RARES A. AMBRUS
  • Publication number: 20240161471
    Abstract: Systems and methods described herein support enhanced computer vision capabilities which may be applicable to, for example, autonomous vehicle operation. An example method includes generating, through training, a shared latent space based on (i) image data that include multiple images, where each image has a different viewing frame of a scene, and (ii) first and second types of embeddings, and training a decoder based on the first type of embeddings. The method also includes generating an embedding based on the first type of embeddings that is representative of a novel viewing frame of the scene, decoding, with the decoder, the shared latent space using cross-attention with the generated embedding, and generating the novel viewing frame of the scene based on an output of the decoder.
    Type: Application
    Filed: August 3, 2023
    Publication date: May 16, 2024
    Applicants: Toyota Research Institute, Inc., Massachusetts Institute of Technology, Toyota Jidosha Kabushiki Kaisha
    Inventors: Vitor Guizilini, Rares A. Ambrus, Jiading Fang, Sergey Zakharov, Vincent Sitzmann, Igor Vasiljevic, Adrien Gaidon
  • Publication number: 20240161510
    Abstract: Systems and methods described herein support enhanced computer vision capabilities which may be applicable to, for example, autonomous vehicle operation. An example method includes An example method includes training a shared latent space and a first decoder based on first image data that includes multiple images, and training the shared latent space and a second decoder based on second image data that includes multiple images. The method also includes generating a volumetric embedding that is representative of a novel viewing frame the first scene. Further, the method includes decoding, with the first decoders, the shared latent space with the volumetric embedding, and generating the novel viewing frame of the first scene based on the output of the first decoder.
    Type: Application
    Filed: August 3, 2023
    Publication date: May 16, 2024
    Applicants: Toyota Research Institute, Inc., Massachusetts Institute of Technology, Toyota Jidosha Kabushiki Kaisha
    Inventors: Vitor Guizilini, Rares A. Ambrus, Jiading Fang, Sergey Zakharov, Vincent Sitzmann, Igor Vasiljevic, Adrien Gaidon
  • Publication number: 20240161389
    Abstract: Systems and methods described herein support enhanced computer vision capabilities which may be applicable to, for example, autonomous vehicle operation. An example method includes generating a latent space and a decoder based on image data that includes multiple images, where each image has a different viewing frame of a scene. The method also includes generating a volumetric embedding that is representative of a novel viewing frame of the scene. The method includes decoding, with the decoder, the latent space using cross-attention with the volumetric embedding, and generating a novel viewing frame of the scene based on an output of the decoder.
    Type: Application
    Filed: August 3, 2023
    Publication date: May 16, 2024
    Applicants: Toyota Research Institute, Inc., Massachusetts Institute of Technology, Toyota Jidosha Kabushiki Kaisha
    Inventors: Vitor Guizilini, Rares A. Ambrus, Jiading Fang, Sergey Zakharov, Vincent Sitzmann, Igor Vasiljevic, Adrien Gaidon
  • Publication number: 20240153197
    Abstract: An example method includes generating embeddings of image data that includes multiple images, where each image has a different viewpoints of a scene, generating a latent space and a decoder, wherein the decoder receives embeddings as input to generate an output viewpoint, for each viewpoint in the image data, determining a volumetric rendering view synthesis loss and a multi-view photometric loss, and applying an optimization algorithm to the latent space and the decoder over a number of epochs until the volumetric rendering view synthesis loss is within a volumetric threshold and the multi-view photometric loss is within a multi-view threshold.
    Type: Application
    Filed: August 3, 2023
    Publication date: May 9, 2024
    Applicants: Toyota Research Institute, Inc., Massachusetts Institute of Technology, Toyota Jidosha Kabushiki Kaisha
    Inventors: Vitor Guizilini, Rares A. Ambrus, Jiading Fang, Sergey Zakharov, Vincent Sitzmann, Igor Vasiljevic, Adrien Gaidon
  • Publication number: 20240135721
    Abstract: A method for improving 3D object detection via object-level augmentations is described. The method includes recognizing, using an image recognition model of a differentiable data generation pipeline, an object in an image of a scene. The method also includes generating, using a 3D reconstruction model, a 3D reconstruction of the scene from the image including the recognized object. The method further includes manipulating, using an object level augmentation model, a random property of the object by a random magnitude at an object level to determine a set of properties and a set of magnitudes of an object manipulation that maximizes a loss function of the image recognition model. The method also includes training a downstream task network based on a set of training data generated based on the set of properties and the set of magnitudes of the object manipulation, such that the loss function is minimized.
    Type: Application
    Filed: October 12, 2022
    Publication date: April 25, 2024
    Applicants: TOYOTA RESEARCH INSTITUTE, INC., TOYOTA JIDOSHA KABUSHIKI KAISHA
    Inventors: Rares Andrei AMBRUS, Sergey ZAKHAROV, Vitor GUIZILINI, Adrien David GAIDON
  • Publication number: 20240104774
    Abstract: Various embodiments include a pose estimation method for refining an initial multi-dimensional pose of an object of interest to generate a refined multi-dimensional object pose Tpr(NL) with NL?1. The method may include: providing the initial object pose Tpr(0) and at least one 2D-3D-correspondence map ?pri with i=1, . . . , I and I?1; and estimating the refined object pose Tpr(NL) using an iterative optimization procedure of a loss according to a given loss function LF(k) based on discrepancies between the one or more provided 2D-3D-correspondence maps ?pri and one or more respective rendered 2D-3D-correspondence maps ?rendk,i.
    Type: Application
    Filed: December 9, 2021
    Publication date: March 28, 2024
    Applicant: Siemens Aktiengesellschaft
    Inventors: Slobodan Ilic, Ivan Shugurov, Sergey Zakharov, Ivan Pavlov
  • Patent number: 11915451
    Abstract: A method and a system for object detection and pose estimation within an input image. A 6-degree-of-freedom object detection and pose estimation is performed using a trained encoder-decoder convolutional artificial neural network including an encoder head, an ID mask decoder head, a first correspondence color channel decoder head and a second correspondence color channel decoder head. The ID mask decoder head creates an ID mask for identifying objects, and the color channel decoder heads are used to create a 2D-to-3D-correspondence map. For at least one object identified by the ID mask, a pose estimation based on the generated 2D-to-3D-correspondence map and on a pre-generated bijective association of points of the object with unique value combinations in the first and the second correspondence color channels is generated.
    Type: Grant
    Filed: January 17, 2020
    Date of Patent: February 27, 2024
    Assignee: Siemens Aktiengesellschaft
    Inventors: Ivan Shugurov, Andreas Hutter, Sergey Zakharov, Slobodan Ilic
  • Patent number: 11887248
    Abstract: Systems and methods described herein relate to reconstructing a scene in three dimensions from a two-dimensional image. One embodiment processes an image using a detection transformer to detect an object in the scene and to generate a NOCS map of the object and a background depth map; uses MLPs to relate the object to a differentiable database of object priors (PriorDB); recovers, from the NOCS map, a partial 3D object shape; estimates an initial object pose; fits a PriorDB object prior to align in geometry and appearance with the partial 3D shape to produce a complete shape and refines the initial pose estimate; generates an editable and re-renderable 3D scene reconstruction based, at least in part, on the complete shape, the refined pose estimate, and the depth map; and controls the operation of a robot based, at least in part, on the editable and re-renderable 3D scene reconstruction.
    Type: Grant
    Filed: March 16, 2022
    Date of Patent: January 30, 2024
    Assignees: Toyota Research Institute, Inc., Massachusetts Institute of Technology, The Board of Trustees of the Leland Standford Junior Univeristy
    Inventors: Sergey Zakharov, Wadim Kehl, Vitor Guizilini, Adrien David Gaidon, Rares A. Ambrus, Dennis Park, Joshua Tenenbaum, Jiajun Wu, Fredo Durand, Vincent Sitzmann
  • Publication number: 20240028792
    Abstract: The disclosure provides implicit representations for multi-object 3D shape, 6D pose and size, and appearance optimization, including obtaining shape, 6D pose and size, and appearance codes. Training is employed using shape and appearance priors from an implicit joint differential database. 2D masks are also obtained and are used in an optimization process that utilizes a combined loss minimizing function and an Octree-based coarse-to-fine differentiable optimization to jointly optimize the latest shape, appearance, pose and size, and 2D masks. An object surface is recovered from the latest shape codes to a desired resolution level. The database represents shapes as Signed Distance Fields (SDF), and appearance as Texture Fields (TF).
    Type: Application
    Filed: July 19, 2022
    Publication date: January 25, 2024
    Applicants: TOYOTA RESEARCH INSTITUTE, INC., TOYOTA JIDOSHA KABUSHIKI KAISHA
    Inventors: MUHAMMAD ZUBAIR IRSHAD, Sergey Zakharov, Rares A. Ambrus, Adrien D. Gaidon
  • Publication number: 20240013409
    Abstract: A method for multiple object tracking includes receiving, with a computing device, a point cloud dataset, detecting one or more objects in the point cloud dataset, each of the detected one or more objects defined by points of the point cloud dataset and a bounding box, querying one or more historical tracklets for historical tracklet states corresponding to each of the one or more detected objects, implementing a 4D encoding backbone comprising two branches: a first branch configured to compute per-point features for each of the one or more objects and the corresponding historical tracklet states, and a second branch configured to obtain 4D point features, concatenating the per-point features and the 4D point features, and predicting, with a decoder receiving the concatenated per-point features, current tracklet states for each of the one or more objects.
    Type: Application
    Filed: May 26, 2023
    Publication date: January 11, 2024
    Applicants: Toyota Research Institute, Inc., Toyota Jidosha Kabushiki Kaisha, The Board of Trustees of the Leland Stanford Junior University
    Inventors: Colton Stearns, Jie Li, Rares A. Ambrus, Vitor Campagnolo Guizilini, Sergey Zakharov, Adrien D. Gaidon, Davis Rempe, Tolga Birdal, Leonidas J. Guibas