Patents by Inventor Peter Vajda
Peter Vajda has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20260100204Abstract: A method to generate synchronized audio for a video includes receiving the video including a sequence of frames and receiving a text input describing at least one of a scene, an event, or a mood to be reflected in an audio track. The method also includes generating a latent audio representation via an audio generation model conditioned jointly on video embeddings associated with the sequence of frames and text embeddings associated with the text input. The method also includes decoding the latent audio representation to produce an audio track temporally aligned with the video and semantically consistent with the text input.Type: ApplicationFiled: October 2, 2025Publication date: April 9, 2026Inventors: Zecheng He, Samaneh Azadi, Bowen Shi, Apoorv Vyas, Ann Lee, Ishan Satish Misra, Peizhao Zhang, Roshan Rajesh Sumbaly, Yaniv Nechemia Taigman, Peter Vajda, Yi-Chiao Wu, Andros Tjandra, Wei-Ning Hsu, Amit Zohar, Animesh Sinha, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Matthew Le, Juefei Xu, Haoyu Ma, Tingbo Hou
-
Publication number: 20260100203Abstract: A method to edit a video includes receiving an input video including a sequence of frames and receiving an editing instruction expressed in natural language. The method also includes generating a multimodal condition based on the textual editing instruction and the input video. The multimodal condition may include an embedding of the input video concatenated with an embedding of the textual editing instruction. The method also includes applying, via a video editing model, the multimodal condition to modify visual content of the input video. The method further includes generating an edited video including visual modifications corresponding to the textual editing instruction. The edited video preserves temporal coherence and overall visual fidelity of the input video.Type: ApplicationFiled: October 2, 2025Publication date: April 9, 2026Inventors: Zecheng He, Samaneh Azadi, Bowen Shi, Apoorv Vyas, Ann Lee, Ishan Satish Misra, Peizhao Zhang, Roshan Rajesh Sumbaly, Yaniv Nechemia Taigman, Peter Vajda, Yi-Chiao Wu, Andros Tjandra, Wei-Ning Hsu, Amit Zohar, Animesh Sinha, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Matthew Le, Juefei Xu, Haoyu Ma, Tingbo Hou
-
Publication number: 20260101081Abstract: A system and method to generate a video is provided. The method may include generating, based on a user input including a description of a desired video, a structured script including one or more of scene descriptions, dialogue, or explicit shot-level information. The method also includes generating, based on the structured script, a sequence of video frames representing one or more scenes. The method further includes generating, based on the structured script and the sequence of video frames, an audio track including one or more of ambient sounds, sound effects, or music. The generated audio track being temporally synchronized with the sequence of video frames. The method also includes combining the sequence of video frames with the audio track to generate a synchronized video output representing the desired video.Type: ApplicationFiled: October 2, 2025Publication date: April 9, 2026Inventors: Zecheng He, Samaneh Azadi, Bowen Shi, Apoorv Vyas, Ann Lee, Ishan Satish Misra, Peizhao Zhang, Roshan Rajesh Sumbaly, Yaniv Nechemia Taigman, Peter Vajda, Yi-Chiao Wu, Andros Tjandra, Wei-Ning Hsu, Amit Zohar, Animesh Sinha, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Matthew Le, Juefei Xu, Haoyu Ma, Tingbo Hou
-
Publication number: 20260099978Abstract: A method to generate a video includes receiving an input describing a scene. The method also includes receiving a reference image depicting a character. The method further includes generating, via an encoder, embeddings of identity features of the reference image. The method also includes generating, via a video generation model, the video in which the character appears with consistent likeness across multiple frames in accordance with the embeddings and the text prompt.Type: ApplicationFiled: October 2, 2025Publication date: April 9, 2026Inventors: Zecheng He, Samaneh Azadi, Bowen Shi, Apoorv Vyas, Ann Lee, Ishan Satish Misra, Peizhao Zhang, Roshan Rajesh Sumbaly, Yaniv Nechemia Taigman, Peter Vajda, Yi-Chiao Wu, Andros Tjandra, Wei-Ning Hsu, Amit Zohar, Animesh Sinha, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Matthew Le, Juefei Xu, Haoyu Ma, Tingbo Hou
-
Publication number: 20260004489Abstract: A system and method to generate a target image from a reference image are provided. The system may receive, via a LDM, a reference image and a text prompt. The system may extract, via a trained vision encoder in the LDM, a vision control signal from an object in the reference image. The vision control signal indicates an identity of the object. The system may extract, via trained text encoders in the LDM, text control signals associated with the text prompt. The system may generate, via cross attention summation of an output of a vision cross attention unit(s) associated with the vision control signal and an output of text cross attention units associated with the text control signals, spatial features indicative of the reference image and the text prompt. The system may output, via a decoder in communication with the LDM, a target image based on the generated spatial features.Type: ApplicationFiled: May 20, 2025Publication date: January 1, 2026Inventors: Zecheng He, Bo Sun, Juefei Xu, Animesh Sinha, Roshan Rajesh Sumbaly, Ning Zhang, Peizhao Zhang, Ankit Rajesh Ramchandani, Peter Vajda, Vincent Charles Cheung, Haoyu Ma
-
Patent number: 11748615Abstract: Computer implemented systems are described that implement a differentiable neural architecture search (DNAS) engine executing on one or more processors. The DNAS engine is configured with a stochastic super net defining a layer-wise search space having a plurality of candidate layers, each of the candidate layers specifying one or more operators for a neural network architecture. Further, the DNAS engine is configured to process training data to train weights for the operators in the stochastic super net based on a loss function representing a latency of the respective operator on a target platform, and to select a set of candidate neural network architectures from the trained stochastic super net. The DNAS engine may, for example, be configured to train the stochastic super net by traversing the layer-wise search space using gradient-based optimization of network architecture distribution.Type: GrantFiled: December 5, 2019Date of Patent: September 5, 2023Assignee: META PLATFORMS, INC.Inventors: Bichen Wu, Peizhao Zhang, Peter Vajda, Xiaoliang Dai, Yanghan Wang, Yuandong Tian
-
Patent number: 11210854Abstract: Systems, methods, and non-transitory computer readable media can determine a placement in a camera view for displaying an augmented reality (AR) advertisement, where the camera view is associated with a computing device. An AR advertisement for a user associated with the computing device can be determined based on attributes associated with the user. Display of the AR advertisement can be caused at the determined placement in the camera view.Type: GrantFiled: December 20, 2017Date of Patent: December 28, 2021Assignee: Facebook, Inc.Inventors: John Samuel Barnett, Dantley Davis, Congxi Lu, Jonathan Morton, Peter Vajda, Joshua Charles Harris
-
Patent number: 11170470Abstract: Techniques are described for content-adaptive downsampling of digital images and videos for computer vision operations, such as semantic segmentation. A computer vision system comprises a memory, one or more processors operably coupled to the memory and a downsampling module configured for execution by the one or more processors to perform, based on a non-uniform sampling model trained to predict content-aware sampling parameters, downsampling input image data to generate downsampled image data. A segmentation module is configured for execution by the one or more processors to perform segmentation on the downsampled image to produce a segmentation result, such as a feature map that assigns pixels of the downsampled image data to object classes. An upsampling module is configured for execution by the one or more processors to perform upsampling according to the segmentation result to produce upsampled image data.Type: GrantFiled: December 6, 2019Date of Patent: November 9, 2021Assignee: Facebook, Inc.Inventors: Zijian He, Peter Vajda, Priyam Chatterjee, Shanghsuan Tsai, Dmitrii Marin
-
Patent number: 11030440Abstract: Systems, methods, and non-transitory computer-readable media can identify a first user depicted in image content captured by a second user. It is determined that the first user should be obscured in the image content based on privacy settings. The image content is modified to obscure the first user.Type: GrantFiled: December 20, 2017Date of Patent: June 8, 2021Assignee: Facebook, Inc.Inventors: John Samuel Barnett, Dantley Davis, Congxi Lu, Jonathan Morton, Peter Vajda, Joshua Charles Harris
-
Patent number: 10796452Abstract: In one embodiment, a system accesses a probability model associated with an image depicting a body. The probability model includes probability values associated with regions of the image and each probability value represents a probability of the associated region of the image containing a particular body part. The system selects a subset (e.g., 3) of the probability values based on a comparison of the probability values. For each selected probability value, the system identifies surrounding probability values surrounding the selected probability value and computes a probabilistic maximum based on the selected probability value and the surrounding probability values. Each probabilistic maximum is associated with a location within the regions associated with the selected probability value and the surrounding probability values.Type: GrantFiled: December 31, 2018Date of Patent: October 6, 2020Assignee: Facebook, Inc.Inventors: Peter Vajda, Peizhao Zhang, Matthieu Tony Uyttendaele, Yanghan Wang
-
Patent number: 10733431Abstract: In one embodiment, a system may access first, second, and third probability models that are respectively associated with predetermined first and second body parts and a predetermined segment connecting the first and second body parts. Each model includes probability values associated with regions in an image, with each value representing the probability of the associated region containing the associated body part or segment. The system may select a first and second region based on the first probability model and a third region based on the second probability model. Based on the third probability model, the system may compute a first probability score for regions connecting the first and third regions and a second probability score for regions connecting the second and third regions. Based on the first and second probability scores, the system may select the first region to indicate where the predetermined first body part appears in the image.Type: GrantFiled: December 31, 2018Date of Patent: August 4, 2020Assignee: Facebook, Inc.Inventors: Peizhao Zhang, Peter Vajda, Kevin Matzen, Ross Girshick
-
Patent number: 10692243Abstract: In one embodiment, a system may access an image and generate a feature map for the image using a neural network. The system may identify regions of interest in the feature map. Regional feature maps may be generated for the regions of interest, respectively. Each of the regional feature maps has a first, a second, and a third dimension. The system may generate a first combined regional feature map by combining the regional feature maps. The combined regional feature map has a first, a second, and a third dimension. The system may generate a second combined regional feature map by processing the first combined regional feature map using one or more convolutional layers. The system may generate, for each of the regions of interest, information associated with an object instance based on a portion of the second combined regional feature map associated with that region of interest.Type: GrantFiled: May 4, 2018Date of Patent: June 23, 2020Assignee: Facebook, Inc.Inventors: Peter Vajda, Peizhao Zhang, Fei Yang, Yanghan Wang
-
Patent number: 10650072Abstract: One general aspect includes a method, including: capturing an image of an object having a multi-part identifier displayed thereon, the multi-part identifier including a first portion and a second portion, the first portion including graphical content and the second portion including human-recognizable textual content. The method also includes based on the captured image, identifying a domain associated with the graphical content. The method also includes based on the captured image, identifying a sub-part of the domain associated with the textual content. The method also includes identifying a digital destination based on the identified domain and the identified sub-part. The method also includes performing an action based on the digital destination. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.Type: GrantFiled: October 30, 2017Date of Patent: May 12, 2020Assignee: FACEBOOK, INC.Inventors: Maria Loveva, Matthew William Canton, Peizhao Zhang, Shihang Wei, Shen Wang, Peter Vajda, Han Wang
-
Patent number: 10586350Abstract: In one embodiment, a system accesses pose probability models for predetermined parts of a body depicted in an image. Each of the pose probability models is configured for determining a probability of the associated predetermined body part being at a location in the image. The system determines a candidate pose that is defined by a set of coordinates representing candidate locations of the predetermined body parts. The system further determines a first probability score for the candidate pose based on the pose probability models and the set of coordinates of the candidate pose. A pose representation is generated for the candidate pose using a transformation model and the candidate pose. The system determines a second probability score for the pose representation based on a pose-representation probability model. The system selects the candidate pose to represent a pose of the body based on at least the first and second probability scores.Type: GrantFiled: May 4, 2018Date of Patent: March 10, 2020Assignee: Facebook, Inc.Inventors: Peter Vajda, Peizhao Zhang, Fei Yang, Yanghan Wang
-
Patent number: 10565729Abstract: In one embodiment, a method includes a system accessing an image and generating a feature map using a first neural network. The system identifies a plurality of regions of interest in the feature map. A plurality of regional feature maps may be generated for the plurality of regions of interest, respectively. Using a second neural network, the system may detect at least one regional feature map in the plurality of regional feature maps that corresponds to a person depicted in the image, and generate a target region definition associated with a location of the person using the regional feature map. Based on the target region definition associated with the location of the person, a target regional feature map may be generated by sampling the feature map for the image. The system may process the target regional feature map to generate a keypoint mask and an instance segmentation mask.Type: GrantFiled: May 4, 2018Date of Patent: February 18, 2020Assignee: Facebook, Inc.Inventors: Peter Vajda, Peizhao Zhang, Fei Yang, Yanghan Wang
-
Patent number: 10452898Abstract: Systems, methods, and non-transitory computer-readable media can identify one or more objects depicted in a camera view of a camera application displayed on a display of a user device. An augmented reality overlay is determined based on the one or more objects identified in the camera view. The camera view is modified based on the augmented reality overlay.Type: GrantFiled: February 1, 2019Date of Patent: October 22, 2019Assignee: Facebook, Inc.Inventors: John Samuel Barnett, Dantley Davis, Congxi Lu, Jonathan Morton, Peter Vajda, Joshua Charles Harris
-
Publication number: 20190171903Abstract: In one embodiment, a system may access an image and generate a feature map for the image using a neural network. The system may identify regions of interest in the feature map. Regional feature maps may be generated for the regions of interest, respectively. Each of the regional feature maps has a first, a second, and a third dimension. The system may generate a first combined regional feature map by combining the regional feature maps. The combined regional feature map has a first, a second, and a third dimension. The system may generate a second combined regional feature map by processing the first combined regional feature map using one or more convolutional layers. The system may generate, for each of the regions of interest, information associated with an object instance based on a portion of the second combined regional feature map associated with that region of interest.Type: ApplicationFiled: May 4, 2018Publication date: June 6, 2019Inventors: Peter Vajda, Peizhao Zhang, Fei Yang, Yanghan Wang
-
Publication number: 20190171870Abstract: In one embodiment, a method includes a system accessing an image and generating a feature map using a first neural network. The system identifies a plurality of regions of interest in the feature map. A plurality of regional feature maps may be generated for the plurality of regions of interest, respectively. Using a second neural network, the system may detect at least one regional feature map in the plurality of regional feature maps that corresponds to a person depicted in the image, and generate a target region definition associated with a location of the person using the regional feature map. Based on the target region definition associated with the location of the person, a target regional feature map may be generated by sampling the feature map for the image. The system may process the target regional feature map to generate a keypoint mask and an instance segmentation mask.Type: ApplicationFiled: May 4, 2018Publication date: June 6, 2019Inventors: Peter Vajda, Peizhao Zhang, Fei Yang, Yanghan Wang
-
Publication number: 20190172224Abstract: In one embodiment, a system accesses a probability model associated with an image depicting a body. The probability model includes probability values associated with regions of the image and each probability value represents a probability of the associated region of the image containing a particular body part. The system selects a subset (e.g., 3) of the probability values based on a comparison of the probability values. For each selected probability value, the system identifies surrounding probability values surrounding the selected probability value and computes a probabilistic maximum based on the selected probability value and the surrounding probability values. Each probabilistic maximum is associated with a location within the regions associated with the selected probability value and the surrounding probability values.Type: ApplicationFiled: December 31, 2018Publication date: June 6, 2019Inventors: Peter Vajda, Peizhao Zhang, Matthieu Tony Uyttendaele, Yanghan Wang
-
Publication number: 20190171867Abstract: Systems, methods, and non-transitory computer-readable media can identify one or more objects depicted in a camera view of a camera application displayed on a display of a user device. An augmented reality overlay is determined based on the one or more objects identified in the camera view. The camera view is modified based on the augmented reality overlay.Type: ApplicationFiled: February 1, 2019Publication date: June 6, 2019Inventors: John Samuel Barnett, Dantley Davis, Congxi Lu, Jonathan Morton, Peter Vajda, Joshua Charles Harris