Patents by Inventor Adam Polyak

Adam Polyak has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHODS, APPARATUSES AND COMPUTER PROGRAM PRODUCTS FOR IMAGE EDITING VIA RECOGNITION AND GENERATION TASKS

Publication number: 20260127792

Abstract: Methods and systems are provided to edit or update images or videos based on instructions. A system may analyze an input image and may determine an instruction associated with the input image. The instruction may include content to edit or update the input image. The system may select an edit task, among predetermined edit tasks associated with changes to images, based on a description of the content of the instruction. The system may generate an output image, based on implementing the selected edit task, including an update to the input image depicting the description of the content of the instruction.

Type: Application

Filed: November 3, 2025

Publication date: May 7, 2026

Inventors: Adam Polyak, Yuval Kirstain, Yaniv Nechemia Taigman, Shelly Sheynin, Uriel Singer, Amit Zohar, Devi Niru Parikh
SYSTEMS AND METHODS FOR AUTOMATED MOVIE GENERATION AND EDITING

Publication number: 20260100204

Abstract: A method to generate synchronized audio for a video includes receiving the video including a sequence of frames and receiving a text input describing at least one of a scene, an event, or a mood to be reflected in an audio track. The method also includes generating a latent audio representation via an audio generation model conditioned jointly on video embeddings associated with the sequence of frames and text embeddings associated with the text input. The method also includes decoding the latent audio representation to produce an audio track temporally aligned with the video and semantically consistent with the text input.

Type: Application

Filed: October 2, 2025

Publication date: April 9, 2026

Inventors: Zecheng He, Samaneh Azadi, Bowen Shi, Apoorv Vyas, Ann Lee, Ishan Satish Misra, Peizhao Zhang, Roshan Rajesh Sumbaly, Yaniv Nechemia Taigman, Peter Vajda, Yi-Chiao Wu, Andros Tjandra, Wei-Ning Hsu, Amit Zohar, Animesh Sinha, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Matthew Le, Juefei Xu, Haoyu Ma, Tingbo Hou
SYSTEMS AND METHODS FOR AUTOMATED MOVIE GENERATION AND EDITING

Publication number: 20260100203

Abstract: A method to edit a video includes receiving an input video including a sequence of frames and receiving an editing instruction expressed in natural language. The method also includes generating a multimodal condition based on the textual editing instruction and the input video. The multimodal condition may include an embedding of the input video concatenated with an embedding of the textual editing instruction. The method also includes applying, via a video editing model, the multimodal condition to modify visual content of the input video. The method further includes generating an edited video including visual modifications corresponding to the textual editing instruction. The edited video preserves temporal coherence and overall visual fidelity of the input video.

Type: Application

Filed: October 2, 2025

Publication date: April 9, 2026

Inventors: Zecheng He, Samaneh Azadi, Bowen Shi, Apoorv Vyas, Ann Lee, Ishan Satish Misra, Peizhao Zhang, Roshan Rajesh Sumbaly, Yaniv Nechemia Taigman, Peter Vajda, Yi-Chiao Wu, Andros Tjandra, Wei-Ning Hsu, Amit Zohar, Animesh Sinha, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Matthew Le, Juefei Xu, Haoyu Ma, Tingbo Hou
SYSTEMS AND METHODS FOR AUTOMATED MOVIE GENERATION AND EDITING

Publication number: 20260101081

Abstract: A system and method to generate a video is provided. The method may include generating, based on a user input including a description of a desired video, a structured script including one or more of scene descriptions, dialogue, or explicit shot-level information. The method also includes generating, based on the structured script, a sequence of video frames representing one or more scenes. The method further includes generating, based on the structured script and the sequence of video frames, an audio track including one or more of ambient sounds, sound effects, or music. The generated audio track being temporally synchronized with the sequence of video frames. The method also includes combining the sequence of video frames with the audio track to generate a synchronized video output representing the desired video.

Type: Application

Filed: October 2, 2025

Publication date: April 9, 2026

Inventors: Zecheng He, Samaneh Azadi, Bowen Shi, Apoorv Vyas, Ann Lee, Ishan Satish Misra, Peizhao Zhang, Roshan Rajesh Sumbaly, Yaniv Nechemia Taigman, Peter Vajda, Yi-Chiao Wu, Andros Tjandra, Wei-Ning Hsu, Amit Zohar, Animesh Sinha, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Matthew Le, Juefei Xu, Haoyu Ma, Tingbo Hou
SYSTEMS AND METHODS FOR AUTOMATED MOVIE GENERATION AND EDITING

Publication number: 20260099978

Abstract: A method to generate a video includes receiving an input describing a scene. The method also includes receiving a reference image depicting a character. The method further includes generating, via an encoder, embeddings of identity features of the reference image. The method also includes generating, via a video generation model, the video in which the character appears with consistent likeness across multiple frames in accordance with the embeddings and the text prompt.

Type: Application

Filed: October 2, 2025

Publication date: April 9, 2026

Inventors: Zecheng He, Samaneh Azadi, Bowen Shi, Apoorv Vyas, Ann Lee, Ishan Satish Misra, Peizhao Zhang, Roshan Rajesh Sumbaly, Yaniv Nechemia Taigman, Peter Vajda, Yi-Chiao Wu, Andros Tjandra, Wei-Ning Hsu, Amit Zohar, Animesh Sinha, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Matthew Le, Juefei Xu, Haoyu Ma, Tingbo Hou
EQUIPPING MACHINE LEARNING MODELS WITH SOCIAL NETWORK KNOWLEDGE, VIDEO EDITING VIA FACTORIZED DIFFUSION DISTILLATION & EFFICIENT DEPTH STABILIZER FOR MIXED REALITY & AUGMENTED REALITY

Publication number: 20260006286

Abstract: Various systems, methods, and devices are described for utilizing artificial intelligence (AI) bot (e.g., a chatbot) to fetch or create content associated with a third-party platform based on an input associated with an electronic device. In an example, systems and methods of AI bot fetching or creating content may include receiving an input, via a user device. The input may be textual, audible, or any other suitable method. Based on the input, one or more content items may be fetched or created. The machine learning model may be utilized to determine context associated with the input. The machine leaning model may determine a number of content items associated with the input and data sources related to the retrieval generators. A result may be presented to a user, where the result may comprise the one or more content items determined.

Type: Application

Filed: May 20, 2025

Publication date: January 1, 2026

Inventors: Hong Yan, Adam Polyak, Yaniv Nechemia Taigman, Devi Niru Parikh, Rakesh Ranjan, Hao Jiang, Shelly Sheynin, Uriel Singer, Yuval Kirstain, Jingqing Huang, Amit Zohar
Scene-based text-to-image generation with human priors

Patent number: 12387388

Abstract: In one embodiment, a method includes accessing a text input and a scene input corresponding to the text input, wherein the scene input comprises semantic segmentations, generating text tokens for the text input and scene tokens for the scene input by machine-learning models, generating predicted image tokens based on the text tokens and the scene tokens by the machine-learning models, and generating an image corresponding to the text input and the scene input based on the predicted image tokens by the machine-learning models.

Type: Grant

Filed: January 3, 2023

Date of Patent: August 12, 2025

Assignee: Meta Platforms, Inc.

Inventors: Oran Gafni, Adam Polyak, Yaniv Nechemia Taigman
Scene-Based Text-to-Image Generation with Human Priors

Publication number: 20240221235

Abstract: In one embodiment, a method includes accessing a text input and a scene input corresponding to the text input, wherein the scene input comprises semantic segmentations, generating text tokens for the text input and scene tokens for the scene input by machine-learning models, generating predicted image tokens based on the text tokens and the scene tokens by the machine-learning models, and generating an image corresponding to the text input and the scene input based on the predicted image tokens by the machine-learning models.

Type: Application

Filed: January 3, 2023

Publication date: July 4, 2024

Inventors: Oran Gafni, Adam Polyak, Yaniv Nechemia Taigman
TEXT TO VIDEO GENERATION

Publication number: 20240155071

Abstract: A method and system for text-to-video generation. The method includes receiving a text input, generating a representation frame based on the text input using a model trained on text-image pairs, generating a set of frames based on the representation frame and a first frame rate, interpolating the set of frames to a higher frame rate, generating a first video based on the interpolated set of frames, increasing a resolution of the first video based on a first and second super-resolution model, and generating an output video based on a result of the super-resolution models.

Type: Application

Filed: September 29, 2023

Publication date: May 9, 2024

Inventors: Sonal Gupta, Adam Polyak, Thomas Falstad Hayes, Xi Yin, Jie An, Chao Yang, Oron Ashual, Oran Gafni, Devi Niru Parikh, Yaniv Nechemia Taigman, Uriel Singer, Songyang Zhang, Qiyuan Hu
GENERATING AUDIO FILES FROM TEXT INPUT

Publication number: 20240112687

Abstract: Methods, systems, and storage media for generating audio data includes receiving a text input. The method also includes receiving a plurality of representative audio sources and encoding the plurality of representative audio sources into a plurality of audio tokens. The method includes encoding the text input into a plurality of text representations. The method comprises mapping each audio tokens of the plurality of audio tokens to a text representation of the plurality of text representations. The method also comprises determining a relationship score based on mapping each audio tokens to the text representation, wherein the relationship score identifies a distribution of audio tokens from the plurality of audio tokens. The method and systems can also comprise decoding the subgroup of audio tokens to yield a reconstructed audio source.

Type: Application

Filed: September 29, 2023

Publication date: April 4, 2024

Inventors: Yaniv Nechemia Taigman, Felix Kruk, Yossef Mordechay Adi, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Devi Niru Parikh, Alexandre Défossez, Jade Copet
Generating a voice model for a user

Patent number: 11430424

Abstract: Disclosed herein a system, a method and a device for generating a voice model for a user. A device can include an encoder and a decoder to generate a voice model for converting text to an audio output that resembles a voice of the person sending respective text. The encoder can includes a neural network and can receive a plurality of audio samples from a user. The encoder can generate a sequence of values and provide the sequence of values to the decoder. The decoder can establish, using the sequence of values and one or more speaker embeddings of the user, a voice model corresponding to the plurality of audio samples of the user.

Type: Grant

Filed: November 13, 2019

Date of Patent: August 30, 2022

Assignee: Meta Platforms Technologies, LLC

Inventors: Lior Wolf, David Vazquez, Tali Zvi, Yaniv Nechemia Taigman, Adam Polyak, Hyunbin Park
GENERATING A VOICE MODEL FOR A USER

Publication number: 20210142782

Abstract: Disclosed herein a system, a method and a device for generating a voice model for a user. A device can include an encoder and a decoder to generate a voice model for converting text to an audio output that resembles a voice of the person sending respective text. The encoder can includes a neural network and can receive a plurality of audio samples from a user. The encoder can generate a sequence of values and provide the sequence of values to the decoder. The decoder can establish, using the sequence of values and one or more speaker embeddings of the user, a voice model corresponding to the plurality of audio samples of the user.

Type: Application

Filed: November 13, 2019

Publication date: May 13, 2021

Inventors: Lior Wolf, David Vazquez, Tali Zvi, Yaniv Nechemia Taigman, Adam Polyak, Hyunbin Park