Patents by Inventor Willi Menapace

Willi Menapace has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

HIERARCHICAL PATCH-WISE DIFFUSION MODELS FOR HIGH-RESOLUTION VIDEO GENERATION

Publication number: 20260148438

Abstract: Hierarchical patch-wise diffusion models (HPDMs) use a diffusion paradigm that learns a hierarchical distribution of patches instead of whole videos for efficient patch-wise training of diffusion models. To enforce consistency between the patches, deep context fusion may be used to propagate the context information from low-scale to high-scale patches in a hierarchical manner. To accelerate patch-wise training and inference, adaptive computation also may be used to allocate more computational resources and network capacity towards coarse image details and to cheapen synthesis of high-frequency texture details. All the processing stages are jointly trained to provide spatially aligned global context to the higher levels of the cascade. As a result, the model does not operate on the full-resolution inputs, which allows the model to be trained on high-resolution video datasets in an end-to-end fashion.

Type: Application

Filed: December 11, 2025

Publication date: May 28, 2026

Inventors: Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Sergey Tulyakov
DIGITAL EFFECTS EXPERIENCE RENDERING SYSTEM

Publication number: 20260112124

Abstract: Examples relate to systems and methods for generating digital effects experiences. The system performs operations including accessing a set of instructions that defines a digital effects experience. The system processes the set of instructions by a generative machine learning model to generate one or more digital effects comprising the digital effects experience. The system continuously processes one or more inputs, received by a user device, while the one or more digital effects comprising the digital effects experience are presented on the user device, along with the set of instructions in real time by the generative machine learning model to update presentation of the one or more digital effects comprising the digital effects experience.

Type: Application

Filed: October 18, 2024

Publication date: April 23, 2026

Inventors: Willi Menapace, Robert Cornelius Murphy, Aliasksandr Siarohin, Aleksei Stoliar, Sergey Tulyakov
PHOTOREALISTIC 4D SCENE GENERATION USING VIDEO DIFFUSION MODELS

Publication number: 20260089303

Abstract: A method for generating photorealistic 4D scenes from text inputs is disclosed. The method utilizes a text-to-video diffusion model to generate a reference video and a freeze-time video. A canonical 3D representation is reconstructed using deformable 3D Gaussian Splats (D-3DGS) based on the freeze-time video. Temporal deformations are learned to capture dynamic interactions in the reference video. The method employs a novel Score Distillation Sampling strategy combining multi-view and temporal aspects to enhance consistency and robustness. The resulting 4D scenes feature multiple objects interacting with detailed background environments, viewable from different angles and times. The method enables flexible camera control and integration with augmented and virtual reality applications. Some examples include features such as image-to-4D generation.

Type: Application

Filed: September 23, 2024

Publication date: March 26, 2026

Inventors: Hsin-Ying Lee, Willi Menapace, Aliaksandr Siarohin, Sergey Tulyakov, Chaoyang Wang, Heng Yu, Peiye Zhuang
SINGLE PASS VIDEO GENERATION MODEL

Publication number: 20260073578

Abstract: The present disclosure addresses technological challenges arising in the field of artificial intelligence (AI) with respect to inefficient use of computing resources and runtime delay. In particular, the present disclosure provides for development of a machine learning model that generates an image sample for a video in a single forward pass. The development of this machine learning model uses an adversarial training approach involving training two machine learning models, a generator model and a discriminator model. With the generator model trained in this way, the generator model can be used to generate image samples for a video in a single forward pass.

Type: Application

Filed: September 10, 2024

Publication date: March 12, 2026

Inventors: Junli Cao, Anil Kag, Yanyu Li, Willi Menapace, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Sergey Tulyakov, Yushu Wu, Zhixing Zhang
INFINITE-SCALE CITY SYNTHESIS

Publication number: 20260051121

Abstract: An environment synthesis framework generates virtual environments from a synthesized two-dimensional (2D) satellite map of a geographic area, a three-dimensional (3D) voxel environment, and a voxel-based neural rendering framework. In an example implementation, the synthesized 2D satellite map is generated by a map synthesis generative adversarial network (GAN) which is trained using sample city datasets. The multi-stage framework lifts the 2D map into a set of 3D octrees, generates an octree-based 3D voxel environment, and then converts it into a texturized 3D virtual environment using a neural rendering GAN and a set of pseudo ground truth images. The resulting 3D virtual environment is texturized, lifelike, editable, traversable in virtual reality (VR) and augmented reality (AR) experiences, and very large in scale.

Type: Application

Filed: October 23, 2025

Publication date: February 19, 2026

Inventors: Menglei Chai, Hsin-Ying Lee, Chieh Lin, Willi Menapace, Aliaksandr Siarohin, Sergey Tulyakov
Hierarchical patch-wise diffusion models for high-resolution video generation

Patent number: 12524925

Abstract: Hierarchical patch-wise diffusion models (HPDMs) use a diffusion paradigm that learns a hierarchical distribution of patches instead of whole videos for efficient patch-wise training of diffusion models. To enforce consistency between the patches, deep context fusion may be used to propagate the context information from low-scale to high-scale patches in a hierarchical manner. To accelerate patch-wise training and inference, adaptive computation also may be used to allocate more computational resources and network capacity towards coarse image details and to cheapen synthesis of high-frequency texture details. All the processing stages are jointly trained to provide spatially aligned global context to the higher levels of the cascade. As a result, the model does not operate on the full-resolution inputs, which allows the model to be trained on high-resolution video datasets in an end-to-end fashion.

Type: Grant

Filed: February 9, 2024

Date of Patent: January 13, 2026

Assignee: Snap Inc.

Inventors: Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Sergey Tulyakov
UNSUPERVISED VOLUMETRIC ANIMATION

Publication number: 20250356569

Abstract: Unsupervised volumetric 3D animation (UVA) of non-rigid deformable objects without annotations learns the 3D structure and dynamics of objects solely from single-view red/green/blue (RGB) videos and decomposes the single-view RGB videos into semantically meaningful parts that can be tracked and animated. Using a 3D autodecoder framework, paired with a keypoint estimator via a differentiable perspective-n-point (PnP) algorithm, the UVA model learns the underlying object 3D geometry and parts decomposition in an entirely unsupervised manner from still or video images. This allows the UVA model to perform 3D segmentation, 3D keypoint estimation, novel view synthesis, and animation. The UVA model can obtain animatable 3D objects from a single or a few images. The UVA method also features a space in which all objects are represented in their canonical, animation-ready form. Applications include the creation of lenses from images or videos for social media applications.

Type: Application

Filed: July 29, 2025

Publication date: November 20, 2025

Inventors: Menglei Chai, Hsin-Ying Lee, Willi Menapace, Kyle Olszewski, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Sergey Tulyakov
PLOTTING BEHIND THE SCENES WITH LEARNABLE GAME ENGINES

Publication number: 20250345710

Abstract: A framework trains game-engine-like neural models from annotated videos to generate a Learnable Game Engine (LGE) that maintains states of the scene, objects and agents in it, and enables rendering the environment from a controllable viewpoint. The LGE models the logic of the game and the rules of physics, making it possible for the user to play the game by specifying both high- and low-level action sequences. The LGE also unlocks a director's mode where the game is played by plotting behind the scenes, specifying high-level actions and goals for the agents using text-based instructions. To implement the director's mode, a trained diffusion-based animation model navigates the scene using high-level constraints, to enable play against an adversary, and to devise the strategy to win a point. To render the resulting state of the environment and its agents, a compositional neural radiance field (NeRF) representation is used in a synthesis model.

Type: Application

Filed: July 16, 2025

Publication date: November 13, 2025

Inventors: Willi Menapace, Aliaksandr Siarohin, Sergey Tulyakov
Infinite-scale city synthesis

Patent number: 12469217

Abstract: An environment synthesis framework generates virtual environments from a synthesized two-dimensional (2D) satellite map of a geographic area, a three-dimensional (3D) voxel environment, and a voxel-based neural rendering framework. In an example implementation, the synthesized 2D satellite map is generated by a map synthesis generative adversarial network (GAN) which is trained using sample city datasets. The multi-stage framework lifts the 2D map into a set of 3D octrees, generates an octree-based 3D voxel environment, and then converts it into a texturized 3D virtual environment using a neural rendering GAN and a set of pseudo ground truth images. The resulting 3D virtual environment is texturized, lifelike, editable, traversable in virtual reality (VR) and augmented reality (AR) experiences, and very large in scale.

Type: Grant

Filed: December 29, 2022

Date of Patent: November 11, 2025

Assignee: Snap Inc.

Inventors: Menglei Chai, Hsin-Ying Lee, Chieh Lin, Willi Menapace, Aliaksandr Siarohin, Sergey Tulyakov
Unsupervised volumetric animation

Patent number: 12400388

Abstract: Unsupervised volumetric 3D animation (UVA) of non-rigid deformable objects without annotations learns the 3D structure and dynamics of objects solely from single-view red/green/blue (RGB) videos and decomposes the single-view RGB videos into semantically meaningful parts that can be tracked and animated. Using a 3D autodecoder framework, paired with a keypoint estimator via a differentiable perspective-n-point (PnP) algorithm, the UVA model learns the underlying object 3D geometry and parts decomposition in an entirely unsupervised manner from still or video images. This allows the UVA model to perform 3D segmentation, 3D keypoint estimation, novel view synthesis, and animation. The UVA model can obtain animatable 3D objects from a single or a few images. The UVA method also features a space in which all objects are represented in their canonical, animation-ready form. Applications include the creation of lenses from images or videos for social media applications.

Type: Grant

Filed: December 28, 2022

Date of Patent: August 26, 2025

Assignee: Snap Inc.

Inventors: Menglei Chai, Hsin-Ying Lee, Willi Menapace, Kyle Olszewski, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Sergey Tulyakov
ASYMMETRICALLY DISTRIBUTED CONVOLUTION-ATTENTION NEURAL NETWORKS

Publication number: 20250265448

Abstract: An asymmetrically distributed convolution-attention neural network (AsCAN) includes a simple hybrid architecture in which the number of convolutional and transformer blocks is asymmetrically distributed in different processing stages. AsCAN adopts more convolutional blocks in the early processing stages, where the feature maps have relatively large spatial sizes, and more transformer blocks at the later processing stages. Transformer layers are incorporated in the early processing stages as well, except that fewer transformer blocks are used compared to convolutions in the early part. This trend is reversed at the lower resolution in the later processing stages. This uneven distribution of the convolutional and transformer blocks yields better throughput due to improved accelerator utilization at various batch sizes during the inference stage.

Type: Application

Filed: February 21, 2024

Publication date: August 21, 2025

Inventors: Junli Cao, Anil Kag, Willi Menapace, Jian Ren, Aliaksandr Siarohin, Sergey Tulyakov
Plotting behind the scenes with learnable game engines

Patent number: 12390738

Abstract: A framework trains game-engine-like neural models from annotated videos to generate a Learnable Game Engine (LGE) that maintains states of the scene, objects and agents in it, and enables rendering the environment from a controllable viewpoint. The LGE models the logic of the game and the rules of physics, making it possible for the user to play the game by specifying both high- and low-level action sequences. The LGE also unlocks a director's mode where the game is played by plotting behind the scenes, specifying high-level actions and goals for the agents using text-based instructions. To implement the director's mode, a trained diffusion-based animation model navigates the scene using high-level constraints, to enable play against an adversary, and to devise the strategy to win a point. To render the resulting state of the environment and its agents, a compositional neural radiance field (NeRF) representation is used in a synthesis model.

Type: Grant

Filed: March 14, 2023

Date of Patent: August 19, 2025

Assignee: Snap Inc.

Inventors: Willi Menapace, Aliaksandr Siarohin, Sergey Tulyakov
HIERARCHICAL PATCH-WISE DIFFUSION MODELS FOR HIGH-RESOLUTION VIDEO GENERATION

Publication number: 20250259339

Abstract: Hierarchical patch-wise diffusion models (HPDMs) use a diffusion paradigm that learns a hierarchical distribution of patches instead of whole videos for efficient patch-wise training of diffusion models. To enforce consistency between the patches, deep context fusion may be used to propagate the context information from low-scale to high-scale patches in a hierarchical manner. To accelerate patch-wise training and inference, adaptive computation also may be used to allocate more computational resources and network capacity towards coarse image details and to cheapen synthesis of high-frequency texture details. All the processing stages are jointly trained to provide spatially aligned global context to the higher levels of the cascade. As a result, the model does not operate on the full-resolution inputs, which allows the model to be trained on high-resolution video datasets in an end-to-end fashion.

Type: Application

Filed: February 9, 2024

Publication date: August 14, 2025

Inventors: Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Sergey Tulyakov
SCALED SPATIOTEMPORAL TRANSFORMERS FOR TEXT-TO-VIDEO SYNTHESIS

Publication number: 20250245897

Abstract: A text-to-video framework including a far-reaching interleaved transformer (FIT) block configured to learn a compressed representation of video input using a set of learnable latent tokens. The FIT block includes a diffusion framework and joint spatiotemporal modeling. The FIT block performs patchification of the video input to produce a sequence of patch tokens that are divided into groups. The FIT block instantiates the set of latent tokens and applies a sequence of computational blocks, and projects the patch tokens to generate video frames.

Type: Application

Filed: January 31, 2024

Publication date: July 31, 2025

Inventors: Tsai-Shien Chen, Yuwei Fang, Anil Kag, Willi Menapace, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Sergey Tulyakov
PLOTTING BEHIND THE SCENES WITH LEARNABLE GAME ENGINES

Publication number: 20240307783

Abstract: A framework trains game-engine-like neural models from annotated videos to generate a Learnable Game Engine (LGE) that maintains states of the scene, objects and agents in it, and enables rendering the environment from a controllable viewpoint. The LGE models the logic of the game and the rules of physics, making it possible for the user to play the game by specifying both high- and low-level action sequences. The LGE also unlocks a director's mode where the game is played by plotting behind the scenes, specifying high-level actions and goals for the agents using text-based instructions. To implement the director's mode, a trained diffusion-based animation model navigates the scene using high-level constraints, to enable play against an adversary, and to devise the strategy to win a point. To render the resulting state of the environment and its agents, a compositional neural radiance field (NeRF) representation is used in a synthesis model.

Type: Application

Filed: March 14, 2023

Publication date: September 19, 2024

Inventors: Willi Menapace, Aliaksandr Siarohin, Sergey Tulyakov
INFINITE-SCALE CITY SYNTHESIS

Publication number: 20240221309

Abstract: An environment synthesis framework generates virtual environments from a synthesized two-dimensional (2D) satellite map of a geographic area, a three-dimensional (3D) voxel environment, and a voxel-based neural rendering framework. In an example implementation, the synthesized 2D satellite map is generated by a map synthesis generative adversarial network (GAN) which is trained using sample city datasets. The multi-stage framework lifts the 2D map into a set of 3D octrees, generates an octree-based 3D voxel environment, and then converts it into a texturized 3D virtual environment using a neural rendering GAN and a set of pseudo ground truth images. The resulting 3D virtual environment is texturized, lifelike, editable, traversable in virtual reality (VR) and augmented reality (AR) experiences, and very large in scale.

Type: Application

Filed: December 29, 2022

Publication date: July 4, 2024

Inventors: Menglei Chai, Hsin-Ying Lee, Chieh Lin, Willi Menapace, Aliaksandr Siarohin, Sergey Tulyakov
UNSUPERVISED VOLUMETRIC ANIMATION

Publication number: 20240221258

Abstract: Unsupervised volumetric 3D animation (UVA) of non-rigid deformable objects without annotations learns the 3D structure and dynamics of objects solely from single-view red/green/blue (RGB) videos and decomposes the single-view RGB videos into semantically meaningful parts that can be tracked and animated. Using a 3D autodecoder framework, paired with a keypoint estimator via a differentiable perspective-n-point (PnP) algorithm, the UVA model learns the underlying object 3D geometry and parts decomposition in an entirely unsupervised manner from still or video images. This allows the UVA model to perform 3D segmentation, 3D keypoint estimation, novel view synthesis, and animation. The UVA model can obtain animatable 3D objects from a single or a few images. The UVA method also features a space in which all objects are represented in their canonical, animation-ready form. Applications include the creation of lenses from images or videos for social media applications.

Type: Application

Filed: December 28, 2022

Publication date: July 4, 2024

Inventors: Menglei Chai, Hsin-Ying Lee, Willi Menapace, Kyle Olszewski, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Sergey Tulyakov