Abstract: A method including receiving a prompt describing a desired characteristic of an image. The method further including generating, using a set of encoding models, a prompt encoding based on the prompt. The method further including generating, using a first transformer block of a diffusion transformer model, a first prompt embedding and a first image embedding based on the prompt encoding and a noise input. The method further including generating, using a second transformer block of the diffusion transformer model, a second image embedding based on the first image embedding and the first prompt embedding. The method further including generating the image based on the second image embedding.
Type:
Grant
Filed:
September 11, 2024
Date of Patent:
April 8, 2025
Assignee:
Stability AI Ltd
Inventors:
Rahim Entezari, Patrick Esser, Robin Rombach, Andreas Blattmann