Abstract: A method. The method including receiving a prompt describing desired characteristics of audio. The method further including generating, using a set of machine learning models and based on the prompt, a latent space representation of the audio at a latent rate less than 40 Hz. The method further including generating, using the set of machine learning models and the latent space representation of the audio, an audio file at an output rate greater than the latent rate. The audio file including the audio based on the latent space representation of the audio. The audio having a length greater than 90 seconds.
Abstract: A method including receiving an input from a user interface of a device, the input indicating a desired characteristic of an image. The method including transmitting a prompt indicating the desired characteristic to a set of servers with a request to generate the image, causing the set of servers to: generate, using a set of encoding models, a prompt encoding based on the prompt; generate, using a first transformer block of a diffusion transformer model, a first prompt embedding and a first image embedding based on the prompt encoding and a noise input; generate, using a second transformer block of the diffusion transformer model, a second image embedding based on the first image embedding and the first prompt embedding; and generate the image based on the second image embedding. The method including receiving the image from the set of servers and presenting the image on a display of the device.
Type:
Application
Filed:
February 25, 2025
Publication date:
September 25, 2025
Applicant:
Stability AI Ltd
Inventors:
Rahim Entezari, Patrick Esser, Robin Rombach, Andreas Blattmann
Abstract: A method including receiving a first representation of an image in a first latent space of a first machine learning model. The method further includes generating, by a second machine learning model based at least in part on the first representation, a second representation of the image in a second latent space of the second machine learning model. The method further includes updating, without generating an output image corresponding to the image, a set of weights of the second machine learning model based at least in part on the first representation and the second representation.
Type:
Application
Filed:
January 31, 2025
Publication date:
September 25, 2025
Applicant:
Stability AI Ltd
Inventors:
Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser, Robin Rombach
Abstract: A method including receiving a prompt describing a desired characteristic of an image. The method further including generating, using a set of encoding models, a prompt encoding based on the prompt. The method further including generating, using a first transformer block of a diffusion transformer model, a first prompt embedding and a first image embedding based on the prompt encoding and a noise input. The method further including generating, using a second transformer block of the diffusion transformer model, a second image embedding based on the first image embedding and the first prompt embedding. The method further including generating the image based on the second image embedding.
Type:
Grant
Filed:
September 11, 2024
Date of Patent:
April 8, 2025
Assignee:
Stability AI Ltd
Inventors:
Rahim Entezari, Patrick Esser, Robin Rombach, Andreas Blattmann