Patents by Inventor Michael Rubinstein
Michael Rubinstein has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20260154903Abstract: A fractional training process can be performed training images to an instance of a machine-learned generative image model to obtain a partially trained instance of the model. A fractional optimization process can be performed with the partially trained instance to an instance of a machine-learned three-dimensional (3D) implicit representation model obtain a partially optimized instance of the model. Based on the plurality of training images, pseudo multi-view subject images can be generated with the partially optimized instance of the 3D implicit representation model and a fully trained instance of the generative image model; The partially trained instance of the model can be trained with a set of training data. The partially optimized instance of the machine-learned 3D implicit representation model can be trained with the machine-learned multi-view image model.Type: ApplicationFiled: January 27, 2026Publication date: June 4, 2026Inventors: Yuanzhen Li, Amit Raj, Varun Jampani, Benjamin Joseph Mildenhall, Benjamin Michael Poole, Jonathan Tilton Barron, Kfir Aberman, Michael Niemeyer, Michael Rubinstein, Nataniel Ruiz Gutierrez, Shiran Elyahu Zada, Srinivas Kaza
-
Publication number: 20260154861Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a text-to-image model so that the text-to-image model generates images that each depict a variable instance of an object class when the object class without the unique identifier is provided as a text input, and that generates images that each depict a same subject instance of the object class when the unique identifier is provided as the text input.Type: ApplicationFiled: January 15, 2026Publication date: June 4, 2026Inventors: Kfir Aberman, Nataniel Ruiz Gutierrez, Michael Rubinstein, Yuanzhen Li, Yael Pritch Knaan, Varun Jampani
-
Patent number: 12561905Abstract: A fractional training process can be performed training images to an instance of a machine-learned generative image model to obtain a partially trained instance of the model. A fractional optimization process can be performed with the partially trained instance to an instance of a machine-learned three-dimensional (3D) implicit representation model obtain a partially optimized instance of the model. Based on the plurality of training images, pseudo multi-view subject images can be generated with the partially optimized instance of the 3D implicit representation model and a fully trained instance of the generative image model; The partially trained instance of the model can be trained with a set of training data. The partially optimized instance of the machine-learned 3D implicit representation model can be trained with the machine-learned multi-view image model.Type: GrantFiled: March 20, 2024Date of Patent: February 24, 2026Assignee: GOOGLE LLCInventors: Yuanzhen Li, Amit Raj, Varun Jampani, Benjamin Joseph Mildenhall, Benjamin Michael Poole, Jonathan Tilton Barron, Kfir Aberman, Michael Niemeyer, Michael Rubinstein, Nataniel Ruiz Gutierrez, Shiran Elyahu Zada, Srinivas Kaza
-
Patent number: 12555275Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a text-to-image model so that the text-to-image model generates images that each depict a variable instance of an object class when the object class without the unique identifier is provided as a text input, and that generates images that each depict a same subject instance of the object class when the unique identifier is provided as the text input.Type: GrantFiled: August 23, 2023Date of Patent: February 17, 2026Assignee: Google LLCInventors: Kfir Aberman, Nataniel Ruiz Gutierrez, Michael Rubinstein, Yuanzhen Li, Yael Pritch Knaan, Varun Jampani
-
Publication number: 20250384251Abstract: Provided are systems and methods for generating custom text-to-video (T2V) models starting from a custom text-to-image (T2I) model and without requiring customized video data. The proposed techniques can be particularly beneficial for applications where video data of a specific subject or style is not available. For example, the proposed approach can be used to create custom videos from a small set of custom still images or generate videos in a specific custom artistic style without having prior videos in that style.Type: ApplicationFiled: June 17, 2025Publication date: December 18, 2025Inventors: Hila Chefer-Livshen, Shiran Elyahu Zada, Rony Paiss, Ariel Ephrat, Omer Tov, Michael Rubinstein, Tali Dekel, Tomer Michaeli, Inbar Mosseri
-
Publication number: 20250363643Abstract: Techniques for tuning an image editing operator for reducing a distractor in raw image data are presented herein. The image editing operator can access the raw image data and a mask. The mask can indicate a region of interest associated with the raw image data. The image editing operator can process the raw image data and the mask to generate processed image data. Additionally, a trained saliency model can process at least the processed image data within the region of interest to generate a saliency map that provides saliency values. Moreover, a saliency loss function can compare the saliency values provided by the saliency map for the processed image data within the region of interest to one or more target saliency values. Subsequently, the one or more parameter values of the image editing operator can be modified based at least in part on the saliency loss function.Type: ApplicationFiled: August 7, 2025Publication date: November 27, 2025Inventors: Kfir Aberman, David Edward Jacobs, Kai Jochen Kohlhoff, Michael Rubinstein, Yossi Gandelsman, Junfeng He, Inbar Mosseri, Yael Pritch Knaan
-
Publication number: 20250363590Abstract: Despite recent progress, existing frame interpolation methods still struggle with extremely high resolution images and challenging cases such as repetitive textures, thin objects, and fast motion. To address these issues, provided is a cascaded diffusion frame interpolation approach that excels in these scenarios while achieving competitive performance on standard benchmarks.Type: ApplicationFiled: May 22, 2025Publication date: November 27, 2025Inventors: Deqing Sun, Junhwa Hur, Charles Irwin Herrmann, Saurabh Saxena, David James Fleet, Janne Matias Kontkanen, Wei-Sheng Lai, Yichang Shih, Michael Rubinstein
-
Patent number: 12406377Abstract: Techniques for tuning an image editing operator for reducing a distractor in raw image data are presented herein. The image editing operator can access the raw image data and a mask. The mask can indicate a region of interest associated with the raw image data. The image editing operator can process the raw image data and the mask to generate processed image data. Additionally, a trained saliency model can process at least the processed image data within the region of interest to generate a saliency map that provides saliency values. Moreover, a saliency loss function can compare the saliency values provided by the saliency map for the processed image data within the region of interest to one or more target saliency values. Subsequently, the one or more parameter values of the image editing operator can be modified based at least in part on the saliency loss function.Type: GrantFiled: July 1, 2022Date of Patent: September 2, 2025Assignee: GOOGLE LLCInventors: Kfir Aberman, David Edward Jacobs, Kai Jochen Kohlhoff, Michael Rubinstein, Yossi Gandelsman, Junfeng He, Inbar Mosseri, Yael Pritch Knaan
-
Publication number: 20250259323Abstract: Methods, systems, and apparatus, including medium-encoded computer program products, for determining video characteristics from videos captured with limited motion cameras. A first set of pairs of images can be selected from a video taken with a limited motion camera. Using the images, a neural network can first be trained for camera parameters while holding the network weights constant. After performing the first training, a second set of pairs of images can be selected from the video. A second training of the neural network can be performed and can include adjusting the camera parameters and the network weights in the neural network. After performing the second training, the camera parameters and the network weights of the neural network can be persisted.Type: ApplicationFiled: May 5, 2023Publication date: August 14, 2025Inventors: Forrester H. Cole, Michael Rubinstein, Keith Noah Snavely, Zhengqi Li, William Freeman, Zhoutong Zhang
-
Publication number: 20250252645Abstract: A computer-implemented method for decomposing videos into multiple layers that can be re-combined with modified relative timings can include obtaining video data including image frames depicting objects. For each of the frames, the method can include generating object maps descriptive of a respective location of at least one object of the objects within the image frame. For each of the frames, the image frame and the object maps can be input into a machine-learned layer renderer model. For each of the frames, the method can include receiving, as output from the model, a background layer illustrative of a background of the video data and one or more object layers associated with respective object maps. The object layers can include image data illustrative of the object and associated trace effects such that the one or more object layers and the background layer can be re-combined with modified relative timings.Type: ApplicationFiled: February 5, 2025Publication date: August 7, 2025Inventors: Forrester H. Cole, Erika Lu, Tali Dekel, William T. Freeman, David Henry Salesin, Michael Rubinstein
-
Publication number: 20250238905Abstract: Provided is a video generation model for performing text-to-video (T2V) or other video generation techniques. The proposed model reduces the computational costs associated with video generation. In particular, unlike traditional T2V methods, the disclosed technology can generate the full temporal duration of a video clip at once, bypassing the need for extensive computation. As one example, a machine-learned denoising diffusion model can simultaneously process a plurality of noisy inputs that correspond to various timestamps spanning the temporal dimension of a video to simultaneously generate synthetic frames for the video that match the timestamps.Type: ApplicationFiled: January 22, 2025Publication date: July 24, 2025Inventors: Inbar Mosseri, Omer Bar Tal, Hila Chefer-Livshen, Omer Tov, Charles Irwin Herrmann, Rony Paiss, Shiran Elyahu Zada, Ariel Ephrat, Junhwa Hur, Guanghui Liu, Amit Raj, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel
-
Patent number: 12243145Abstract: A computer-implemented method for decomposing videos into multiple layers (212, 213) that can be re-combined with modified relative timings includes obtaining video data including a plurality of image frames (201) depicting one or more objects. For each of the plurality of frames, the computer-implemented method includes generating one or more object maps descriptive of a respective location of at least one object of the one or more objects within the image frame. For each of the plurality of frames, the computer-implemented method includes inputting the image frame and the one or more object maps into a machine-learned layer Tenderer model. (220) For each of the plurality of frames, the computer-implemented method includes receiving, as output from the machine-learned layer Tenderer model, a background layer illustrative of a background of the video data and one or more object layers respectively associated with one of the one or more object maps.Type: GrantFiled: May 22, 2020Date of Patent: March 4, 2025Assignee: GOOGLE LLCInventors: Forrester H. Cole, Erika Lu, Tali Dekel, William T. Freeman, David Henry Salesin, Michael Rubinstein
-
Publication number: 20240428816Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: receiving, by a user device, a first indication of one or more first speakers visible in a current view recorded by a camera of the user device, in response, generating a respective isolated speech signal for each of the one or more first speakers that isolates speech of the first speaker in the current view and sending the isolated speech signals for each of the one or more first speakers to a listening device operatively coupled to the user device, receiving, by the user device, a second indication of one or more second speakers visible in the current view recorded by the camera of the user device, and in response generating and sending a respective isolated speech signal for each of the one or more second speakers to the listening device.Type: ApplicationFiled: August 7, 2024Publication date: December 26, 2024Inventors: Anatoly Efros, Noam Etzion-Rosenberg, Tal Remez, Oran Lang, Inbar Mosseri, Israel Or Weinstein, Benjamin Schlesinger, Michael Rubinstein, Ariel Ephrat, Yukun Zhu, Stella Laurenzo, Amit Pitaru, Yossi Matias
-
Publication number: 20240412458Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for editing images based on decoder-based accumulative score sampling (DASS) losses.Type: ApplicationFiled: June 12, 2024Publication date: December 12, 2024Inventors: Varun Jampani, Chun-Han Yao, Amit Raj, Wei-Chih Hung, Ming-Hsuan Yang, Michael Rubinstein, Yuanzhen Li
-
Publication number: 20240320912Abstract: A fractional training process can be performed training images to an instance of a machine-learned generative image model to obtain a partially trained instance of the model. A fractional optimization process can be performed with the partially trained instance to an instance of a machine-learned three-dimensional (3D) implicit representation model obtain a partially optimized instance of the model. Based on the plurality of training images, pseudo multi-view subject images can be generated with the partially optimized instance of the 3D implicit representation model and a fully trained instance of the generative image model; The partially trained instance of the model can be trained with a set of training data. The partially optimized instance of the machine-learned 3D implicit representation model can be trained with the machine-learned multi-view image model.Type: ApplicationFiled: March 20, 2024Publication date: September 26, 2024Inventors: Yuanzhen Li, Amit Raj, Varun Jampani, Benjamin Joseph Mildenhall, Benjamin Michael Poole, Jonathan Tilton Barron, Kfir Aberman, Michael Niemeyer, Michael Rubinstein, Nataniel Ruiz Gutierrez, Shiran Elyahu Zada, Srinivas Kaza
-
Publication number: 20240296596Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a text-to-image model so that the text-to-image model generates images that each depict a variable instance of an object class when the object class without the unique identifier is provided as a text input, and that generates images that each depict a same subject instance of the object class when the unique identifier is provided as the text input.Type: ApplicationFiled: August 23, 2023Publication date: September 5, 2024Inventors: Kfir Aberman, Nataniel Ruiz Gutierrez, Michael Rubinstein, Yuanzhen Li, Yael Pritch Knaan, Varun Jampani
-
Patent number: 12073844Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: receiving, by a user device, a first indication of one or more first speakers visible in a current view recorded by a camera of the user device, in response, generating a respective isolated speech signal for each of the one or more first speakers that isolates speech of the first speaker in the current view and sending the isolated speech signals for each of the one or more first speakers to a listening device operatively coupled to the user device, receiving, by the user device, a second indication of one or more second speakers visible in the current view recorded by the camera of the user device, and in response generating and sending a respective isolated speech signal for each of the one or more second speakers to the listening device.Type: GrantFiled: October 1, 2020Date of Patent: August 27, 2024Assignee: Google LLCInventors: Anatoly Efros, Noam Etzion-Rosenberg, Tal Remez, Oran Lang, Inbar Mosseri, Israel Or Weinstein, Benjamin Schlesinger, Michael Rubinstein, Ariel Ephrat, Yukun Zhu, Stella Laurenzo, Amit Pitaru, Yossi Matias
-
Publication number: 20240249523Abstract: The present disclosure provides systems and methods for identifying and extracting object-related effects in videos. Given an ordinary video and a rough segmentation mask overtime of one or more subjects of interest, example systems proposed herein can estimate an omnimatte for each subject—an alpha matte and color image that includes the subject along with all its related time-varying scene elements. Example implementations of the proposed models can be trained only on the input video in a self-supervised manner, without any manual labels, and are generic. For example, the models can produce omnimattes automatically for arbitrary objects and a variety of effects.Type: ApplicationFiled: May 11, 2022Publication date: July 25, 2024Inventors: Forrester H. Cole, Andrew Zisserman, Tali Dekel, William Tafel Freeman, Erika Lu, Michael Rubinstein
-
Publication number: 20240043369Abstract: Disclosed herein are cyclobutane-based crosslinking compounds that, when incorporated into acrylate-based polymeric materials, can produce toughened acrylate polymer networks. Also disclosed herein are polymers comprising the crosslinkers, methods of preparing toughened polymer networks using the crosslinkers, and methods of using the polymer networks.Type: ApplicationFiled: June 28, 2023Publication date: February 8, 2024Inventors: Stephen L. Craig, Jeremiah A. Johnson, Shu Wang, Michael Rubinstein, Abraham Herzog-Arbeitman
-
Patent number: 11894014Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.Type: GrantFiled: September 22, 2022Date of Patent: February 6, 2024Assignee: Google LLCInventors: Inbar Mosseri, Michael Rubinstein, Ariel Ephrat, William Freeman, Oran Lang, Kevin William Wilson, Tali Dekel, Avinatan Hassidim