Patents by Inventor Jason Wen Yong Kuen

Jason Wen Yong Kuen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Recommending objects for image composition using a geometry-and-lighting aware neural network

Patent number: 12373920

Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media that utilizes artificial intelligence to learn to recommend foreground object images for use in generating composite images based on geometry and/or lighting features. For instance, in one or more embodiments, the disclosed systems transform a foreground object image corresponding to a background image using at least one of a geometry transformation or a lighting transformation. The disclosed systems further generating predicted embeddings for the background image, the foreground object image, and the transformed foreground object image within a geometry-lighting-sensitive embedding space utilizing a geometry-lighting-aware neural network. Using a loss determined from the predicted embeddings, the disclosed systems update parameters of the geometry-lighting-aware neural network. The disclosed systems further provide a variety of efficient user interfaces for generating composite digital images.

Type: Grant

Filed: April 11, 2022

Date of Patent: July 29, 2025

Assignee: Adobe Inc.

Inventors: Zhe Lin, Sijie Zhu, Jason Wen Yong Kuen, Scott Cohen, Zhifei Zhang
GENERATING AN ALPHA IMAGE BASED ON A TEXT PROMPT

Publication number: 20250238982

Abstract: A method, apparatus, non-transitory computer readable medium, and system for image generation includes obtaining a text prompt describing an object and a keyable background and generating an image including the object and the keyable background based on the text prompt. Some embodiments generate an alpha image by replacing the keyable background with an alpha channel.

Type: Application

Filed: January 18, 2024

Publication date: July 24, 2025

Inventors: Ryan Burgert, Brian Lynn Price, Yijun Li, Jason Wen Yong Kuen
DETECTING DIGITAL OBJECTS AND GENERATING OBJECT MASKS ON DEVICE

Publication number: 20250232575

Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generates object masks for digital objects portrayed in digital images utilizing a detection-masking neural network pipeline. In particular, in one or more embodiments, the disclosed systems utilize detection heads of a neural network to detect digital objects portrayed within a digital image. In some cases, each detection head is associated with one or more digital object classes that are not associated with the other detection heads. Further, in some cases, the detection heads implement multi-scale synchronized batch normalization to normalize feature maps across various feature levels. The disclosed systems further utilize a masking head of the neural network to generate one or more object masks for the detected digital objects. In some cases, the disclosed systems utilize post-processing techniques to filter out low-quality masks.

Type: Application

Filed: April 7, 2025

Publication date: July 17, 2025

Inventors: Jason Wen Yong Kuen, Su Chen, Scott Cohen, Zhe Lin, Zijun Wei, Jianming Zhang
Unified pretraining framework for document understanding

Patent number: 12333845

Abstract: The technology described includes methods for pretraining a document encoder model based on multimodal self cross-attention. One method includes receiving image data that encodes a set of pretraining documents. A set of sentences is extracted from the image data. A bounding box for each sentence is generated. For each sentence, a set of predicted features is generated by using an encoder machine-learning model. The encoder model performs cross-attention between a set of masked-textual features for the sentence and a set of masked-visual features for the sentence. The set of masked-textual features is based on a masking function and the sentence. The set of masked-visual features is based on the masking function and the corresponding bounding box. A document-encoder model is pretrained based on the set of predicted features for each sentence and pretraining tasks. The pretraining tasks includes masked sentence modeling, visual contrastive learning, or visual-language alignment.

Type: Grant

Filed: November 16, 2021

Date of Patent: June 17, 2025

Assignee: Adobe Inc.

Inventors: Jiuxiang Gu, Ani Nenkova Nenkova, Nikolaos Barmpalios, Vlad Ion Morariu, Tong Sun, Rajiv Bhawanji Jain, Jason wen yong Kuen, Handong Zhao
Detecting digital objects and generating object masks on device

Patent number: 12272127

Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generates object masks for digital objects portrayed in digital images utilizing a detection-masking neural network pipeline. In particular, in one or more embodiments, the disclosed systems utilize detection heads of a neural network to detect digital objects portrayed within a digital image. In some cases, each detection head is associated with one or more digital object classes that are not associated with the other detection heads. Further, in some cases, the detection heads implement multi-scale synchronized batch normalization to normalize feature maps across various feature levels. The disclosed systems further utilize a masking head of the neural network to generate one or more object masks for the detected digital objects. In some cases, the disclosed systems utilize post-processing techniques to filter out low-quality masks.

Type: Grant

Filed: January 31, 2022

Date of Patent: April 8, 2025

Assignee: Adobe Inc.

Inventors: Jason Wen Yong Kuen, Su Chen, Scott Cohen, Zhe Lin, Zijun Wei, Jianming Zhang
SEMANTIC IMAGE SYNTHESIS

Publication number: 20250086849

Abstract: Embodiments of the present disclosure include obtaining a text prompt describing an element, layout information indicating a target region for the element, and a precision level corresponding to the element. Some embodiments generate a text feature pyramid based on the text prompt, the layout information, and the precision level, wherein the text feature pyramid comprises a plurality of text feature maps at a plurality of scales, respectively. Then, an image is generated based on the text feature pyramid. In some cases, the image includes an object corresponding to the element of the text prompt at the target region. Additionally, a shape of the object corresponds to a shape of the target region based on the precision level.

Type: Application

Filed: September 8, 2023

Publication date: March 13, 2025

Inventors: Yu Zeng, Zhe Lin, Jianming Zhang, Qing Liu, Jason Wen Yong Kuen, John Philip Collomosse
Visual-semantic representation learning via multi-modal contrastive training

Patent number: 12223439

Abstract: Systems and methods for multi-modal representation learning are described. One or more embodiments provide a visual representation learning system trained using machine learning techniques. For example, some embodiments of the visual representation learning system are trained using cross-modal training tasks including a combination of intra-modal and inter-modal similarity preservation objectives. In some examples, the training tasks are based on contrastive learning techniques.

Type: Grant

Filed: March 3, 2021

Date of Patent: February 11, 2025

Assignee: ADOBE INC.

Inventors: Xin Yuan, Zhe Lin, Jason Wen Yong Kuen, Jianming Zhang, Yilin Wang, Ajinkya Kale, Baldo Faieta
Retrieval-based text-to-image generation with visual-semantic contrastive representation

Patent number: 12198224

Abstract: Systems and methods for image generation are described. Embodiments of the present disclosure receive a text phrase that describes a target image to be generated; generate text features based on the text phrase; retrieve a search image based on the text phrase; and generate the target image using an image generation network based on the text features and the search image.

Type: Grant

Filed: February 15, 2022

Date of Patent: January 14, 2025

Assignee: ADOBE INC.

Inventors: Xin Yuan, Zhe Lin, Jason Wen Yong Kuen, Jianming Zhang, John Philip Collomosse
PERFORMING MULTIPLE SEGMENTATION TASKS

Publication number: 20240249413

Abstract: In implementations of systems for performing multiple segmentation tasks, a computing device implements a segment system to receive input data describing a digital image depicting an object. The segment system computes per-pixel embeddings for the digital image using a pixel decoder of a machine learning model. Output embeddings are generated using a transformer decoder of the machine learning model based on the per-pixel embeddings for the digital image, input embeddings for a first segmentation task and input embeddings for a second segmentation task. The segment system outputs a first digital image and a second digital image. The first digital image depicts the object segmented based on the first segmentation task and the second digital image depicts the object segmented based on the second segmentation task.

Type: Application

Filed: January 23, 2023

Publication date: July 25, 2024

Applicant: Adobe Inc.

Inventors: Jason Wen Yong Kuen, Zhe Lin, Sukjun Hwang, Jianming Zhang, Brian Lynn Price
MULTI-MODAL IMAGE GENERATION

Publication number: 20240169623

Abstract: Systems and methods for multi-modal image generation are provided. One or more aspects of the systems and methods includes obtaining a text prompt and layout information indicating a target location for an element of the text prompt within an image to be generated and computing a text feature map including a plurality of values corresponding to the element of the text prompt at pixel locations corresponding to the target location. Then the image is generated based on the text feature map using a diffusion model. The generated image includes the element of the text prompt at the target location.

Type: Application

Filed: November 22, 2022

Publication date: May 23, 2024

Inventors: Yu Zeng, Zhe Lin, Jianming Zhang, Qing Liu, Jason Wen Yong Kuen, John Philip Collomosse
IMAGE AND SEMANTIC BASED TABLE RECOGNITION

Publication number: 20240104951

Abstract: In various examples, a table recognition model receives an image of a table and generates, using a first encoder of the table recognition machine learning model, an image feature vector including features extracted from the image of the table; generates, using a first decoder of the table recognition machine learning model and the image feature vector, a set of coordinates within the image representing rows and columns associated with the table, and generates, using a second decoder of the table recognition machine learning model and the image feature vector, a set of bounding boxes and semantic features associated with cells the table, then determines, using a third decoder of the table recognition machine learning model, a table structure associated with the table using the image feature vector, the set of coordinates, the set of bounding boxes, and the semantic features.

Type: Application

Filed: September 19, 2022

Publication date: March 28, 2024

Inventors: Jiuxiang Gu, Vlad Morariu, Tong Sun, Jason wen yong Kuen, Ani Nenkova
Multi-source panoptic feature pyramid network

Patent number: 11941884

Abstract: Systems and methods for image processing are described. Embodiments of the present disclosure receive an image having a plurality of object instances; encode the image to obtain image features; decode the image features to obtain object features; generate object detection information based on the object features using an object detection branch, wherein the object detection branch is trained based on a first training set using a detection loss; generate semantic segmentation information based on the object features using a semantic segmentation branch, wherein the semantic segmentation branch is trained based on a second training set different from the first training set using a semantic segmentation loss; and combine the object detection information and the semantic segmentation information to obtain panoptic segmentation information that indicates which pixels of the image correspond to each of the plurality of object instances.

Type: Grant

Filed: November 12, 2021

Date of Patent: March 26, 2024

Assignee: ADOBE INC.

Inventors: Jason Wen Yong Kuen, Bo Sun, Zhe Lin, Simon Su Chen
Object detection in images

Patent number: 11868889

Abstract: In implementations of object detection in images, object detectors are trained using heterogeneous training datasets. A first training dataset is used to train an image tagging network to determine an attention map of an input image for a target concept. A second training dataset is used to train a conditional detection network that accepts as conditional inputs the attention map and a word embedding of the target concept. Despite the conditional detection network being trained with a training dataset having a small number of seen classes (e.g., classes in a training dataset), it generalizes to novel, unseen classes by concept conditioning, since the target concept propagates through the conditional detection network via the conditional inputs, thus influencing classification and region proposal. Hence, classes of objects that can be detected are expanded, without the need to scale training databases to include additional classes.

Type: Grant

Filed: January 31, 2022

Date of Patent: January 9, 2024

Assignee: Adobe Inc.

Inventors: Zhe Lin, Xiaohui Shen, Mingyang Ling, Jianming Zhang, Jason Wen Yong Kuen
OPEN VOCABULARY INSTANCE SEGMENTATION WITH NOISE ESTIMATION AND ROBUST STUDENT

Publication number: 20230401827

Abstract: Systems and methods for image segmentation are described. Embodiments of the present disclosure receive a training image and a caption for the training image, wherein the caption includes text describing an object in the training image; generate a pseudo mask for the object using a teacher network based on the text describing the object; generate a mask for the object using a student network; compute noise information for the training image using a noise estimation network; and update parameters of the student network based on the mask, the pseudo mask, and the noise information.

Type: Application

Filed: June 9, 2022

Publication date: December 14, 2023

Inventors: Jason Wen Yong Kuen, Dat Ba Huynh, Zhe Lin, Jiuxiang Gu
ADAPTIVE SPARSE ATTENTION PATTERN

Publication number: 20230368003

Abstract: The technology described herein is directed to an adaptive sparse attention pattern that is learned during fine-tuning and deployed in a machine-learning model. In aspects, a row or a column in an attention matrix with an importance score for a task that is above a threshold importance score is identified. The important row or the column is included in an adaptive attention pattern used with a machine-learning model having a self-attention operation. In response to an input, a task-specific inference is generated for the input using the machine-learning model with the adaptive attention pattern.

Type: Application

Filed: May 10, 2022

Publication date: November 16, 2023

Inventors: Jiuxiang Gu, Zihan Wang, Jason Wen Yong Kuen, Handong Zhao, Vlad Ion Morariu, Ruiyi Zhang, Ani Nenkova Nenkova, Tong Sun
RECOMMENDING OBJECTS FOR IMAGE COMPOSITION USING A GEOMETRY-AND-LIGHTING AWARE NEURAL NETWORK

Publication number: 20230325991

Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media that utilizes artificial intelligence to learn to recommend foreground object images for use in generating composite images based on geometry and/or lighting features. For instance, in one or more embodiments, the disclosed systems transform a foreground object image corresponding to a background image using at least one of a geometry transformation or a lighting transformation. The disclosed systems further generating predicted embeddings for the background image, the foreground object image, and the transformed foreground object image within a geometry-lighting-sensitive embedding space utilizing a geometry-lighting-aware neural network. Using a loss determined from the predicted embeddings, the disclosed systems update parameters of the geometry-lighting-aware neural network. The disclosed systems further provide a variety of efficient user interfaces for generating composite digital images.

Type: Application

Filed: April 11, 2022

Publication date: October 12, 2023

Inventors: Zhe Lin, Sijie Zhu, Jason Wen Yong Kuen, Scott Cohen, Zhifei Zhang
RECOMMENDING OBJECTS FOR IMAGE COMPOSITION USING GEOMETRY-AND-LIGHTING AWARE SEARCH AND EFFICIENT USER INTERFACE WORKFLOWS

Publication number: 20230325992

Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media that utilizes artificial intelligence to learn to recommend foreground object images for use in generating composite images based on geometry and/or lighting features. For instance, in one or more embodiments, the disclosed systems transform a foreground object image corresponding to a background image using at least one of a geometry transformation or a lighting transformation. The disclosed systems further generating predicted embeddings for the background image, the foreground object image, and the transformed foreground object image within a geometry-lighting-sensitive embedding space utilizing a geometry-lighting-aware neural network. Using a loss determined from the predicted embeddings, the disclosed systems update parameters of the geometry-lighting-aware neural network. The disclosed systems further provide a variety of efficient user interfaces for generating composite digital images.

Type: Application

Filed: April 11, 2022

Publication date: October 12, 2023

Inventors: Zhe Lin, Sijie Zhu, Jason Wen Yong Kuen, Scott Cohen, Zhifei Zhang
RETRIEVAL-BASED TEXT-TO-IMAGE GENERATION WITH VISUAL-SEMANTIC CONTRASTIVE REPRESENTATION

Publication number: 20230260164

Abstract: Systems and methods for image generation are described. Embodiments of the present disclosure receive a text phrase that describes a target image to be generated; generate text features based on the text phrase; retrieve a search image based on the text phrase; and generate the target image using an image generation network based on the text features and the search image.

Type: Application

Filed: February 15, 2022

Publication date: August 17, 2023

Inventors: Xin Yuan, Zhe Lin, Jason Wen Yong Kuen, Jianming Zhang, John Philip Collomosse
OPEN VOCABULARY INSTANCE SEGMENTATION

Publication number: 20230252774

Abstract: Systems and methods for image processing are described. Embodiments of the present disclosure receive a training image and a caption for the training image, wherein the caption includes text describing an object in the training image; generate a pseudo mask for the object using a teacher network based on the text describing the object; generate a mask for the object using a student network; and update parameters of the student network based on the mask and the pseudo mask.

Type: Application

Filed: February 9, 2022

Publication date: August 10, 2023

Inventors: Jason Wen Yong Kuen, Dat Ba Huynh, Zhe Lin, Jiuxiang Gu
MULTI-SOURCE PANOPTIC FEATURE PYRAMID NETWORK

Publication number: 20230154185

Abstract: Systems and methods for image processing are described. Embodiments of the present disclosure receive an image having a plurality of object instances; encode the image to obtain image features; decode the image features to obtain object features; generate object detection information based on the object features using an object detection branch, wherein the object detection branch is trained based on a first training set using a detection loss; generate semantic segmentation information based on the object features using a semantic segmentation branch, wherein the semantic segmentation branch is trained based on a second training set different from the first training set using a semantic segmentation loss; and combine the object detection information and the semantic segmentation information to obtain panoptic segmentation information that indicates which pixels of the image correspond to each of the plurality of object instances.

Type: Application

Filed: November 12, 2021

Publication date: May 18, 2023

Inventors: Jason Wen Yong Kuen, Bo Sun, Zhe Lin, Simon Su Chen

1 2 next