Patents by Inventor Jason Wen Yong Kuen
Jason Wen Yong Kuen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12272127Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generates object masks for digital objects portrayed in digital images utilizing a detection-masking neural network pipeline. In particular, in one or more embodiments, the disclosed systems utilize detection heads of a neural network to detect digital objects portrayed within a digital image. In some cases, each detection head is associated with one or more digital object classes that are not associated with the other detection heads. Further, in some cases, the detection heads implement multi-scale synchronized batch normalization to normalize feature maps across various feature levels. The disclosed systems further utilize a masking head of the neural network to generate one or more object masks for the detected digital objects. In some cases, the disclosed systems utilize post-processing techniques to filter out low-quality masks.Type: GrantFiled: January 31, 2022Date of Patent: April 8, 2025Assignee: Adobe Inc.Inventors: Jason Wen Yong Kuen, Su Chen, Scott Cohen, Zhe Lin, Zijun Wei, Jianming Zhang
-
Publication number: 20250086849Abstract: Embodiments of the present disclosure include obtaining a text prompt describing an element, layout information indicating a target region for the element, and a precision level corresponding to the element. Some embodiments generate a text feature pyramid based on the text prompt, the layout information, and the precision level, wherein the text feature pyramid comprises a plurality of text feature maps at a plurality of scales, respectively. Then, an image is generated based on the text feature pyramid. In some cases, the image includes an object corresponding to the element of the text prompt at the target region. Additionally, a shape of the object corresponds to a shape of the target region based on the precision level.Type: ApplicationFiled: September 8, 2023Publication date: March 13, 2025Inventors: Yu Zeng, Zhe Lin, Jianming Zhang, Qing Liu, Jason Wen Yong Kuen, John Philip Collomosse
-
Patent number: 12223439Abstract: Systems and methods for multi-modal representation learning are described. One or more embodiments provide a visual representation learning system trained using machine learning techniques. For example, some embodiments of the visual representation learning system are trained using cross-modal training tasks including a combination of intra-modal and inter-modal similarity preservation objectives. In some examples, the training tasks are based on contrastive learning techniques.Type: GrantFiled: March 3, 2021Date of Patent: February 11, 2025Assignee: ADOBE INC.Inventors: Xin Yuan, Zhe Lin, Jason Wen Yong Kuen, Jianming Zhang, Yilin Wang, Ajinkya Kale, Baldo Faieta
-
Patent number: 12198224Abstract: Systems and methods for image generation are described. Embodiments of the present disclosure receive a text phrase that describes a target image to be generated; generate text features based on the text phrase; retrieve a search image based on the text phrase; and generate the target image using an image generation network based on the text features and the search image.Type: GrantFiled: February 15, 2022Date of Patent: January 14, 2025Assignee: ADOBE INC.Inventors: Xin Yuan, Zhe Lin, Jason Wen Yong Kuen, Jianming Zhang, John Philip Collomosse
-
Publication number: 20240249413Abstract: In implementations of systems for performing multiple segmentation tasks, a computing device implements a segment system to receive input data describing a digital image depicting an object. The segment system computes per-pixel embeddings for the digital image using a pixel decoder of a machine learning model. Output embeddings are generated using a transformer decoder of the machine learning model based on the per-pixel embeddings for the digital image, input embeddings for a first segmentation task and input embeddings for a second segmentation task. The segment system outputs a first digital image and a second digital image. The first digital image depicts the object segmented based on the first segmentation task and the second digital image depicts the object segmented based on the second segmentation task.Type: ApplicationFiled: January 23, 2023Publication date: July 25, 2024Applicant: Adobe Inc.Inventors: Jason Wen Yong Kuen, Zhe Lin, Sukjun Hwang, Jianming Zhang, Brian Lynn Price
-
Publication number: 20240169623Abstract: Systems and methods for multi-modal image generation are provided. One or more aspects of the systems and methods includes obtaining a text prompt and layout information indicating a target location for an element of the text prompt within an image to be generated and computing a text feature map including a plurality of values corresponding to the element of the text prompt at pixel locations corresponding to the target location. Then the image is generated based on the text feature map using a diffusion model. The generated image includes the element of the text prompt at the target location.Type: ApplicationFiled: November 22, 2022Publication date: May 23, 2024Inventors: Yu Zeng, Zhe Lin, Jianming Zhang, Qing Liu, Jason Wen Yong Kuen, John Philip Collomosse
-
Publication number: 20240104951Abstract: In various examples, a table recognition model receives an image of a table and generates, using a first encoder of the table recognition machine learning model, an image feature vector including features extracted from the image of the table; generates, using a first decoder of the table recognition machine learning model and the image feature vector, a set of coordinates within the image representing rows and columns associated with the table, and generates, using a second decoder of the table recognition machine learning model and the image feature vector, a set of bounding boxes and semantic features associated with cells the table, then determines, using a third decoder of the table recognition machine learning model, a table structure associated with the table using the image feature vector, the set of coordinates, the set of bounding boxes, and the semantic features.Type: ApplicationFiled: September 19, 2022Publication date: March 28, 2024Inventors: Jiuxiang Gu, Vlad Morariu, Tong Sun, Jason wen yong Kuen, Ani Nenkova
-
Patent number: 11941884Abstract: Systems and methods for image processing are described. Embodiments of the present disclosure receive an image having a plurality of object instances; encode the image to obtain image features; decode the image features to obtain object features; generate object detection information based on the object features using an object detection branch, wherein the object detection branch is trained based on a first training set using a detection loss; generate semantic segmentation information based on the object features using a semantic segmentation branch, wherein the semantic segmentation branch is trained based on a second training set different from the first training set using a semantic segmentation loss; and combine the object detection information and the semantic segmentation information to obtain panoptic segmentation information that indicates which pixels of the image correspond to each of the plurality of object instances.Type: GrantFiled: November 12, 2021Date of Patent: March 26, 2024Assignee: ADOBE INC.Inventors: Jason Wen Yong Kuen, Bo Sun, Zhe Lin, Simon Su Chen
-
Patent number: 11868889Abstract: In implementations of object detection in images, object detectors are trained using heterogeneous training datasets. A first training dataset is used to train an image tagging network to determine an attention map of an input image for a target concept. A second training dataset is used to train a conditional detection network that accepts as conditional inputs the attention map and a word embedding of the target concept. Despite the conditional detection network being trained with a training dataset having a small number of seen classes (e.g., classes in a training dataset), it generalizes to novel, unseen classes by concept conditioning, since the target concept propagates through the conditional detection network via the conditional inputs, thus influencing classification and region proposal. Hence, classes of objects that can be detected are expanded, without the need to scale training databases to include additional classes.Type: GrantFiled: January 31, 2022Date of Patent: January 9, 2024Assignee: Adobe Inc.Inventors: Zhe Lin, Xiaohui Shen, Mingyang Ling, Jianming Zhang, Jason Wen Yong Kuen
-
Publication number: 20230401827Abstract: Systems and methods for image segmentation are described. Embodiments of the present disclosure receive a training image and a caption for the training image, wherein the caption includes text describing an object in the training image; generate a pseudo mask for the object using a teacher network based on the text describing the object; generate a mask for the object using a student network; compute noise information for the training image using a noise estimation network; and update parameters of the student network based on the mask, the pseudo mask, and the noise information.Type: ApplicationFiled: June 9, 2022Publication date: December 14, 2023Inventors: Jason Wen Yong Kuen, Dat Ba Huynh, Zhe Lin, Jiuxiang Gu
-
Publication number: 20230368003Abstract: The technology described herein is directed to an adaptive sparse attention pattern that is learned during fine-tuning and deployed in a machine-learning model. In aspects, a row or a column in an attention matrix with an importance score for a task that is above a threshold importance score is identified. The important row or the column is included in an adaptive attention pattern used with a machine-learning model having a self-attention operation. In response to an input, a task-specific inference is generated for the input using the machine-learning model with the adaptive attention pattern.Type: ApplicationFiled: May 10, 2022Publication date: November 16, 2023Inventors: Jiuxiang Gu, Zihan Wang, Jason Wen Yong Kuen, Handong Zhao, Vlad Ion Morariu, Ruiyi Zhang, Ani Nenkova Nenkova, Tong Sun
-
Publication number: 20230325992Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media that utilizes artificial intelligence to learn to recommend foreground object images for use in generating composite images based on geometry and/or lighting features. For instance, in one or more embodiments, the disclosed systems transform a foreground object image corresponding to a background image using at least one of a geometry transformation or a lighting transformation. The disclosed systems further generating predicted embeddings for the background image, the foreground object image, and the transformed foreground object image within a geometry-lighting-sensitive embedding space utilizing a geometry-lighting-aware neural network. Using a loss determined from the predicted embeddings, the disclosed systems update parameters of the geometry-lighting-aware neural network. The disclosed systems further provide a variety of efficient user interfaces for generating composite digital images.Type: ApplicationFiled: April 11, 2022Publication date: October 12, 2023Inventors: Zhe Lin, Sijie Zhu, Jason Wen Yong Kuen, Scott Cohen, Zhifei Zhang
-
Publication number: 20230325991Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media that utilizes artificial intelligence to learn to recommend foreground object images for use in generating composite images based on geometry and/or lighting features. For instance, in one or more embodiments, the disclosed systems transform a foreground object image corresponding to a background image using at least one of a geometry transformation or a lighting transformation. The disclosed systems further generating predicted embeddings for the background image, the foreground object image, and the transformed foreground object image within a geometry-lighting-sensitive embedding space utilizing a geometry-lighting-aware neural network. Using a loss determined from the predicted embeddings, the disclosed systems update parameters of the geometry-lighting-aware neural network. The disclosed systems further provide a variety of efficient user interfaces for generating composite digital images.Type: ApplicationFiled: April 11, 2022Publication date: October 12, 2023Inventors: Zhe Lin, Sijie Zhu, Jason Wen Yong Kuen, Scott Cohen, Zhifei Zhang
-
Publication number: 20230260164Abstract: Systems and methods for image generation are described. Embodiments of the present disclosure receive a text phrase that describes a target image to be generated; generate text features based on the text phrase; retrieve a search image based on the text phrase; and generate the target image using an image generation network based on the text features and the search image.Type: ApplicationFiled: February 15, 2022Publication date: August 17, 2023Inventors: Xin Yuan, Zhe Lin, Jason Wen Yong Kuen, Jianming Zhang, John Philip Collomosse
-
Publication number: 20230252774Abstract: Systems and methods for image processing are described. Embodiments of the present disclosure receive a training image and a caption for the training image, wherein the caption includes text describing an object in the training image; generate a pseudo mask for the object using a teacher network based on the text describing the object; generate a mask for the object using a student network; and update parameters of the student network based on the mask and the pseudo mask.Type: ApplicationFiled: February 9, 2022Publication date: August 10, 2023Inventors: Jason Wen Yong Kuen, Dat Ba Huynh, Zhe Lin, Jiuxiang Gu
-
Publication number: 20230154221Abstract: The technology described includes methods for pretraining a document encoder model based on multimodal self cross-attention. One method includes receiving image data that encodes a set of pretraining documents. A set of sentences is extracted from the image data. A bounding box for each sentence is generated. For each sentence, a set of predicted features is generated by using an encoder machine-learning model. The encoder model performs cross-attention between a set of masked-textual features for the sentence and a set of masked-visual features for the sentence. The set of masked-textual features is based on a masking function and the sentence. The set of masked-visual features is based on the masking function and the corresponding bounding box. A document-encoder model is pretrained based on the set of predicted features for each sentence and pretraining tasks. The pretraining tasks includes masked sentence modeling, visual contrastive learning, or visual-language alignment.Type: ApplicationFiled: November 16, 2021Publication date: May 18, 2023Inventors: Jiuxiang Gu, Ani Nenkova Nenkova, Nikolaos Barmpalios, Vlad Ion Morariu, Tong Sun, Rajiv Bhawanji Jain, Jason wen yong Kuen, Handong Zhao
-
Publication number: 20230154185Abstract: Systems and methods for image processing are described. Embodiments of the present disclosure receive an image having a plurality of object instances; encode the image to obtain image features; decode the image features to obtain object features; generate object detection information based on the object features using an object detection branch, wherein the object detection branch is trained based on a first training set using a detection loss; generate semantic segmentation information based on the object features using a semantic segmentation branch, wherein the semantic segmentation branch is trained based on a second training set different from the first training set using a semantic segmentation loss; and combine the object detection information and the semantic segmentation information to obtain panoptic segmentation information that indicates which pixels of the image correspond to each of the plurality of object instances.Type: ApplicationFiled: November 12, 2021Publication date: May 18, 2023Inventors: Jason Wen Yong Kuen, Bo Sun, Zhe Lin, Simon Su Chen
-
Publication number: 20230128792Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generates object masks for digital objects portrayed in digital images utilizing a detection-masking neural network pipeline. In particular, in one or more embodiments, the disclosed systems utilize detection heads of a neural network to detect digital objects portrayed within a digital image. In some cases, each detection head is associated with one or more digital object classes that are not associated with the other detection heads. Further, in some cases, the detection heads implement multi-scale synchronized batch normalization to normalize feature maps across various feature levels. The disclosed systems further utilize a masking head of the neural network to generate one or more object masks for the detected digital objects. In some cases, the disclosed systems utilize post-processing techniques to filter out low-quality masks.Type: ApplicationFiled: January 31, 2022Publication date: April 27, 2023Inventors: Jason Wen Yong Kuen, Su Chen, Scott Cohen, Zhe Lin, Zijun Wei, Jianming Zhang
-
Patent number: 11610393Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and efficiently learning parameters of a distilled neural network from parameters of a source neural network utilizing multiple augmentation strategies. For example, the disclosed systems can generate lightly augmented digital images and heavily augmented digital images. The disclosed systems can further learn parameters for a source neural network from the lightly augmented digital images. Moreover, the disclosed systems can learn parameters for a distilled neural network from the parameters learned for the source neural network. For example, the disclosed systems can compare classifications of heavily augmented digital images generated by the source neural network and the distilled neural network to transfer learned parameters from the source neural network to the distilled neural network via a knowledge distillation loss function.Type: GrantFiled: October 2, 2020Date of Patent: March 21, 2023Assignee: Adobe Inc.Inventors: Jason Wen Yong Kuen, Zhe Lin, Jiuxiang Gu
-
Publication number: 20220284321Abstract: Systems and methods for multi-modal representation learning are described. One or more embodiments provide a visual representation learning system trained using machine learning techniques. For example, some embodiments of the visual representation learning system are trained using cross-modal training tasks including a combination of intra-modal and inter-modal similarity preservation objectives. In some examples, the training tasks are based on contrastive learning techniques.Type: ApplicationFiled: March 3, 2021Publication date: September 8, 2022Inventors: Xin Yuan, Zhe Lin, Jason Wen Yong Kuen, Jianming Zhang, Yilin Wang, Ajinkya Kale, Baldo Faieta