Patents by Inventor Jason Wen Yong Kuen

Jason Wen Yong Kuen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

UNIFIED PRETRAINING FRAMEWORK FOR DOCUMENT UNDERSTANDING

Publication number: 20230154221

Abstract: The technology described includes methods for pretraining a document encoder model based on multimodal self cross-attention. One method includes receiving image data that encodes a set of pretraining documents. A set of sentences is extracted from the image data. A bounding box for each sentence is generated. For each sentence, a set of predicted features is generated by using an encoder machine-learning model. The encoder model performs cross-attention between a set of masked-textual features for the sentence and a set of masked-visual features for the sentence. The set of masked-textual features is based on a masking function and the sentence. The set of masked-visual features is based on the masking function and the corresponding bounding box. A document-encoder model is pretrained based on the set of predicted features for each sentence and pretraining tasks. The pretraining tasks includes masked sentence modeling, visual contrastive learning, or visual-language alignment.

Type: Application

Filed: November 16, 2021

Publication date: May 18, 2023

Inventors: Jiuxiang Gu, Ani Nenkova Nenkova, Nikolaos Barmpalios, Vlad Ion Morariu, Tong Sun, Rajiv Bhawanji Jain, Jason wen yong Kuen, Handong Zhao
DETECTING DIGITAL OBJECTS AND GENERATING OBJECT MASKS ON DEVICE

Publication number: 20230128792

Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generates object masks for digital objects portrayed in digital images utilizing a detection-masking neural network pipeline. In particular, in one or more embodiments, the disclosed systems utilize detection heads of a neural network to detect digital objects portrayed within a digital image. In some cases, each detection head is associated with one or more digital object classes that are not associated with the other detection heads. Further, in some cases, the detection heads implement multi-scale synchronized batch normalization to normalize feature maps across various feature levels. The disclosed systems further utilize a masking head of the neural network to generate one or more object masks for the detected digital objects. In some cases, the disclosed systems utilize post-processing techniques to filter out low-quality masks.

Type: Application

Filed: January 31, 2022

Publication date: April 27, 2023

Inventors: Jason Wen Yong Kuen, Su Chen, Scott Cohen, Zhe Lin, Zijun Wei, Jianming Zhang
Knowledge distillation for neural networks using multiple augmentation strategies

Patent number: 11610393

Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and efficiently learning parameters of a distilled neural network from parameters of a source neural network utilizing multiple augmentation strategies. For example, the disclosed systems can generate lightly augmented digital images and heavily augmented digital images. The disclosed systems can further learn parameters for a source neural network from the lightly augmented digital images. Moreover, the disclosed systems can learn parameters for a distilled neural network from the parameters learned for the source neural network. For example, the disclosed systems can compare classifications of heavily augmented digital images generated by the source neural network and the distilled neural network to transfer learned parameters from the source neural network to the distilled neural network via a knowledge distillation loss function.

Type: Grant

Filed: October 2, 2020

Date of Patent: March 21, 2023

Assignee: Adobe Inc.

Inventors: Jason Wen Yong Kuen, Zhe Lin, Jiuxiang Gu
VISUAL-SEMANTIC REPRESENTATION LEARNING VIA MULTI-MODAL CONTRASTIVE TRAINING

Publication number: 20220284321

Abstract: Systems and methods for multi-modal representation learning are described. One or more embodiments provide a visual representation learning system trained using machine learning techniques. For example, some embodiments of the visual representation learning system are trained using cross-modal training tasks including a combination of intra-modal and inter-modal similarity preservation objectives. In some examples, the training tasks are based on contrastive learning techniques.

Type: Application

Filed: March 3, 2021

Publication date: September 8, 2022

Inventors: Xin Yuan, Zhe Lin, Jason Wen Yong Kuen, Jianming Zhang, Yilin Wang, Ajinkya Kale, Baldo Faieta
Object Detection In Images

Publication number: 20220157054

Abstract: In implementations of object detection in images, object detectors are trained using heterogeneous training datasets. A first training dataset is used to train an image tagging network to determine an attention map of an input image for a target concept. A second training dataset is used to train a conditional detection network that accepts as conditional inputs the attention map and a word embedding of the target concept. Despite the conditional detection network being trained with a training dataset having a small number of seen classes (e.g., classes in a training dataset), it generalizes to novel, unseen classes by concept conditioning, since the target concept propagates through the conditional detection network via the conditional inputs, thus influencing classification and region proposal. Hence, classes of objects that can be detected are expanded, without the need to scale training databases to include additional classes.

Type: Application

Filed: January 31, 2022

Publication date: May 19, 2022

Applicant: Adobe Inc.

Inventors: Zhe Lin, Xiaohui Shen, Mingyang Ling, Jianming Zhang, Jason Wen Yong Kuen
SELF-SUPERVISED VISUAL-RELATIONSHIP PROBING

Publication number: 20220147838

Abstract: Methods and systems disclosed herein relate generally to systems and methods for generating visual relationship graphs that identify relationships between objects depicted in an image. A vision-language application uses transformer encoders to generate a graph structure, in which the graph structure represents a dependency between a first region and a second region of an image. The dependency indicates that a contextual representation of the first region was derived, at least in part, by processing the second region. The contextual representation identifies a predicted identity of an image object depicted in the first region. The predicted identity is determined at least in part by identifying a relationship between the first region and other data objects associated with various modalities.

Type: Application

Filed: November 9, 2020

Publication date: May 12, 2022

Inventors: Jiuxiang Gu, Vlad Ion Morariu, Tong Sun, Jason wen yong Kuen, Handong Zhao
KNOWLEDGE DISTILLATION FOR NEURAL NETWORKS USING MULTIPLE AUGMENTATION STRATEGIES

Publication number: 20220108131

Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and efficiently learning parameters of a distilled neural network from parameters of a source neural network utilizing multiple augmentation strategies. For example, the disclosed systems can generate lightly augmented digital images and heavily augmented digital images. The disclosed systems can further learn parameters for a source neural network from the lightly augmented digital images. Moreover, the disclosed systems can learn parameters for a distilled neural network from the parameters learned for the source neural network. For example, the disclosed systems can compare classifications of heavily augmented digital images generated by the source neural network and the distilled neural network to transfer learned parameters from the source neural network to the distilled neural network via a knowledge distillation loss function.

Type: Application

Filed: October 2, 2020

Publication date: April 7, 2022

Inventors: Jason Wen Yong Kuen, Zhe Lin, Jiuxiang Gu
Object detection in images

Patent number: 11256918

Abstract: In implementations of object detection in images, object detectors are trained using heterogeneous training datasets. A first training dataset is used to train an image tagging network to determine an attention map of an input image for a target concept. A second training dataset is used to train a conditional detection network that accepts as conditional inputs the attention map and a word embedding of the target concept. Despite the conditional detection network being trained with a training dataset having a small number of seen classes (e.g., classes in a training dataset), it generalizes to novel, unseen classes by concept conditioning, since the target concept propagates through the conditional detection network via the conditional inputs, thus influencing classification and region proposal. Hence, classes of objects that can be detected are expanded, without the need to scale training databases to include additional classes.

Type: Grant

Filed: May 14, 2020

Date of Patent: February 22, 2022

Assignee: Adobe Inc.

Inventors: Zhe Lin, Xiaohui Shen, Mingyang Ling, Jianming Zhang, Jason Wen Yong Kuen
Object Detection In Images

Publication number: 20200272822

Abstract: In implementations of object detection in images, object detectors are trained using heterogeneous training datasets. A first training dataset is used to train an image tagging network to determine an attention map of an input image for a target concept. A second training dataset is used to train a conditional detection network that accepts as conditional inputs the attention map and a word embedding of the target concept. Despite the conditional detection network being trained with a training dataset having a small number of seen classes (e.g., classes in a training dataset), it generalizes to novel, unseen classes by concept conditioning, since the target concept propagates through the conditional detection network via the conditional inputs, thus influencing classification and region proposal. Hence, classes of objects that can be detected are expanded, without the need to scale training databases to include additional classes.

Type: Application

Filed: May 14, 2020

Publication date: August 27, 2020

Applicant: Adobe Inc.

Inventors: Zhe Lin, Xiaohui Shen, Mingyang Ling, Jianming Zhang, Jason Wen Yong Kuen
Object detection in images

Patent number: 10755099

Abstract: In implementations of object detection in images, object detectors are trained using heterogeneous training datasets. A first training dataset is used to train an image tagging network to determine an attention map of an input image for a target concept. A second training dataset is used to train a conditional detection network that accepts as conditional inputs the attention map and a word embedding of the target concept. Despite the conditional detection network being trained with a training dataset having a small number of seen classes (e.g., classes in a training dataset), it generalizes to novel, unseen classes by concept conditioning, since the target concept propagates through the conditional detection network via the conditional inputs, thus influencing classification and region proposal. Hence, classes of objects that can be detected are expanded, without the need to scale training databases to include additional classes.

Type: Grant

Filed: November 13, 2018

Date of Patent: August 25, 2020

Assignee: Adobe Inc.

Inventors: Zhe Lin, Xiaohui Shen, Mingyang Ling, Jianming Zhang, Jason Wen Yong Kuen
Object Detection In Images

Publication number: 20200151448

Abstract: In implementations of object detection in images, object detectors are trained using heterogeneous training datasets. A first training dataset is used to train an image tagging network to determine an attention map of an input image for a target concept. A second training dataset is used to train a conditional detection network that accepts as conditional inputs the attention map and a word embedding of the target concept. Despite the conditional detection network being trained with a training dataset having a small number of seen classes (e.g., classes in a training dataset), it generalizes to novel, unseen classes by concept conditioning, since the target concept propagates through the conditional detection network via the conditional inputs, thus influencing classification and region proposal. Hence, classes of objects that can be detected are expanded, without the need to scale training databases to include additional classes.

Type: Application

Filed: November 13, 2018

Publication date: May 14, 2020

Applicant: Adobe Inc.

Inventors: Zhe Lin, Xiaohui Shen, Mingyang Ling, Jianming Zhang, Jason Wen Yong Kuen

prev 1 2