Patents by Inventor Jiuxiang Gu

Jiuxiang Gu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Generating an improved named entity recognition model using noisy data with a self-cleaning discriminator model

Patent number: 12387043

Abstract: This disclosure describes one or more implementations of systems, non-transitory computer-readable media, and methods that train a named entity recognition (NER) model with noisy training data through a self-cleaning discriminator model. For example, the disclosed systems utilize a self-cleaning guided denoising framework to improve NER learning on noisy training data via a guidance training set. In one or more implementations, the disclosed systems utilize, within the denoising framework, an auxiliary discriminator model to correct noise in the noisy training data while training an NER model through the noisy training data. For example, while training the NER model to predict labels from the noisy training data, the disclosed systems utilize a discriminator model to detect noisy NER labels and reweight the noisy NER labels provided for training in the NER model.

Type: Grant

Filed: September 22, 2023

Date of Patent: August 12, 2025

Assignee: Adobe Inc.

Inventors: Ruiyi Zhang, Zhendong Chu, Vlad Morariu, Tong Yu, Rajiv Jain, Nedim Lipka, Jiuxiang Gu
Generating scene graphs from digital images using external knowledge and image reconstruction

Patent number: 12346827

Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for generating semantic scene graphs for digital images using an external knowledgebase for feature refinement. For example, the disclosed system can determine object proposals and subgraph proposals for a digital image to indicate candidate relationships between objects in the digital image. The disclosed system can then extract relationships from an external knowledgebase for refining features of the object proposals and the subgraph proposals. Additionally, the disclosed system can generate a semantic scene graph for the digital image based on the refined features of the object/subgraph proposals. Furthermore, the disclosed system can update/train a semantic scene graph generation network based on the generated semantic scene graph. The disclosed system can also reconstruct the image using object labels based on the refined features to further update/train the semantic scene graph generation network.

Type: Grant

Filed: June 3, 2022

Date of Patent: July 1, 2025

Assignee: Adobe Inc.

Inventors: Handong Zhao, Zhe Lin, Sheng Li, Mingyang Ling, Jiuxiang Gu
Enhanced document visual question answering system via hierarchical attention

Patent number: 12346655

Abstract: Systems and methods for performing Document Visual Question Answering tasks are described. A document and query are received. The document encodes document tokens and the query encodes query tokens. The document is segmented into nested document sections, lines, and tokens. A nested structure of tokens is generated based on the segmented document. A feature vector for each token is generated. A graph structure is generated based on the nested structure of tokens. Each graph node corresponds to the query, a document section, a line, or a token. The node connections correspond to the nested structure. Each node is associated with the feature vector for the corresponding object. A graph attention network is employed to generate another embedding for each node. These embeddings are employed to identify a portion of the document that includes a response to the query. An indication of the identified portion of the document is be provided.

Type: Grant

Filed: November 17, 2021

Date of Patent: July 1, 2025

Assignee: Adobe Inc.

Inventors: Shijie Geng, Christopher Tensmeyer, Curtis Michael Wigington, Jiuxiang Gu
Unified pretraining framework for document understanding

Patent number: 12333845

Abstract: The technology described includes methods for pretraining a document encoder model based on multimodal self cross-attention. One method includes receiving image data that encodes a set of pretraining documents. A set of sentences is extracted from the image data. A bounding box for each sentence is generated. For each sentence, a set of predicted features is generated by using an encoder machine-learning model. The encoder model performs cross-attention between a set of masked-textual features for the sentence and a set of masked-visual features for the sentence. The set of masked-textual features is based on a masking function and the sentence. The set of masked-visual features is based on the masking function and the corresponding bounding box. A document-encoder model is pretrained based on the set of predicted features for each sentence and pretraining tasks. The pretraining tasks includes masked sentence modeling, visual contrastive learning, or visual-language alignment.

Type: Grant

Filed: November 16, 2021

Date of Patent: June 17, 2025

Assignee: Adobe Inc.

Inventors: Jiuxiang Gu, Ani Nenkova Nenkova, Nikolaos Barmpalios, Vlad Ion Morariu, Tong Sun, Rajiv Bhawanji Jain, Jason wen yong Kuen, Handong Zhao
Extracting document hierarchy using a multimodal, layer-wise link prediction neural network

Patent number: 12333844

Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate a digital document hierarchy comprising layers of parent-child element relationships from the visual elements. For example, for a layer of the layers, the disclosed systems determine, from the visual elements, candidate parent visual elements and child visual elements. In addition, for the layer of the layers, the disclosed systems generate, from the feature embeddings utilizing a neural network, element classifications for the candidate parent visual elements and parent-child element link probabilities for the candidate parent visual elements and the child visual elements. Moreover, for the layer, the disclosed systems select parent visual elements from the candidate parent visual elements based on the parent-child element link probabilities. Further, the disclosed systems utilize the digital document hierarchy to generate an interactive digital document from the digital document image.

Type: Grant

Filed: November 15, 2022

Date of Patent: June 17, 2025

Assignee: Adobe Inc.

Inventors: Vlad Morariu, Puneet Mathur, Rajiv Jain, Ashutosh Mehra, Jiuxiang Gu, Franck Dernoncourt, Anandhavelu N, Quan Tran, Verena Kaynig-Fittkau, Nedim Lipka, Ani Nenkova
GENERATING AN IMPROVED NAMED ENTITY RECOGNITION MODEL USING NOISY DATA WITH A SELF-CLEANING DISCRIMINATOR MODEL

Publication number: 20250103813

Abstract: This disclosure describes one or more implementations of systems, non-transitory computer-readable media, and methods that train a named entity recognition (NER) model with noisy training data through a self-cleaning discriminator model. For example, the disclosed systems utilize a self-cleaning guided denoising framework to improve NER learning on noisy training data via a guidance training set. In one or more implementations, the disclosed systems utilize, within the denoising framework, an auxiliary discriminator model to correct noise in the noisy training data while training an NER model through the noisy training data. For example, while training the NER model to predict labels from the noisy training data, the disclosed systems utilize a discriminator model to detect noisy NER labels and reweight the noisy NER labels provided for training in the NER model.

Type: Application

Filed: September 22, 2023

Publication date: March 27, 2025

Inventors: Ruiyi Zhang, Zhendong Chu, Vlad Morariu, Tong Yu, Rajiv Jain, Nedim Lipka, Jiuxiang Gu
POSITION-BASED TEXT-TO-SPEECH MODEL

Publication number: 20250095631

Abstract: Position-based text-to-speech model and training techniques are described. A digital document, for instance, is received by an audio synthesis service. A text-to-speech model is utilized by the audio synthesis service to generate digital audio from text included in the digital document. The text-to-speech model, for instance, is configured to generate a text encoding and a document positional encoding from an initial text sequence of the digital document. The document positional encoding is based on a location of the text encoding within the digital document. Digital audio is then generated by the text-to-speech model that includes a spectrogram having a reordered text sequence, which is different from the initial text sequence, by decoding the text encoding and the document positional encoding.

Type: Application

Filed: December 4, 2023

Publication date: March 20, 2025

Applicant: Adobe Inc.

Inventors: Puneet Mathur, Franck Dernoncourt, Quan Hung Tran, Jiuxiang Gu, Ani Nenkova, Vlad Ion Morariu, Rajiv Bhawanji Jain, Dinesh Manocha
UTILIZING A GENERATIVE NEURAL NETWORK TO INTERACTIVELY CREATE AND MODIFY DIGITAL IMAGES BASED ON NATURAL LANGUAGE FEEDBACK

Publication number: 20250078200

Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods that implement a neural network framework for interactive multi-round image generation from natural language inputs. Specifically, the disclosed systems provide an intelligent framework (i.e., a text-based interactive image generation model) that facilitates a multi-round image generation and editing workflow that comports with arbitrary input text and synchronous interaction. In particular embodiments, the disclosed systems utilize natural language feedback for conditioning a generative neural network that performs text-to-image generation and text-guided image modification. For example, the disclosed systems utilize a trained model to inject textual features from natural language feedback into a unified joint embedding space for generating text-informed style vectors. In turn, the disclosed systems can generate an image with semantically meaningful features that map to the natural language feedback.

Type: Application

Filed: November 19, 2024

Publication date: March 6, 2025

Inventors: Ruiyi Zhang, Yufan Zhou, Christopher Tensmeyer, Jiuxiang Gu, Tong Yu, Tong Sun
GENERATING 3D MODELS FROM A SINGLE IMAGE

Publication number: 20250078393

Abstract: Systems and methods for generating a 3D model from a single input image are described. Embodiments are configured to obtain an input image and camera view information corresponding to the input image; encode the input image to obtain 2D features comprising a plurality of 2D tokens corresponding to patches of the input image; decode the 2D features based on the camera view information to obtain 3D features comprising a plurality of 3D tokens corresponding to regions of a 3D representation; and generate a 3D model of the input image based on the 3D features.

Type: Application

Filed: September 5, 2023

Publication date: March 6, 2025

Inventors: HAO TAN, YICONG HONG, KAI ZHANG, JIUXIANG GU, SAI BI, YANG ZHOU, DIFAN LIU, FENG LIU, KALYAN K. SUNKAVALLI, TRUNG HUU BUI
EFFICIENT VISION-LANGUAGE RETRIEVAL USING STRUCTURAL PRUNING

Publication number: 20250013866

Abstract: Systems and methods for reducing inference time of vision-language models, as well as for multimodal search, are described herein. Embodiments are configured to obtain an embedding neural network. The embedding neural network is pretrained to embed inputs from a plurality of modalities into a multimodal embedding space. Embodiments are further configured to perform a first progressive pruning stage, where the first progressive pruning stage includes a first pruning of the embedding neural network and a first fine-tuning of the embedding neural network. Embodiments then perform a second progressive pruning stage based on an output of the first progressive pruning stage, where the second progressive pruning stage includes a second pruning of the embedding neural network and a second fine-tuning of the embedding neural network.

Type: Application

Filed: July 6, 2023

Publication date: January 9, 2025

Inventors: Handong Zhao, Yue Bai, Zhe Lin, Ajinkya Gorakhnath Kale, Jiuxiang Gu, Tong Yu, Sungchul Kim
GENERATING TEMPORAL DEPENDENCY GRAPHS

Publication number: 20250013831

Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generates a temporal dependency graph. For example, the disclosed systems generate from a text document, a structural vector, a syntactic vector, and a semantic vector. In some embodiments, the disclosed systems generate a multi-dimensional vector by combining the various vectors. In these or other embodiments, the disclosed systems generate an initial dependency graph structure and an adjacency matrix utilizing an iterative deep graph learning model. Further, in some embodiments, the disclosed systems generate an entity-level relation matrix utilizing a convolutional graph neural network. Moreover, in some embodiments, the disclosed systems generate a temporal dependency graph from the entity-level relation matrix and the adjacency matrix.

Type: Application

Filed: October 24, 2023

Publication date: January 9, 2025

Inventors: Puneet Mathur, Vlad Morariu, Verena Kaynig-Fittkau, Jiuxiang Gu, Franck Dernoncourt, Quan Tran, Ani Nenkova, Dinesh Manocha, Rajiv Jain
IDENTIFYING VISUAL TEXT USING VISION-LANGUAGE MODELS

Publication number: 20240427995

Abstract: A method includes receiving a text to be used for generating an image. The method further includes determining whether the text is a visual text using a machine learning model trained to classify whether an input text is non-visual text or visual text. The method further includes responsive to determining that the text is a visual text, generating the image using a second machine learning model based on the text. The method further includes displaying the image and the text.

Type: Application

Filed: June 22, 2023

Publication date: December 26, 2024

Inventors: Jiuxiang GU, Ryan ROSSI, Gaurav VERMA, Christopher TENSMEYER, Ani NENKOVA
EFFICIENT AUGMENTATION FOR MULTIMODAL MACHINE LEARNING

Publication number: 20240404243

Abstract: Systems and methods for multimodal machine learning are provided. According to one aspect, a method for multimodal machine learning includes obtaining a prompt; encoding the prompt using a multimodal encoder to obtain a prompt embedding, wherein the encoding comprises generating a plurality of multi-head attention (MHA) outputs corresponding to a plurality of different scales, respectively, and combining the plurality of MHA outputs using a multi-scale aggregator; and generating a response to the prompt based on the prompt embedding.

Type: Application

Filed: June 5, 2023

Publication date: December 5, 2024

Inventors: Handong Zhao, Yue Bai, Zhe Lin, Ajinkya Gorakhnath Kale, Jiuxiang Gu, Tong Yu, Sungchul Kim
TEXT-TO-IMAGE SYSTEM AND METHOD

Publication number: 20240386621

Abstract: Techniques and systems for training and/or implementing a text-to-image generation model are provided. A pre-trained multimodal model is leveraged for avoiding slower and more labor-intensive methodologies for training a text-to-image generation model. Accordingly, images without associated text (i.e., bare images) are provided to the pre-trained multimodal model so that it can produce generated text-image pairs. The generated text-image pairs are provided to the text-to-image generation model for training and/or implementing the text-to-image generation model.

Type: Application

Filed: May 17, 2023

Publication date: November 21, 2024

Applicant: Adobe Inc.

Inventors: Ruiyi Zhang, Yufan Zhou, Tong Yu, Tong Sun, Rajiv Jain, Jiuxiang Gu, Christopher Alan Tensmeyer
Utilizing a generative neural network to interactively create and modify digital images based on natural language feedback

Patent number: 12148119

Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods that implement a neural network framework for interactive multi-round image generation from natural language inputs. Specifically, the disclosed systems provide an intelligent framework (i.e., a text-based interactive image generation model) that facilitates a multi-round image generation and editing workflow that comports with arbitrary input text and synchronous interaction. In particular embodiments, the disclosed systems utilize natural language feedback for conditioning a generative neural network that performs text-to-image generation and text-guided image modification. For example, the disclosed systems utilize a trained model to inject textual features from natural language feedback into a unified joint embedding space for generating text-informed style vectors. In turn, the disclosed systems can generate an image with semantically meaningful features that map to the natural language feedback.

Type: Grant

Filed: January 14, 2022

Date of Patent: November 19, 2024

Assignee: Adobe Inc.

Inventors: Ruiyi Zhang, Yufan Zhou, Christopher Tensmeyer, Jiuxiang Gu, Tong Yu, Tong Sun
Multi-scale distillation for low-resolution detection

Patent number: 12136185

Abstract: Systems and methods for image processing are described. The systems and methods include receiving a low-resolution image; generating a feature map based on the low-resolution image using an encoder of a student network, wherein the encoder of the student network is trained based on comparing a predicted feature map from the encoder of the student network and a fused feature map from a teacher network, and wherein the fused feature map represents a combination of first feature map from a high-resolution encoder of the teacher network and a second feature map from a low-resolution encoder of the teacher network; and decoding the feature map to obtain prediction information for the low-resolution image.

Type: Grant

Filed: November 16, 2021

Date of Patent: November 5, 2024

Assignee: ADOBE INC.

Inventors: Jason Kuen, Jiuxiang Gu, Zhe Lin
LABEL INDUCTION

Publication number: 20240232525

Abstract: Systems and methods for document classification are described. Embodiments of the present disclosure generate classification data for a plurality of samples using a neural network trained to identify a plurality of known classes; select a set of samples for annotation from the plurality of samples using an open-set metric based on the classification data, wherein the annotation includes an unknown class; and train the neural network to identify the unknown class based on the annotation of the set of samples.

Type: Application

Filed: October 24, 2022

Publication date: July 11, 2024

Inventors: Rajiv Bhawanji Jain, Michelle Yuan, Vlad Ion Morariu, Ani Nenkova Nenkova, Smitha Bangalore Naresh, Nikolaos Barmpalios, Ruchi Deshpande, Ruiyi Zhang, Jiuxiang Gu, Varun Manjunatha, Nedim Lipka, Andrew Marc Greene
Language-guided document editing

Patent number: 11995394

Abstract: Systems and methods for document editing are provided. One aspect of the systems and methods includes obtaining a document and a natural language edit request. Another aspect of the systems and methods includes generating a structured edit command using a machine learning model based on the document and the natural language edit request. Yet another aspect of the systems and methods includes generating a modified document based on the document and the structured edit command, where the modified document includes a revision of the document that incorporates the natural language edit request.

Type: Grant

Filed: February 7, 2023

Date of Patent: May 28, 2024

Assignee: ADOBE INC.

Inventors: Vlad Ion Morariu, Puneet Mathur, Rajiv Bhawanji Jain, Jiuxiang Gu, Franck Dernoncourt
EXTRACTING DOCUMENT HIERARCHY USING A MULTIMODAL, LAYER-WISE LINK PREDICTION NEURAL NETWORK

Publication number: 20240161529

Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate a digital document hierarchy comprising layers of parent-child element relationships from the visual elements. For example, for a layer of the layers, the disclosed systems determine, from the visual elements, candidate parent visual elements and child visual elements. In addition, for the layer of the layers, the disclosed systems generate, from the feature embeddings utilizing a neural network, element classifications for the candidate parent visual elements and parent-child element link probabilities for the candidate parent visual elements and the child visual elements. Moreover, for the layer, the disclosed systems select parent visual elements from the candidate parent visual elements based on the parent-child element link probabilities. Further, the disclosed systems utilize the digital document hierarchy to generate an interactive digital document from the digital document image.

Type: Application

Filed: November 15, 2022

Publication date: May 16, 2024

Inventors: Vlad Morariu, Puneet Mathur, Rajiv Jain, Ashutosh Mehra, Jiuxiang Gu, Franck Dernoncourt, Anandhavelu N, Quan Tran, Verena Kaynig-Fittkau, Nedim Lipka, Ani Nenkova
TRAINING LANGUAGE MODELS AND PRESERVING PRIVACY

Publication number: 20240135103

Abstract: In implementations of systems for training language models and preserving privacy, a computing device implements a privacy system to predict a next word after a last word in a sequence of words by processing input data using a machine learning model trained on training data to predict next words after last words in sequences of words. The training data describes a corpus of text associated with clients and including sensitive samples and non-sensitive samples. The machine learning model is trained by sampling a client of the clients and using a subset of the sensitive samples associated with the client and a subset of the non-sensitive samples associated with the client to update parameters of the machine learning model. The privacy system generates an indication of the next word after the last word in the sequence of words for display in a user interface.

Type: Application

Filed: February 23, 2023

Publication date: April 25, 2024

Applicant: Adobe Inc.

Inventors: Franck Dernoncourt, Tong Sun, Thi kim phung Lai, Rajiv Bhawanji Jain, Nikolaos Barmpalios, Jiuxiang Gu

1 2 3 next