Patents by Inventor Jiuxiang Gu

Jiuxiang Gu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12387043
    Abstract: This disclosure describes one or more implementations of systems, non-transitory computer-readable media, and methods that train a named entity recognition (NER) model with noisy training data through a self-cleaning discriminator model. For example, the disclosed systems utilize a self-cleaning guided denoising framework to improve NER learning on noisy training data via a guidance training set. In one or more implementations, the disclosed systems utilize, within the denoising framework, an auxiliary discriminator model to correct noise in the noisy training data while training an NER model through the noisy training data. For example, while training the NER model to predict labels from the noisy training data, the disclosed systems utilize a discriminator model to detect noisy NER labels and reweight the noisy NER labels provided for training in the NER model.
    Type: Grant
    Filed: September 22, 2023
    Date of Patent: August 12, 2025
    Assignee: Adobe Inc.
    Inventors: Ruiyi Zhang, Zhendong Chu, Vlad Morariu, Tong Yu, Rajiv Jain, Nedim Lipka, Jiuxiang Gu
  • Patent number: 12346827
    Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for generating semantic scene graphs for digital images using an external knowledgebase for feature refinement. For example, the disclosed system can determine object proposals and subgraph proposals for a digital image to indicate candidate relationships between objects in the digital image. The disclosed system can then extract relationships from an external knowledgebase for refining features of the object proposals and the subgraph proposals. Additionally, the disclosed system can generate a semantic scene graph for the digital image based on the refined features of the object/subgraph proposals. Furthermore, the disclosed system can update/train a semantic scene graph generation network based on the generated semantic scene graph. The disclosed system can also reconstruct the image using object labels based on the refined features to further update/train the semantic scene graph generation network.
    Type: Grant
    Filed: June 3, 2022
    Date of Patent: July 1, 2025
    Assignee: Adobe Inc.
    Inventors: Handong Zhao, Zhe Lin, Sheng Li, Mingyang Ling, Jiuxiang Gu
  • Patent number: 12346655
    Abstract: Systems and methods for performing Document Visual Question Answering tasks are described. A document and query are received. The document encodes document tokens and the query encodes query tokens. The document is segmented into nested document sections, lines, and tokens. A nested structure of tokens is generated based on the segmented document. A feature vector for each token is generated. A graph structure is generated based on the nested structure of tokens. Each graph node corresponds to the query, a document section, a line, or a token. The node connections correspond to the nested structure. Each node is associated with the feature vector for the corresponding object. A graph attention network is employed to generate another embedding for each node. These embeddings are employed to identify a portion of the document that includes a response to the query. An indication of the identified portion of the document is be provided.
    Type: Grant
    Filed: November 17, 2021
    Date of Patent: July 1, 2025
    Assignee: Adobe Inc.
    Inventors: Shijie Geng, Christopher Tensmeyer, Curtis Michael Wigington, Jiuxiang Gu
  • Patent number: 12333845
    Abstract: The technology described includes methods for pretraining a document encoder model based on multimodal self cross-attention. One method includes receiving image data that encodes a set of pretraining documents. A set of sentences is extracted from the image data. A bounding box for each sentence is generated. For each sentence, a set of predicted features is generated by using an encoder machine-learning model. The encoder model performs cross-attention between a set of masked-textual features for the sentence and a set of masked-visual features for the sentence. The set of masked-textual features is based on a masking function and the sentence. The set of masked-visual features is based on the masking function and the corresponding bounding box. A document-encoder model is pretrained based on the set of predicted features for each sentence and pretraining tasks. The pretraining tasks includes masked sentence modeling, visual contrastive learning, or visual-language alignment.
    Type: Grant
    Filed: November 16, 2021
    Date of Patent: June 17, 2025
    Assignee: Adobe Inc.
    Inventors: Jiuxiang Gu, Ani Nenkova Nenkova, Nikolaos Barmpalios, Vlad Ion Morariu, Tong Sun, Rajiv Bhawanji Jain, Jason wen yong Kuen, Handong Zhao
  • Patent number: 12333844
    Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate a digital document hierarchy comprising layers of parent-child element relationships from the visual elements. For example, for a layer of the layers, the disclosed systems determine, from the visual elements, candidate parent visual elements and child visual elements. In addition, for the layer of the layers, the disclosed systems generate, from the feature embeddings utilizing a neural network, element classifications for the candidate parent visual elements and parent-child element link probabilities for the candidate parent visual elements and the child visual elements. Moreover, for the layer, the disclosed systems select parent visual elements from the candidate parent visual elements based on the parent-child element link probabilities. Further, the disclosed systems utilize the digital document hierarchy to generate an interactive digital document from the digital document image.
    Type: Grant
    Filed: November 15, 2022
    Date of Patent: June 17, 2025
    Assignee: Adobe Inc.
    Inventors: Vlad Morariu, Puneet Mathur, Rajiv Jain, Ashutosh Mehra, Jiuxiang Gu, Franck Dernoncourt, Anandhavelu N, Quan Tran, Verena Kaynig-Fittkau, Nedim Lipka, Ani Nenkova
  • Publication number: 20250103813
    Abstract: This disclosure describes one or more implementations of systems, non-transitory computer-readable media, and methods that train a named entity recognition (NER) model with noisy training data through a self-cleaning discriminator model. For example, the disclosed systems utilize a self-cleaning guided denoising framework to improve NER learning on noisy training data via a guidance training set. In one or more implementations, the disclosed systems utilize, within the denoising framework, an auxiliary discriminator model to correct noise in the noisy training data while training an NER model through the noisy training data. For example, while training the NER model to predict labels from the noisy training data, the disclosed systems utilize a discriminator model to detect noisy NER labels and reweight the noisy NER labels provided for training in the NER model.
    Type: Application
    Filed: September 22, 2023
    Publication date: March 27, 2025
    Inventors: Ruiyi Zhang, Zhendong Chu, Vlad Morariu, Tong Yu, Rajiv Jain, Nedim Lipka, Jiuxiang Gu
  • Publication number: 20250095631
    Abstract: Position-based text-to-speech model and training techniques are described. A digital document, for instance, is received by an audio synthesis service. A text-to-speech model is utilized by the audio synthesis service to generate digital audio from text included in the digital document. The text-to-speech model, for instance, is configured to generate a text encoding and a document positional encoding from an initial text sequence of the digital document. The document positional encoding is based on a location of the text encoding within the digital document. Digital audio is then generated by the text-to-speech model that includes a spectrogram having a reordered text sequence, which is different from the initial text sequence, by decoding the text encoding and the document positional encoding.
    Type: Application
    Filed: December 4, 2023
    Publication date: March 20, 2025
    Applicant: Adobe Inc.
    Inventors: Puneet Mathur, Franck Dernoncourt, Quan Hung Tran, Jiuxiang Gu, Ani Nenkova, Vlad Ion Morariu, Rajiv Bhawanji Jain, Dinesh Manocha
  • Publication number: 20250078200
    Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods that implement a neural network framework for interactive multi-round image generation from natural language inputs. Specifically, the disclosed systems provide an intelligent framework (i.e., a text-based interactive image generation model) that facilitates a multi-round image generation and editing workflow that comports with arbitrary input text and synchronous interaction. In particular embodiments, the disclosed systems utilize natural language feedback for conditioning a generative neural network that performs text-to-image generation and text-guided image modification. For example, the disclosed systems utilize a trained model to inject textual features from natural language feedback into a unified joint embedding space for generating text-informed style vectors. In turn, the disclosed systems can generate an image with semantically meaningful features that map to the natural language feedback.
    Type: Application
    Filed: November 19, 2024
    Publication date: March 6, 2025
    Inventors: Ruiyi Zhang, Yufan Zhou, Christopher Tensmeyer, Jiuxiang Gu, Tong Yu, Tong Sun
  • Publication number: 20250078393
    Abstract: Systems and methods for generating a 3D model from a single input image are described. Embodiments are configured to obtain an input image and camera view information corresponding to the input image; encode the input image to obtain 2D features comprising a plurality of 2D tokens corresponding to patches of the input image; decode the 2D features based on the camera view information to obtain 3D features comprising a plurality of 3D tokens corresponding to regions of a 3D representation; and generate a 3D model of the input image based on the 3D features.
    Type: Application
    Filed: September 5, 2023
    Publication date: March 6, 2025
    Inventors: HAO TAN, YICONG HONG, KAI ZHANG, JIUXIANG GU, SAI BI, YANG ZHOU, DIFAN LIU, FENG LIU, KALYAN K. SUNKAVALLI, TRUNG HUU BUI
  • Publication number: 20250013866
    Abstract: Systems and methods for reducing inference time of vision-language models, as well as for multimodal search, are described herein. Embodiments are configured to obtain an embedding neural network. The embedding neural network is pretrained to embed inputs from a plurality of modalities into a multimodal embedding space. Embodiments are further configured to perform a first progressive pruning stage, where the first progressive pruning stage includes a first pruning of the embedding neural network and a first fine-tuning of the embedding neural network. Embodiments then perform a second progressive pruning stage based on an output of the first progressive pruning stage, where the second progressive pruning stage includes a second pruning of the embedding neural network and a second fine-tuning of the embedding neural network.
    Type: Application
    Filed: July 6, 2023
    Publication date: January 9, 2025
    Inventors: Handong Zhao, Yue Bai, Zhe Lin, Ajinkya Gorakhnath Kale, Jiuxiang Gu, Tong Yu, Sungchul Kim
  • Publication number: 20250013831
    Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generates a temporal dependency graph. For example, the disclosed systems generate from a text document, a structural vector, a syntactic vector, and a semantic vector. In some embodiments, the disclosed systems generate a multi-dimensional vector by combining the various vectors. In these or other embodiments, the disclosed systems generate an initial dependency graph structure and an adjacency matrix utilizing an iterative deep graph learning model. Further, in some embodiments, the disclosed systems generate an entity-level relation matrix utilizing a convolutional graph neural network. Moreover, in some embodiments, the disclosed systems generate a temporal dependency graph from the entity-level relation matrix and the adjacency matrix.
    Type: Application
    Filed: October 24, 2023
    Publication date: January 9, 2025
    Inventors: Puneet Mathur, Vlad Morariu, Verena Kaynig-Fittkau, Jiuxiang Gu, Franck Dernoncourt, Quan Tran, Ani Nenkova, Dinesh Manocha, Rajiv Jain
  • Publication number: 20240427995
    Abstract: A method includes receiving a text to be used for generating an image. The method further includes determining whether the text is a visual text using a machine learning model trained to classify whether an input text is non-visual text or visual text. The method further includes responsive to determining that the text is a visual text, generating the image using a second machine learning model based on the text. The method further includes displaying the image and the text.
    Type: Application
    Filed: June 22, 2023
    Publication date: December 26, 2024
    Inventors: Jiuxiang GU, Ryan ROSSI, Gaurav VERMA, Christopher TENSMEYER, Ani NENKOVA
  • Publication number: 20240404243
    Abstract: Systems and methods for multimodal machine learning are provided. According to one aspect, a method for multimodal machine learning includes obtaining a prompt; encoding the prompt using a multimodal encoder to obtain a prompt embedding, wherein the encoding comprises generating a plurality of multi-head attention (MHA) outputs corresponding to a plurality of different scales, respectively, and combining the plurality of MHA outputs using a multi-scale aggregator; and generating a response to the prompt based on the prompt embedding.
    Type: Application
    Filed: June 5, 2023
    Publication date: December 5, 2024
    Inventors: Handong Zhao, Yue Bai, Zhe Lin, Ajinkya Gorakhnath Kale, Jiuxiang Gu, Tong Yu, Sungchul Kim
  • Publication number: 20240386621
    Abstract: Techniques and systems for training and/or implementing a text-to-image generation model are provided. A pre-trained multimodal model is leveraged for avoiding slower and more labor-intensive methodologies for training a text-to-image generation model. Accordingly, images without associated text (i.e., bare images) are provided to the pre-trained multimodal model so that it can produce generated text-image pairs. The generated text-image pairs are provided to the text-to-image generation model for training and/or implementing the text-to-image generation model.
    Type: Application
    Filed: May 17, 2023
    Publication date: November 21, 2024
    Applicant: Adobe Inc.
    Inventors: Ruiyi Zhang, Yufan Zhou, Tong Yu, Tong Sun, Rajiv Jain, Jiuxiang Gu, Christopher Alan Tensmeyer
  • Patent number: 12148119
    Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods that implement a neural network framework for interactive multi-round image generation from natural language inputs. Specifically, the disclosed systems provide an intelligent framework (i.e., a text-based interactive image generation model) that facilitates a multi-round image generation and editing workflow that comports with arbitrary input text and synchronous interaction. In particular embodiments, the disclosed systems utilize natural language feedback for conditioning a generative neural network that performs text-to-image generation and text-guided image modification. For example, the disclosed systems utilize a trained model to inject textual features from natural language feedback into a unified joint embedding space for generating text-informed style vectors. In turn, the disclosed systems can generate an image with semantically meaningful features that map to the natural language feedback.
    Type: Grant
    Filed: January 14, 2022
    Date of Patent: November 19, 2024
    Assignee: Adobe Inc.
    Inventors: Ruiyi Zhang, Yufan Zhou, Christopher Tensmeyer, Jiuxiang Gu, Tong Yu, Tong Sun
  • Patent number: 12136185
    Abstract: Systems and methods for image processing are described. The systems and methods include receiving a low-resolution image; generating a feature map based on the low-resolution image using an encoder of a student network, wherein the encoder of the student network is trained based on comparing a predicted feature map from the encoder of the student network and a fused feature map from a teacher network, and wherein the fused feature map represents a combination of first feature map from a high-resolution encoder of the teacher network and a second feature map from a low-resolution encoder of the teacher network; and decoding the feature map to obtain prediction information for the low-resolution image.
    Type: Grant
    Filed: November 16, 2021
    Date of Patent: November 5, 2024
    Assignee: ADOBE INC.
    Inventors: Jason Kuen, Jiuxiang Gu, Zhe Lin
  • Publication number: 20240232525
    Abstract: Systems and methods for document classification are described. Embodiments of the present disclosure generate classification data for a plurality of samples using a neural network trained to identify a plurality of known classes; select a set of samples for annotation from the plurality of samples using an open-set metric based on the classification data, wherein the annotation includes an unknown class; and train the neural network to identify the unknown class based on the annotation of the set of samples.
    Type: Application
    Filed: October 24, 2022
    Publication date: July 11, 2024
    Inventors: Rajiv Bhawanji Jain, Michelle Yuan, Vlad Ion Morariu, Ani Nenkova Nenkova, Smitha Bangalore Naresh, Nikolaos Barmpalios, Ruchi Deshpande, Ruiyi Zhang, Jiuxiang Gu, Varun Manjunatha, Nedim Lipka, Andrew Marc Greene
  • Patent number: 11995394
    Abstract: Systems and methods for document editing are provided. One aspect of the systems and methods includes obtaining a document and a natural language edit request. Another aspect of the systems and methods includes generating a structured edit command using a machine learning model based on the document and the natural language edit request. Yet another aspect of the systems and methods includes generating a modified document based on the document and the structured edit command, where the modified document includes a revision of the document that incorporates the natural language edit request.
    Type: Grant
    Filed: February 7, 2023
    Date of Patent: May 28, 2024
    Assignee: ADOBE INC.
    Inventors: Vlad Ion Morariu, Puneet Mathur, Rajiv Bhawanji Jain, Jiuxiang Gu, Franck Dernoncourt
  • Publication number: 20240161529
    Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate a digital document hierarchy comprising layers of parent-child element relationships from the visual elements. For example, for a layer of the layers, the disclosed systems determine, from the visual elements, candidate parent visual elements and child visual elements. In addition, for the layer of the layers, the disclosed systems generate, from the feature embeddings utilizing a neural network, element classifications for the candidate parent visual elements and parent-child element link probabilities for the candidate parent visual elements and the child visual elements. Moreover, for the layer, the disclosed systems select parent visual elements from the candidate parent visual elements based on the parent-child element link probabilities. Further, the disclosed systems utilize the digital document hierarchy to generate an interactive digital document from the digital document image.
    Type: Application
    Filed: November 15, 2022
    Publication date: May 16, 2024
    Inventors: Vlad Morariu, Puneet Mathur, Rajiv Jain, Ashutosh Mehra, Jiuxiang Gu, Franck Dernoncourt, Anandhavelu N, Quan Tran, Verena Kaynig-Fittkau, Nedim Lipka, Ani Nenkova
  • Publication number: 20240135103
    Abstract: In implementations of systems for training language models and preserving privacy, a computing device implements a privacy system to predict a next word after a last word in a sequence of words by processing input data using a machine learning model trained on training data to predict next words after last words in sequences of words. The training data describes a corpus of text associated with clients and including sensitive samples and non-sensitive samples. The machine learning model is trained by sampling a client of the clients and using a subset of the sensitive samples associated with the client and a subset of the non-sensitive samples associated with the client to update parameters of the machine learning model. The privacy system generates an indication of the next word after the last word in the sequence of words for display in a user interface.
    Type: Application
    Filed: February 23, 2023
    Publication date: April 25, 2024
    Applicant: Adobe Inc.
    Inventors: Franck Dernoncourt, Tong Sun, Thi kim phung Lai, Rajiv Bhawanji Jain, Nikolaos Barmpalios, Jiuxiang Gu