Patents by Inventor Trung Bui
Trung Bui has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12652444Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for segmenting digital videos into topic chapters. In particular, in some embodiments, the disclosed systems generate, utilizing a text encoder, a text representation for a transcript sentence of a video transcript. In addition, in some embodiments, the disclosed systems generate, utilizing a frame encoder, a set of frame representations for a set of video frames associated with the transcript sentence. Moreover, in some embodiments, the disclosed systems generate, utilizing a cross-modal attention model, a text-aware visual representation from the text representation and the set of frame representations. Furthermore, in some embodiments, the disclosed systems determine a topic-boundary label for the transcript sentence from the text representation and the text-aware visual representation.Type: GrantFiled: September 10, 2024Date of Patent: June 9, 2026Assignee: Adobe Inc.Inventors: Fabian David Caba Heilbron, Franck Dernoncourt, Linzi Xing, Quan Tran, Seunghyun Yoon, Trung Bui, Zhaowen Wang
-
Patent number: 12645959Abstract: This disclosure describes methods, non-transitory computer readable storage media, and systems that provide a platform for on-demand selection of machine-learning models and on-demand learning of parameters for the selected machine-learning models via cloud-based systems. For instance, the disclosed system receives a request indicating a selection of a machine-learning model to perform a machine-learning task (e.g., a natural language task) utilizing a specific dataset (e.g., a user-defined dataset). The disclosed system utilizes a scheduler to monitor available computing devices on cloud-based storage systems for instantiating the selected machine-learning model. Using the indicated dataset at a determined cloud-based computing device, the disclosed system automatically trains the machine-learning model.Type: GrantFiled: May 26, 2021Date of Patent: June 2, 2026Assignee: Adobe Inc.Inventors: Nham Van Le, Tuan Manh Lai, Trung Bui, Doo Soon Kim
-
Patent number: 12646507Abstract: The disclosed method generates helpful training data for a language model, for example, a model implementing a punctuation restoration task, for real-world ASR texts. The method uses a reinforcement learning method using a generative AI model to generate additional data to train the language model. The method allows the generative AI model to learn from real-world ASR text to generate more effective training examples based on gradient feedback from the language model.Type: GrantFiled: July 12, 2023Date of Patent: June 2, 2026Assignee: Adobe Inc.Inventors: Viet Dac Lai, Trung Bui, Seunghyun Yoon, Quan Tran, Hao Tan, Hanieh Deilamsalehy, Abel Salinas, Franck Dernoncourt
-
Patent number: 12586392Abstract: Embodiments are disclosed for training an image caption evaluation system to perform evaluations of image captions. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a training image, a ground truth image caption for the training image, and a perturbed image caption for the training image, where the perturbed image caption includes modifications to the ground truth image caption. The disclosed systems and methods further comprise generating, by a visual encoder, a visual embedding representation of the training image and generating, by a perturbation-aware text encoder, a first text embedding for the ground truth image caption and a second text embedding for the perturbed image caption. The disclosed systems and methods further comprise computing losses between the visual embedding, the first text embedding, and the second text embedding and training the perturbation-aware text encoder based on the computed losses.Type: GrantFiled: March 6, 2023Date of Patent: March 24, 2026Assignee: Adobe Inc.Inventors: Seunghyun Yoon, Trung Bui
-
Patent number: 12586374Abstract: A method includes receiving a video input and a text transcription of the video input. The video input includes a plurality of frames and the text transcription includes a plurality of sentences. The method further includes determining, by a multimodal summarization model, a subset of key frames of the plurality of frames and a subset of key sentences of the plurality of sentences. The method further includes providing a summary of the video input and a summary of the text transcription based on the subset of key frames and the subset of key sentences.Type: GrantFiled: June 2, 2023Date of Patent: March 24, 2026Assignee: Adobe Inc.Inventors: Zhaowen Wang, Trung Bui, Bo He
-
Patent number: 12580003Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for segmenting digital videos into topic chapters utilizing a sliding window and video segmentation models. Specifically, the disclosed systems utilize a sliding window to divide a digital video into overlapping segments, each segment including a subset of sentences of a transcript of the video and corresponding video frames for a given time window of the digital video. Further, the disclosed systems generate, for each overlapping segment, topic-boundary label predictions for the subset of sentences. Specifically, the disclosed systems generate text representations for the sentences using a text encoder and frame representations for the corresponding video frames using a frame encoder. Moreover, the disclosed systems generate the topic-boundary label predictions based on the text representations and the frame representations.Type: GrantFiled: November 26, 2024Date of Patent: March 17, 2026Assignee: Adobe Inc.Inventors: Linzi Xing, Franck Dernoncourt, Fabian David Caba Heilbron, Seunghyun Yoon, Zhaowen Wang, Trung Bui
-
Publication number: 20260075295Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for segmenting digital videos into topic chapters. In particular, in some embodiments, the disclosed systems generate, utilizing a text encoder, a text representation for a transcript sentence of a video transcript. In addition, in some embodiments, the disclosed systems generate, utilizing a frame encoder, a set of frame representations for a set of video frames associated with the transcript sentence. Moreover, in some embodiments, the disclosed systems generate, utilizing a cross-modal attention model, a text-aware visual representation from the text representation and the set of frame representations. Furthermore, in some embodiments, the disclosed systems determine a topic-boundary label for the transcript sentence from the text representation and the text-aware visual representation.Type: ApplicationFiled: September 10, 2024Publication date: March 12, 2026Inventors: Fabian David Caba Heilbron, Franck Dernoncourt, Linzi Xing, Quan Tran, Seunghyun Yoon, Trung Bui, Zhaowen Wang
-
Patent number: 12547616Abstract: Systems and methods for natural language processing are described. One or more embodiments of the present disclosure receive a query related to information in a table, compute an operation selector by combining the query with an operation embedding representing a plurality of table operations, compute a column selector by combining the query with a weighted operation embedding, compute a row selector based on the operation selector and the column selector, compute a probability value for a cell in the table based on the row selector and the column selector, where the probability value represents a probability that the cell provides an answer to the query, and transmit contents of the cell based on the probability value.Type: GrantFiled: May 11, 2021Date of Patent: February 10, 2026Assignee: ADOBE INC.Inventors: Dung Thai, Doo Soon Kim, Franck Dernoncourt, Trung Bui
-
Patent number: 12547901Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for providing multilingual semantic search results utilizing meta-learning and knowledge distillation. For example, in some implementations, the disclosed systems perform a first inner learning loop for a monolingual to bilingual meta-learning task for a teacher model. Additionally, in some implementations, the disclosed systems perform a second inner learning loop for a bilingual to multilingual meta-learning task for a student model. In some embodiments, the disclosed systems perform knowledge distillation based on the first inner learning loop for the monolingual to bilingual meta-learning task and the second inner learning loop for the bilingual to multilingual meta-learning task.Type: GrantFiled: August 14, 2023Date of Patent: February 10, 2026Assignee: Adobe Inc.Inventors: Meryem M'Hamdi, Seunghyun Yoon, Franck Dernoncourt, Trung Bui
-
Publication number: 20260017471Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for training a multilingual large language model to embed text into an embedding space of a vision language model comprising a text encoder for a first language and a vision encoder. In particular, in some embodiments, the disclosed systems generate, utilizing the vision encoder, image embeddings for images. Additionally, in some embodiments, the disclosed systems generate, utilizing the multilingual large language model, text embeddings for text in languages other than the first language. Furthermore, in some embodiments, the disclosed systems determine similarity metrics between the image embeddings for the images and the text embeddings for the text. Moreover, in some embodiments, the disclosed systems adjust parameters of the multilingual large language model to reduce an output of a contrastive loss function based on the similarity metrics without adjusting parameters of the vision encoder.Type: ApplicationFiled: July 12, 2024Publication date: January 15, 2026Inventors: Handong Zhao, Tracy King, Kushal Kafle, Rohith Reddy Katikireddy, Sanat Sharma, Scott Cohen, Seunghyun Yoon, Trung Bui, Tushar Vatsa, Venkata Naveen Kumar Yadav Marri, Wei-ting Hsu, Hao Tan, Fangzheng Wu, Amine Ben Khalifa, Ajinkya Gorakhnath Kale
-
Patent number: 12518523Abstract: A method and a system for performing distance metric learning using proxies are provided. The method for performing distance metric learning assigns a proxy as an anchor to represent a class and associates the proxy with all data points in a training batch. The method allows data points to interact with each other via proxies during training. Additionally, the fine-grained data-to-data relation is actively considered, which is combined with a learnable margin parameter leading to intra-class compactness and inter-class separability.Type: GrantFiled: October 12, 2023Date of Patent: January 6, 2026Assignee: VINBRAIN JOINT STOCK COMPANYInventors: Nguyen Phan, Sen Kim Tran, Huy Duc Ta, Soan Thi Minh Duong, Chanh Do Trung Nguyen, Trung Bui, Steven Q. H. Truong
-
Publication number: 20250342628Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that perform text-to-image editing using executable code generated from natural language text input. For instance, in one or more embodiments, the disclosed systems receive, from a client device, a digital image and natural language text input providing instructions for modifying the digital image. The disclosed systems also generate, using a large language model, executable action code for modifying the digital image in accordance with the instructions of the natural language text input, the executable action code being compatible with an editing application. The disclosed systems further modify the digital image by executing the executable action code via the editing application and provide the modified digital image for display via a graphical user interface of the client device.Type: ApplicationFiled: May 3, 2024Publication date: November 6, 2025Inventors: Handong Zhao, Qiucheng Wu, Trung Bui, Seunghyun Yoon, Quan Tran, Jing Shi
-
Patent number: 12451132Abstract: A computer-implemented method is disclosed for determining one or more characteristics of a dialog between a computer system and user. The method may comprise receiving a system utterance comprising one or more tokens defining one or more words generated by the computer system; receiving a user utterance comprising one or more tokens defining one or more words uttered by a user in response to the system utterance, the system utterance and the user utterance forming a dialog context; receiving one or more utterance candidates comprising one or more tokens; for each utterance candidate, generating an input sequence combining the one or more tokens of each of the system utterance, the user utterance, and the utterance candidate; and for each utterance candidate, evaluating the generated input sequence with a model to determine a probability that the utterance candidate is relevant to the dialog context.Type: GrantFiled: February 9, 2023Date of Patent: October 21, 2025Assignee: Adobe Inc.Inventors: Tuan Manh Lai, Trung Bui, Quan Tran
-
Publication number: 20250307606Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for generating digital images via a generative neural network with localized constraints. The disclosed system generates, utilizing one or more encoder neural networks, a sequence of embeddings comprising a prompt embedding representing a text prompt and an object text embedding representing a phrase indicating an object in the text prompt. The disclosed system generates, utilizing the one or more encoder neural networks, a visual embedding representing an object image corresponding to the object. The disclosed system determines a modified sequence of embeddings by replacing the object text embedding with the visual embedding in the sequence of embeddings. The disclosed system also generates, utilizing a generative neural network, a synthetic digital image from the modified sequence of embeddings comprising the visual embedding.Type: ApplicationFiled: March 28, 2024Publication date: October 2, 2025Inventors: Weixi Feng, Yijun Li, Trung Bui, Tobias Hinz, Scott Cohen, Quan Tran, Jianming Zhang, Handong Zhao, Franck Dernoncourt
-
Publication number: 20250298487Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for modifying a digital design by performing a selective object-level undo operation. In one or more embodiments, the disclosed systems generate a modified object by performing a series of operations on an object depicted within the digital design. In some embodiments, the disclosed systems receive a selective object-level undo operation on the modified object, wherein the request specifies an operation to undo from among the series of operations performed on the object. In one or more embodiments, the disclosed systems modify the modified object by performing the selective object-level undo operation on the modified object to undo the operation from among the series of operations. In some embodiments, the disclosed systems provide an updated digital design depicting the modified object reflecting modifications from the series of operations excluding the operation undone by the selective object-level undo operation.Type: ApplicationFiled: March 20, 2024Publication date: September 25, 2025Inventors: Nikita Soni, Trung Bui, Kevin Gary Smith
-
Patent number: 12399890Abstract: Systems and methods for natural language processing are described. Embodiments are configured to receive a structured representation of a search query, wherein the structured representation comprises a plurality of nodes and at least one edge connecting two of the nodes, receive a modification expression for the search query, wherein the modification expression comprises a natural language expression, generate a modified structured representation based on the structured representation and the modification expression using a neural network configured to combine structured representation features and natural language expression features, and perform a search based on the modified structured representation.Type: GrantFiled: November 3, 2020Date of Patent: August 26, 2025Assignee: ADOBE INC.Inventors: Quan Tran, Zhe Lin, Xuanli He, Walter Chang, Trung Bui, Franck Dernoncourt
-
Publication number: 20250252265Abstract: The present disclosure is directed toward systems, methods, and non-transitory computer readable media that provide a contextual query answering system that trains and implements a unique machine learning architecture to generate accurate domain-specific contextual responses. For example, the disclosed systems receive a contextual query indicating a software context of a computer application within a software-specific domain. The disclosed systems utilize a context retrieval model to generate query embeddings from the contextual query and data segment embeddings from data segments of stored digital documents. Further, the context retrieval model determines relevant digital documents from among the stored digital documents based on comparing the query embeddings and the data segment embeddings. The disclosed systems provide the relevant digital documents to a response generator model to generate a contextual response within the software-specific domain.Type: ApplicationFiled: February 5, 2024Publication date: August 7, 2025Inventors: Varun Kumar Kotte, Trung Bui, Seunghyun Yoon, Sanat Sharma, Franck Dernoncourt, Dewang Sultania
-
Publication number: 20250209278Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for identifying speaker names in transcripts. In particular, in one or more embodiments, the disclosed systems determine, from a set of sentences in a textual transcript of a dialogue, a first sentence spoken by a first speaker and a second sentence spoken by a second speaker. Additionally, in some embodiments, the disclosed systems generate a first feature representation for the first sentence and a second feature representation for the second sentence. Moreover, in some embodiments, the disclosed systems determine a speaker name for at least one of the first sentence or the second sentence by comparing each of the first feature representation and the second feature representation with a name representation for a name spoken in at least one of the first sentence or the second sentence.Type: ApplicationFiled: December 20, 2023Publication date: June 26, 2025Inventors: Minh Nguyen, Franck Dernoncourt, Hanieh Deilamsalehy, Hao Tan, Quan Tran, Seunghyun Yoon, Trung Bui
-
Publication number: 20250124698Abstract: A method and a system for performing distance metric learning using proxies are provided. The method for performing distance metric learning assigns a proxy as an anchor to represent a class and associates the proxy with all data points in a training batch. The method allows data points to interact with each other via proxies during training. Additionally, the fine-grained data-to-data relation is actively considered, which is combined with a learnable margin parameter leading to intra-class compactness and inter-class separability.Type: ApplicationFiled: October 12, 2023Publication date: April 17, 2025Inventors: Nguyen PHAN, Sen Kim TRAN, Huy Duc TA, Soan Thi Minh DUONG, Chanh Do Trung NGUYEN, Trung BUI, Steven Q. H. TRUONG
-
Publication number: 20250077775Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for generating aspect-based summaries utilizing deep learning. In particular, in one or more embodiments, the disclosed systems access a transcript comprising sentences. The disclosed systems generate, utilizing a sentence classification machine learning model, aspect labels for the sentences of the transcript. The disclosed systems organize the sentences based on the aspect labels. The disclosed systems generate, utilizing a summary machine learning model, a summary of the transcript for each aspect of the plurality of aspects from the organized sentences.Type: ApplicationFiled: August 29, 2023Publication date: March 6, 2025Inventors: Zhongfen Deng, Seunghyun Yoon, Trung Bui, Quan Tran, Franck Dernoncourt