Patents by Inventor Duy Vu
Duy Vu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20260162009Abstract: Techniques are disclosed herein towards a process for enhancing generative model robustness. The process includes accessing a training data set comprising examples, each with a natural language utterance and a database schema. Each example is augmented by generating augmentation prompts with perturbation instructions, which direct the generative model to modify the prompt using one or more categories of perturbations. These perturbations may alter the natural language utterance, the database schema, or both, resulting in variant prompts. The augmented examples are added to an augmented training data set. Using these augmented training examples, a pre-trained generative model is fine-tuned, yielding a fine-tuned generative model capable of accurately and consistently producing structured queries in response to diverse natural language inputs and schema configurations.Type: ApplicationFiled: August 15, 2025Publication date: June 11, 2026Applicant: Oracle International CorporationInventors: Varsha Kuppur Rajendra, Duy Vu, Gioacchino Tangari, Dalu Guo, Steve Wai-Chun Siu, Cong Duy Vu Hoang, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Thanh Long Duong
-
Publication number: 20260161622Abstract: Techniques are disclosed herein to augment a prompt for a generative model. The techniques may include accessing a set of training examples, generating a logical form query that corresponds to the natural language utterance for each example of the set of training examples, evaluating performance of the generative model using the logical form query generated for each example of the set of training examples, identifying, based on the evaluating, a subset of rejected examples from the set of examples, grouping the subset of rejected examples into at least one or more groups of rejected examples based on similarity between embedding vectors for the subset of rejected examples, generating, by a generative model, instructions for each group of the one or more groups of rejected examples, and modifying the baseline prompt to include the instructions for each group of the one or more groups of rejected examples.Type: ApplicationFiled: June 30, 2025Publication date: June 11, 2026Applicant: Oracle International CorporationInventors: Duy Vu, Gioacchino Tangari, Budhaditya Saha, Dai Quoc Nguyen, Thanh Tien Vu, Nagesh Panyam Chandrasekarasatry, Cong Duy Vu Hoang, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Thanh Long Duong
-
Publication number: 20260161680Abstract: Techniques are disclosed herein towards logical form data-time expression generation. For example, methods are provided for receiving a natural language (NL) utterance, generating a prompt including the NL utterance and instructions to transform the NL utterance into a logical form query including a coded-form expression, generating, by a generative model based on the prompt, a logical form query including a coded-form expression, transforming the coded-form expression into a period definition expression by executing the coded-form expression with one or more pre-defined period-definition content items, updating the logical form query to include the period definition expression by replacing the coded-form expression with the period definition expression, and providing at least one of i) the updated logical form query or ii) a query result obtained based on the updated logical form query, to a client system.Type: ApplicationFiled: June 30, 2025Publication date: June 11, 2026Applicant: Oracle International CorporationInventors: Duy Vu, Nagesh Panyam Chandrasekarasastry, Gioacchino Tangari, Cong Duy Vu Hoang, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Thanh Long Duong, Anshuk Pal Chaudhuri, Karthik Puthenparampil Srinivasan, Kok Wei Kam, Prabhakara Reddy Munnangi, Subhash Kumar Bhamidipati
-
Patent number: 12632786Abstract: Techniques for named entity bias detection and mitigation for sentence sentiment analysis. In one particular aspect, a method is provided that includes obtaining a training set of labeled examples for training a machine learning model to classify sentiment, preparing a list of named entities using one or more data sources, for each example in the training set of labeled examples with a named entity, replacing the named entity with a corresponding entity type tag to generate a labeled template data set, executing a sampling process for each entity type t within the labeled template data set to generate a augmented invariance data set comprising one or more invariance groups having labeled examples for each entity type t, and training the machine learning model using labeled examples from the augmented invariance data set.Type: GrantFiled: November 10, 2022Date of Patent: May 19, 2026Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Duy Vu, Varsha Kuppur Rajendra, Shivashankar Subramanian, Ahmed Ataallah Ataallah Abobakr, Thanh Long Duong, Mark Edward Johnson
-
Patent number: 12602545Abstract: Techniques disclosed herein relate generally to language detection. In one particular aspect, a method is provided that includes obtaining a sequence of n-grams of a textual unit; using an embedding layer to obtain an ordered plurality of embedding vectors for the sequence of n-grams; using a deep network to obtain an encoded vector that is based on the ordered plurality of embedding vectors; and using a classifier to obtain a language prediction for the textual unit that is based on the encoded vector. The deep network includes an attention mechanism, and using the embedding layer to obtain the ordered plurality of embedding vectors comprises, for each n-gram in the sequence of n-grams: obtaining hash values for the n-gram; based on the hash values, selecting component vectors from among the plurality of component vectors; and obtaining an embedding vector for the n-gram that is based on the component vectors.Type: GrantFiled: November 4, 2022Date of Patent: April 14, 2026Assignee: Oracle International CorporationInventors: Thanh Tien Vu, Poorya Zaremoodi, Duy Vu, Mark Edward Johnson, Thanh Long Duong, Xu Zhong, Vladislav Blinov, Cong Duy Vu Hoang, Yu-Heng Hong, Vinamr Goel, Philip Victor Ogren, Srinivasa Phani Kumar Gadde, Vishal Vishnoi
-
Publication number: 20260099675Abstract: Techniques disclosed herein relate generally to text classification and include techniques for fusing word embeddings with word scores for text classification. In one particular aspect, a method for text classification is provided that includes obtaining an embedding vector for a textual unit, based on a plurality of word embedding vectors and a plurality of word scores. The plurality of word embedding vectors includes a corresponding word embedding vector for each of a plurality of words of the textual unit, and the plurality of word scores includes a corresponding word score for each of the plurality of words of the textual unit. The method also includes passing the embedding vector for the textual unit through at least one feed-forward layer to obtain a final layer output, and performing a classification on the final layer output.Type: ApplicationFiled: December 10, 2025Publication date: April 9, 2026Applicant: Oracle International CorporationInventors: Ahmed Ataallah Ataallah Abobakr, Mark Edward Johnson, Thanh Long Duong, Vladislav Blinov, Yu-Heng Hong, Cong Duy Vu Hoang, Duy Vu
-
Publication number: 20260080260Abstract: The present disclosure relates to manufacturing training and testing data by leveraging data augmentation techniques to generate examples of long context database schemas. Aspects are directed towards accessing a training dataset comprising training examples where each training example may include i) a prompt including a natural language utterance and a database schema having one or more tables, and ii) a gold logical form corresponding to the natural language utterance, combining the tables from the database schemas in the training examples may generate a combined database schema set, generating a set of long context training examples based on the training dataset and the combined database schema set, and incorporating the long context database schema into the selected training example to generate a long context training example to train a generative artificial intelligence model with at least the set of long context training examples to generate a trained generative artificial intelligence model.Type: ApplicationFiled: January 23, 2025Publication date: March 19, 2026Applicant: Oracle International CorporationInventors: Dai Quoc Nguyen, Cong Duy Vu Hoang, Duy Vu, Gioacchino Tangari, Steve Wai-Chun Siu, Dalu Guo, Budhaditya Saha, Thanh Tien Vu, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Thanh Long Duong, Anshuk Pal Chaudhuri, Prabhakara Reddy Munnangi, Subash Kumar Bhamidipati
-
Patent number: 12579471Abstract: Techniques for augmentation and batch balancing of training data to enhance negation and fairness of a machine learning model. In one particular aspect, a method is provided that includes obtaining a training set of labeled examples for training a machine learning model to classify sentiment, searching the training set of labeled examples or an unlabeled corpus of text on target domains for sentiment examples having negation cues, sentiment laden words, words with sentiment prefixes or suffixes, or a combination thereof, rewriting the sentiment examples to create negated versions thereof and generate a labeled negation pair data set, and training the machine learning model using labeled examples from the labeled negation pair data set.Type: GrantFiled: November 10, 2022Date of Patent: March 17, 2026Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Duy Vu, Varsha Kuppur Rajendra, Dai Hoang Tran, Shivashankar Subramanian, Poorya Zaremoodi, Thanh Long Duong, Mark Edward Johnson
-
Publication number: 20260072909Abstract: The present disclosure relates to machine learning techniques for In-Context-Learning (ICL) with pattern-based retrieval for the task of converting Natural Language (NL) to Structured Query Language (SQL). Aspects are directed towards acquiring a natural language utterance and a database schema, searching, using at least a portion of the natural language utterance as a key, a memory bank for one or more in-context examples that are relevant to the key, generating a prompt comprising the natural language query, the database schema, and the one or more in-context examples, transmitting the prompt to a first pretrained generative artificial intelligence model, receiving, from the first pretrained generative artificial intelligence model, a logical form corresponding to the natural language utterance based at least in part on the prompt, executing the logical form on a database to obtain a query result, and providing the query result to a user.Type: ApplicationFiled: February 25, 2025Publication date: March 12, 2026Applicant: Oracle International CorporationInventors: Nagesh Panyam Chandrasekarasatry, Duy Vu, Dai Quoc Nguyen, Gioacchino Tangari, Cong Duy Vu Hoang, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Vladislav Blinov, Thanh Tien Vu, Ying Xu, Thanh Long Duong
-
Patent number: 12554934Abstract: A method includes accessing document including sentences, document being associated with configuration flag indicating whether ABSA, SLSA, or both are to be performed; inputting the document into language model that generates chunks of token embeddings for the document; and, based on the configuration flag, performing at least one from among the ABSA and the SLSA by inputting the chunks of token embeddings into a multi-task model. When performing the SLSA, a part of token embeddings in each of the chunks is masked, and the masked token embeddings do not belong to a particular sentence on which the SLSA is performed.Type: GrantFiled: October 12, 2023Date of Patent: February 17, 2026Assignee: Oracle International CorporationInventors: Poorya Zaremoodi, Duy Vu, Nagaraj N. Bhat, Srijon Sarkar, Varsha Kuppur Rajendra, Thanh Long Duong, Mark Edward Johnson, Pramir Sarkar, Shahid Reza
-
Publication number: 20260037505Abstract: Techniques are disclosed herein for providing and using a natural language to logical form model having execution and sematic error correction capabilities. In one aspect, a method is disclosed that includes: accessing a set of training examples and generating a set of error correction training examples via an iterative process performed for each training example. The iterative process includes generating an inferred logical form, executing the inferred logical form on a database, when executing the inferred logical form on the database fails, obtaining an execution error message corresponding to the failure, and recording the inferred logical form and the execution error message as part of an execution error example, and populating an error correction prompt template with the execution error example to generate an error correction training example. A machine learning model may then be trained with at least the set of error correction training examples.Type: ApplicationFiled: July 30, 2024Publication date: February 5, 2026Applicant: Oracle International CorporationInventors: Duy Vu, Steve Wai-Chun Siu, Gioacchino Tangari, Cong Duy Vu Hoang, Vladislav Blinov, Yakupitiyage Don Thanuja Samodhve Dharmasiri, Ying Xu, Thanh Long Duong
-
Patent number: 12530545Abstract: A computer-implemented method includes: accessing a plurality of datasets, where each dataset of the plurality of datasets includes training examples; selecting datasets that include the training examples in a source language and a target language; and sampling, based on a sampling weight that is determined for each of the selected datasets, the training examples from the selected datasets to generate the training batches; training an ML model for performing at least a first task using the training examples of the training batches, by interleavingly inputting the training batches to the ML model; and outputting the trained ML model configured to perform the at least the first task on input utterances provided in at least one among the source language and the target language. The sampling weight is determined for each of the selected datasets based on one or more attributes common to the training examples of the selected dataset.Type: GrantFiled: October 12, 2023Date of Patent: January 20, 2026Assignee: Oracle International CorporationInventors: Duy Vu, Poorya Zaremoodi, Nagaraj N. Bhat, Srijon Sarkar, Varsha Kuppur Rajendra, Thanh Long Duong, Mark Edward Johnson, Pramir Sarkar, Shahid Reza
-
Patent number: 12518098Abstract: Techniques disclosed herein relate generally to text classification and include techniques for fusing word embeddings with word scores for text classification. In one particular aspect, a method for text classification is provided that includes obtaining an embedding vector for a textual unit, based on a plurality of word embedding vectors and a plurality of word scores. The plurality of word embedding vectors includes a corresponding word embedding vector for each of a plurality of words of the textual unit, and the plurality of word scores includes a corresponding word score for each of the plurality of words of the textual unit. The method also includes passing the embedding vector for the textual unit through at least one feed-forward layer to obtain a final layer output, and performing a classification on the final layer output.Type: GrantFiled: September 29, 2022Date of Patent: January 6, 2026Assignee: Oracle International CorporationInventors: Ahmed Ataallah Ataallah Abobakr, Mark Edward Johnson, Thanh Long Duong, Vladislav Blinov, Yu-Heng Hong, Cong Duy Vu Hoang, Duy Vu
-
Publication number: 20250307556Abstract: Techniques are provided for using context tags in named-entity recognition (NER) models. In one particular aspect, a method is provided that includes receiving an utterance, generating embeddings for words of the utterance, generating a regular expression and gazetteer feature vector for the utterance, generating a context tag distribution feature vector for the utterance, concatenating or interpolating the embeddings with the regular expression and gazetteer feature vector and the context tag distribution feature vector to generate a set of feature vectors, generating an encoded form of the utterance based on the set of feature vectors, generating log-probabilities based on the encoded form of the utterance, and identifying one or more constraints for the utterance.Type: ApplicationFiled: June 11, 2025Publication date: October 2, 2025Applicant: Oracle International CorporationInventors: Duy Vu, Tuyen Quang Pham, Cong Duy Vu Hoang, Srinivasa Phani Kumar Gadde, Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi
-
Patent number: 12412126Abstract: Techniques for augmentation and batch balancing of training data to enhance negation and fairness of a machine learning model. In one particular aspect, a method is provided that includes generating a list of demographic words associated with a demographic group, searching an unlabeled corpus of text to identify unlabeled examples in a target domain comprising at least one demographic word from the list of demographic words, rewriting the unlabeled examples to create one or more versions of each of the unlabeled examples and generate a fairness invariance data set, and training the machine learning model using unlabeled examples from the fairness invariance data set.Type: GrantFiled: November 10, 2022Date of Patent: September 9, 2025Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Duy Vu, Varsha Kuppur Rajendra, Dai Hoang Tran, Shivashankar Subramanian, Poorya Zaremoodi, Thanh Long Duong, Mark Edward Johnson
-
Patent number: 12361219Abstract: Techniques are provided for using context tags in named-entity recognition (NER) models. In one particular aspect, a method is provided that includes receiving an utterance, generating embeddings for words of the utterance, generating a regular expression and gazetteer feature vector for the utterance, generating a context tag distribution feature vector for the utterance, concatenating or interpolating the embeddings with the regular expression and gazetteer feature vector and the context tag distribution feature vector to generate a set of feature vectors, generating an encoded form of the utterance based on the set of feature vectors, generating log-probabilities based on the encoded form of the utterance, and identifying one or more constraints for the utterance.Type: GrantFiled: November 28, 2023Date of Patent: July 15, 2025Assignee: Oracle International CorporationInventors: Duy Vu, Tuyen Quang Pham, Cong Duy Vu Hoang, Srinivasa Phani Kumar Gadde, Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi
-
Publication number: 20250218428Abstract: Techniques are disclosed herein for focused training of language models and end-to-end hypertuning of the framework. In one aspect, a method is provided that includes obtaining a machine learning model pre-trained for language modeling, and post-training the machine learning model for various tasks to generate a focused machine learning model. The post-training includes: (i) training the machine learning model on an unlabeled set of training data pertaining to a task that the machine learning model was pre-trained for as part of the language modeling, and the unlabeled set of training data is obtained with respect to a target domain, a target task, or a target language, and (ii) training the machine learning model on a labeled set of training data that pertains to another task that is an auxiliary task related to a downstream task to be performed using the machine learning model or output from the machine learning model.Type: ApplicationFiled: March 20, 2025Publication date: July 3, 2025Applicant: Oracle International CorporationInventors: Poorya Zaremoodi, Cong Duy Vu Hoang, Duy Vu, Dai Hoang Tran, Budhaditya Saha, Nagaraj N. Bhat, Thanh Tien Vu, Tuyen Quang Pham, Adam Craig Pocock, Katherine Silverstein, Srinivasa Phani Kumar Gadde, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong
-
Patent number: 12299402Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances.Type: GrantFiled: May 9, 2024Date of Patent: May 13, 2025Assignee: Oracle International CorporationInventors: Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi, Crystal C. Pan, Vladislav Blinov, Cong Duy Vu Hoang, Elias Luqman Jalaluddin, Duy Vu, Balakota Srinivas Vinnakota
-
Patent number: 12288550Abstract: Techniques are disclosed herein for focused training of language models and end-to-end hypertuning of the framework. In one aspect, a method is provided that includes obtaining a machine learning model pre-trained for language modeling, and post-training the machine learning model for various tasks to generate a focused machine learning model. The post-training includes: (i) training the machine learning model on an unlabeled set of training data pertaining to a task that the machine learning model was pre-trained for as part of the language modeling, and the unlabeled set of training data is obtained with respect to a target domain, a target task, or a target language, and (ii) training the machine learning model on a labeled set of training data that pertains to another task that is an auxiliary task related to a downstream task to be performed using the machine learning model or output from the machine learning model.Type: GrantFiled: September 23, 2022Date of Patent: April 29, 2025Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Poorya Zaremoodi, Cong Duy Vu Hoang, Duy Vu, Dai Hoang Tran, Budhaditya Saha, Nagaraj N. Bhat, Thanh Tien Vu, Tuyen Quang Pham, Adam Craig Pocock, Katherine Silverstein, Srinivasa Phani Kumar Gadde, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong
-
Publication number: 20250095636Abstract: Techniques are disclosed herein for improving the performance of an end-to-end (E2E) Automatic Speech Recognition (ASR) model in a target domain. A set of test examples are generated. The set of test examples comprise multiple subsets of test examples and each subset of test examples corresponds to a particular test category. A machine language model is then used to convert audio samples of the subset of test examples to text transcripts. A word error rate is determined for the subset of test examples. A test category is then selected based on the word error rates and a set of training examples is generated for training the ASR model in a particular target domain from a selected subset of test examples The training examples are used to fine-tune the model in the target domain. The trained model is then deployed in a cloud infrastructure of a cloud service provider.Type: ApplicationFiled: September 3, 2024Publication date: March 20, 2025Applicant: Oracle International CorporationInventors: Duy Vu, Yu-Heng Hong, Ying Xu, Philip Arthur