Patents by Inventor Duy Vu

Duy Vu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

INSTRUCTION INDUCTION FOR NL2SQL PROMPTS AND GENERATIVE MODELS

Publication number: 20260161622

Abstract: Techniques are disclosed herein to augment a prompt for a generative model. The techniques may include accessing a set of training examples, generating a logical form query that corresponds to the natural language utterance for each example of the set of training examples, evaluating performance of the generative model using the logical form query generated for each example of the set of training examples, identifying, based on the evaluating, a subset of rejected examples from the set of examples, grouping the subset of rejected examples into at least one or more groups of rejected examples based on similarity between embedding vectors for the subset of rejected examples, generating, by a generative model, instructions for each group of the one or more groups of rejected examples, and modifying the baseline prompt to include the instructions for each group of the one or more groups of rejected examples.

Type: Application

Filed: June 30, 2025

Publication date: June 11, 2026

Applicant: Oracle International Corporation

Inventors: Duy Vu, Gioacchino Tangari, Budhaditya Saha, Dai Quoc Nguyen, Thanh Tien Vu, Nagesh Panyam Chandrasekarasatry, Cong Duy Vu Hoang, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Thanh Long Duong
BENCHMARKING AND MODIFYING BEHAVIORAL ROBUSTNESS OF TEXT-TO-SQL MODELS

Publication number: 20260162009

Abstract: Techniques are disclosed herein towards a process for enhancing generative model robustness. The process includes accessing a training data set comprising examples, each with a natural language utterance and a database schema. Each example is augmented by generating augmentation prompts with perturbation instructions, which direct the generative model to modify the prompt using one or more categories of perturbations. These perturbations may alter the natural language utterance, the database schema, or both, resulting in variant prompts. The augmented examples are added to an augmented training data set. Using these augmented training examples, a pre-trained generative model is fine-tuned, yielding a fine-tuned generative model capable of accurately and consistently producing structured queries in response to diverse natural language inputs and schema configurations.

Type: Application

Filed: August 15, 2025

Publication date: June 11, 2026

Applicant: Oracle International Corporation

Inventors: Varsha Kuppur Rajendra, Duy Vu, Gioacchino Tangari, Dalu Guo, Steve Wai-Chun Siu, Cong Duy Vu Hoang, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Thanh Long Duong
GENERATIVE MODEL BASED QUERY LANGUAGE GENERATION FOR DATE TIME EXPRESSIONS

Publication number: 20260161680

Abstract: Techniques are disclosed herein towards logical form data-time expression generation. For example, methods are provided for receiving a natural language (NL) utterance, generating a prompt including the NL utterance and instructions to transform the NL utterance into a logical form query including a coded-form expression, generating, by a generative model based on the prompt, a logical form query including a coded-form expression, transforming the coded-form expression into a period definition expression by executing the coded-form expression with one or more pre-defined period-definition content items, updating the logical form query to include the period definition expression by replacing the coded-form expression with the period definition expression, and providing at least one of i) the updated logical form query or ii) a query result obtained based on the updated logical form query, to a client system.

Type: Application

Filed: June 30, 2025

Publication date: June 11, 2026

Applicant: Oracle International Corporation

Inventors: Duy Vu, Nagesh Panyam Chandrasekarasastry, Gioacchino Tangari, Cong Duy Vu Hoang, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Thanh Long Duong, Anshuk Pal Chaudhuri, Karthik Puthenparampil Srinivasan, Kok Wei Kam, Prabhakara Reddy Munnangi, Subhash Kumar Bhamidipati
Named entity bias detection and mitigation techniques for sentence sentiment analysis

Patent number: 12632786

Abstract: Techniques for named entity bias detection and mitigation for sentence sentiment analysis. In one particular aspect, a method is provided that includes obtaining a training set of labeled examples for training a machine learning model to classify sentiment, preparing a list of named entities using one or more data sources, for each example in the training set of labeled examples with a named entity, replacing the named entity with a corresponding entity type tag to generate a labeled template data set, executing a sampling process for each entity type t within the labeled template data set to generate a augmented invariance data set comprising one or more invariance groups having labeled examples for each entity type t, and training the machine learning model using labeled examples from the augmented invariance data set.

Type: Grant

Filed: November 10, 2022

Date of Patent: May 19, 2026

Assignee: ORACLE INTERNATIONAL CORPORATION

Inventors: Duy Vu, Varsha Kuppur Rajendra, Shivashankar Subramanian, Ahmed Ataallah Ataallah Abobakr, Thanh Long Duong, Mark Edward Johnson
Wide and deep network for language detection using hash embeddings

Patent number: 12602545

Abstract: Techniques disclosed herein relate generally to language detection. In one particular aspect, a method is provided that includes obtaining a sequence of n-grams of a textual unit; using an embedding layer to obtain an ordered plurality of embedding vectors for the sequence of n-grams; using a deep network to obtain an encoded vector that is based on the ordered plurality of embedding vectors; and using a classifier to obtain a language prediction for the textual unit that is based on the encoded vector. The deep network includes an attention mechanism, and using the embedding layer to obtain the ordered plurality of embedding vectors comprises, for each n-gram in the sequence of n-grams: obtaining hash values for the n-gram; based on the hash values, selecting component vectors from among the plurality of component vectors; and obtaining an embedding vector for the n-gram that is based on the component vectors.

Type: Grant

Filed: November 4, 2022

Date of Patent: April 14, 2026

Assignee: Oracle International Corporation

Inventors: Thanh Tien Vu, Poorya Zaremoodi, Duy Vu, Mark Edward Johnson, Thanh Long Duong, Xu Zhong, Vladislav Blinov, Cong Duy Vu Hoang, Yu-Heng Hong, Vinamr Goel, Philip Victor Ogren, Srinivasa Phani Kumar Gadde, Vishal Vishnoi
FUSION OF WORD EMBEDDINGS AND WORD SCORES FOR TEXT CLASSIFICATION

Publication number: 20260099675

Abstract: Techniques disclosed herein relate generally to text classification and include techniques for fusing word embeddings with word scores for text classification. In one particular aspect, a method for text classification is provided that includes obtaining an embedding vector for a textual unit, based on a plurality of word embedding vectors and a plurality of word scores. The plurality of word embedding vectors includes a corresponding word embedding vector for each of a plurality of words of the textual unit, and the plurality of word scores includes a corresponding word score for each of the plurality of words of the textual unit. The method also includes passing the embedding vector for the textual unit through at least one feed-forward layer to obtain a final layer output, and performing a classification on the final layer output.

Type: Application

Filed: December 10, 2025

Publication date: April 9, 2026

Applicant: Oracle International Corporation

Inventors: Ahmed Ataallah Ataallah Abobakr, Mark Edward Johnson, Thanh Long Duong, Vladislav Blinov, Yu-Heng Hong, Cong Duy Vu Hoang, Duy Vu
LARGE LANGUAGE MODELS FOR NL2SQL WITH LONG CONTEXT FINETUNING

Publication number: 20260080260

Abstract: The present disclosure relates to manufacturing training and testing data by leveraging data augmentation techniques to generate examples of long context database schemas. Aspects are directed towards accessing a training dataset comprising training examples where each training example may include i) a prompt including a natural language utterance and a database schema having one or more tables, and ii) a gold logical form corresponding to the natural language utterance, combining the tables from the database schemas in the training examples may generate a combined database schema set, generating a set of long context training examples based on the training dataset and the combined database schema set, and incorporating the long context database schema into the selected training example to generate a long context training example to train a generative artificial intelligence model with at least the set of long context training examples to generate a trained generative artificial intelligence model.

Type: Application

Filed: January 23, 2025

Publication date: March 19, 2026

Applicant: Oracle International Corporation

Inventors: Dai Quoc Nguyen, Cong Duy Vu Hoang, Duy Vu, Gioacchino Tangari, Steve Wai-Chun Siu, Dalu Guo, Budhaditya Saha, Thanh Tien Vu, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Thanh Long Duong, Anshuk Pal Chaudhuri, Prabhakara Reddy Munnangi, Subash Kumar Bhamidipati
Data augmentation and batch balancing methods to enhance negation and fairness

Patent number: 12579471

Abstract: Techniques for augmentation and batch balancing of training data to enhance negation and fairness of a machine learning model. In one particular aspect, a method is provided that includes obtaining a training set of labeled examples for training a machine learning model to classify sentiment, searching the training set of labeled examples or an unlabeled corpus of text on target domains for sentiment examples having negation cues, sentiment laden words, words with sentiment prefixes or suffixes, or a combination thereof, rewriting the sentiment examples to create negated versions thereof and generate a labeled negation pair data set, and training the machine learning model using labeled examples from the labeled negation pair data set.

Type: Grant

Filed: November 10, 2022

Date of Patent: March 17, 2026

Assignee: ORACLE INTERNATIONAL CORPORATION

Inventors: Duy Vu, Varsha Kuppur Rajendra, Dai Hoang Tran, Shivashankar Subramanian, Poorya Zaremoodi, Thanh Long Duong, Mark Edward Johnson
IN-CONTEXT LEARNING FOR NL2SQL WITH PATTERN BASED RETRIEVAL

Publication number: 20260072909

Abstract: The present disclosure relates to machine learning techniques for In-Context-Learning (ICL) with pattern-based retrieval for the task of converting Natural Language (NL) to Structured Query Language (SQL). Aspects are directed towards acquiring a natural language utterance and a database schema, searching, using at least a portion of the natural language utterance as a key, a memory bank for one or more in-context examples that are relevant to the key, generating a prompt comprising the natural language query, the database schema, and the one or more in-context examples, transmitting the prompt to a first pretrained generative artificial intelligence model, receiving, from the first pretrained generative artificial intelligence model, a logical form corresponding to the natural language utterance based at least in part on the prompt, executing the logical form on a database to obtain a query result, and providing the query result to a user.

Type: Application

Filed: February 25, 2025

Publication date: March 12, 2026

Applicant: Oracle International Corporation

Inventors: Nagesh Panyam Chandrasekarasatry, Duy Vu, Dai Quoc Nguyen, Gioacchino Tangari, Cong Duy Vu Hoang, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Vladislav Blinov, Thanh Tien Vu, Ying Xu, Thanh Long Duong
Multi-task model with context masking

Patent number: 12554934

Abstract: A method includes accessing document including sentences, document being associated with configuration flag indicating whether ABSA, SLSA, or both are to be performed; inputting the document into language model that generates chunks of token embeddings for the document; and, based on the configuration flag, performing at least one from among the ABSA and the SLSA by inputting the chunks of token embeddings into a multi-task model. When performing the SLSA, a part of token embeddings in each of the chunks is masked, and the masked token embeddings do not belong to a particular sentence on which the SLSA is performed.

Type: Grant

Filed: October 12, 2023

Date of Patent: February 17, 2026

Assignee: Oracle International Corporation

Inventors: Poorya Zaremoodi, Duy Vu, Nagaraj N. Bhat, Srijon Sarkar, Varsha Kuppur Rajendra, Thanh Long Duong, Mark Edward Johnson, Pramir Sarkar, Shahid Reza
EXECUTION AND SEMANTIC ERROR CORRECTION CAPABILITIES FOR NATURAL LANGUAGE TO LOGICAL FORM MODEL

Publication number: 20260037505

Abstract: Techniques are disclosed herein for providing and using a natural language to logical form model having execution and sematic error correction capabilities. In one aspect, a method is disclosed that includes: accessing a set of training examples and generating a set of error correction training examples via an iterative process performed for each training example. The iterative process includes generating an inferred logical form, executing the inferred logical form on a database, when executing the inferred logical form on the database fails, obtaining an execution error message corresponding to the failure, and recording the inferred logical form and the execution error message as part of an execution error example, and populating an error correction prompt template with the execution error example to generate an error correction training example. A machine learning model may then be trained with at least the set of error correction training examples.

Type: Application

Filed: July 30, 2024

Publication date: February 5, 2026

Applicant: Oracle International Corporation

Inventors: Duy Vu, Steve Wai-Chun Siu, Gioacchino Tangari, Cong Duy Vu Hoang, Vladislav Blinov, Yakupitiyage Don Thanuja Samodhve Dharmasiri, Ying Xu, Thanh Long Duong
Data augmentation and batch balancing for training multi-lingual model

Patent number: 12530545

Abstract: A computer-implemented method includes: accessing a plurality of datasets, where each dataset of the plurality of datasets includes training examples; selecting datasets that include the training examples in a source language and a target language; and sampling, based on a sampling weight that is determined for each of the selected datasets, the training examples from the selected datasets to generate the training batches; training an ML model for performing at least a first task using the training examples of the training batches, by interleavingly inputting the training batches to the ML model; and outputting the trained ML model configured to perform the at least the first task on input utterances provided in at least one among the source language and the target language. The sampling weight is determined for each of the selected datasets based on one or more attributes common to the training examples of the selected dataset.

Type: Grant

Filed: October 12, 2023

Date of Patent: January 20, 2026

Assignee: Oracle International Corporation

Inventors: Duy Vu, Poorya Zaremoodi, Nagaraj N. Bhat, Srijon Sarkar, Varsha Kuppur Rajendra, Thanh Long Duong, Mark Edward Johnson, Pramir Sarkar, Shahid Reza
Fusion of word embeddings and word scores for text classification

Patent number: 12518098

Abstract: Techniques disclosed herein relate generally to text classification and include techniques for fusing word embeddings with word scores for text classification. In one particular aspect, a method for text classification is provided that includes obtaining an embedding vector for a textual unit, based on a plurality of word embedding vectors and a plurality of word scores. The plurality of word embedding vectors includes a corresponding word embedding vector for each of a plurality of words of the textual unit, and the plurality of word scores includes a corresponding word score for each of the plurality of words of the textual unit. The method also includes passing the embedding vector for the textual unit through at least one feed-forward layer to obtain a final layer output, and performing a classification on the final layer output.

Type: Grant

Filed: September 29, 2022

Date of Patent: January 6, 2026

Assignee: Oracle International Corporation

Inventors: Ahmed Ataallah Ataallah Abobakr, Mark Edward Johnson, Thanh Long Duong, Vladislav Blinov, Yu-Heng Hong, Cong Duy Vu Hoang, Duy Vu
CONTEXT TAG INTEGRATION WITH NAMED ENTITY RECOGNITION MODELS

Publication number: 20250307556

Abstract: Techniques are provided for using context tags in named-entity recognition (NER) models. In one particular aspect, a method is provided that includes receiving an utterance, generating embeddings for words of the utterance, generating a regular expression and gazetteer feature vector for the utterance, generating a context tag distribution feature vector for the utterance, concatenating or interpolating the embeddings with the regular expression and gazetteer feature vector and the context tag distribution feature vector to generate a set of feature vectors, generating an encoded form of the utterance based on the set of feature vectors, generating log-probabilities based on the encoded form of the utterance, and identifying one or more constraints for the utterance.

Type: Application

Filed: June 11, 2025

Publication date: October 2, 2025

Applicant: Oracle International Corporation

Inventors: Duy Vu, Tuyen Quang Pham, Cong Duy Vu Hoang, Srinivasa Phani Kumar Gadde, Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi
Data augmentation and batch balancing methods to enhance negation and fairness

Patent number: 12412126

Abstract: Techniques for augmentation and batch balancing of training data to enhance negation and fairness of a machine learning model. In one particular aspect, a method is provided that includes generating a list of demographic words associated with a demographic group, searching an unlabeled corpus of text to identify unlabeled examples in a target domain comprising at least one demographic word from the list of demographic words, rewriting the unlabeled examples to create one or more versions of each of the unlabeled examples and generate a fairness invariance data set, and training the machine learning model using unlabeled examples from the fairness invariance data set.

Type: Grant

Filed: November 10, 2022

Date of Patent: September 9, 2025

Assignee: ORACLE INTERNATIONAL CORPORATION

Inventors: Duy Vu, Varsha Kuppur Rajendra, Dai Hoang Tran, Shivashankar Subramanian, Poorya Zaremoodi, Thanh Long Duong, Mark Edward Johnson
Context tag integration with named entity recognition models

Patent number: 12361219

Abstract: Techniques are provided for using context tags in named-entity recognition (NER) models. In one particular aspect, a method is provided that includes receiving an utterance, generating embeddings for words of the utterance, generating a regular expression and gazetteer feature vector for the utterance, generating a context tag distribution feature vector for the utterance, concatenating or interpolating the embeddings with the regular expression and gazetteer feature vector and the context tag distribution feature vector to generate a set of feature vectors, generating an encoded form of the utterance based on the set of feature vectors, generating log-probabilities based on the encoded form of the utterance, and identifying one or more constraints for the utterance.

Type: Grant

Filed: November 28, 2023

Date of Patent: July 15, 2025

Assignee: Oracle International Corporation

Inventors: Duy Vu, Tuyen Quang Pham, Cong Duy Vu Hoang, Srinivasa Phani Kumar Gadde, Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi
FRAMEWORK FOR FOCUSED TRAINING OF LANGUAGE MODELS AND TECHNIQUES FOR END-TO-END HYPERTUNING OF THE FRAMEWORK

Publication number: 20250218428

Abstract: Techniques are disclosed herein for focused training of language models and end-to-end hypertuning of the framework. In one aspect, a method is provided that includes obtaining a machine learning model pre-trained for language modeling, and post-training the machine learning model for various tasks to generate a focused machine learning model. The post-training includes: (i) training the machine learning model on an unlabeled set of training data pertaining to a task that the machine learning model was pre-trained for as part of the language modeling, and the unlabeled set of training data is obtained with respect to a target domain, a target task, or a target language, and (ii) training the machine learning model on a labeled set of training data that pertains to another task that is an auxiliary task related to a downstream task to be performed using the machine learning model or output from the machine learning model.

Type: Application

Filed: March 20, 2025

Publication date: July 3, 2025

Applicant: Oracle International Corporation

Inventors: Poorya Zaremoodi, Cong Duy Vu Hoang, Duy Vu, Dai Hoang Tran, Budhaditya Saha, Nagaraj N. Bhat, Thanh Tien Vu, Tuyen Quang Pham, Adam Craig Pocock, Katherine Silverstein, Srinivasa Phani Kumar Gadde, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong
Techniques for out-of-domain (OOD) detection

Patent number: 12299402

Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances.

Type: Grant

Filed: May 9, 2024

Date of Patent: May 13, 2025

Assignee: Oracle International Corporation

Inventors: Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi, Crystal C. Pan, Vladislav Blinov, Cong Duy Vu Hoang, Elias Luqman Jalaluddin, Duy Vu, Balakota Srinivas Vinnakota
Framework for focused training of language models and techniques for end-to-end hypertuning of the framework

Patent number: 12288550

Abstract: Techniques are disclosed herein for focused training of language models and end-to-end hypertuning of the framework. In one aspect, a method is provided that includes obtaining a machine learning model pre-trained for language modeling, and post-training the machine learning model for various tasks to generate a focused machine learning model. The post-training includes: (i) training the machine learning model on an unlabeled set of training data pertaining to a task that the machine learning model was pre-trained for as part of the language modeling, and the unlabeled set of training data is obtained with respect to a target domain, a target task, or a target language, and (ii) training the machine learning model on a labeled set of training data that pertains to another task that is an auxiliary task related to a downstream task to be performed using the machine learning model or output from the machine learning model.

Type: Grant

Filed: September 23, 2022

Date of Patent: April 29, 2025

Assignee: ORACLE INTERNATIONAL CORPORATION

Inventors: Poorya Zaremoodi, Cong Duy Vu Hoang, Duy Vu, Dai Hoang Tran, Budhaditya Saha, Nagaraj N. Bhat, Thanh Tien Vu, Tuyen Quang Pham, Adam Craig Pocock, Katherine Silverstein, Srinivasa Phani Kumar Gadde, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong
SYSTEM AND METHOD FOR IMPROVING AN END-TO-END AUTOMATIC SPEECH RECOGNITION MODEL

Publication number: 20250095636

Abstract: Techniques are disclosed herein for improving the performance of an end-to-end (E2E) Automatic Speech Recognition (ASR) model in a target domain. A set of test examples are generated. The set of test examples comprise multiple subsets of test examples and each subset of test examples corresponds to a particular test category. A machine language model is then used to convert audio samples of the subset of test examples to text transcripts. A word error rate is determined for the subset of test examples. A test category is then selected based on the word error rates and a set of training examples is generated for training the ASR model in a particular target domain from a selected subset of test examples The training examples are used to fine-tune the model in the target domain. The trained model is then deployed in a cloud infrastructure of a cloud service provider.

Type: Application

Filed: September 3, 2024

Publication date: March 20, 2025

Applicant: Oracle International Corporation

Inventors: Duy Vu, Yu-Heng Hong, Ying Xu, Philip Arthur

1 2 next