Patents by Inventor Thanh Long Duong

Thanh Long Duong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240061834
    Abstract: Systems and methods identify whether an input utterance is suitable for providing to a machine learning model configured to generate a query for a database. Techniques include generating an input string by concatenating a natural language utterance with a database schema representation for a database; providing the input string to a first machine learning model; based on the input string, generating, by the first machine learning model, a score indicating whether the natural language utterance is translatable to a database query for the database and should be routed to a second machine learning model, the second machine learning model configured to generate a query for the database based on the natural language utterance; comparing the score to a threshold value; and responsive to determining that the score exceeds the threshold value, providing the natural language utterance or the input string to the second machine learning model.
    Type: Application
    Filed: August 21, 2023
    Publication date: February 22, 2024
    Applicant: Oracle International Corporation
    Inventors: Gioacchino Tangari, Cong Duy Vu Hoang, Poorya Zaremoodi, Philip Arthur, Nitika Mathur, Mark Edward Johnson, Thanh Long Duong
  • Publication number: 20240062021
    Abstract: Techniques are disclosed herein for calibrating confidence scores of a machine learning model trained to translate natural language to a meaning representation language. The techniques include obtaining one or more raw beam scores generated from one or more beam levels of a decoder of a machine learning model trained to translate natural language to a logical form, where each of the one or more raw beam scores is a conditional probability of a sub-tree determined by a heuristic search algorithm of the decoder at one of the one or more beam levels, classifying, by a calibration model, a logical form output by the machine learning model as correct or incorrect based on the one or more raw beam scores, and providing the logical form with a confidence score that is determined based on the classifying of the logical form.
    Type: Application
    Filed: February 9, 2023
    Publication date: February 22, 2024
    Applicant: Oracle International Corporation
    Inventors: Gioacchino Tangari, Cong Duy Vu Hoang, Mark Edward Johnson, Poorya Zaremoodi, Nitika Mathur, Aashna Devang Kanuga, Thanh Long Duong
  • Publication number: 20240062044
    Abstract: Techniques are disclosed herein for addressing catastrophic forgetting and over-generalization while training a model to transform natural language to a logical form such as a meaning representation language. The techniques include accessing training data comprising natural language examples, augmenting the training data to generate expanded training data, training a machine learning model on the expanded training data, and providing the trained machine learning model. The augmenting includes (i) generating contrastive examples by revising natural language of examples identified to have caused regression during training of a machine learning model with the training data, (ii) generating alternative examples by modifying operators of examples identified within the training data that belong to a concept that exhibits bias, or (iii) a combination of (i) and (ii).
    Type: Application
    Filed: August 18, 2023
    Publication date: February 22, 2024
    Applicant: Oracle International Corporation
    Inventors: Shivashankar Subramanian, Dalu Guo, Gioacchino Tangari, Nitika Mathur, Cong Duy Vu Hoang, Mark Edward Johnson, Thanh Long Duong
  • Publication number: 20240062108
    Abstract: Techniques are disclosed herein for training and deploying a named entity recognition model. The techniques include implementing a nested labeling scheme for named entities within the training data and then training a machine learning model on the training data The techniques further include extracting an entity hierarchy for a predicted class based on a hierarchical template associated with a composite label, where the predicted class is representative of multiple named entity classes comprising at least a parent class and a child class associated with the composite label. The techniques further include increasing the volume of training data via data mining for sequence tags in a language corpus and then training a machine learning model on the training data.
    Type: Application
    Filed: May 25, 2023
    Publication date: February 22, 2024
    Applicant: Oracle International Corporation
    Inventors: Tuyen Quang Pham, Bhagya Hettige, Gioacchino Tangari, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Thanh Long Duong
  • Publication number: 20240061835
    Abstract: Systems and methods fine-tune a pretrained machine learning model. For a model having multiple layers, an initial set of configurations is identified, each configuration establishing layers to be frozen and layers to be fine-tuned. A configuration that is optimized with respect to one or more parameters is selected, establishing a set of fine-tuning layers and a set of frozen layers. An input for the model is provided to a remote system. An output of the set of frozen layers of the model, given the provided input, is received back and locally stored. The set of fine-tuning layers of the model is loaded from the remote system. The model is fine-tuned by retrieving the locally stored output of the set of frozen layers, and updating weights associated with the set of fine-tuning layers of the machine learning model.
    Type: Application
    Filed: August 21, 2023
    Publication date: February 22, 2024
    Applicant: Oracle International Corporation
    Inventors: Shivashankar Subramanian, Gioacchino Tangari, Thanh Tien Vu, Cong Duy Vu Hoang, Poorya Zaremoodi, Dalu Guo, Mark Edward Johnson, Thanh Long Duong
  • Publication number: 20240061989
    Abstract: Techniques for generating text content arranged in a consistent read order from a source document including text corresponding to different read orders are disclosed. A system parses a binary file representing an electronic document to identify characters and metadata associated with the characters. The system pre-sorts a character order of characters in each line of the electronic document to generate an ordered list of characters arranged according to the right-to-left reading order. The system performs a layout-mirroring operation to change a position of characters within the modified document relative to a right edge of the document and a left edge of the document. Subsequent to performing layout-mirroring, the system identifies native left-to-right reading-order text in-line with the native right-to-left reading-order text.
    Type: Application
    Filed: February 15, 2023
    Publication date: February 22, 2024
    Applicant: Oracle International Corporation
    Inventors: Xu Zhong, Vishank Bhatia, Thanh Long Duong, Mark Johnson, Srinivasa Phani Kumar Gadde, Vishal Vishnoi
  • Patent number: 11908460
    Abstract: Disclosed herein are techniques for using a generative adversarial network (GAN) to train a semantic parser of a dialog system. A method described herein involves accessing seed data that includes seed tuples. Each seed tuple includes a respective seed utterance and a respective seed logical form corresponding to the respective seed utterance. The method further includes training a semantic parser and a discriminator in a GAN. The semantic parser learns to map utterances to logical forms based on output from the discriminator, and the discriminator learns to recognize authentic logical forms based on output from the semantic parser. The semantic parser may then be integrated into a dialog system.
    Type: Grant
    Filed: August 13, 2020
    Date of Patent: February 20, 2024
    Assignee: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson
  • Publication number: 20240028963
    Abstract: An augmentation and feature caching subsystem is described for training AI/ML models. In one particular aspect, a method is provided that includes receiving data comprising training examples, one or more augmentation configuration hyperparameters and one or more feature extraction configuration hyperparameters; generating a first key based on one of the training examples and the one or more augmentation configuration hyperparameters; searching a first key-value storage based on the first key; obtaining one or more augmentations based on the search of the first key-value storage; applying the obtained one or more augmentations to the training examples to result in augmented training examples; generating a second key based on one of the augmented training examples and the one or more feature extraction configuration hyperparameters; searching a second key-value storage based on the second key; obtaining one or more features based on the search of the second key-value storage.
    Type: Application
    Filed: July 11, 2023
    Publication date: January 25, 2024
    Applicant: Oracle International Corporation
    Inventors: Vladislav Blinov, Vishal Vishnoi, Thanh Long Duong, Mark Edward Johnson, Xin Xu, Elias Luqman Jalaluddin, Ying Xu, Ahmed Ataallah Ataallah Abobakr, Umanga Bista, Thanh Tien Vu
  • Patent number: 11868727
    Abstract: Techniques are provided for using context tags in named-entity recognition (NER) models. In one particular aspect, a method is provided that includes receiving an utterance, generating embeddings for words of the utterance, generating a regular expression and gazetteer feature vector for the utterance, generating a context tag distribution feature vector for the utterance, concatenating or interpolating the embeddings with the regular expression and gazetteer feature vector and the context tag distribution feature vector to generate a set of feature vectors, generating an encoded form of the utterance based on the set of feature vectors, generating log-probabilities based on the encoded form of the utterance, and identifying one or more constraints for the utterance.
    Type: Grant
    Filed: January 19, 2022
    Date of Patent: January 9, 2024
    Assignee: Oracle International Corporation
    Inventors: Duy Vu, Tuyen Quang Pham, Cong Duy Vu Hoang, Srinivasa Phani Kumar Gadde, Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi
  • Publication number: 20230419040
    Abstract: Novel techniques are described for data augmentation using a two-stage entity-aware augmentation to improve model robustness to entity value changes for intent prediction.
    Type: Application
    Filed: February 1, 2023
    Publication date: December 28, 2023
    Applicant: Oracle International Corporation
    Inventors: Ahmed Ataallah Ataallah Abobakr, Shivashankar Subramanian, Ying Xu, Vladislav Blinov, Umanga Bista, Tuyen Quang Pham, Thanh Long Duong, Mark Edward Johnson, Elias Luqman Jalaluddin, Vanshika Sridharan, Xin Xu, Srinivasa Phani Kumar Gadde, Vishal Vishnoi
  • Publication number: 20230419052
    Abstract: Novel techniques are described for positive entity-aware augmentation using a two-stage augmentation to improve the stability of the model to entity value changes for intent prediction. In one particular aspect, a method is provided that includes accessing a first set of training data for an intent prediction model, the first set of training data comprising utterances and intent labels; applying one or more positive data augmentation techniques to the first set of training data, depending on the tuning requirements for hyper-parameters, to result in a second set of training data, where the positive data augmentation techniques comprise Entity-Aware (“EA”) technique and a two-stage augmentation technique; combining the first set of training data and the second set of training data to generate expanded training data; and training the intent prediction model using the expanded training data.
    Type: Application
    Filed: February 1, 2023
    Publication date: December 28, 2023
    Applicant: Oracle International Corporation
    Inventors: Ahmed Ataallah Ataallah Abobakr, Shivashankar Subramanian, Ying Xu, Vladislav Blinov, Umanga Bista, Tuyen Quang Pham, Thanh Long Duong, Mark Edward Johnson, Elias Luqman Jalaluddin, Vanshika Sridharan, Xin XU, Srinivasa Phani Kumar Gadde, Vishal Vishnoi
  • Publication number: 20230419127
    Abstract: Novel techniques are described for negative entity-aware augmentation using a two-stage augmentation to improve the stability of the model to entity value changes for intent prediction. In some embodiments, a method comprises accessing a first set of training data for an intent prediction model, the first set of training data comprising utterances and intent labels; applying one or more negative entity-aware data augmentation techniques to the first set of training data, depending on the tuning requirements for hyper-parameters, to result in a second set of training data, where the one or more negative entity-aware data augmentation techniques comprise Keyword Augmentation Technique (“KAT”) plus entity without context technique and KAT plus entity in random context as OOD technique; combining the first set of training data and the second set of training data to generate expanded training data; and training the intent prediction model using the expanded training data.
    Type: Application
    Filed: February 1, 2023
    Publication date: December 28, 2023
    Applicant: Oracle International Corporation
    Inventors: Ahmed Ataallah Ataallah Abobakr, Shivashankar Subramanian, Ying Xu, Vladislav Blinov, Umanga Bista, Tuyen Quang Pham, Thanh Long Duong, Mark Edward Johnson, Elias Luqman Jalaluddin, Vanshika Sridharan, Xin Xu, Srinivasa Phani Kumar Gadde, Vishal Vishnoi
  • Publication number: 20230376700
    Abstract: Techniques are provided for generating training data to facilitate fine-tuning embedding models. Training data including anchor utterances is obtained. Positive utterances and negative utterances are generated from the anchor utterances. Tuples including the anchor utterances, the positive utterances, and the negative utterances are formed. Embeddings for the tuples are generated and a pre-trained embedding model is fine-tuned based on the embeddings. The fine-tuned model can be deployed to a system.
    Type: Application
    Filed: May 9, 2023
    Publication date: November 23, 2023
    Applicant: Oracle International Corporation
    Inventors: Umanga Bista, Vladislav Blinov, Mark Edward Johnson, Ahmed Ataallah Ataallah Abobakr, Thanh Long Duong, Srinivasa Phani Kumar Gadde, Vishal Vishnoi, Elias Luqman Jalaluddin, Xin Xu, Shivashankar Subramanian
  • Publication number: 20230376696
    Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances.
    Type: Application
    Filed: August 2, 2023
    Publication date: November 23, 2023
    Applicant: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi, Crystal C. Pan, Vladislav Blinov, Cong Duy Vu Hoang, Elias Luqman Jalaluddin, Duy Vu, Balakota Srinivas Vinnakota
  • Patent number: 11810553
    Abstract: Techniques described herein use backpropagation to train one or more machine learning (ML) models of a dialog system. For instance, a method includes accessing seed data that includes training tuples, where each training tuple comprising a respective logical form. The method includes converting the logical form of a training tuple to a converted logical form, by applying to the logical form a text-to-speech (TTS) subsystem, an automatic speech recognition (ASR) subsystem, and a semantic parser of a dialog system. The method includes determining a training signal by using an objective function to compare the converted logical form to the logical form. The method further includes training the TTS subsystem, the ASR subsystem, and the semantic parser via backpropagation based on the training signal. As a result of the training by backpropagation, the machine learning models are tuned work effectively together within a pipeline of the dialog system.
    Type: Grant
    Filed: October 26, 2022
    Date of Patent: November 7, 2023
    Assignee: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson
  • Patent number: 11790901
    Abstract: Described herein are dialog systems, and techniques for providing such dialog systems, that are suitable for use on standalone computing devices. In some embodiments, a dialog system includes a dialog manager, which takes as input an input logical form, which may be a representation of user input. The dialog manager may include a dialog state tracker, an execution subsystem, a dialog policy subsystem, and a context stack. The dialog state tracker may generate an intermediate logical form from the input logical form combined with a context from the context stack. The context stack may maintain a history of a current dialog, and thus, the intermediate logical form may include contextual information potentially missing from the input logical form. The execution subsystem may execute the intermediate logical form to produce an execution result, and the dialog policy subsystem may generate an output logical form based on the execution result.
    Type: Grant
    Filed: December 30, 2022
    Date of Patent: October 17, 2023
    Assignee: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson, Vu Cong Duy Hoang, Tuyen Quang Pham, Yu-Heng Hong, Vladislavs Dovgalecs, Guy Bashkansky, Jason Eric Black, Andrew David Bleeker, Serge Le Huitouze
  • Patent number: 11763092
    Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances.
    Type: Grant
    Filed: March 30, 2021
    Date of Patent: September 19, 2023
    Assignee: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi, Crystal C. Pan, Vladislav Blinov, Cong Duy Vu Hoang, Elias Luqman Jalaluddin, Duy Vu, Balakota Srinivas Vinnakota
  • Publication number: 20230206125
    Abstract: Techniques are provided for improved training of a machine learning model using lexical dropout. A machine learning model and a training data set are accessed. The training data set can include sample utterances and corresponding labels. A dropout parameter is identified. The dropout parameter can indicate a likelihood for dropping out one or more feature vectors for tokens associated with respective entities during training of the machine learning model. The dropout parameter is applied to feature vectors for tokens associated with respective entities. The machine learning model is trained using the training data set and the dropout parameter to generate a trained machine learning model. The use of the trained the machine learning model is facilitated.
    Type: Application
    Filed: December 22, 2022
    Publication date: June 29, 2023
    Applicant: Oracle International Corporation
    Inventors: Tuyen Quang Pham, Cong Duy Vu Hoang, Thanh Tien Vu, Mark Edward Johnson, Thanh Long Duong
  • Publication number: 20230205999
    Abstract: Techniques are provided for named entity recognition using a gazetteer incorporated with a neural network. An utterance is received from a user. The utterance is input into a neural network comprising model parameters learned for named entity recognition. The neural network generates a first representation of one or more named entities based on the utterance. A gazetteer is searched based on the input utterance to generate a second representation of one or more named entities identified in the utterance. The first named entity representation is combined with the second named entity representation to generate a combined named entity representation. The combined named entity representation is output for facilitating a response to the user.
    Type: Application
    Filed: December 22, 2022
    Publication date: June 29, 2023
    Applicant: Oracle International Corporation
    Inventors: Tuyen Quang Pham, Cong Duy Vu Hoang, Mark Edward Johnson, Thanh Long Duong
  • Publication number: 20230186161
    Abstract: Techniques are disclosed herein for synthesizing synthetic training data to facilitate training a natural language to logical form model. In one aspect, training data can be synthesized from original under a framework based on templates and a synchronous context-free grammar. In one aspect, training data can be synthesized under a framework based on a probabilistic context-free grammar and a translator. In one aspect, training data can be synthesized under a framework based on tree-to-string translation. In one aspect, the synthetic training data can be combined with original training data in order to train a machine learning model to translate an utterance to a logical form.
    Type: Application
    Filed: December 13, 2022
    Publication date: June 15, 2023
    Applicant: Oracle International Corporation
    Inventors: Philip Arthur, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Srinivasa Phani Kumar Gadde, Balakota Srinivas Vinnakota, Cong Duy Vu Hoang, Steve Wai-Chun Siu, Nitika Mathur, Gioacchino Tangari, Aashna Devang Kanuga