Patents by Inventor Thanh Long Duong

Thanh Long Duong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230186025
    Abstract: Techniques for preprocessing data assets to be used in a natural language to logical form model based on scalable search and content-based schema linking. In one particular aspect, a method includes accessing an utterance, classifying named entities within the utterance into predefined classes, searching value lists within the database schema using tokens from the utterance to identify and output value matches including: (i) any value within the value lists that matches a token from the utterance and (ii) any attribute associated with a matching value, generating a data structure by organizing and storing: (i) each of the named entities and an assigned class for each of the named entities, (ii) each of the value matches and the token matching each of the value matches, and (iii) the utterance, in a predefined format for the data structure, and outputting the data structure.
    Type: Application
    Filed: December 13, 2022
    Publication date: June 15, 2023
    Applicant: Oracle International Corporation
    Inventors: Jae Min John, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Srinivasa Phani Kumar Gadde, Balakota Srinivas Vinnakota, Shivashankar Subramanian, Cong Duy Vu Hoang, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Nitika Mathur, Aashna Devang Kanuga, Philip Arthur, Gioacchino Tangari, Steve Wai-Chun Siu
  • Publication number: 20230185799
    Abstract: Techniques are disclosed for training a model, using multi-task learning, to transform natural language to a logical form. In one particular aspect, a method includes accessing a first set of utterances that have non-follow-up utterances and a second set of utterances that have initial utterances and associated one or more follow-up utterances and training a model for translating an utterance to a logical form. The training is a joint training process that includes calculating a first loss for a first semantic parsing task based on one or more non-follow-up utterances from the first set of utterances, calculating a second loss for a second semantic parsing task based on one or more initial utterances and associated one or more follow-up utterances from the second set of utterances, combining the first and second losses to obtain a final loss, and updating model parameters of the model based on the final loss.
    Type: Application
    Filed: December 13, 2022
    Publication date: June 15, 2023
    Applicant: Oracle International Corporation
    Inventors: Cong Duy Vu Hoang, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Srinivasa Phani Kumar Gadde, Balakota Srinivas Vinnakota
  • Publication number: 20230185834
    Abstract: Techniques are disclosed herein for synthesizing synthetic training data to facilitate training a natural language to logical form model. In one aspect, training data can be synthesized from original under a framework based on templates and a synchronous context-free grammar. In one aspect, training data can be synthesized under a framework based on a probabilistic context-free grammar and a translator. In one aspect, training data can be synthesized under a framework based on tree-to-string translation. In one aspect, the synthetic training data can be combined with original training data in order to train a machine learning model to translate an utterance to a logical form.
    Type: Application
    Filed: December 13, 2022
    Publication date: June 15, 2023
    Applicant: Oracle International Corporation
    Inventors: Philip Arthur, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Srinivasa Phani Kumar Gadde, Balakota Srinivas Vinnakota, Cong Duy Vu Hoang, Steve Wai-Chun Siu, Nitika Mathur, Gioacchino Tangari, Aashna Devang Kanuga
  • Publication number: 20230186914
    Abstract: Described herein are dialog systems, and techniques for providing such dialog systems, that are suitable for use on standalone computing devices. In some embodiments, a dialog system includes a dialog manager, which takes as input an input logical form, which may be a representation of user input. The dialog manager may include a dialog state tracker, an execution subsystem, a dialog policy subsystem, and a context stack. The dialog state tracker may generate an intermediate logical form from the input logical form combined with a context from the context stack. The context stack may maintain a history of a current dialog, and thus, the intermediate logical form may include contextual information potentially missing from the input logical form. The execution subsystem may execute the intermediate logical form to produce an execution result, and the dialog policy subsystem may generate an output logical form based on the execution result.
    Type: Application
    Filed: December 30, 2022
    Publication date: June 15, 2023
    Applicant: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson, Vu Cong Duy Hoang, Tuyen Quang Pham, Yu-Heng Hong, Vladislavs Dovgalecs, Guy Bashkansky, Jason Eric Black, Andrew David Bleeker, Serge Le Huitouze
  • Publication number: 20230186026
    Abstract: Techniques are disclosed herein for synthesizing synthetic training data to facilitate training a natural language to logical form model. In one aspect, training data can be synthesized from original under a framework based on templates and a synchronous context-free grammar. In one aspect, training data can be synthesized under a framework based on a probabilistic context-free grammar and a translator. In one aspect, training data can be synthesized under a framework based on tree-to-string translation. In one aspect, the synthetic training data can be combined with original training data in order to train a machine learning model to translate an utterance to a logical form.
    Type: Application
    Filed: December 13, 2022
    Publication date: June 15, 2023
    Applicant: Oracle International Corporation
    Inventors: Philip Arthur, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Srinivasa Phani Kumar Gadde, Balakota Srinivas Vinnakota, Cong Duy Vu Hoang, Steve Wai-Chun Siu, Nitika Mathur, Gioacchino Tangari, Aashna Devang Kanuga
  • Publication number: 20230169955
    Abstract: Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.
    Type: Application
    Filed: November 23, 2022
    Publication date: June 1, 2023
    Applicant: Oracle International Corporation
    Inventors: Elias Luqman Jalaluddin, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Yu-Heng Hong, Balakota Srinivas Vinnakota
  • Publication number: 20230153688
    Abstract: Techniques for augmentation and batch balancing of training data to enhance negation and fairness of a machine learning model. In one particular aspect, a method is provided that includes obtaining a training set of labeled examples for training a machine learning model to classify sentiment, searching the training set of labeled examples or an unlabeled corpus of text on target domains for sentiment examples having negation cues, sentiment laden words, words with sentiment prefixes or suffixes, or a combination thereof, rewriting the sentiment examples to create negated versions thereof and generate a labeled negation pair data set, and training the machine learning model using labeled examples from the labeled negation pair data set.
    Type: Application
    Filed: November 10, 2022
    Publication date: May 18, 2023
    Applicant: Oracle International Corporation
    Inventors: Duy Vu, Varsha Kuppur Rajendra, Dai Hoang Tran, Shivashankar Subramanian, Poorya Zaremoodi, Thanh Long Duong, Mark Edward Johnson
  • Publication number: 20230154455
    Abstract: Techniques are provided for improved training of a machine-learning model that includes multiple layers and is configured to process textual language input. The machine-learning model includes one or more blocks in which each block includes a multi-head self-attention network, a first connection for providing input to the multi-head self-attention network, and a second (residual) connection for providing the input to a normalization layer, bypassing the multi-head self-attention network. During training, the second connection is dropped out according to a dropout parameter. Additionally, or alternatively, an attention weight matrix is used for dropout by blocking diagonal entries in the attention weight matrix. As a result, the machine-learning model increasingly focuses on contextual information, which provides more accurate language processing results.
    Type: Application
    Filed: November 16, 2022
    Publication date: May 18, 2023
    Applicant: Oracle International Corporation
    Inventors: Thanh Tien Vu, Tuyen Quang Pham, Mark Edward Johnson, Thanh Long Duong
  • Publication number: 20230153687
    Abstract: Techniques for named entity bias detection and mitigation for sentence sentiment analysis. In one particular aspect, a method is provided that includes obtaining a training set of labeled examples for training a machine learning model to classify sentiment, preparing a list of named entities using one or more data sources, for each example in the training set of labeled examples with a named entity, replacing the named entity with a corresponding entity type tag to generate a labeled template data set, executing a sampling process for each entity type t within the labeled template data set to generate a augmented invariance data set comprising one or more invariance groups having labeled examples for each entity type t, and training the machine learning model using labeled examples from the augmented invariance data set.
    Type: Application
    Filed: November 10, 2022
    Publication date: May 18, 2023
    Applicant: Oracle International Corporation
    Inventors: Duy Vu, Varsha Kuppur Rajendra, Shivashankar Subramanian, Ahmed Ataallah Ataallah Abobakr, Thanh Long Duong, Mark Edward Johnson
  • Publication number: 20230153528
    Abstract: Techniques for augmentation and batch balancing of training data to enhance negation and fairness of a machine learning model. In one particular aspect, a method is provided that includes generating a list of demographic words associated with a demographic group, searching an unlabeled corpus of text to identify unlabeled examples in a target domain comprising at least one demographic word from the list of demographic words, rewriting the unlabeled examples to create one or more versions of each of the unlabeled examples and generate a fairness invariance data set, and training the machine learning model using unlabeled examples from the fairness invariance data set.
    Type: Application
    Filed: November 10, 2022
    Publication date: May 18, 2023
    Applicant: Oracle International Corporation
    Inventors: Duy Vu, Varsha Kuppur Rajendra, Dai Hoang Tran, Shivashankar Subramanian, Poorya Zaremoodi, Thanh Long Duong, Mark Edward Johnson
  • Patent number: 11651768
    Abstract: Techniques for stop word data augmentation for training chatbot systems in natural language processing. In one particular aspect, a computer-implemented method includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with stop words to generate an augmented training set of out-of-domain utterances for an unresolved intent category corresponding to an unresolved intent; and training the intent classifier using the training set of utterances and the augmented training set of out-of-domain utterances. The augmenting includes: selecting one or more utterances from the training set of utterances, and for each selected utterance, preserving existing stop words within the utterance and replacing at least one non-stop word within the utterance with a stop word or stop word phrase selected from a list of stop words to generate an out-of-domain utterance.
    Type: Grant
    Filed: September 9, 2020
    Date of Patent: May 16, 2023
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Vishal Vishnoi, Mark Edward Johnson, Elias Luqman Jalaluddin, Balakota Srinivas Vinnakota, Thanh Long Duong, Gautam Singaraju
  • Publication number: 20230141853
    Abstract: Techniques disclosed herein relate generally to language detection. In one particular aspect, a method is provided that includes obtaining a sequence of n-grams of a textual unit; using an embedding layer to obtain an ordered plurality of embedding vectors for the sequence of n-grams; using a deep network to obtain an encoded vector that is based on the ordered plurality of embedding vectors; and using a classifier to obtain a language prediction for the textual unit that is based on the encoded vector. The deep network includes an attention mechanism, and using the embedding layer to obtain the ordered plurality of embedding vectors comprises, for each n-gram in the sequence of n-grams: obtaining hash values for the n-gram; based on the hash values, selecting component vectors from among the plurality of component vectors; and obtaining an embedding vector for the n-gram that is based on the component vectors.
    Type: Application
    Filed: November 4, 2022
    Publication date: May 11, 2023
    Applicant: Oracle International Corporation
    Inventors: Thanh Tien Vu, Poorya Zaremoodi, Duy Vu, Mark Edward Johnson, Thanh Long Duong, Xu Zhong, Vladislav Blinov, Cong Duy Vu Hoang, Yu-Heng Hong, Vinamr Goel, Philip Victor Ogren, Srinivasa Phani Kumar Gadde, Vishal Vishnoi
  • Publication number: 20230136965
    Abstract: In some aspects, a computer obtains a trained conditional random field (CRF) model comprising a set of model parameters learned from training data and stored in a transition matrix. Tag sequences, inconsistent with the tag sequence logic, are identified for the tags within the transition matrix. setting, within the transition matrix, a cost associated with transitioning between the pair of tags to be equal to a predefined hyperparameter value that penalizes the transitioning between the inconsistent pair of tags. The CRF model receives a string of text comprising one or more named entities. The CRF model inputs the string of text into the CRF model having the cost associated with the transitioning between the pair of tags set equal to the predefined hyperparameter value. The CRF model classifies the words within the string of text into different classes which might include the one or more named entities.
    Type: Application
    Filed: October 31, 2022
    Publication date: May 4, 2023
    Applicant: Oracle International Corporation
    Inventors: Thanh Tien Vu, Tuyen Quang Pham, Mark Edward Johnson, Thanh Long Duong, Aashna Devang Kanuga, Srinivasa Phani Kumar Gadde, Vishal Vishnoi
  • Publication number: 20230139397
    Abstract: Deep learning techniques are disclosed for extraction of embedded data from documents. In an exemplary technique, a set of unstructured text data is received. One or more text groupings are generated by processing the set of unstructured text data. One or more text grouping embeddings are generated in a format for input to a machine learning model based on the one or more generated text groupings. One or more output predictions are generated by inputting the one or more text grouping embeddings into the machine learning model. Each output prediction of the one or more output predictions correspond to a predicted aspect of a text grouping of the one or more text groupings.
    Type: Application
    Filed: August 12, 2022
    Publication date: May 4, 2023
    Applicant: Oracle International Corporation
    Inventors: Xu Zhong, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Thanh Long Duong, Mark Edward Johnson
  • Publication number: 20230115321
    Abstract: Techniques are provided for customizing or fine-tuning a pre-trained version of a machine-learning model that includes multiple layers and is configured to process audio or textual language input. Each of the multiple layers is configured with a plurality of layer-specific pre-trained parameter values corresponding to a plurality of parameters, and each of the multiple layers is configured to implement multi-head attention. An incomplete subset of the multiple layers is identified for which corresponding layer-specific pre-trained parameter values are to be fine-tuned using a client data set. The machine-learning model is fine-tuned using the client data set to generate an updated version of the machine-learning model, where the layer-specific pre-trained parameter values configured for each layer of one of more of the multiple layers not included in the incomplete subset are frozen during the fine-tuning. Use of the updated version of the machine-learning model is facilitated.
    Type: Application
    Filed: May 3, 2022
    Publication date: April 13, 2023
    Applicant: Oracle International Corporation
    Inventors: Thanh Tien Vu, Tuyen Quang Pham, Omid Mohamad Nezami, Mark Edward Johnson, Thanh Long Duong, Cong Duy Vu Hoang
  • Publication number: 20230098783
    Abstract: Techniques are disclosed herein for focused training of language models and end-to-end hypertuning of the framework. In one aspect, a method is provided that includes obtaining a machine learning model pre-trained for language modeling, and post-training the machine learning model for various tasks to generate a focused machine learning model. The post-training includes: (i) training the machine learning model on an unlabeled set of training data pertaining to a task that the machine learning model was pre-trained for as part of the language modeling, and the unlabeled set of training data is obtained with respect to a target domain, a target task, or a target language, and (ii) training the machine learning model on a labeled set of training data that pertains to another task that is an auxiliary task related to a downstream task to be performed using the machine learning model or output from the machine learning model.
    Type: Application
    Filed: September 23, 2022
    Publication date: March 30, 2023
    Applicant: Oracle International Corporation
    Inventors: Poorya Zaremoodi, Cong Duy Vu Hoang, Duy Vu, Dai Hoang Tran, Budhaditya Saha, Nagaraj N. Bhat, Thanh Tien Vu, Tuyen Quang Pham, Adam Craig Pocock, Katherine Silverstein, Srinivasa Phani Kumar Gadde, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong
  • Publication number: 20230095673
    Abstract: Techniques for extracting key information from a document using machine-learning models in a chatbot system is disclosed herein. In one particular aspect, a method is provided that includes receiving a set of data, which includes key fields, within a document at a data processing system that includes a table detection module, a key information extraction module, and a table extraction module. Text information and corresponding location data are extracted via optical character recognition. The table detection module detects whether one or more tables are present in the document and, if applicable, a location of each of the tables. The key information extraction module extracts text from the key fields. The table extraction module extracts each of the tables based on input from the optical character recognition and the table detection module. Extraction results include the text from the key fields and each of the tables can be output.
    Type: Application
    Filed: August 15, 2022
    Publication date: March 30, 2023
    Applicant: Oracle International Corporation
    Inventors: Yakupitiyage Don Thanuja Samodhye Dharmasiri, Xu Zhong, Ahmed Ataallah Ataallah Abobakr, Hongtao Yang, Budhaditya Saha, Shaoke Xu, Shashi Prasad Suravarapu, Mark Edward Johnson, Thanh Long Duong
  • Publication number: 20230100508
    Abstract: Techniques disclosed herein relate generally to text classification and include techniques for fusing word embeddings with word scores for text classification. In one particular aspect, a method for text classification is provided that includes obtaining an embedding vector for a textual unit, based on a plurality of word embedding vectors and a plurality of word scores. The plurality of word embedding vectors includes a corresponding word embedding vector for each of a plurality of words of the textual unit, and the plurality of word scores includes a corresponding word score for each of the plurality of words of the textual unit. The method also includes passing the embedding vector for the textual unit through at least one feed-forward layer to obtain a final layer output, and performing a classification on the final layer output.
    Type: Application
    Filed: September 29, 2022
    Publication date: March 30, 2023
    Applicant: Oracle International Corporation
    Inventors: Ahmed Ataallah Ataallah Abobakr, Mark Edward Johnson, Thanh Long Duong, Vladislav Blinov, Yu-Heng Hong, Cong Duy Vu Hoang, Duy Vu
  • Publication number: 20230080553
    Abstract: Techniques for adjusting outlier datasets for training chatbot systems in natural language processing are disclosed. In one particular aspect, a method is provided that includes receiving a dataset that includes training or inference data. An initial set of outlier data points can be identified within the dataset based on a score of the outlier data points being above or below a threshold. The initial set can be adjusted by identifying one or more nearest neighbors, which can be included in the dataset. Outlier data points that include a label that matches a number of labels of the nearest neighbors that exceeds a predetermined threshold can be removed from the initial set of outlier data points to generate a final set. Outlier data points of the final set can be adjusted with respect to the dataset to generate a set of training data that is used to train a machine-learning model.
    Type: Application
    Filed: May 25, 2022
    Publication date: March 16, 2023
    Applicant: Oracle International Corporation
    Inventors: Yakupitiyage Don Thanuja Samodhye Dharmasiri, Mark Edward Johnson, Thanh Long Duong
  • Publication number: 20230061999
    Abstract: Techniques for improving a semantic parser of a dialog system, by breaking the semantic parser into a coarse semantic parser and a fine semantic parser, are described. A method described herein includes accessing an utterance received in a dialog system. The utterance is a text-based natural language expression. The method further includes applying a coarse semantic parser to the utterance to determine an intermediate logical form for the utterance. The intermediate logical form indicates one or more intents in the utterance. The method further includes applying a fine semantic parser to the intermediate logical form to determine a logical form for the utterance. The logical form is a syntactic expression of the utterance according to an established grammar, and the logical form includes one or more parameters of the one or more intents. The logical form can be used to conduct a dialog with a user of the dialog system.
    Type: Application
    Filed: October 26, 2022
    Publication date: March 2, 2023
    Applicant: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson