Patents by Inventor Thanh Long Duong
Thanh Long Duong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12153881Abstract: Techniques for keyword data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training a machine-learning model to identify one or more intents for one or more utterances, augmenting the training set of utterances with out-of-domain (OOD) examples. The augmenting includes: identifying keywords within utterances of the training set of utterances, generating a set of OOD examples with the identified keywords, filtering out OOD examples from the set of OOD examples that have a context substantially similar to context of the utterances of the training set of utterances, and incorporating the set of OOD examples without the filtered OOD examples into the training set of utterances to generate an augmented training set of utterances. Thereafter, the machine-learning model is trained using the augmented training set of utterances.Type: GrantFiled: October 28, 2021Date of Patent: November 26, 2024Assignee: Oracle International CorporationInventors: Elias Luqman Jalaluddin, Vishal Vishnoi, Thanh Long Duong, Mark Edward Johnson, Poorya Zaremoodi, Gautam Singaraju, Ying Xu, Vladislav Blinov
-
Patent number: 12153885Abstract: Techniques are disclosed for systems including techniques for multi-feature balancing for natural langue processors. In an embodiment, a method includes receiving a natural language query to be processed by a machine learning model, the machine learning model utilizing a dataset of natural language phrases for processing natural language queries, determining, based on the machine learning model and the natural language query, a feature dropout value, generating, and based on the natural language query, one or more contextual features and one or more expressional features that may be input to the machine learning model, modifying at least one or the one or more contextual features and the one or more expressional features based on the feature dropout value to generate a set of input features for the machine learning model, and processing the set of input features to cause generating an output dataset for corresponding to the natural language query.Type: GrantFiled: January 20, 2022Date of Patent: November 26, 2024Assignee: Oracle International CorporationInventors: Thanh Long Duong, Vishal Vishnoi, Mark Edward Johnson, Elias Luqman Jalaluddin, Tuyen Quang Pham, Cong Duy Vu Hoang, Poorya Zaremoodi, Srinivasa Phani Kumar Gadde, Aashna Devang Kanuga, Zikai Li, Yuanxu Wu
-
Publication number: 20240338395Abstract: Techniques for multi-layer training of a machine learning model are disclosed. A system pre-trains a machine learning model on training data obtained from unlabeled document graph data by executing unsupervised pre-training tasks on the unlabeled document graph data to generate a labeled pre-training data set. The system modifies document graphs to change attributes of nodes in the document graphs. The system pre-trains the machine learning model with a data set including the modified document graphs and un-modified document graphs to generate prediction associated with the modifications to the document graphs. Subsequent to pre-training, the system fine-tunes the machine learning model with a set of labeled training data to generate predictions associated with a specific attribute of a document graph.Type: ApplicationFiled: April 10, 2023Publication date: October 10, 2024Applicant: Oracle International CorporationInventors: Xu Zhong, Don Dharmasiri, Thanh Long Duong, Mark Johnson, Srinivasa Phani Kumar Gadde, Vishal Vishnoi
-
Patent number: 12099816Abstract: Techniques are disclosed for systems including techniques for multi-factor modelling for training and utilizing chatbot systems for natural language processing. In an embodiment, a method includes receiving a set of utterance data corresponding to a natural language-based query, determining one or more intents for the chatbot corresponds to a possible context for the natural language-based query and associated with a skill for the chatbot, generating one or more intent classification datasets, each intent classification dataset associated with a probability that the natural language query corresponds to an intent of the one or more intents, generating one or more transformed datasets each corresponding to a skill of one or more skills, determining a first skill of the one or more skills based on the one or more transformed datasets and processing, based on the determined first skill, the set of utterance data to resolve the natural language-based query.Type: GrantFiled: January 18, 2022Date of Patent: September 24, 2024Assignee: Oracle International CorporationInventors: Elias Luqman Jalaluddin, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Ying Xu
-
Publication number: 20240289555Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances.Type: ApplicationFiled: May 9, 2024Publication date: August 29, 2024Applicant: Oracle International CorporationInventors: Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi, Crystal C. Pan, Vladislav Blinov, Cong Duy Vu Hoang, Elias Luqman Jalaluddin, Duy Vu, Balakota Srinivas Vinnakota
-
Patent number: 12056434Abstract: Techniques for generating formatting tags for textual content obtained from a source electronic document are disclosed. A system parses a digital file to obtain information about characters in an electronic document. The system applies tags to text generated based on the textual content of the electronic document by creating segments of textually-consecutive characters and applying corresponding text formatting style tags to the segments. The system further identifies segments of text overlapping bounding boxes in the electronic document. The system generates textual content including a segment of text and a corresponding hyperlink associated with the segment of text. The system further generates textual content by selectively applying line breaks from the source electronic document in the textual content.Type: GrantFiled: January 6, 2023Date of Patent: August 6, 2024Assignee: Oracle International CorporationInventors: Vishank Bhatia, Xu Zhong, Thanh Long Duong, Mark Johnson, Srinivasa Phani Kumar Gadde, Vishal Vishnoi, King-Hwa Lee, Christopher Kennewick
-
Publication number: 20240256777Abstract: A method includes receiving a training set of utterances for training a machine-learning model to identify one or more intents for one or more utterances, and augmenting the training set of utterances with out-of-domain (OOD) examples. The augmenting includes: generating a data set of OOD examples, filtering out OOD examples from the data set of OOD examples, determining a difficulty value for each OOD example remaining within the filtered data set of the OOD examples, and generating augmented batches of utterances including utterances from the training set of utterances and utterances from the filtered data set of the OOD based on the difficulty value for each OOD. Thereafter, the machine-learning model is trained using the augmented batches of utterances in accordance with a curriculum training protocol.Type: ApplicationFiled: April 9, 2024Publication date: August 1, 2024Applicant: Oracle International CorporationInventors: Elias Luqman Jalaluddin, Vishal Vishnoi, Thanh Long Duong, Mark Edward Johnson, Poorya Zaremoodi, Gautam Singaraju, Ying Xu, Vladislav Blinov, Yu-Heng Hong
-
Publication number: 20240256771Abstract: Techniques for identifying content in key-value pairs of documents using a graph neural network (GNN) are disclosed. A system trains a GNN to identify key-value pair groupings in documents. The GNN classifies nodes in document graphs as key-type nodes and answer-type nodes. The GNN also classifies edges connecting nodes in the document graphs for keeping in the document graph or removing from the document graph. The resulting document graph includes key-value groupings in the document. Upon identifying content matching a query, a system returns as a query response content from among the key-value groupings.Type: ApplicationFiled: January 30, 2023Publication date: August 1, 2024Applicant: Oracle International CorporationInventors: Xu Zhong, Thanh Long Duong, Mark Johnson
-
Publication number: 20240232187Abstract: The present disclosure is related to techniques for converting a natural language utterance to a logical form query and deriving a natural language interpretation of the logical form query. The techniques include accessing a Meaning Resource Language (MRL) query and converting the MRL query into a MRL structure including logical form statements. The converting includes extracting operations and associated attributes from the MRL query and generating the logical form statements from the operations and associated attributes. The techniques further include translating each of the logical form statements into a natural language expression based on a grammar data structure that includes a set of rules for translating logical form statements into corresponding natural language expressions, combining the natural language expressions into a single natural language expression, and providing the single natural language expression as an interpretation of the natural language utterance.Type: ApplicationFiled: May 22, 2023Publication date: July 11, 2024Applicant: Oracle International CorporationInventors: Chang Xu, Poorya Zaremoodi, Cong Duy Vu Hoang, Nitika Mathur, Philip Arthur, Steve Wai-Chun Siu, Aashna Devang Kanuga, Gioacchino Tangari, Mark Edward Johnson, Thanh Long Duong, Vishal Vishnoi, Stephen Andrew McRitchie, Christopher Mark Broadbent
-
Publication number: 20240232541Abstract: Techniques for using enhanced logit values for classifying utterances and messages input to chatbot systems in natural language processing. A method can include a chatbot system receiving an utterance generated by a user interacting with the chatbot system and inputting the utterance into a machine-learning model including a series of network layers. A final network layer of the series of network layers can include a logit function. The machine-learning model can map a first probability for a resolvable class to a first logit value using the logit function. The machine-learning model can map a second probability for a unresolvable class to an enhanced logit value. The method can also include the chatbot system classifying the utterance as the resolvable class or the unresolvable class based on the first logit value and the enhanced logit value.Type: ApplicationFiled: March 20, 2024Publication date: July 11, 2024Applicant: Oracle International CorporationInventors: Ying Xu, Poorya Zaremoodi, Thanh Tien Vu, Cong Duy Vu Hoang, Vladislav Blinov, Yu-Heng Hong, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Vishal Vishnoi, Elias Luqman Jalaluddin, Manish Parekh, Thanh Long Duong, Mark Edward Johnson
-
Patent number: 12026468Abstract: Techniques for out-of-domain data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training a machine-learning model to identify one or more intents for one or more utterances, and augmenting the training set of utterances with out-of-domain (OOD) examples. The augmenting includes: generating a data set of OOD examples, filtering out OOD examples from the data set of OOD examples, determining a difficulty value for each OOD example remaining within the filtered data set of the OOD examples, and generating augmented batches of utterances comprising utterances from the training set of utterances and utterances from the filtered data set of the OOD based on the difficulty value for each OOD. Thereafter, the machine-learning model is trained using the augmented batches of utterances in accordance with a curriculum training protocol.Type: GrantFiled: October 28, 2021Date of Patent: July 2, 2024Assignee: Oracle International CorporationInventors: Elias Luqman Jalaluddin, Vishal Vishnoi, Thanh Long Duong, Mark Edward Johnson, Poorya Zaremoodi, Gautam Singaraju, Ying Xu, Vladislav Blinov, Yu-Heng Hong
-
Patent number: 12019994Abstract: Techniques for using logit values for classifying utterances and messages input to chatbot systems in natural language processing. A method can include a chatbot system receiving an utterance generated by a user interacting with the chatbot system. The chatbot system can input the utterance into a machine-learning model including a set of binary classifiers. Each binary classifier of the set of binary classifiers can be associated with a modified logit function. The method can also include the machine-learning model using the modified logit function to generate a set of distance-based logit values for the utterance. The method can also include the machine-learning model applying an enhanced activation function to the set of distance-based logit values to generate a predicted output. The method can also include the chatbot system classifying, based on the predicted output, the utterance as being associated with the particular class.Type: GrantFiled: November 30, 2021Date of Patent: June 25, 2024Assignee: Oracle International CorporationInventors: Ying Xu, Poorya Zaremoodi, Thanh Tien Vu, Cong Duy Vu Hoang, Vladislav Blinov, Yu-Heng Hong, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Vishal Vishnoi, Elias Luqman Jalaluddin, Manish Parekh, Thanh Long Duong, Mark Edward Johnson
-
Patent number: 12014146Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances.Type: GrantFiled: August 2, 2023Date of Patent: June 18, 2024Assignee: Oracle International CorporationInventors: Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi, Crystal C. Pan, Vladislav Blinov, Cong Duy Vu Hoang, Elias Luqman Jalaluddin, Duy Vu, Balakota Srinivas Vinnakota
-
Patent number: 12001775Abstract: A data corpus is partitioned into text strings for header classification. A group characteristic is computed for a text string, and whether the group characteristic satisfies a group characteristic criterion is determined. The text string may be disqualified from header classification if the group characteristic criterion is not satisfied, or one or more font characteristics may be determined for the text string if the group characteristic criterion is satisfied. A font characteristic that meets one or more prevalence criteria may be identified and evaluated to determine whether the font characteristic meets at least one font characteristic criterion. The text string may be disqualified from header classification if the font characteristic criterion is not satisfied, or if the font characteristic meets the font characteristic criterion, the text string is classified as a header, and tagged content is generated by applying a header tag to the text string.Type: GrantFiled: June 13, 2023Date of Patent: June 4, 2024Assignee: Oracle International CorporationInventors: Sagar Gollamudi, Vishank Bhatia, Xu Zhong, Thanh Long Duong, Mark Johnson, Srinivasa Phani Kumar Gadde, Vishal Vishnoi
-
Publication number: 20240169155Abstract: Techniques for automatically switching between chatbot skills in the same domain. In one particular aspect, a method is provided that includes receiving an utterance from a user within a chatbot session, where a current skill context is a first skill and a current group context is a first group, inputting the utterance into a candidate skills model for the first group, obtaining, using the candidate skills model, a ranking of skills within the first group, determining, based on the ranking of skills, a second skill is a highest ranked skill, changing the current skill context of the chatbot session to the second skill, inputting the utterance into a candidate flows model for the second skill, obtaining, using the candidate flows model, a ranking of intents within the second skill that match the utterance, and determining, based on the ranking of intents, an intent that is a highest ranked intent.Type: ApplicationFiled: January 26, 2024Publication date: May 23, 2024Applicant: Oracle International CorporationInventors: Vishal Vishnoi, Xin Xu, Elias Luqman Jalaluddin, Srinivasa Phani Kumar Gadde, Crystal C. Pan, Mark Edward Johnson, Thanh Long Duong, Balakota Srinivas Vinnakota, Manish Parekh
-
Publication number: 20240169161Abstract: Obtaining collections of sentences in different languages that are usable for training models in various applications of artificial intelligence is provided. A method is provided that obtains, from text corpus, webpages in a plurality of languages, each of the webpages corresponding to an URL; obtains annotations for each of the webpages based on its URL, to obtain annotated data entries corresponding to the webpages, each of the annotated data entries including a classification label corresponding to a sub-topic of one of a plurality of topics, where each of the plurality of topics includes a corresponding plurality of sub-topics; filters the annotated data entries to obtain topic-specific content in a target language based on the classification labels, the topic-specific content corresponding to one or more sub-topics; performs post-processing on the topic-specific content to obtain result data; and outputs the result data for the topic.Type: ApplicationFiled: August 21, 2023Publication date: May 23, 2024Applicant: Oracle International CorporationInventors: Paria Jamshid Lou, Gioacchino Tangari, Jason Black, Bhagya Gayathri Hettige, Xu Zhong, Poorya Zaremoodi, Thanh Long Duong, Mark Edward Johnson
-
Publication number: 20240144923Abstract: Disclosed herein are techniques for using a generative adversarial network (GAN) to train a semantic parser of a dialog system. A method described herein involves accessing seed data that includes seed tuples. Each seed tuple includes a respective seed utterance and a respective seed logical form corresponding to the respective seed utterance. The method further includes training a semantic parser and a discriminator in a GAN. The semantic parser learns to map utterances to logical forms based on output from the discriminator, and the discriminator learns to recognize authentic logical forms based on output from the semantic parser. The semantic parser may then be integrated into a dialog system.Type: ApplicationFiled: January 11, 2024Publication date: May 2, 2024Applicant: Oracle International CorporationInventors: Thanh Long Duong, Mark Edward Johnson
-
Publication number: 20240143934Abstract: A method includes accessing document including sentences, document being associated with configuration flag indicating whether ABSA, SLSA, or both are to be performed; inputting the document into language model that generates chunks of token embeddings for the document; and, based on the configuration flag, performing at least one from among the ABSA and the SLSA by inputting the chunks of token embeddings into a multi-task model. When performing the SLSA, a part of token embeddings in each of the chunks is masked, and the masked token embeddings do not belong to a particular sentence on which the SLSA is performed.Type: ApplicationFiled: October 12, 2023Publication date: May 2, 2024Applicant: Oracle International CorporationInventors: Poorya Zaremoodi, Duy Vu, Nagaraj N. Bhat, Srijon Sarkar, Varsha Kuppur Rajendra, Thanh Long Duong, Mark Edward Johnson, Pramir Sarkar, Shahid Reza
-
Patent number: 11972755Abstract: Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.Type: GrantFiled: November 23, 2022Date of Patent: April 30, 2024Assignee: Oracle International CorporationInventors: Elias Luqman Jalaluddin, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Yu-Heng Hong, Balakota Srinivas Vinnakota
-
Patent number: 11972220Abstract: Techniques for using enhanced logit values for classifying utterances and messages input to chatbot systems in natural language processing. A method can include a chatbot system receiving an utterance generated by a user interacting with the chatbot system and inputting the utterance into a machine-learning model including a series of network layers. A final network layer of the series of network layers can include a logit function. The machine-learning model can map a first probability for a resolvable class to a first logit value using the logit function. The machine-learning model can map a second probability for a unresolvable class to an enhanced logit value. The method can also include the chatbot system classifying the utterance as the resolvable class or the unresolvable class based on the first logit value and the enhanced logit value.Type: GrantFiled: November 29, 2021Date of Patent: April 30, 2024Assignee: Oracle International CorporationInventors: Ying Xu, Poorya Zaremoodi, Thanh Tien Vu, Cong Duy Vu Hoang, Vladislav Blinov, Yu-Heng Hong, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Vishal Vishnoi, Elias Luqman Jalaluddin, Manish Parekh, Thanh Long Duong, Mark Edward Johnson