Patents by Inventor Mark Edward Johnson

Mark Edward Johnson has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230419040
    Abstract: Novel techniques are described for data augmentation using a two-stage entity-aware augmentation to improve model robustness to entity value changes for intent prediction.
    Type: Application
    Filed: February 1, 2023
    Publication date: December 28, 2023
    Applicant: Oracle International Corporation
    Inventors: Ahmed Ataallah Ataallah Abobakr, Shivashankar Subramanian, Ying Xu, Vladislav Blinov, Umanga Bista, Tuyen Quang Pham, Thanh Long Duong, Mark Edward Johnson, Elias Luqman Jalaluddin, Vanshika Sridharan, Xin Xu, Srinivasa Phani Kumar Gadde, Vishal Vishnoi
  • Publication number: 20230419127
    Abstract: Novel techniques are described for negative entity-aware augmentation using a two-stage augmentation to improve the stability of the model to entity value changes for intent prediction. In some embodiments, a method comprises accessing a first set of training data for an intent prediction model, the first set of training data comprising utterances and intent labels; applying one or more negative entity-aware data augmentation techniques to the first set of training data, depending on the tuning requirements for hyper-parameters, to result in a second set of training data, where the one or more negative entity-aware data augmentation techniques comprise Keyword Augmentation Technique (“KAT”) plus entity without context technique and KAT plus entity in random context as OOD technique; combining the first set of training data and the second set of training data to generate expanded training data; and training the intent prediction model using the expanded training data.
    Type: Application
    Filed: February 1, 2023
    Publication date: December 28, 2023
    Applicant: Oracle International Corporation
    Inventors: Ahmed Ataallah Ataallah Abobakr, Shivashankar Subramanian, Ying Xu, Vladislav Blinov, Umanga Bista, Tuyen Quang Pham, Thanh Long Duong, Mark Edward Johnson, Elias Luqman Jalaluddin, Vanshika Sridharan, Xin Xu, Srinivasa Phani Kumar Gadde, Vishal Vishnoi
  • Publication number: 20230419052
    Abstract: Novel techniques are described for positive entity-aware augmentation using a two-stage augmentation to improve the stability of the model to entity value changes for intent prediction. In one particular aspect, a method is provided that includes accessing a first set of training data for an intent prediction model, the first set of training data comprising utterances and intent labels; applying one or more positive data augmentation techniques to the first set of training data, depending on the tuning requirements for hyper-parameters, to result in a second set of training data, where the positive data augmentation techniques comprise Entity-Aware (“EA”) technique and a two-stage augmentation technique; combining the first set of training data and the second set of training data to generate expanded training data; and training the intent prediction model using the expanded training data.
    Type: Application
    Filed: February 1, 2023
    Publication date: December 28, 2023
    Applicant: Oracle International Corporation
    Inventors: Ahmed Ataallah Ataallah Abobakr, Shivashankar Subramanian, Ying Xu, Vladislav Blinov, Umanga Bista, Tuyen Quang Pham, Thanh Long Duong, Mark Edward Johnson, Elias Luqman Jalaluddin, Vanshika Sridharan, Xin XU, Srinivasa Phani Kumar Gadde, Vishal Vishnoi
  • Publication number: 20230376700
    Abstract: Techniques are provided for generating training data to facilitate fine-tuning embedding models. Training data including anchor utterances is obtained. Positive utterances and negative utterances are generated from the anchor utterances. Tuples including the anchor utterances, the positive utterances, and the negative utterances are formed. Embeddings for the tuples are generated and a pre-trained embedding model is fine-tuned based on the embeddings. The fine-tuned model can be deployed to a system.
    Type: Application
    Filed: May 9, 2023
    Publication date: November 23, 2023
    Applicant: Oracle International Corporation
    Inventors: Umanga Bista, Vladislav Blinov, Mark Edward Johnson, Ahmed Ataallah Ataallah Abobakr, Thanh Long Duong, Srinivasa Phani Kumar Gadde, Vishal Vishnoi, Elias Luqman Jalaluddin, Xin Xu, Shivashankar Subramanian
  • Publication number: 20230376696
    Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances.
    Type: Application
    Filed: August 2, 2023
    Publication date: November 23, 2023
    Applicant: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi, Crystal C. Pan, Vladislav Blinov, Cong Duy Vu Hoang, Elias Luqman Jalaluddin, Duy Vu, Balakota Srinivas Vinnakota
  • Patent number: 11810553
    Abstract: Techniques described herein use backpropagation to train one or more machine learning (ML) models of a dialog system. For instance, a method includes accessing seed data that includes training tuples, where each training tuple comprising a respective logical form. The method includes converting the logical form of a training tuple to a converted logical form, by applying to the logical form a text-to-speech (TTS) subsystem, an automatic speech recognition (ASR) subsystem, and a semantic parser of a dialog system. The method includes determining a training signal by using an objective function to compare the converted logical form to the logical form. The method further includes training the TTS subsystem, the ASR subsystem, and the semantic parser via backpropagation based on the training signal. As a result of the training by backpropagation, the machine learning models are tuned work effectively together within a pipeline of the dialog system.
    Type: Grant
    Filed: October 26, 2022
    Date of Patent: November 7, 2023
    Assignee: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson
  • Patent number: 11804219
    Abstract: Techniques for data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes generating a list of values to cover for an entity, selecting utterances from a set of data that have context for the entity, converting the utterances into templates, where each template of the templates comprises a slot that maps to the list of values for the entity, selecting a template from the templates, selecting a value from the list of values based on the mapping between the slot within the selected template and the list of values for the entity; and creating an artificial utterance based on the selected template and the selected value, where the creating the artificial utterance comprises inserting the selected value into the slot of the selected template that maps to the list of values for the entity.
    Type: Grant
    Filed: June 11, 2021
    Date of Patent: October 31, 2023
    Assignee: Oracle International Corporation
    Inventors: Srinivasa Phani Kumar Gadde, Yuanxu Wu, Aashna Devang Kanuga, Elias Luqman Jalaluddin, Vishal Vishnoi, Mark Edward Johnson
  • Patent number: 11790901
    Abstract: Described herein are dialog systems, and techniques for providing such dialog systems, that are suitable for use on standalone computing devices. In some embodiments, a dialog system includes a dialog manager, which takes as input an input logical form, which may be a representation of user input. The dialog manager may include a dialog state tracker, an execution subsystem, a dialog policy subsystem, and a context stack. The dialog state tracker may generate an intermediate logical form from the input logical form combined with a context from the context stack. The context stack may maintain a history of a current dialog, and thus, the intermediate logical form may include contextual information potentially missing from the input logical form. The execution subsystem may execute the intermediate logical form to produce an execution result, and the dialog policy subsystem may generate an output logical form based on the execution result.
    Type: Grant
    Filed: December 30, 2022
    Date of Patent: October 17, 2023
    Assignee: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson, Vu Cong Duy Hoang, Tuyen Quang Pham, Yu-Heng Hong, Vladislavs Dovgalecs, Guy Bashkansky, Jason Eric Black, Andrew David Bleeker, Serge Le Huitouze
  • Patent number: 11763092
    Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances.
    Type: Grant
    Filed: March 30, 2021
    Date of Patent: September 19, 2023
    Assignee: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi, Crystal C. Pan, Vladislav Blinov, Cong Duy Vu Hoang, Elias Luqman Jalaluddin, Duy Vu, Balakota Srinivas Vinnakota
  • Publication number: 20230206125
    Abstract: Techniques are provided for improved training of a machine learning model using lexical dropout. A machine learning model and a training data set are accessed. The training data set can include sample utterances and corresponding labels. A dropout parameter is identified. The dropout parameter can indicate a likelihood for dropping out one or more feature vectors for tokens associated with respective entities during training of the machine learning model. The dropout parameter is applied to feature vectors for tokens associated with respective entities. The machine learning model is trained using the training data set and the dropout parameter to generate a trained machine learning model. The use of the trained the machine learning model is facilitated.
    Type: Application
    Filed: December 22, 2022
    Publication date: June 29, 2023
    Applicant: Oracle International Corporation
    Inventors: Tuyen Quang Pham, Cong Duy Vu Hoang, Thanh Tien Vu, Mark Edward Johnson, Thanh Long Duong
  • Publication number: 20230205999
    Abstract: Techniques are provided for named entity recognition using a gazetteer incorporated with a neural network. An utterance is received from a user. The utterance is input into a neural network comprising model parameters learned for named entity recognition. The neural network generates a first representation of one or more named entities based on the utterance. A gazetteer is searched based on the input utterance to generate a second representation of one or more named entities identified in the utterance. The first named entity representation is combined with the second named entity representation to generate a combined named entity representation. The combined named entity representation is output for facilitating a response to the user.
    Type: Application
    Filed: December 22, 2022
    Publication date: June 29, 2023
    Applicant: Oracle International Corporation
    Inventors: Tuyen Quang Pham, Cong Duy Vu Hoang, Mark Edward Johnson, Thanh Long Duong
  • Publication number: 20230186025
    Abstract: Techniques for preprocessing data assets to be used in a natural language to logical form model based on scalable search and content-based schema linking. In one particular aspect, a method includes accessing an utterance, classifying named entities within the utterance into predefined classes, searching value lists within the database schema using tokens from the utterance to identify and output value matches including: (i) any value within the value lists that matches a token from the utterance and (ii) any attribute associated with a matching value, generating a data structure by organizing and storing: (i) each of the named entities and an assigned class for each of the named entities, (ii) each of the value matches and the token matching each of the value matches, and (iii) the utterance, in a predefined format for the data structure, and outputting the data structure.
    Type: Application
    Filed: December 13, 2022
    Publication date: June 15, 2023
    Applicant: Oracle International Corporation
    Inventors: Jae Min John, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Srinivasa Phani Kumar Gadde, Balakota Srinivas Vinnakota, Shivashankar Subramanian, Cong Duy Vu Hoang, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Nitika Mathur, Aashna Devang Kanuga, Philip Arthur, Gioacchino Tangari, Steve Wai-Chun Siu
  • Publication number: 20230186161
    Abstract: Techniques are disclosed herein for synthesizing synthetic training data to facilitate training a natural language to logical form model. In one aspect, training data can be synthesized from original under a framework based on templates and a synchronous context-free grammar. In one aspect, training data can be synthesized under a framework based on a probabilistic context-free grammar and a translator. In one aspect, training data can be synthesized under a framework based on tree-to-string translation. In one aspect, the synthetic training data can be combined with original training data in order to train a machine learning model to translate an utterance to a logical form.
    Type: Application
    Filed: December 13, 2022
    Publication date: June 15, 2023
    Applicant: Oracle International Corporation
    Inventors: Philip Arthur, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Srinivasa Phani Kumar Gadde, Balakota Srinivas Vinnakota, Cong Duy Vu Hoang, Steve Wai-Chun Siu, Nitika Mathur, Gioacchino Tangari, Aashna Devang Kanuga
  • Publication number: 20230185834
    Abstract: Techniques are disclosed herein for synthesizing synthetic training data to facilitate training a natural language to logical form model. In one aspect, training data can be synthesized from original under a framework based on templates and a synchronous context-free grammar. In one aspect, training data can be synthesized under a framework based on a probabilistic context-free grammar and a translator. In one aspect, training data can be synthesized under a framework based on tree-to-string translation. In one aspect, the synthetic training data can be combined with original training data in order to train a machine learning model to translate an utterance to a logical form.
    Type: Application
    Filed: December 13, 2022
    Publication date: June 15, 2023
    Applicant: Oracle International Corporation
    Inventors: Philip Arthur, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Srinivasa Phani Kumar Gadde, Balakota Srinivas Vinnakota, Cong Duy Vu Hoang, Steve Wai-Chun Siu, Nitika Mathur, Gioacchino Tangari, Aashna Devang Kanuga
  • Publication number: 20230186026
    Abstract: Techniques are disclosed herein for synthesizing synthetic training data to facilitate training a natural language to logical form model. In one aspect, training data can be synthesized from original under a framework based on templates and a synchronous context-free grammar. In one aspect, training data can be synthesized under a framework based on a probabilistic context-free grammar and a translator. In one aspect, training data can be synthesized under a framework based on tree-to-string translation. In one aspect, the synthetic training data can be combined with original training data in order to train a machine learning model to translate an utterance to a logical form.
    Type: Application
    Filed: December 13, 2022
    Publication date: June 15, 2023
    Applicant: Oracle International Corporation
    Inventors: Philip Arthur, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Srinivasa Phani Kumar Gadde, Balakota Srinivas Vinnakota, Cong Duy Vu Hoang, Steve Wai-Chun Siu, Nitika Mathur, Gioacchino Tangari, Aashna Devang Kanuga
  • Publication number: 20230185799
    Abstract: Techniques are disclosed for training a model, using multi-task learning, to transform natural language to a logical form. In one particular aspect, a method includes accessing a first set of utterances that have non-follow-up utterances and a second set of utterances that have initial utterances and associated one or more follow-up utterances and training a model for translating an utterance to a logical form. The training is a joint training process that includes calculating a first loss for a first semantic parsing task based on one or more non-follow-up utterances from the first set of utterances, calculating a second loss for a second semantic parsing task based on one or more initial utterances and associated one or more follow-up utterances from the second set of utterances, combining the first and second losses to obtain a final loss, and updating model parameters of the model based on the final loss.
    Type: Application
    Filed: December 13, 2022
    Publication date: June 15, 2023
    Applicant: Oracle International Corporation
    Inventors: Cong Duy Vu Hoang, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Srinivasa Phani Kumar Gadde, Balakota Srinivas Vinnakota
  • Publication number: 20230186914
    Abstract: Described herein are dialog systems, and techniques for providing such dialog systems, that are suitable for use on standalone computing devices. In some embodiments, a dialog system includes a dialog manager, which takes as input an input logical form, which may be a representation of user input. The dialog manager may include a dialog state tracker, an execution subsystem, a dialog policy subsystem, and a context stack. The dialog state tracker may generate an intermediate logical form from the input logical form combined with a context from the context stack. The context stack may maintain a history of a current dialog, and thus, the intermediate logical form may include contextual information potentially missing from the input logical form. The execution subsystem may execute the intermediate logical form to produce an execution result, and the dialog policy subsystem may generate an output logical form based on the execution result.
    Type: Application
    Filed: December 30, 2022
    Publication date: June 15, 2023
    Applicant: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson, Vu Cong Duy Hoang, Tuyen Quang Pham, Yu-Heng Hong, Vladislavs Dovgalecs, Guy Bashkansky, Jason Eric Black, Andrew David Bleeker, Serge Le Huitouze
  • Publication number: 20230169955
    Abstract: Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.
    Type: Application
    Filed: November 23, 2022
    Publication date: June 1, 2023
    Applicant: Oracle International Corporation
    Inventors: Elias Luqman Jalaluddin, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Yu-Heng Hong, Balakota Srinivas Vinnakota
  • Publication number: 20230153687
    Abstract: Techniques for named entity bias detection and mitigation for sentence sentiment analysis. In one particular aspect, a method is provided that includes obtaining a training set of labeled examples for training a machine learning model to classify sentiment, preparing a list of named entities using one or more data sources, for each example in the training set of labeled examples with a named entity, replacing the named entity with a corresponding entity type tag to generate a labeled template data set, executing a sampling process for each entity type t within the labeled template data set to generate a augmented invariance data set comprising one or more invariance groups having labeled examples for each entity type t, and training the machine learning model using labeled examples from the augmented invariance data set.
    Type: Application
    Filed: November 10, 2022
    Publication date: May 18, 2023
    Applicant: Oracle International Corporation
    Inventors: Duy Vu, Varsha Kuppur Rajendra, Shivashankar Subramanian, Ahmed Ataallah Ataallah Abobakr, Thanh Long Duong, Mark Edward Johnson
  • Publication number: 20230154455
    Abstract: Techniques are provided for improved training of a machine-learning model that includes multiple layers and is configured to process textual language input. The machine-learning model includes one or more blocks in which each block includes a multi-head self-attention network, a first connection for providing input to the multi-head self-attention network, and a second (residual) connection for providing the input to a normalization layer, bypassing the multi-head self-attention network. During training, the second connection is dropped out according to a dropout parameter. Additionally, or alternatively, an attention weight matrix is used for dropout by blocking diagonal entries in the attention weight matrix. As a result, the machine-learning model increasingly focuses on contextual information, which provides more accurate language processing results.
    Type: Application
    Filed: November 16, 2022
    Publication date: May 18, 2023
    Applicant: Oracle International Corporation
    Inventors: Thanh Tien Vu, Tuyen Quang Pham, Mark Edward Johnson, Thanh Long Duong