Patents by Inventor Mark Edward Johnson

Mark Edward Johnson has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210304075
    Abstract: The present disclosure relates to chatbot systems, and more particularly, to batching techniques for handling unbalanced training data when training a model such that bias is removed from the trained machine learning model when performing inference. In an embodiment, a plurality of raw utterances is obtained. A bias eliminating distribution is determined and a subset of the plurality of raw utterances is batched according to the bias-reducing distribution. The resulting unbiased training data may be input into a prediction model for training the prediction model. The trained prediction model may be obtained and utilized to predict unbiased results from new inputs received by the trained prediction model.
    Type: Application
    Filed: March 30, 2021
    Publication date: September 30, 2021
    Applicant: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi, Balakota Srinivas Vinnakota, Yu-Heng Hong, Elias Luqman Jalaluddin
  • Publication number: 20210304733
    Abstract: Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.
    Type: Application
    Filed: September 9, 2020
    Publication date: September 30, 2021
    Applicant: Oracle International Corporation
    Inventors: Elias Luqman Jalaluddin, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Yu-Heng Hong, Balakota Srinivas Vinnakota
  • Publication number: 20210081609
    Abstract: Techniques for reducing memory and processing resources used by a dialog system by sharing resources between pipelined processes of the dialog system. An integrated shared dictionary is constructed for concurrent use by automated speech recognition (ASR) and natural language understanding (NLU) subsystems of the dialog system. The integrated shared dictionary comprises multiple entries, with each entry comprising first information that is used by the ASR subsystem, second information used by the NLU subsystem, and information correlating the first information and the second information. The ASR subsystem uses the integrated shared dictionary to identify a dictionary entry containing a set of words corresponding to speech input. The dictionary entry information is communicated to the NLU subsystem, which uses the entry to generate a meaning representation for the speech input.
    Type: Application
    Filed: July 13, 2020
    Publication date: March 18, 2021
    Applicant: Oracle International Corporation
    Inventor: Mark Edward Johnson
  • Publication number: 20210082414
    Abstract: Techniques are described for using data stored for a user in association with context levels to improve the efficiency and accuracy of dialog processing tasks. A dialog system stores historical dialog data in association with a plurality of configured context levels. The dialog system receives an utterance and identifies a term for disambiguation from the utterance. Based on a determined context level, the dialog system identifies relevant historical data stored to a database. The historical data may be used to perform tasks such as resolving an ambiguity based on user preferences, disambiguating named entities based on a prior dialog, and identifying previously generated answers to queries. Based on the context level, the dialog system can efficiently identify the relevant information and use the identified information to provide a response.
    Type: Application
    Filed: August 26, 2020
    Publication date: March 18, 2021
    Applicant: Oracle International Corporation
    Inventor: Mark Edward Johnson
  • Publication number: 20210082400
    Abstract: Techniques for stop word data augmentation for training chatbot systems in natural language processing. In one particular aspect, a computer-implemented method includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with stop words to generate an augmented training set of out-of-domain utterances for an unresolved intent category corresponding to an unresolved intent; and training the intent classifier using the training set of utterances and the augmented training set of out-of-domain utterances. The augmenting includes: selecting one or more utterances from the training set of utterances, and for each selected utterance, preserving existing stop words within the utterance and replacing at least one non-stop word within the utterance with a stop word or stop word phrase selected from a list of stop words to generate an out-of-domain utterance.
    Type: Application
    Filed: September 9, 2020
    Publication date: March 18, 2021
    Applicant: Oracle International Corporation
    Inventors: Vishal Vishnoi, Mark Edward Johnson, Elias Luqman Jalaluddin, Balakota Srinivas Vinnakota, Thanh Long Duong, Gautam Singaraju
  • Publication number: 20210081799
    Abstract: A model for a natural language understanding task is generated based on labeled data generated by a labeling model. The model for the natural language understanding task is smaller than the labeling model (i.e., with lower computational and memory requirements than the combined model), but with substantially the same performance as the labeling model. In some cases, the labeling model may be generated based on a large pre-trained model.
    Type: Application
    Filed: July 24, 2020
    Publication date: March 18, 2021
    Applicant: Oracle International Corporation
    Inventor: Mark Edward Johnson
  • Publication number: 20210082425
    Abstract: Techniques are described for training and executing a machine learning model using data derived from a database. A dialog system uses data from the database to generate related training data for natural language understanding applications. The generated training data is then used to train a machine learning model. This enables the dialog system to leverage a large amount of available data to speed up the training process as compared to conventional labeling techniques. The dialog system uses the trained machine learning model to identify a named entity from a received spoken utterance and generate and output a speech response based upon the identified named entity.
    Type: Application
    Filed: August 3, 2020
    Publication date: March 18, 2021
    Applicant: Oracle International Corporation
    Inventors: Mark Edward Johnson, Michael Rye Kennewick
  • Publication number: 20210082424
    Abstract: The present disclosure relates generally to determining intent based upon speech input using a dialog system. More particularly, techniques are described using matching-based machine learning techniques to identify an intent corresponding to speech input in a dialog system. These procedures do not require training when intents are added or removed from the set of possible intents.
    Type: Application
    Filed: July 29, 2020
    Publication date: March 18, 2021
    Applicant: Oracle International Corporation
    Inventor: Mark Edward Johnson
  • Publication number: 20210074274
    Abstract: Disclosed herein are techniques for using a generative adversarial network (GAN) to train a semantic parser of a dialog system. A method described herein involves accessing seed data that includes seed tuples. Each seed tuple includes a respective seed utterance and a respective seed logical form corresponding to the respective seed utterance. The method further includes training a semantic parser and a discriminator in a GAN. The semantic parser learns to map utterances to logical forms based on output from the discriminator, and the discriminator learns to recognize authentic logical forms based on output from the semantic parser. The semantic parser may then be integrated into a dialog system.
    Type: Application
    Filed: August 13, 2020
    Publication date: March 11, 2021
    Applicant: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson
  • Publication number: 20210074262
    Abstract: Some techniques described herein determine a correction model for a dialog system, such that the correction model corrects output from an automatic speech recognition (ASR) subsystem in the dialog system. A method described herein includes accessing training data. A first tuple of the training data includes an utterance, where the utterance is a textual representation of speech. The method further includes using an ASR subsystem of a dialog system to convert the utterance to an output utterance. The method further includes storing the output utterance in corrective training data that is based on the training data. The method further includes training a correction model based on the corrective training data, such that the correction model is configured to correct output from the ASR subsystem during operation of the dialog system.
    Type: Application
    Filed: August 13, 2020
    Publication date: March 11, 2021
    Applicant: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson
  • Publication number: 20210073465
    Abstract: Techniques for improving a semantic parser of a dialog system, by breaking the semantic parser into a coarse semantic parser and a fine semantic parser, are described. A method described herein includes accessing an utterance received in a dialog system. The utterance is a text-based natural language expression. The method further includes applying a coarse semantic parser to the utterance to determine an intermediate logical form for the utterance. The intermediate logical form indicates one or more intents in the utterance. The method further includes applying a fine semantic parser to the intermediate logical form to determine a logical form for the utterance. The logical form is a syntactic expression of the utterance according to an established grammar, and the logical form includes one or more parameters of the one or more intents. The logical form can be used to conduct a dialog with a user of the dialog system.
    Type: Application
    Filed: August 13, 2020
    Publication date: March 11, 2021
    Applicant: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson
  • Publication number: 20210074269
    Abstract: Techniques described herein use backpropagation to train one or more machine learning (ML) models of a dialog system. For instance, a method includes accessing seed data that includes training tuples, where each training tuple comprising a respective logical form. The method includes converting the logical form of a training tuple to a converted logical form, by applying to the logical form a text-to-speech (TTS) subsystem, an automatic speech recognition (ASR) subsystem, and a semantic parser of a dialog system. The method includes determining a training signal by using an objective function to compare the converted logical form to the logical form. The method further includes training the TTS subsystem, the ASR subsystem, and the semantic parser via backpropagation based on the training signal. As a result of the training by backpropagation, the machine learning models are tuned work effectively together within a pipeline of the dialog system.
    Type: Application
    Filed: August 25, 2020
    Publication date: March 11, 2021
    Applicant: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson
  • Publication number: 20210065709
    Abstract: Described herein are dialog systems, and techniques for providing such dialog systems, that are suitable for use on standalone computing devices. In some embodiments, a dialog system includes a dialog manager, which takes as input an input logical form, which may be a representation of user input. The dialog, manager may include a dialog state tracker, an execution subsystem, a dialog policy subsystem, and a context stack. The dialog state tracker may generate an intermediate logical form from the input logical form combined with a context from the context stack. The context stack may maintain a history of a current dialog, and thus, the intermediate logical form may include contextual information potentially missing from the input logical form. The execution subsystem may execute the intermediate logical form to produce an execution result, and the dialog policy subsystem may generate an output logical form based on the execution result.
    Type: Application
    Filed: August 28, 2020
    Publication date: March 4, 2021
    Applicant: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson, Vu Cong Duy Hoang, Tuyen Quang Pham, Yu-Heng Hong, Vladislavs Dovgalecs, Guy Bashkansky, Jason Black, Andrew David Bleeker, Serge Le Huitouze
  • Patent number: 10579738
    Abstract: The disclosure relates to systems and methods for generating semantic parsers based on automatically generated operators and user-designated utterances relating to the operators for use in natural language processing. The system may automatically generate multiple operators that each express a respective computer-executable instruction that resolves a request. These operators may be expressed in a manner that is machine-readable and not necessarily for consumption by a human user. The system may generate a canonical statement that expresses the request in a first manner that a human user would be able to understand. The system may generate a task, such as crowd-sourced task, for a human user to provide an utterance that conveys the canonical statement in a second manner different than the first manner. By doing so, the system may rapidly build operators and learn how humans would utter requests resolved by instructions encoded in the operators for building semantic parsers.
    Type: Grant
    Filed: April 5, 2018
    Date of Patent: March 3, 2020
    Assignee: Voicebox Technologies Corporation
    Inventor: Mark Edward Johnson
  • Patent number: 10460036
    Abstract: The disclosure relates to transferred learning from a first language (e.g., a source language for which a semantic parser has been defined) to a second language (e.g., a target language for which a semantic parser has not been defined). A system may use knowledge from a trained model in one language to model another language. For example, the system may transfer knowledge of a semantic parser from a first (e.g., source) language to a second (e.g., target) language. Such transfer of knowledge may occur and be useful when the first language has sufficient training data but the second language has insufficient training data. The foregoing transfer of knowledge may extend the semantic parser for multiple languages (e.g., the first language and the second language).
    Type: Grant
    Filed: April 23, 2018
    Date of Patent: October 29, 2019
    Assignee: Voicebox Technologies Corporation
    Inventors: Long Duong, Hadi Afshar, Dominique Estival, Glen Pink, Philip Cohen, Mark Edward Johnson
  • Publication number: 20180307679
    Abstract: The disclosure relates to transferred learning from a first language (e.g., a source language for which a semantic parser has been defined) to a second language (e.g., a target language for which a semantic parser has not been defined). A system may use knowledge from a trained model in one language to model another language. For example, the system may transfer knowledge of a semantic parser from a first (e.g., source) language to a second (e.g., target) language. Such transfer of knowledge may occur and be useful when the first language has sufficient training data but the second language has insufficient training data. The foregoing transfer of knowledge may extend the semantic parser for multiple languages (e.g., the first language and the second language).
    Type: Application
    Filed: April 23, 2018
    Publication date: October 25, 2018
    Inventors: Long DUONG, Hadi AFSHAR, Dominique ESTIVAL, Glen PINK, Philip COHEN, Mark Edward JOHNSON
  • Publication number: 20180300313
    Abstract: The disclosure relates to systems and methods for generating semantic parsers based on automatically generated operators and user-designated utterances relating to the operators for use in natural language processing. The system may automatically generate multiple operators that each express a respective computer-executable instruction that resolves a request. These operators may be expressed in a manner that is machine-readable and not necessarily for consumption by a human user. The system may generate a canonical statement that expresses the request in a first manner that a human user would be able to understand. The system may generate a task, such as crowd-sourced task, for a human user to provide an utterance that conveys the canonical statement in a second manner different than the first manner. By doing so, the system may rapidly build operators and learn how humans would utter requests resolved by instructions encoded in the operators for building semantic parsers.
    Type: Application
    Filed: April 5, 2018
    Publication date: October 18, 2018
    Applicant: VOICEBOX TECHNOLOGIES CORPORATION
    Inventor: Mark Edward JOHNSON
  • Patent number: 9867845
    Abstract: The disclosure provides for peptide-based bolaamphiphile vectors that are capable of encapsulating a variety of agents, including peptides, proteins, nucleic acids, and drugs. The disclosure further provides for delivering these agents across biological membranes using the peptide-based bolaamphiphile vectors.
    Type: Grant
    Filed: July 30, 2015
    Date of Patent: January 16, 2018
    Assignee: The Regents of the University of California
    Inventors: Zhibin Guan, Hanxiang Zeng, Mark Edward Johnson
  • Publication number: 20160030590
    Abstract: The disclosure provides for peptide-based bolaamphiphile vectors that are capable of encapsulating a variety of agents, including peptides, proteins, nucleic acids, and drugs. The disclosure further provides for delivering these agents across biological membranes using the peptide-based bolaamphiphile vectors.
    Type: Application
    Filed: July 30, 2015
    Publication date: February 4, 2016
    Inventors: Zhibin Guan, Hanxiang Zeng, Mark Edward Johnson
  • Patent number: 8275607
    Abstract: A word is selected from a received text and features are identified from the word. The features are applied to a model to identify probabilities for sets of part-of-speech tags. The probabilities for the sets of part-of-speech tags are used to weight scores for possible part-of-speech tags for the selected word to form weighted scores. The weighted scores are used to select a part-of-speech tag for the word and the selected part of speech tag is stored or output. The scores for the possible part-of-speech tags are based on variational approximation parameters trained from a sparse prior over probability distributions describing the probability of a part-of-speech tag given a word.
    Type: Grant
    Filed: December 12, 2007
    Date of Patent: September 25, 2012
    Assignee: Microsoft Corporation
    Inventors: Kristina Nikolova Toutanova, Mark Edward Johnson