Patents by Inventor Mark Edward Johnson
Mark Edward Johnson has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20210304075Abstract: The present disclosure relates to chatbot systems, and more particularly, to batching techniques for handling unbalanced training data when training a model such that bias is removed from the trained machine learning model when performing inference. In an embodiment, a plurality of raw utterances is obtained. A bias eliminating distribution is determined and a subset of the plurality of raw utterances is batched according to the bias-reducing distribution. The resulting unbiased training data may be input into a prediction model for training the prediction model. The trained prediction model may be obtained and utilized to predict unbiased results from new inputs received by the trained prediction model.Type: ApplicationFiled: March 30, 2021Publication date: September 30, 2021Applicant: Oracle International CorporationInventors: Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi, Balakota Srinivas Vinnakota, Yu-Heng Hong, Elias Luqman Jalaluddin
-
Publication number: 20210304733Abstract: Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.Type: ApplicationFiled: September 9, 2020Publication date: September 30, 2021Applicant: Oracle International CorporationInventors: Elias Luqman Jalaluddin, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Yu-Heng Hong, Balakota Srinivas Vinnakota
-
Publication number: 20210081609Abstract: Techniques for reducing memory and processing resources used by a dialog system by sharing resources between pipelined processes of the dialog system. An integrated shared dictionary is constructed for concurrent use by automated speech recognition (ASR) and natural language understanding (NLU) subsystems of the dialog system. The integrated shared dictionary comprises multiple entries, with each entry comprising first information that is used by the ASR subsystem, second information used by the NLU subsystem, and information correlating the first information and the second information. The ASR subsystem uses the integrated shared dictionary to identify a dictionary entry containing a set of words corresponding to speech input. The dictionary entry information is communicated to the NLU subsystem, which uses the entry to generate a meaning representation for the speech input.Type: ApplicationFiled: July 13, 2020Publication date: March 18, 2021Applicant: Oracle International CorporationInventor: Mark Edward Johnson
-
Publication number: 20210082414Abstract: Techniques are described for using data stored for a user in association with context levels to improve the efficiency and accuracy of dialog processing tasks. A dialog system stores historical dialog data in association with a plurality of configured context levels. The dialog system receives an utterance and identifies a term for disambiguation from the utterance. Based on a determined context level, the dialog system identifies relevant historical data stored to a database. The historical data may be used to perform tasks such as resolving an ambiguity based on user preferences, disambiguating named entities based on a prior dialog, and identifying previously generated answers to queries. Based on the context level, the dialog system can efficiently identify the relevant information and use the identified information to provide a response.Type: ApplicationFiled: August 26, 2020Publication date: March 18, 2021Applicant: Oracle International CorporationInventor: Mark Edward Johnson
-
Publication number: 20210082400Abstract: Techniques for stop word data augmentation for training chatbot systems in natural language processing. In one particular aspect, a computer-implemented method includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with stop words to generate an augmented training set of out-of-domain utterances for an unresolved intent category corresponding to an unresolved intent; and training the intent classifier using the training set of utterances and the augmented training set of out-of-domain utterances. The augmenting includes: selecting one or more utterances from the training set of utterances, and for each selected utterance, preserving existing stop words within the utterance and replacing at least one non-stop word within the utterance with a stop word or stop word phrase selected from a list of stop words to generate an out-of-domain utterance.Type: ApplicationFiled: September 9, 2020Publication date: March 18, 2021Applicant: Oracle International CorporationInventors: Vishal Vishnoi, Mark Edward Johnson, Elias Luqman Jalaluddin, Balakota Srinivas Vinnakota, Thanh Long Duong, Gautam Singaraju
-
Publication number: 20210081799Abstract: A model for a natural language understanding task is generated based on labeled data generated by a labeling model. The model for the natural language understanding task is smaller than the labeling model (i.e., with lower computational and memory requirements than the combined model), but with substantially the same performance as the labeling model. In some cases, the labeling model may be generated based on a large pre-trained model.Type: ApplicationFiled: July 24, 2020Publication date: March 18, 2021Applicant: Oracle International CorporationInventor: Mark Edward Johnson
-
Publication number: 20210082425Abstract: Techniques are described for training and executing a machine learning model using data derived from a database. A dialog system uses data from the database to generate related training data for natural language understanding applications. The generated training data is then used to train a machine learning model. This enables the dialog system to leverage a large amount of available data to speed up the training process as compared to conventional labeling techniques. The dialog system uses the trained machine learning model to identify a named entity from a received spoken utterance and generate and output a speech response based upon the identified named entity.Type: ApplicationFiled: August 3, 2020Publication date: March 18, 2021Applicant: Oracle International CorporationInventors: Mark Edward Johnson, Michael Rye Kennewick
-
Publication number: 20210082424Abstract: The present disclosure relates generally to determining intent based upon speech input using a dialog system. More particularly, techniques are described using matching-based machine learning techniques to identify an intent corresponding to speech input in a dialog system. These procedures do not require training when intents are added or removed from the set of possible intents.Type: ApplicationFiled: July 29, 2020Publication date: March 18, 2021Applicant: Oracle International CorporationInventor: Mark Edward Johnson
-
Publication number: 20210074274Abstract: Disclosed herein are techniques for using a generative adversarial network (GAN) to train a semantic parser of a dialog system. A method described herein involves accessing seed data that includes seed tuples. Each seed tuple includes a respective seed utterance and a respective seed logical form corresponding to the respective seed utterance. The method further includes training a semantic parser and a discriminator in a GAN. The semantic parser learns to map utterances to logical forms based on output from the discriminator, and the discriminator learns to recognize authentic logical forms based on output from the semantic parser. The semantic parser may then be integrated into a dialog system.Type: ApplicationFiled: August 13, 2020Publication date: March 11, 2021Applicant: Oracle International CorporationInventors: Thanh Long Duong, Mark Edward Johnson
-
Publication number: 20210074262Abstract: Some techniques described herein determine a correction model for a dialog system, such that the correction model corrects output from an automatic speech recognition (ASR) subsystem in the dialog system. A method described herein includes accessing training data. A first tuple of the training data includes an utterance, where the utterance is a textual representation of speech. The method further includes using an ASR subsystem of a dialog system to convert the utterance to an output utterance. The method further includes storing the output utterance in corrective training data that is based on the training data. The method further includes training a correction model based on the corrective training data, such that the correction model is configured to correct output from the ASR subsystem during operation of the dialog system.Type: ApplicationFiled: August 13, 2020Publication date: March 11, 2021Applicant: Oracle International CorporationInventors: Thanh Long Duong, Mark Edward Johnson
-
Publication number: 20210073465Abstract: Techniques for improving a semantic parser of a dialog system, by breaking the semantic parser into a coarse semantic parser and a fine semantic parser, are described. A method described herein includes accessing an utterance received in a dialog system. The utterance is a text-based natural language expression. The method further includes applying a coarse semantic parser to the utterance to determine an intermediate logical form for the utterance. The intermediate logical form indicates one or more intents in the utterance. The method further includes applying a fine semantic parser to the intermediate logical form to determine a logical form for the utterance. The logical form is a syntactic expression of the utterance according to an established grammar, and the logical form includes one or more parameters of the one or more intents. The logical form can be used to conduct a dialog with a user of the dialog system.Type: ApplicationFiled: August 13, 2020Publication date: March 11, 2021Applicant: Oracle International CorporationInventors: Thanh Long Duong, Mark Edward Johnson
-
Publication number: 20210074269Abstract: Techniques described herein use backpropagation to train one or more machine learning (ML) models of a dialog system. For instance, a method includes accessing seed data that includes training tuples, where each training tuple comprising a respective logical form. The method includes converting the logical form of a training tuple to a converted logical form, by applying to the logical form a text-to-speech (TTS) subsystem, an automatic speech recognition (ASR) subsystem, and a semantic parser of a dialog system. The method includes determining a training signal by using an objective function to compare the converted logical form to the logical form. The method further includes training the TTS subsystem, the ASR subsystem, and the semantic parser via backpropagation based on the training signal. As a result of the training by backpropagation, the machine learning models are tuned work effectively together within a pipeline of the dialog system.Type: ApplicationFiled: August 25, 2020Publication date: March 11, 2021Applicant: Oracle International CorporationInventors: Thanh Long Duong, Mark Edward Johnson
-
Publication number: 20210065709Abstract: Described herein are dialog systems, and techniques for providing such dialog systems, that are suitable for use on standalone computing devices. In some embodiments, a dialog system includes a dialog manager, which takes as input an input logical form, which may be a representation of user input. The dialog, manager may include a dialog state tracker, an execution subsystem, a dialog policy subsystem, and a context stack. The dialog state tracker may generate an intermediate logical form from the input logical form combined with a context from the context stack. The context stack may maintain a history of a current dialog, and thus, the intermediate logical form may include contextual information potentially missing from the input logical form. The execution subsystem may execute the intermediate logical form to produce an execution result, and the dialog policy subsystem may generate an output logical form based on the execution result.Type: ApplicationFiled: August 28, 2020Publication date: March 4, 2021Applicant: Oracle International CorporationInventors: Thanh Long Duong, Mark Edward Johnson, Vu Cong Duy Hoang, Tuyen Quang Pham, Yu-Heng Hong, Vladislavs Dovgalecs, Guy Bashkansky, Jason Black, Andrew David Bleeker, Serge Le Huitouze
-
Patent number: 10579738Abstract: The disclosure relates to systems and methods for generating semantic parsers based on automatically generated operators and user-designated utterances relating to the operators for use in natural language processing. The system may automatically generate multiple operators that each express a respective computer-executable instruction that resolves a request. These operators may be expressed in a manner that is machine-readable and not necessarily for consumption by a human user. The system may generate a canonical statement that expresses the request in a first manner that a human user would be able to understand. The system may generate a task, such as crowd-sourced task, for a human user to provide an utterance that conveys the canonical statement in a second manner different than the first manner. By doing so, the system may rapidly build operators and learn how humans would utter requests resolved by instructions encoded in the operators for building semantic parsers.Type: GrantFiled: April 5, 2018Date of Patent: March 3, 2020Assignee: Voicebox Technologies CorporationInventor: Mark Edward Johnson
-
Patent number: 10460036Abstract: The disclosure relates to transferred learning from a first language (e.g., a source language for which a semantic parser has been defined) to a second language (e.g., a target language for which a semantic parser has not been defined). A system may use knowledge from a trained model in one language to model another language. For example, the system may transfer knowledge of a semantic parser from a first (e.g., source) language to a second (e.g., target) language. Such transfer of knowledge may occur and be useful when the first language has sufficient training data but the second language has insufficient training data. The foregoing transfer of knowledge may extend the semantic parser for multiple languages (e.g., the first language and the second language).Type: GrantFiled: April 23, 2018Date of Patent: October 29, 2019Assignee: Voicebox Technologies CorporationInventors: Long Duong, Hadi Afshar, Dominique Estival, Glen Pink, Philip Cohen, Mark Edward Johnson
-
Publication number: 20180307679Abstract: The disclosure relates to transferred learning from a first language (e.g., a source language for which a semantic parser has been defined) to a second language (e.g., a target language for which a semantic parser has not been defined). A system may use knowledge from a trained model in one language to model another language. For example, the system may transfer knowledge of a semantic parser from a first (e.g., source) language to a second (e.g., target) language. Such transfer of knowledge may occur and be useful when the first language has sufficient training data but the second language has insufficient training data. The foregoing transfer of knowledge may extend the semantic parser for multiple languages (e.g., the first language and the second language).Type: ApplicationFiled: April 23, 2018Publication date: October 25, 2018Inventors: Long DUONG, Hadi AFSHAR, Dominique ESTIVAL, Glen PINK, Philip COHEN, Mark Edward JOHNSON
-
Publication number: 20180300313Abstract: The disclosure relates to systems and methods for generating semantic parsers based on automatically generated operators and user-designated utterances relating to the operators for use in natural language processing. The system may automatically generate multiple operators that each express a respective computer-executable instruction that resolves a request. These operators may be expressed in a manner that is machine-readable and not necessarily for consumption by a human user. The system may generate a canonical statement that expresses the request in a first manner that a human user would be able to understand. The system may generate a task, such as crowd-sourced task, for a human user to provide an utterance that conveys the canonical statement in a second manner different than the first manner. By doing so, the system may rapidly build operators and learn how humans would utter requests resolved by instructions encoded in the operators for building semantic parsers.Type: ApplicationFiled: April 5, 2018Publication date: October 18, 2018Applicant: VOICEBOX TECHNOLOGIES CORPORATIONInventor: Mark Edward JOHNSON
-
Patent number: 9867845Abstract: The disclosure provides for peptide-based bolaamphiphile vectors that are capable of encapsulating a variety of agents, including peptides, proteins, nucleic acids, and drugs. The disclosure further provides for delivering these agents across biological membranes using the peptide-based bolaamphiphile vectors.Type: GrantFiled: July 30, 2015Date of Patent: January 16, 2018Assignee: The Regents of the University of CaliforniaInventors: Zhibin Guan, Hanxiang Zeng, Mark Edward Johnson
-
Publication number: 20160030590Abstract: The disclosure provides for peptide-based bolaamphiphile vectors that are capable of encapsulating a variety of agents, including peptides, proteins, nucleic acids, and drugs. The disclosure further provides for delivering these agents across biological membranes using the peptide-based bolaamphiphile vectors.Type: ApplicationFiled: July 30, 2015Publication date: February 4, 2016Inventors: Zhibin Guan, Hanxiang Zeng, Mark Edward Johnson
-
Patent number: 8275607Abstract: A word is selected from a received text and features are identified from the word. The features are applied to a model to identify probabilities for sets of part-of-speech tags. The probabilities for the sets of part-of-speech tags are used to weight scores for possible part-of-speech tags for the selected word to form weighted scores. The weighted scores are used to select a part-of-speech tag for the word and the selected part of speech tag is stored or output. The scores for the possible part-of-speech tags are based on variational approximation parameters trained from a sparse prior over probability distributions describing the probability of a part-of-speech tag given a word.Type: GrantFiled: December 12, 2007Date of Patent: September 25, 2012Assignee: Microsoft CorporationInventors: Kristina Nikolova Toutanova, Mark Edward Johnson