Patents by Inventor Mark Edward Johnson

Mark Edward Johnson has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

BATCHING TECHNIQUES FOR HANDLING UNBALANCED TRAINING DATA FOR A CHATBOT

Publication number: 20210304075

Abstract: The present disclosure relates to chatbot systems, and more particularly, to batching techniques for handling unbalanced training data when training a model such that bias is removed from the trained machine learning model when performing inference. In an embodiment, a plurality of raw utterances is obtained. A bias eliminating distribution is determined and a subset of the plurality of raw utterances is batched according to the bias-reducing distribution. The resulting unbiased training data may be input into a prediction model for training the prediction model. The trained prediction model may be obtained and utilized to predict unbiased results from new inputs received by the trained prediction model.

Type: Application

Filed: March 30, 2021

Publication date: September 30, 2021

Applicant: Oracle International Corporation

Inventors: Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi, Balakota Srinivas Vinnakota, Yu-Heng Hong, Elias Luqman Jalaluddin
NOISE DATA AUGMENTATION FOR NATURAL LANGUAGE PROCESSING

Publication number: 20210304733

Abstract: Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.

Type: Application

Filed: September 9, 2020

Publication date: September 30, 2021

Applicant: Oracle International Corporation

Inventors: Elias Luqman Jalaluddin, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Yu-Heng Hong, Balakota Srinivas Vinnakota
STREAMLINING DIALOG PROCESSING USING INTEGRATED SHARED RESOURCES

Publication number: 20210081609

Abstract: Techniques for reducing memory and processing resources used by a dialog system by sharing resources between pipelined processes of the dialog system. An integrated shared dictionary is constructed for concurrent use by automated speech recognition (ASR) and natural language understanding (NLU) subsystems of the dialog system. The integrated shared dictionary comprises multiple entries, with each entry comprising first information that is used by the ASR subsystem, second information used by the NLU subsystem, and information correlating the first information and the second information. The ASR subsystem uses the integrated shared dictionary to identify a dictionary entry containing a set of words corresponding to speech input. The dictionary entry information is communicated to the NLU subsystem, which uses the entry to generate a meaning representation for the speech input.

Type: Application

Filed: July 13, 2020

Publication date: March 18, 2021

Applicant: Oracle International Corporation

Inventor: Mark Edward Johnson
TECHNIQUES FOR DIALOG PROCESSING USING CONTEXTUAL DATA

Publication number: 20210082414

Abstract: Techniques are described for using data stored for a user in association with context levels to improve the efficiency and accuracy of dialog processing tasks. A dialog system stores historical dialog data in association with a plurality of configured context levels. The dialog system receives an utterance and identifies a term for disambiguation from the utterance. Based on a determined context level, the dialog system identifies relevant historical data stored to a database. The historical data may be used to perform tasks such as resolving an ambiguity based on user preferences, disambiguating named entities based on a prior dialog, and identifying previously generated answers to queries. Based on the context level, the dialog system can efficiently identify the relevant information and use the identified information to provide a response.

Type: Application

Filed: August 26, 2020

Publication date: March 18, 2021

Applicant: Oracle International Corporation

Inventor: Mark Edward Johnson
STOP WORD DATA AUGMENTATION FOR NATURAL LANGUAGE PROCESSING

Publication number: 20210082400

Abstract: Techniques for stop word data augmentation for training chatbot systems in natural language processing. In one particular aspect, a computer-implemented method includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with stop words to generate an augmented training set of out-of-domain utterances for an unresolved intent category corresponding to an unresolved intent; and training the intent classifier using the training set of utterances and the augmented training set of out-of-domain utterances. The augmenting includes: selecting one or more utterances from the training set of utterances, and for each selected utterance, preserving existing stop words within the utterance and replacing at least one non-stop word within the utterance with a stop word or stop word phrase selected from a list of stop words to generate an out-of-domain utterance.

Type: Application

Filed: September 9, 2020

Publication date: March 18, 2021

Applicant: Oracle International Corporation

Inventors: Vishal Vishnoi, Mark Edward Johnson, Elias Luqman Jalaluddin, Balakota Srinivas Vinnakota, Thanh Long Duong, Gautam Singaraju
COMPRESSING NEURAL NETWORKS FOR NATURAL LANGUAGE UNDERSTANDING

Publication number: 20210081799

Abstract: A model for a natural language understanding task is generated based on labeled data generated by a labeling model. The model for the natural language understanding task is smaller than the labeling model (i.e., with lower computational and memory requirements than the combined model), but with substantially the same performance as the labeling model. In some cases, the labeling model may be generated based on a large pre-trained model.

Type: Application

Filed: July 24, 2020

Publication date: March 18, 2021

Applicant: Oracle International Corporation

Inventor: Mark Edward Johnson
REDUCED TRAINING FOR DIALOG SYSTEMS USING A DATABASE

Publication number: 20210082425

Abstract: Techniques are described for training and executing a machine learning model using data derived from a database. A dialog system uses data from the database to generate related training data for natural language understanding applications. The generated training data is then used to train a machine learning model. This enables the dialog system to leverage a large amount of available data to speed up the training process as compared to conventional labeling techniques. The dialog system uses the trained machine learning model to identify a named entity from a received spoken utterance and generate and output a speech response based upon the identified named entity.

Type: Application

Filed: August 3, 2020

Publication date: March 18, 2021

Applicant: Oracle International Corporation

Inventors: Mark Edward Johnson, Michael Rye Kennewick
REDUCED TRAINING INTENT RECOGNITION TECHNIQUES

Publication number: 20210082424

Abstract: The present disclosure relates generally to determining intent based upon speech input using a dialog system. More particularly, techniques are described using matching-based machine learning techniques to identify an intent corresponding to speech input in a dialog system. These procedures do not require training when intents are added or removed from the set of possible intents.

Type: Application

Filed: July 29, 2020

Publication date: March 18, 2021

Applicant: Oracle International Corporation

Inventor: Mark Edward Johnson
USING A GENERATIVE ADVERSARIAL NETWORK TO TRAIN A SEMANTIC PARSER OF A DIALOG SYSTEM

Publication number: 20210074274

Abstract: Disclosed herein are techniques for using a generative adversarial network (GAN) to train a semantic parser of a dialog system. A method described herein involves accessing seed data that includes seed tuples. Each seed tuple includes a respective seed utterance and a respective seed logical form corresponding to the respective seed utterance. The method further includes training a semantic parser and a discriminator in a GAN. The semantic parser learns to map utterances to logical forms based on output from the discriminator, and the discriminator learns to recognize authentic logical forms based on output from the semantic parser. The semantic parser may then be integrated into a dialog system.

Type: Application

Filed: August 13, 2020

Publication date: March 11, 2021

Applicant: Oracle International Corporation

Inventors: Thanh Long Duong, Mark Edward Johnson
IMPLEMENTING A CORRECTION MODEL TO REDUCE PROPAGATION OF AUTOMATIC SPEECH RECOGNITION ERRORS

Publication number: 20210074262

Abstract: Some techniques described herein determine a correction model for a dialog system, such that the correction model corrects output from an automatic speech recognition (ASR) subsystem in the dialog system. A method described herein includes accessing training data. A first tuple of the training data includes an utterance, where the utterance is a textual representation of speech. The method further includes using an ASR subsystem of a dialog system to convert the utterance to an output utterance. The method further includes storing the output utterance in corrective training data that is based on the training data. The method further includes training a correction model based on the corrective training data, such that the correction model is configured to correct output from the ASR subsystem during operation of the dialog system.

Type: Application

Filed: August 13, 2020

Publication date: March 11, 2021

Applicant: Oracle International Corporation

Inventors: Thanh Long Duong, Mark Edward Johnson
SEMANTIC PARSER INCLUDING A COARSE SEMANTIC PARSER AND A FINE SEMANTIC PARSER

Publication number: 20210073465

Abstract: Techniques for improving a semantic parser of a dialog system, by breaking the semantic parser into a coarse semantic parser and a fine semantic parser, are described. A method described herein includes accessing an utterance received in a dialog system. The utterance is a text-based natural language expression. The method further includes applying a coarse semantic parser to the utterance to determine an intermediate logical form for the utterance. The intermediate logical form indicates one or more intents in the utterance. The method further includes applying a fine semantic parser to the intermediate logical form to determine a logical form for the utterance. The logical form is a syntactic expression of the utterance according to an established grammar, and the logical form includes one or more parameters of the one or more intents. The logical form can be used to conduct a dialog with a user of the dialog system.

Type: Application

Filed: August 13, 2020

Publication date: March 11, 2021

Applicant: Oracle International Corporation

Inventors: Thanh Long Duong, Mark Edward Johnson
USING BACKPROPAGATION TO TRAIN A DIALOG SYSTEM

Publication number: 20210074269

Abstract: Techniques described herein use backpropagation to train one or more machine learning (ML) models of a dialog system. For instance, a method includes accessing seed data that includes training tuples, where each training tuple comprising a respective logical form. The method includes converting the logical form of a training tuple to a converted logical form, by applying to the logical form a text-to-speech (TTS) subsystem, an automatic speech recognition (ASR) subsystem, and a semantic parser of a dialog system. The method includes determining a training signal by using an objective function to compare the converted logical form to the logical form. The method further includes training the TTS subsystem, the ASR subsystem, and the semantic parser via backpropagation based on the training signal. As a result of the training by backpropagation, the machine learning models are tuned work effectively together within a pipeline of the dialog system.

Type: Application

Filed: August 25, 2020

Publication date: March 11, 2021

Applicant: Oracle International Corporation

Inventors: Thanh Long Duong, Mark Edward Johnson
TASK-ORIENTED DIALOG SUITABLE FOR A STANDALONE DEVICE

Publication number: 20210065709

Abstract: Described herein are dialog systems, and techniques for providing such dialog systems, that are suitable for use on standalone computing devices. In some embodiments, a dialog system includes a dialog manager, which takes as input an input logical form, which may be a representation of user input. The dialog, manager may include a dialog state tracker, an execution subsystem, a dialog policy subsystem, and a context stack. The dialog state tracker may generate an intermediate logical form from the input logical form combined with a context from the context stack. The context stack may maintain a history of a current dialog, and thus, the intermediate logical form may include contextual information potentially missing from the input logical form. The execution subsystem may execute the intermediate logical form to produce an execution result, and the dialog policy subsystem may generate an output logical form based on the execution result.

Type: Application

Filed: August 28, 2020

Publication date: March 4, 2021

Applicant: Oracle International Corporation

Inventors: Thanh Long Duong, Mark Edward Johnson, Vu Cong Duy Hoang, Tuyen Quang Pham, Yu-Heng Hong, Vladislavs Dovgalecs, Guy Bashkansky, Jason Black, Andrew David Bleeker, Serge Le Huitouze
System and method for generating a multi-lingual and multi-intent capable semantic parser based on automatically generated operators and user-designated utterances relating to the operators

Patent number: 10579738

Abstract: The disclosure relates to systems and methods for generating semantic parsers based on automatically generated operators and user-designated utterances relating to the operators for use in natural language processing. The system may automatically generate multiple operators that each express a respective computer-executable instruction that resolves a request. These operators may be expressed in a manner that is machine-readable and not necessarily for consumption by a human user. The system may generate a canonical statement that expresses the request in a first manner that a human user would be able to understand. The system may generate a task, such as crowd-sourced task, for a human user to provide an utterance that conveys the canonical statement in a second manner different than the first manner. By doing so, the system may rapidly build operators and learn how humans would utter requests resolved by instructions encoded in the operators for building semantic parsers.

Type: Grant

Filed: April 5, 2018

Date of Patent: March 3, 2020

Assignee: Voicebox Technologies Corporation

Inventor: Mark Edward Johnson
Multi-lingual semantic parser based on transferred learning

Patent number: 10460036

Abstract: The disclosure relates to transferred learning from a first language (e.g., a source language for which a semantic parser has been defined) to a second language (e.g., a target language for which a semantic parser has not been defined). A system may use knowledge from a trained model in one language to model another language. For example, the system may transfer knowledge of a semantic parser from a first (e.g., source) language to a second (e.g., target) language. Such transfer of knowledge may occur and be useful when the first language has sufficient training data but the second language has insufficient training data. The foregoing transfer of knowledge may extend the semantic parser for multiple languages (e.g., the first language and the second language).

Type: Grant

Filed: April 23, 2018

Date of Patent: October 29, 2019

Assignee: Voicebox Technologies Corporation

Inventors: Long Duong, Hadi Afshar, Dominique Estival, Glen Pink, Philip Cohen, Mark Edward Johnson
MULTI-LINGUAL SEMANTIC PARSER BASED ON TRANSFERRED LEARNING

Publication number: 20180307679

Abstract: The disclosure relates to transferred learning from a first language (e.g., a source language for which a semantic parser has been defined) to a second language (e.g., a target language for which a semantic parser has not been defined). A system may use knowledge from a trained model in one language to model another language. For example, the system may transfer knowledge of a semantic parser from a first (e.g., source) language to a second (e.g., target) language. Such transfer of knowledge may occur and be useful when the first language has sufficient training data but the second language has insufficient training data. The foregoing transfer of knowledge may extend the semantic parser for multiple languages (e.g., the first language and the second language).

Type: Application

Filed: April 23, 2018

Publication date: October 25, 2018

Inventors: Long DUONG, Hadi AFSHAR, Dominique ESTIVAL, Glen PINK, Philip COHEN, Mark Edward JOHNSON
SYSTEM AND METHOD FOR GENERATING A MULTI-LINGUAL AND MULTI-INTENT CAPABLE SEMANTIC PARSER BASED ON AUTOMATICALLY GENERATED OPERATORS AND USER-DESIGNATED UTTERANCES RELATING TO THE OPERATORS

Publication number: 20180300313

Abstract: The disclosure relates to systems and methods for generating semantic parsers based on automatically generated operators and user-designated utterances relating to the operators for use in natural language processing. The system may automatically generate multiple operators that each express a respective computer-executable instruction that resolves a request. These operators may be expressed in a manner that is machine-readable and not necessarily for consumption by a human user. The system may generate a canonical statement that expresses the request in a first manner that a human user would be able to understand. The system may generate a task, such as crowd-sourced task, for a human user to provide an utterance that conveys the canonical statement in a second manner different than the first manner. By doing so, the system may rapidly build operators and learn how humans would utter requests resolved by instructions encoded in the operators for building semantic parsers.

Type: Application

Filed: April 5, 2018

Publication date: October 18, 2018

Applicant: VOICEBOX TECHNOLOGIES CORPORATION

Inventor: Mark Edward JOHNSON
Vectors for delivery of agents across biological membranes

Patent number: 9867845

Abstract: The disclosure provides for peptide-based bolaamphiphile vectors that are capable of encapsulating a variety of agents, including peptides, proteins, nucleic acids, and drugs. The disclosure further provides for delivering these agents across biological membranes using the peptide-based bolaamphiphile vectors.

Type: Grant

Filed: July 30, 2015

Date of Patent: January 16, 2018

Assignee: The Regents of the University of California

Inventors: Zhibin Guan, Hanxiang Zeng, Mark Edward Johnson
VECTORS FOR DELIVERY OF AGENTS ACROSS BIOLOGICAL MEMBRANES

Publication number: 20160030590

Abstract: The disclosure provides for peptide-based bolaamphiphile vectors that are capable of encapsulating a variety of agents, including peptides, proteins, nucleic acids, and drugs. The disclosure further provides for delivering these agents across biological membranes using the peptide-based bolaamphiphile vectors.

Type: Application

Filed: July 30, 2015

Publication date: February 4, 2016

Inventors: Zhibin Guan, Hanxiang Zeng, Mark Edward Johnson
Semi-supervised part-of-speech tagging

Patent number: 8275607

Abstract: A word is selected from a received text and features are identified from the word. The features are applied to a model to identify probabilities for sets of part-of-speech tags. The probabilities for the sets of part-of-speech tags are used to weight scores for possible part-of-speech tags for the selected word to form weighted scores. The weighted scores are used to select a part-of-speech tag for the word and the selected part of speech tag is stored or output. The scores for the possible part-of-speech tags are based on variational approximation parameters trained from a sparse prior over probability distributions describing the probability of a part-of-speech tag given a word.

Type: Grant

Filed: December 12, 2007

Date of Patent: September 25, 2012

Assignee: Microsoft Corporation

Inventors: Kristina Nikolova Toutanova, Mark Edward Johnson

prev 1 2 3 4 5 6 next