Patents by Inventor Mark Edward Johnson

Mark Edward Johnson has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12374322
    Abstract: Techniques for adjusting outlier datasets for training chatbot systems in natural language processing are disclosed. In one particular aspect, a method is provided that includes receiving a dataset that includes training or inference data. An initial set of outlier data points can be identified within the dataset based on a score of the outlier data points being above or below a threshold. The initial set can be adjusted by identifying one or more nearest neighbors, which can be included in the dataset. Outlier data points that include a label that matches a number of labels of the nearest neighbors that exceeds a predetermined threshold can be removed from the initial set of outlier data points to generate a final set. Outlier data points of the final set can be adjusted with respect to the dataset to generate a set of training data that is used to train a machine-learning model.
    Type: Grant
    Filed: May 25, 2022
    Date of Patent: July 29, 2025
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Yakupitiyage Don Thanuja Samodhye Dharmasiri, Mark Edward Johnson, Thanh Long Duong
  • Patent number: 12367352
    Abstract: Deep learning techniques are disclosed for extraction of embedded data from documents. In an exemplary technique, a set of unstructured text data is received. One or more text groupings are generated by processing the set of unstructured text data. One or more text grouping embeddings are generated in a format for input to a machine learning model based on the one or more generated text groupings. One or more output predictions are generated by inputting the one or more text grouping embeddings into the machine learning model. Each output prediction of the one or more output predictions correspond to a predicted aspect of a text grouping of the one or more text groupings.
    Type: Grant
    Filed: August 12, 2022
    Date of Patent: July 22, 2025
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Xu Zhong, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Thanh Long Duong, Mark Edward Johnson
  • Patent number: 12361219
    Abstract: Techniques are provided for using context tags in named-entity recognition (NER) models. In one particular aspect, a method is provided that includes receiving an utterance, generating embeddings for words of the utterance, generating a regular expression and gazetteer feature vector for the utterance, generating a context tag distribution feature vector for the utterance, concatenating or interpolating the embeddings with the regular expression and gazetteer feature vector and the context tag distribution feature vector to generate a set of feature vectors, generating an encoded form of the utterance based on the set of feature vectors, generating log-probabilities based on the encoded form of the utterance, and identifying one or more constraints for the utterance.
    Type: Grant
    Filed: November 28, 2023
    Date of Patent: July 15, 2025
    Assignee: Oracle International Corporation
    Inventors: Duy Vu, Tuyen Quang Pham, Cong Duy Vu Hoang, Srinivasa Phani Kumar Gadde, Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi
  • Publication number: 20250225129
    Abstract: Techniques for natural language processing include accessing an input string comprising a natural language utterance and a database schema representation for a database; providing the natural language utterance to a first encoder to generate one or more embeddings of the natural language utterance; providing the database schema representation to the first encoder to generate one or more embeddings of the database schema representation; encoding, by a second encoder, relations between elements in the database schema representation and words in the natural language utterance based on the one or more embeddings of the natural language utterance and the one or more embeddings of the database schema representation; and generating a logical form for the natural language utterance based on the encoded relations, the one or more embeddings of the natural language utterance, and the one or more embeddings of the database schema representation.
    Type: Application
    Filed: January 10, 2024
    Publication date: July 10, 2025
    Applicant: Oracle International Corporation
    Inventors: Cong Duy Vu Hoang, Poorya Zaremoodi, Thanh Tien Vu, Gioacchino Tangari, Mark Edward Johnson, Thanh Long Duong, Vishal Vishnoi
  • Publication number: 20250225342
    Abstract: Techniques are disclosed herein for resolving date/time expressions while transforming natural language to a logical form such as a meaning representation language. A class label for a token in a natural language utterance and a meaning representation for the natural language utterance can be predicted. The class label can be associated with a date/time expression. The meaning representation can include an operator and a value. When the value associated with the class label matches a predetermined value type or the operator matches a predetermined operator, the value and/or the operator can be modified, and an executable statement can be generated for the meaning representation. A query on a computing system can be executed using the executable statement.
    Type: Application
    Filed: January 10, 2024
    Publication date: July 10, 2025
    Applicant: Oracle International Corporation
    Inventors: Aashna Devang Kanuga, Cong Duy Vu Hoang, Mark Edward Johnson, Vasisht Raghavendra, Yuanxu Wu, Steve Wai-Chun Siu, Nikita Mathur, Gioacchino Tangari, Shubham Pawankumar Shah, Vanshika Sridharan, Thanh Long Duong, Zikai Li, Diego Andres Cornejo Barra, Stephen Andrew McRitchie, Christopher Mark Broadbent, Vishal Vishnoi, Srinivasa Phani Kumar Gadde, Poorya Zaremoodi, Arash Shamaei, Thanh Tien Vu, Yakupitiyage Don Thanuja Samodhye Dharmasiri
  • Publication number: 20250218428
    Abstract: Techniques are disclosed herein for focused training of language models and end-to-end hypertuning of the framework. In one aspect, a method is provided that includes obtaining a machine learning model pre-trained for language modeling, and post-training the machine learning model for various tasks to generate a focused machine learning model. The post-training includes: (i) training the machine learning model on an unlabeled set of training data pertaining to a task that the machine learning model was pre-trained for as part of the language modeling, and the unlabeled set of training data is obtained with respect to a target domain, a target task, or a target language, and (ii) training the machine learning model on a labeled set of training data that pertains to another task that is an auxiliary task related to a downstream task to be performed using the machine learning model or output from the machine learning model.
    Type: Application
    Filed: March 20, 2025
    Publication date: July 3, 2025
    Applicant: Oracle International Corporation
    Inventors: Poorya Zaremoodi, Cong Duy Vu Hoang, Duy Vu, Dai Hoang Tran, Budhaditya Saha, Nagaraj N. Bhat, Thanh Tien Vu, Tuyen Quang Pham, Adam Craig Pocock, Katherine Silverstein, Srinivasa Phani Kumar Gadde, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong
  • Patent number: 12340172
    Abstract: Techniques for improving a semantic parser of a dialog system, by breaking the semantic parser into a coarse semantic parser and a fine semantic parser, are described. A method described herein includes accessing an utterance received in a dialog system. The utterance is a text-based natural language expression. The method further includes applying a coarse semantic parser to the utterance to determine an intermediate logical form for the utterance. The intermediate logical form indicates one or more intents in the utterance. The method further includes applying a fine semantic parser to the intermediate logical form to determine a logical form for the utterance. The logical form is a syntactic expression of the utterance according to an established grammar, and the logical form includes one or more parameters of the one or more intents. The logical form can be used to conduct a dialog with a user of the dialog system.
    Type: Grant
    Filed: October 26, 2022
    Date of Patent: June 24, 2025
    Assignee: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson
  • Publication number: 20250190710
    Abstract: Techniques for augmenting training data include accessing training data comprising a plurality of training examples comprising a first training example comprising a first natural language utterance and a first logical form for the first natural language utterance. A second natural language utterance is generated by adding or replacing one or more values in the first natural language utterance. A logical form for the second natural language utterance is generated. A second training example is generated, comprising the second natural language utterance and the logical form for the second natural language utterance. The training data is augmented by adding the second training example to the plurality of training examples to generate an augmented training data set. A machine learning model is trained to generate logical forms for utterances using the augmented training data set.
    Type: Application
    Filed: December 6, 2023
    Publication date: June 12, 2025
    Applicant: Oracle International Corporation
    Inventors: Philip Arthur, Gioacchino Tangari, Nitika Mathur, Aashna Devang Kanuga, Cong Duy Vu Hoang, Poorya Zaremoodi, Thanh Long Duong, Mark Edward Johnson
  • Publication number: 20250156649
    Abstract: Techniques are disclosed herein for improving model robustness on operators and triggering keywords in natural language to a meaning representation language system. The techniques include augmenting an original set of training data for a target robustness bucket by leveraging a combination of two training data generation techniques: (1) modification of existing training examples and (2) synthetic template-based example generation. The resulting set of augmented data examples from the two training data generation techniques are appended to the original set of training data to generate an augmented training data set and the augmented training data set is used to train a machine learning model to generate logical forms for utterances.
    Type: Application
    Filed: November 9, 2023
    Publication date: May 15, 2025
    Applicant: Oracle International Corporation
    Inventors: Gioacchino Tangari, Chang Xu, Nitika Mathur, Philip Arthur, Syed Najam Abbas Zaidi, Aashna Devang Kanuga, Cong Duy Vu Hoang, Poorya Zaremoodi, Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi
  • Publication number: 20250157209
    Abstract: Techniques for extracting key information from a document using machine-learning models in a chatbot system is disclosed herein. In one particular aspect, a method is provided that includes receiving a set of data, which includes key fields, within a document at a data processing system that includes a table detection module, a key information extraction module, and a table extraction module. Text information and corresponding location data are extracted via optical character recognition. The table detection module detects whether one or more tables are present in the document and, if applicable, a location of each of the tables. The key information extraction module extracts text from the key fields. The table extraction module extracts each of the tables based on input from the optical character recognition and the table detection module. Extraction results include the text from the key fields and each of the tables can be output.
    Type: Application
    Filed: December 26, 2024
    Publication date: May 15, 2025
    Applicant: Oracle International Corporation
    Inventors: Yakupitiyage Don Thanuja Samodhye Dharmasiri, Xu Zhong, Ahmed Ataallah Ataallah Abobakr, Hongtao Yang, Budhaditya Saha, Shaoke Xu, Shashi Prasad Suravarapu, Mark Edward Johnson, Thanh Long Duong
  • Patent number: 12299402
    Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances.
    Type: Grant
    Filed: May 9, 2024
    Date of Patent: May 13, 2025
    Assignee: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi, Crystal C. Pan, Vladislav Blinov, Cong Duy Vu Hoang, Elias Luqman Jalaluddin, Duy Vu, Balakota Srinivas Vinnakota
  • Patent number: 12293155
    Abstract: A method includes receiving a training set of utterances for training a machine-learning model to identify one or more intents for one or more utterances, and augmenting the training set of utterances with out-of-domain (OOD) examples. The augmenting includes: generating a data set of OOD examples, filtering out OOD examples from the data set of OOD examples, determining a difficulty value for each OOD example remaining within the filtered data set of the OOD examples, and generating augmented batches of utterances including utterances from the training set of utterances and utterances from the filtered data set of the OOD based on the difficulty value for each OOD. Thereafter, the machine-learning model is trained using the augmented batches of utterances in accordance with a curriculum training protocol.
    Type: Grant
    Filed: April 9, 2024
    Date of Patent: May 6, 2025
    Assignee: Oracle International Corporation
    Inventors: Elias Luqman Jalaluddin, Vishal Vishnoi, Thanh Long Duong, Mark Edward Johnson, Poorya Zaremoodi, Gautam Singaraju, Ying Xu, Vladislav Blinov, Yu-Heng Hong
  • Patent number: 12288550
    Abstract: Techniques are disclosed herein for focused training of language models and end-to-end hypertuning of the framework. In one aspect, a method is provided that includes obtaining a machine learning model pre-trained for language modeling, and post-training the machine learning model for various tasks to generate a focused machine learning model. The post-training includes: (i) training the machine learning model on an unlabeled set of training data pertaining to a task that the machine learning model was pre-trained for as part of the language modeling, and the unlabeled set of training data is obtained with respect to a target domain, a target task, or a target language, and (ii) training the machine learning model on a labeled set of training data that pertains to another task that is an auxiliary task related to a downstream task to be performed using the machine learning model or output from the machine learning model.
    Type: Grant
    Filed: September 23, 2022
    Date of Patent: April 29, 2025
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Poorya Zaremoodi, Cong Duy Vu Hoang, Duy Vu, Dai Hoang Tran, Budhaditya Saha, Nagaraj N. Bhat, Thanh Tien Vu, Tuyen Quang Pham, Adam Craig Pocock, Katherine Silverstein, Srinivasa Phani Kumar Gadde, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong
  • Publication number: 20250117585
    Abstract: In some aspects, a computing device may receive, at a data processing system, a set of utterances for training or inferencing with a named entity recognizer to assign a label to each token piece from the set of utterances. The computing device may determine a length of each utterance in the set and when the length of the utterance exceeds a pre-determined threshold of token pieces: dividing the utterance into a plurality of overlapping chunks of token pieces; assigning a label together with a confidence score for each token piece in a chunk; determining a final label and an associated confidence score for each chunk of token pieces by merging two confidence scores; determining a final annotated label for the utterance based at least on the merging the two confidence scores; and storing the final annotated label in a memory.
    Type: Application
    Filed: December 19, 2024
    Publication date: April 10, 2025
    Applicant: Oracle International Corporation
    Inventors: Thanh Tien Vu, Tuyen Quang Pham, Mark Edward Johnson, Thanh Long Duong, Ying Xu, Poorya Zaremoodi, Omid Mohamad Nezami, Budhaditya Saha, Cong Duy Vu Hoang
  • Publication number: 20250117591
    Abstract: Techniques for using logit values for classifying utterances and messages input to chatbot systems in natural language processing. A method can include a chatbot system receiving an utterance generated by a user interacting with the chatbot system. The chatbot system can input the utterance into a machine-learning model including a set of binary classifiers. Each binary classifier of the set of binary classifiers can be associated with a modified logit function. The method can also include the machine-learning model using the modified logit function to generate a set of distance-based logit values for the utterance. The method can also include the machine-learning model applying an enhanced activation function to the set of distance-based logit values to generate a predicted output. The method can also include the chatbot system classifying, based on the predicted output, the utterance as being associated with the particular class.
    Type: Application
    Filed: December 19, 2024
    Publication date: April 10, 2025
    Applicant: Oracle International Corporation
    Inventors: Ying XU, Poorya Zaremoodi, Thanh Tien Vu, Cong Duy Vu Hoang, Vladislav Blinov, Yu-Heng Hong, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Vishal Vishnoi, Elias Luqman Jalaluddin, Manish Parekh, Thanh Long Duong, Mark Edward Johnson
  • Publication number: 20250094465
    Abstract: Techniques are disclosed herein for executing an execution plan for a digital assistant with generative artificial intelligence (genAI). A first genAI model can generate a list of executable actions based on an utterance provided by a user. An execution plan can be generated to include the executable actions. The execution plan can be executed by performing an iterative process for each of the executable actions. The iterative process can include identifying an action type, invoking one or more states, and executing, by the one or more states, the executable action using an asset to obtain an output. A second prompt can be generated based on the output obtained from executing each of the executable actions. A second genAI model can generate a response to the utterance based on the second prompt.
    Type: Application
    Filed: September 5, 2024
    Publication date: March 20, 2025
    Applicant: Oracle International Corporation
    Inventors: Xin Xu, Bhagya Gayathri Hettige, Srinivasa Phani Kumar Gadde, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Vanshika Sridharan, Vishal Vishnoi, Mark Edward Johnson
  • Publication number: 20250094725
    Abstract: Techniques are disclosed herein for implementing digital assistants using generative artificial intelligence. An input prompt comprising a natural language utterance and candidate agents and associated actions can be constructed. An execution plan can be generated using a first generative artificial model based on the input prompt. The execution plan can be executed to perform actions included in the execution plan using agents indicated by the execution plan. A response to the natural language utterance can be generated by a second generative artificial intelligence model using one or more outputs from executing the execution plan.
    Type: Application
    Filed: April 2, 2024
    Publication date: March 20, 2025
    Applicant: Oracle International Corporation
    Inventors: Vishal Vishnoi, Xin Xu, Diego Andres Cornejo Barra, Ying Xu, Yakupitiyage Don Thanuja Samodhve Dharmasiri, Aashna Devang Kanuga, Srinivasa Phani Kumar Gadde, Thanh Long Duong, Mark Edward Johnson
  • Publication number: 20250068627
    Abstract: Techniques are disclosed herein for transforming natural language conversations into a visual output. In one aspect, a computer-implement method includes generating an input string by concatenating a natural language utterance with a schema representation comprising a set of entities for visualization actions, generating, by a first encoder of a machine learning model, one or more embeddings of the input string, encoding, by a second encoder of the machine learning model, relations between elements in the schema representation and words in the natural language utterance based on the one or more embeddings, generating, by a grammar-based decoder of the machine learning model and based on the encoded relations and the one or more embeddings, an intermediate logical form that represents at least the query, the one or more visualization actions, or the combination thereof, and generating, based on the intermediate logical form, a command for a computing system.
    Type: Application
    Filed: March 26, 2024
    Publication date: February 27, 2025
    Applicant: Oracle International Corporation
    Inventors: Cong Duy Vu Hoang, Gioacchino Tangari, Stephen Andrew McRitchie, Nitika Mathur, Aashna Devang Kanuga, Steve Wai-Chun Siu, Dalu Guo, Chang Xu, Mark Edward Johnson, Christopher Mark Broadbent, Thanh Long Duong, Srinivasa Phani Kumar Gadde, Vishal Vishnoi, Chandan Basavaraju, Kenneth Khiaw Hong Eng
  • Publication number: 20250068626
    Abstract: The present disclosure relates to manufacturing training data by leveraging an automated pipeline that manufactures visualization training datasets to train a machine learning model to convert a natural language utterance into meaning representation language logical form that includes one or more visualization actions. Aspects are directed towards accessing an original training dataset, a visualization query dataset, an incremental visualization dataset, a manipulation visualization dataset, or any combination thereof. One or more visualization training datasets are generated by: (i) modifying examples in the original training dataset, the visualization query dataset, or both to include visualization actions, (ii) generating examples, using the incremental visualization dataset, the manipulation visualization dataset, or both, that include visualization actions, or (iii) both (i) and (ii).
    Type: Application
    Filed: March 1, 2024
    Publication date: February 27, 2025
    Applicant: Oracle International Corporation
    Inventors: Gioacchino Tangari, Steve Wai-Chun Siu, Dalu Guo, Cong Duy Vu Hoang, Berk Sarioz, Chang Xu, Stephen Andrew McRitchie, Mark Edward Johnson, Christopher Mark Broadbent, Thanh Long Duong, Srinivasa Phani Kumar Gadde, Vishal Vishnoi, Chandan Basavaraju, Kenneth Khiaw Hong Eng
  • Patent number: 12236321
    Abstract: The present disclosure relates to chatbot systems, and more particularly, to batching techniques for handling unbalanced training data when training a model such that bias is removed from the trained machine learning model when performing inference. In an embodiment, a plurality of raw utterances is obtained. A bias eliminating distribution is determined and a subset of the plurality of raw utterances is batched according to the bias-reducing distribution. The resulting unbiased training data may be input into a prediction model for training the prediction model. The trained prediction model may be obtained and utilized to predict unbiased results from new inputs received by the trained prediction model.
    Type: Grant
    Filed: March 30, 2021
    Date of Patent: February 25, 2025
    Assignee: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi, Balakota Srinivas Vinnakota, Yu-Heng Hong, Elias Luqman Jalaluddin